Overview

The fifth project is to build a simplified implementation of memcached, which is an in-memory key-value store that is used for quick access to frequently used computations (ie. a cache). To support concurrent access to the cache, your library must use the C++ thread library to ensure thread-safe behavior.

To use the memcached programming paradigm, we first declare a memcached object, in this case a templated HTCache as follows:

// Declare a HTCache object
HTCache<int64_t, bool> cache(addrlen, pagesize, policy, handler);

This object takes the size of the address space (in terms of the number of bits), the size of each page (in terms of the number of entries), the eviction policy (either FIFO, Random, LRU, or Clock, and the handler (the function to call when a requested item is missing from the cache). It maps an int64_t key to a bool value.

Once we have this HTCache object, we can retrieve a value from the cache with following command:

// Retrieve value associated with key
auto value = cache.get(key);

If the value associated with the key is not in the HTCache then the provided handler will be called to generate or retrieve the missing value, which is then stored in the cache and returned to the caller.

Likewise, we can also store values into the cache by doing the following:

// Store value associated with key
cache.put(key, value);

If the HTCache is full, that is it has no free space for another value, then we must perform an eviction. To do so, we will use the specified eviction policy to select a currently existing value for replacement and then store the new value in the place of the evicted object.

In this way, the use of a fault handler and an eviction policy means that HTCache is similar to the paging and virtual memory systems we have discussed in class for the last few weeks.

Hash Tree

Internally, the HTCache uses a modified form of a hashed array tree, which combines facets of hash tables with binary trees as shown below:

Unlike in typical hash tables, our hashed array tree splits up the array of buckets into a tree of HTPages. Rather than using the result of the hash function to map to a specific bucket, we will use the hash function to convert a key into a virtual address that we can then use to navigate our hashed array tree in a manner similar to how we translate virtual address to physical addresses with page tables.

This means that inside HTCache, we have a PageTable, which is actually just a std::vector of HTPages (ie. the root or directory). Likewise, each HTPage itself contains a std::vector of HTPageEntry objects which at the very least keep track of the key and value pair associated with the HTPageEntry.

To locate an item in the HTCache, we must translate the key into an address by using the std::hash function, and then use that to find the appropriate key-value pair within our hashed array tree.

For example, the following code corresponds to the diagram above:

HTCache<int, bool> is_odd(4, 4, EVICT_FIFO, [](const int &key){ return key % 2; });

cout << is_odd.get(3) << endl;
cout << is_odd.get(8) << endl;

The example code creates a HTCache called is_odd with an address length of 4 bits as specified by the first argument to the HTCache constructor. This means that our total address space is 2^4 = 16 addresses where each address corresponds to a HTPageEntry.

The second argument, 4 specifies the number of entries per HTPage. Because each HTPage has 4 HTPageEntries, we need 2^4 Entries / 2^2 Entries Per Page = 2^2 = 4 HTPages in our PageTable std::vector.

To perform is_odd.get(3) (that is retrieve the value of 3 from our HTCache), we must do the following:

  1. First, use std::hash to convert the key into an integer virtual address.

    In this example, let's assume the result of std::hash(3) is simply 3.

  2. Next, extract the virtual page number or VPN from the integer address.

    Since the virtual address is 3 in this case, the bit representation is 0011. Therefore the VPN is 00 or the first two bits of the virtual address.

  3. Afterwards, extract the offset into the HTPage by taking the remaining bits of the virtual address.

    In this case, the final bits are 11 and therefore the offset is 11

In summary, we hash the key to generate a virtual address and then extract the VPN and the offset from this virtual address:

Hash(3) ->  0 0 1 1
            ___ ___
            VPN Offset

Once we have the VPN, we can then use this to select the appropriate HTPage from the PageTable. Likewise, we can then access the HTPageEntry inside the selected HTPage by using the computed offset.

To perform is_odd.get(8) (that is retrieve the value of 8 from our HTCache), we follow the same process and get a VPN of 10 and an offset of 00. This means we will need to lookup the first HTPageEntry of the third HTPage of our PageTable to find the value for the key 8.

In this way, the HTCache's internal structure resembles a single-level page table and uses a similar mechanism for translating virtual addresses to physical address as we've discussed in class.

More details about this project and your deliverables are described below.

Memcached

Many websites such as Wikipedia and Wordpress utilize memcached to speed up access to their websites. In fact the notion of a fast distributed key-value store is a core component of the NoSQL movement currently sweeping the data storage and processing industry.

For our project, we will create a simplified version of memcached that acts uses an embedded library model rather than a client/server framework. In this way, our htcache is akin to [sqlite], which is an embeddable database engine that many applications such as Firefox and Chrome utilize internally.

Deliverables

Working in groups of one or two people, you are to create a library that implements the HTCache class described above by midnight on Tuesday, November 28, 2017. Additionally, you are to write a fibonacci application that utilizes your library, and perform experiments to analyze your project by Thursday, November 30, 2017.

For this project, you must use C++ (not C) as the implementation language. Any test scripts or auxillary tools can be written in any reasonable scripting language.

Furthermore, you should use the C++ thread library rather than plain POSIX threads. In particular, you will need to use std::mutex and std::lock_guard to perform mutual exclusion an ensure thread-safe concurrent access to the HTCache.

Timeline

Here is a timeline of events related to this project:

Date Event
Thursday, November 16 Project description and repository are available.
Tuesday, November 21 Test scripts / outputs are available from TAs (after design is checked off).
Tuesday, November 28 Library is due (pushed to GitLab).
Thursday, November 30 Demonstrations of library and application must be completed.

Repository

To start this project, one group member must fork the Project 05 repository on GitLab:

https://gitlab.com/nd-cse-30341-fa17/cse-30341-fa17-project05

Once this repository has been forked, follow the instructions from Reading 00 to:

  1. Make the repository private.

  2. Configure access to the repository.

    Make sure you add all the members of the team in addition to the instructional staff.

Source Code

As you can see, the base Project 05 repository contains a README.md file and the following folder hierarchy:

project05
    \_  Makefile            # This is the project Makefile
    \_  bin                 # This contains the application executables and scripts
    \_  include             # This contains the HTCache library header files
        \_  htcache
            \_  htcache.h   # This contains the HTCache source code
            \_  htpage.h    # This contains the HTPage source code
    \_  src                 # This contains the application source code
    \_  tests               # This contains the test source code

You must maintain this folder structure for your project and place files in their appropriate place.

HTCache

The include/htcache/htcache.h file contains the implementation of the HTCache class. You are to complete this class based on the design described above and with the hints provided by the TODOs in the file. In particular, you will need to do the following:

  1. The constructor must determine the Addresses, Pages, VPNShift, VPNMask, and OffsetMask and then initialize the PageTable based on these values.

  2. The get method must determine the virtual address, VPN, and offset based on the key, and then use this information to retrieve the value from the appropriate HTPage.

    It must take care of the case where the data is missing (ie. a std::out_of_range exception with be thrown). In this situation, it must call the Handler function to compute the appropriate value and then store that into the cache before returning this new value.

  3. The put method must also determine the virtual address, VPN, and offset based on the key, and then use this information to store the value into the appropriate HTPage.

To keep track of the cache hits and misses, you must update the Hits and Misses instance variables appropriately. All of the information that you determine and track will be dumped to a stream by the provided stats methods.

Because multiple threads can access the HTCache concurrently, you are to use the provided Lock judiciously to protect values that need to be updated atomically.

HTPage

The include/htcache/htpage.h file contains the implementation of the HTPage class. You are to complete this class based on the design described above and with the hints provided by the TODOs in the file. In particular, you will need to do the following:

  1. Add additional variables to perform the bookkeeping necessary to implement the various eviction policies: FIFO, Random, LRU, and Clock.

  2. Implement each of the eviction policies.

  3. The constructor must initialize the std::vector of HTPageEntry objects.

  4. The copy constructor must copy the individual instance variables.

  5. The get method must use linear probing to locate the appropriate HTPageEntry. If it is not found, then it should throw a std::out_of_range exception if it is not found.

  6. The put method must also use linear probing to locate the appropriate HTPageEntry. If no HTPageEntry is found, then it should add a new HTPageEntry. If the HTPage is full, an existing HTPageEntry must be selected based on the eviction Policy and then used for replacement.

Once again, since multiple threads can access each HTPage concurrently, you must use the Lock to implement monitor-style protection.

Applications

Under the src directory, we have provided the following example and testing applications:

  1. bs.cpp: This performs binary search on an array.

  2. bubble.cpp: This performs bubble sort on an array.

  3. insertion.cpp: This performs insertion sort on an array.

  4. prime.cpp: This computes and adds primes.

  5. scan.cpp: This performs a scan of an array.

  6. selection.cpp: This performs selection sort on an array.

You may use each of these programs to test and analyze your HTCache implementation:

# Build applications
$ make

# Scan Usage
$ ./bin/scan
Usage: ./bin/scan AddressLength PageSize EvictionPolicy Threads

# Run scan example
$ ./bin/scan 8 64 0 1
Addresses : 256
Pages     : 4
VPNShift  : 6
VPNMask   : 0xC0
OffsetMask: 0x3F
Hits      : 0
Misses    : 10240

Each application takes four arguments:

  1. AddressLength: Number of bits in address space.

  2. PageSize: Number of entries per page.

  3. EvictionPolicy: Which eviction policy to use (FIFO = 0, Random = 1, LRU = 2, Clock = 3).

  4. Threads: Number of threads to utilize.

To test the application without any caching, you can simply set AddressLength to 0:

# Run scan example without cache
$ ./bin/scan 0 0 0 1
Addresses : 0
Pages     : 0
VPNShift  : 0
VPNMask   : 0x0
OffsetMask: 0x0
Hits      : 0
Misses    : 0

Note: The sorting examples (bubble.cpp, insertion.cpp, and selection.cpp) are not thread-safe and should only be used with one thread.

Fibonacci

In addition to these programs, you are also given a src/fibonacci.cpp. You are to modify this program to use your HTCache library to memoize the computation of all fibonacci numbers from 0 to 99.

# Run fibonacci
$ ./bin/fibonacci 8 64 0 1
 0. Fibonacci( 0) = 0
 0. Fibonacci( 1) = 1
 0. Fibonacci( 2) = 1
 0. Fibonacci( 3) = 2
 0. Fibonacci( 4) = 3
 0. Fibonacci( 5) = 5
 0. Fibonacci( 6) = 8
 0. Fibonacci( 7) = 13
 0. Fibonacci( 8) = 21
 0. Fibonacci( 9) = 34
 ...
 0. Fibonacci(93) = 12200160415121876738
 0. Fibonacci(94) = 1293530146158671551
 0. Fibonacci(95) = 13493690561280548289
 0. Fibonacci(96) = 14787220707439219840
 0. Fibonacci(97) = 9834167195010216513
 0. Fibonacci(98) = 6174643828739884737
 0. Fibonacci(99) = 16008811023750101250
Addresses : 256
Pages     : 4
VPNShift  : 6
VPNMask   : 0xC0
OffsetMask: 0x3F
Hits      : 194
Misses    : 100

K.I.S.S.

While the exact organization of the project code is up to you, keep in mind that you will be graded in part on coding style, cleaniness, and organization. This means your code should be consistently formatted, not contain any dead code, have reasonable comments, and appropriate naming among other things:

Please refer to these Coding Style slides for some tips and guidelines on coding style expectations.

Testing

As you develop your library, you should create separate unit tests in the tests directory that ensure your HTCache library works correctly. These tests should cover the following:

You do not need to provide additional functional tests as those are provided to you already.

Experiments

Once you have a working implementation of your HTCache library, you should use the provided applications along with your fibonacci program to explore the following questions:

You must create scripts or additional programs that perform these experiments and collect the data.

Demonstration

As part of your grade, you will need to present your library to a TA where you will demonstrate the correctness of your HTCache library and fibonacci application along with an analysis of the experiments above.

Presentation

As part of your demonstration, you must provide a presentation (between 5 - 10 slides) with the following content:

  1. Design: An overview of your library implementation.

  2. Testing: A discussion on how you tested your library.

  3. Experiments: A discussion on what experiments you ran to explore the questions above and what the results were.

  4. Analysis:: A discussion of the questions above. You should use your experimental results as evidence for your choices.

  5. Summary: A summary of what you learned about caching and virtual memory.

Note, you must incorporate images, graphs, diagrams and other visual elements as part of your presentation where reasonable.

Please upload these slides to Google Drive and place a link in your README.md.

Be prepared to be asked about different aspects of your project, as the TA may ask you questions to probe your understanding of the material and your work.

Documentation

As noted above, the Project 05 repository includes a README.md file with the following sections:

  1. Members: This should be a list of the project members.

  2. Design: This is a list of design questions that you should answer before you do any coding as they will guide you towards the resources you need.

  3. Demonstration: This is where you should provide a Google Drive link to your demonstration slides.

  4. Errata: This is a section where you can describe any deficiencies or known problems with your implementation.

You must complete this document report as part of your project.

Design and Implementation

You should look at the design questions in the README.md file of the Project 05 repository. The questions there will help you design and plan your process queue implementation.

Feel free to ask the TAs or the instructor for some guidance on these questions before attempting to implement this project.

Extra credit

Grading

Your project will be graded on the following metrics:

Metric Points
Source Code
  1. General
    • Builds and cleans without warnings or errors
    • Manages resources such as memory and files appropriately
    • Is consistent, readable, and organized
  2. Library
    • HTCache - Determines Addresses correctly
    • HTCache - Determines Pages correctly
    • HTCache - Determines VPNShift correctly
    • HTCache - Determines VPNMask correctly
    • HTCache - Determines OffsetMask correctly
    • HTCache - Tracks Hits correctly
    • HTCache - Tracks Misses correctly
    • HTCache - Implements get correctly
    • HTCache - Implements put correctly
    • HTCache - Provides proper synchronization
    • HTPage - Implements FIFO eviction correctly
    • HTPage - Implements Random eviction correctly
    • HTPage - Implements LRU eviction correctly
    • HTPage - Implements Clock eviction correctly
    • HTPage - Implements get correctly
    • HTPage - Implements put correctly
    • HTPage - Provides proper synchronization
  3. Fibonacci
    • Utilizes library appropriately
    • Computes fibonacci numbers accurately
  4. Testing
    • Provides unit tests
  5. Experiments
    • Provides scripts that automate experiments
16.0
  1. 3.0
    • 1.0
    • 1.0
    • 1.0
  2. 9.0
    • 0.50
    • 0.50
    • 0.50
    • 0.50
    • 0.25
    • 0.25
    • 0.25
    • 1.5
    • 0.5
    • 0.5
    • 0.25
    • 0.25
    • 0.5
    • 0.5
    • 1.0
    • 1.0
    • 0.25
  3. 1.5
    • 0.5
    • 1.0
  4. 1.5
    • 1.5
  5. 1.0
    • 1.0
Demonstration
  1. Slides
  2. Testing
  3. Experiments
  4. Analysis
  5. Summary
6.0
  1. 1.0
  2. 1.0
  3. 2.0
  4. 1.0
  5. 1.0
Documentation
  1. Design
2.0
  1. 2.0