The fifth project is to build a simplified implementation of memcached, which is an in-memory key-value store that is used for quick access to frequently used computations (ie. a cache). To support concurrent access to the cache, your library must use the C++ thread library to ensure thread-safe behavior.
To use the memcached programming paradigm, we first declare a memcached
object, in this case a templated HTCache as follows:
// Declare a HTCache object HTCache<int64_t, bool> cache(addrlen, pagesize, policy, handler);
This object takes the size of the address space (in terms of the number
of bits), the size of each page (in terms of the number of
entries), the eviction policy (either FIFO, Random, LRU, or
Clock, and the handler (the function to call when a requested item is
missing from the cache). It maps an int64_t key to a bool value.
Once we have this HTCache object, we can retrieve a value from the
cache with following command:
// Retrieve value associated with key auto value = cache.get(key);
If the value associated with the key is not in the HTCache then the
provided handler will be called to generate or retrieve the missing
value, which is then stored in the cache and returned to the caller.
Likewise, we can also store values into the cache by doing the following:
// Store value associated with key
cache.put(key, value);
If the HTCache is full, that is it has no free space for another value,
then we must perform an eviction. To do so, we will use the specified
eviction policy to select a currently existing value for replacement
and then store the new value in the place of the evicted object.
In this way, the use of a fault handler and an eviction policy means
that HTCache is similar to the paging and virtual memory systems we
have discussed in class for the last few weeks.
Internally, the HTCache uses a modified form of a hashed array tree,
which combines facets of hash tables with binary trees as shown below:
Unlike in typical hash tables, our hashed array tree splits up the
array of buckets into a tree of HTPages. Rather than using the
result of the hash function to map to a specific bucket, we will use the
hash function to convert a key into a virtual address that we can then
use to navigate our hashed array tree in a manner similar to how we
translate virtual address to physical addresses with page tables.
This means that inside HTCache, we have a PageTable, which is actually
just a std::vector of HTPages (ie. the root or directory). Likewise,
each HTPage itself contains a std::vector of HTPageEntry objects
which at the very least keep track of the key and value pair
associated with the HTPageEntry.
To locate an item in the HTCache, we must translate the key into an
address by using the std::hash function, and then use that to find the
appropriate key-value pair within our hashed array tree.
For example, the following code corresponds to the diagram above:
HTCache<int, bool> is_odd(4, 4, EVICT_FIFO, [](const int &key){ return key % 2; }); cout << is_odd.get(3) << endl; cout << is_odd.get(8) << endl;
The example code creates a HTCache called is_odd with an address
length of 4 bits as specified by the first argument to the HTCache
constructor. This means that our total address space is 2^4 = 16
addresses where each address corresponds to a HTPageEntry.
The second argument, 4 specifies the number of entries per HTPage.
Because each HTPage has 4 HTPageEntries, we need 2^4 Entries / 2^2
Entries Per Page = 2^2 = 4 HTPages in our PageTable std::vector.
To perform is_odd.get(3) (that is retrieve the value of 3 from our
HTCache), we must do the following:
First, use std::hash to convert the key into an integer virtual address.
In this example, let's assume the result of std::hash(3) is simply
3.
Next, extract the virtual page number or VPN from the integer address.
Since the virtual address is 3 in this case, the bit representation
is 0011. Therefore the VPN is 00 or the first two bits of the
virtual address.
Afterwards, extract the offset into the HTPage by taking the
remaining bits of the virtual address.
In this case, the final bits are 11 and therefore the offset is 11
In summary, we hash the key to generate a virtual address and then extract the VPN and the offset from this virtual address:
Hash(3) -> 0 0 1 1 ___ ___ VPN Offset
Once we have the VPN, we can then use this to select the appropriate
HTPage from the PageTable. Likewise, we can then access the
HTPageEntry inside the selected HTPage by using the computed
offset.
To perform is_odd.get(8) (that is retrieve the value of 8 from our
HTCache), we follow the same process and get a VPN of 10 and an
offset of 00. This means we will need to lookup the first
HTPageEntry of the third HTPage of our PageTable to find the
value for the key 8.
In this way, the HTCache's internal structure resembles a single-level
page table and uses a similar mechanism for translating virtual addresses
to physical address as we've discussed in class.
More details about this project and your deliverables are described below.
Many websites such as Wikipedia and Wordpress utilize memcached to speed up access to their websites. In fact the notion of a fast distributed key-value store is a core component of the NoSQL movement currently sweeping the data storage and processing industry.
For our project, we will create a simplified version of memcached that
acts uses an embedded library model rather than a client/server framework.
In this way, our htcache is akin to [sqlite], which is an embeddable
database engine that many applications such as Firefox and Chrome
utilize internally.
Working in groups of one or two people, you are to create a
library that implements the HTCache class described above by midnight
on Tuesday, November 28, 2017. Additionally, you are to write a
fibonacci application that utilizes your library, and perform experiments
to analyze your project by Thursday, November 30, 2017.
For this project, you must use C++ (not C) as the implementation language. Any test scripts or auxillary tools can be written in any reasonable scripting language.
Furthermore, you should use the C++ thread library rather than plain
POSIX threads. In particular, you will need to use std::mutex and
std::lock_guard to perform mutual exclusion an ensure thread-safe
concurrent access to the HTCache.
Here is a timeline of events related to this project:
| Date | Event |
|---|---|
| Thursday, November 16 | Project description and repository are available. |
| Tuesday, November 21 | Test scripts / outputs are available from TAs (after design is checked off). |
| Tuesday, November 28 | Library is due (pushed to GitLab). |
| Thursday, November 30 | Demonstrations of library and application must be completed. |
To start this project, one group member must fork the Project 05 repository on GitLab:
https://gitlab.com/nd-cse-30341-fa17/cse-30341-fa17-project05
Once this repository has been forked, follow the instructions from Reading 00 to:
Make the repository private.
Configure access to the repository.
Make sure you add all the members of the team in addition to the instructional staff.
As you can see, the base Project 05 repository contains a README.md
file and the following folder hierarchy:
project05
\_ Makefile # This is the project Makefile
\_ bin # This contains the application executables and scripts
\_ include # This contains the HTCache library header files
\_ htcache
\_ htcache.h # This contains the HTCache source code
\_ htpage.h # This contains the HTPage source code
\_ src # This contains the application source code
\_ tests # This contains the test source code
You must maintain this folder structure for your project and place files in their appropriate place.
The include/htcache/htcache.h file contains the implementation of the
HTCache class. You are to complete this class based on the design
described above and with the hints provided by the TODOs in the file. In
particular, you will need to do the following:
The constructor must determine the Addresses, Pages, VPNShift,
VPNMask, and OffsetMask and then initialize the PageTable based on
these values.
The get method must determine the virtual address, VPN, and
offset based on the key, and then use this information to retrieve
the value from the appropriate HTPage.
It must take care of the case where the data is missing (ie. a
std::out_of_range exception with be thrown). In this situation, it
must call the Handler function to compute the appropriate value
and then store that into the cache before returning this new value.
The put method must also determine the virtual address, VPN, and
offset based on the key, and then use this information to store the
value into the appropriate HTPage.
To keep track of the cache hits and misses, you must update the Hits
and Misses instance variables appropriately. All of the information that
you determine and track will be dumped to a stream by the provided
stats methods.
Because multiple threads can access the HTCache concurrently, you are to
use the provided Lock judiciously to protect values that need to be
updated atomically.
The include/htcache/htpage.h file contains the implementation of the
HTPage class. You are to complete this class based on the design
described above and with the hints provided by the TODOs in the file. In
particular, you will need to do the following:
Add additional variables to perform the bookkeeping necessary to implement the various eviction policies: FIFO, Random, LRU, and Clock.
Implement each of the eviction policies.
The constructor must initialize the std::vector of HTPageEntry objects.
The copy constructor must copy the individual instance variables.
The get method must use linear probing to locate the appropriate
HTPageEntry. If it is not found, then it should throw a
std::out_of_range exception if it is not found.
The put method must also use linear probing to locate the
appropriate HTPageEntry. If no HTPageEntry is found, then it should
add a new HTPageEntry. If the HTPage is full, an existing
HTPageEntry must be selected based on the eviction Policy and then used
for replacement.
Once again, since multiple threads can access each HTPage concurrently,
you must use the Lock to implement monitor-style protection.
Under the src directory, we have provided the following example and
testing applications:
bs.cpp: This performs binary search on an array.
bubble.cpp: This performs bubble sort on an array.
insertion.cpp: This performs insertion sort on an array.
prime.cpp: This computes and adds primes.
scan.cpp: This performs a scan of an array.
selection.cpp: This performs selection sort on an array.
You may use each of these programs to test and analyze your HTCache
implementation:
# Build applications $ make # Scan Usage $ ./bin/scan Usage: ./bin/scan AddressLength PageSize EvictionPolicy Threads # Run scan example $ ./bin/scan 8 64 0 1 Addresses : 256 Pages : 4 VPNShift : 6 VPNMask : 0xC0 OffsetMask: 0x3F Hits : 0 Misses : 10240
Each application takes four arguments:
AddressLength: Number of bits in address space.
PageSize: Number of entries per page.
EvictionPolicy: Which eviction policy to use (FIFO = 0, Random =
1, LRU = 2, Clock = 3).
Threads: Number of threads to utilize.
To test the application without any caching, you can simply set
AddressLength to 0:
# Run scan example without cache $ ./bin/scan 0 0 0 1 Addresses : 0 Pages : 0 VPNShift : 0 VPNMask : 0x0 OffsetMask: 0x0 Hits : 0 Misses : 0
Note: The sorting examples (bubble.cpp, insertion.cpp, and
selection.cpp) are not thread-safe and should only be used with one
thread.
In addition to these programs, you are also given a src/fibonacci.cpp.
You are to modify this program to use your HTCache library to memoize
the computation of all fibonacci numbers from 0 to 99.
# Run fibonacci $ ./bin/fibonacci 8 64 0 1 0. Fibonacci( 0) = 0 0. Fibonacci( 1) = 1 0. Fibonacci( 2) = 1 0. Fibonacci( 3) = 2 0. Fibonacci( 4) = 3 0. Fibonacci( 5) = 5 0. Fibonacci( 6) = 8 0. Fibonacci( 7) = 13 0. Fibonacci( 8) = 21 0. Fibonacci( 9) = 34 ... 0. Fibonacci(93) = 12200160415121876738 0. Fibonacci(94) = 1293530146158671551 0. Fibonacci(95) = 13493690561280548289 0. Fibonacci(96) = 14787220707439219840 0. Fibonacci(97) = 9834167195010216513 0. Fibonacci(98) = 6174643828739884737 0. Fibonacci(99) = 16008811023750101250 Addresses : 256 Pages : 4 VPNShift : 6 VPNMask : 0xC0 OffsetMask: 0x3F Hits : 194 Misses : 100
While the exact organization of the project code is up to you, keep in mind that you will be graded in part on coding style, cleaniness, and organization. This means your code should be consistently formatted, not contain any dead code, have reasonable comments, and appropriate naming among other things:
Break long functions into smaller functions.
Make sure each function does one thing and does it well.
Abstract, but don't over do it.
Please refer to these Coding Style slides for some tips and guidelines on coding style expectations.
As you develop your library, you should create separate unit tests in the
tests directory that ensure your HTCache library works correctly.
These tests should cover the following:
You should have tests that verify the behavior of the constructor, put,
and get methods of the HTCache class.
You should have tests that verify the behavior of the constructor, put,
and get methods of the HTPage class.
You should have tests that verify the behavior of the eviction policies
of the HTPage class.
You do not need to provide additional functional tests as those are provided to you already.
Once you have a working implementation of your HTCache library, you
should use the provided applications along with your fibonacci program to
explore the following questions:
How does the size of the cache (ie. AddressLength and PageSize)
impact the performance?
Which eviction policy is provides the best performance?
How does having multiple threads impact the cache behavior?
You must create scripts or additional programs that perform these experiments and collect the data.
As part of your grade, you will need to present your library to a
TA where you will demonstrate the correctness of your HTCache
library and fibonacci application along with an analysis of the
experiments above.
As part of your demonstration, you must provide a presentation (between 5
- 10 slides) with the following content:
Design: An overview of your library implementation.
Testing: A discussion on how you tested your library.
Experiments: A discussion on what experiments you ran to explore the questions above and what the results were.
Analysis:: A discussion of the questions above. You should use your experimental results as evidence for your choices.
Summary: A summary of what you learned about caching and virtual memory.
Note, you must incorporate images, graphs, diagrams and other visual elements as part of your presentation where reasonable.
Please upload these slides to Google Drive and place a link in your
README.md.
Be prepared to be asked about different aspects of your project, as the TA may ask you questions to probe your understanding of the material and your work.
As noted above, the Project 05 repository includes a README.md file
with the following sections:
Members: This should be a list of the project members.
Design: This is a list of design questions that you should answer before you do any coding as they will guide you towards the resources you need.
Demonstration: This is where you should provide a Google Drive link to your demonstration slides.
Errata: This is a section where you can describe any deficiencies or known problems with your implementation.
You must complete this document report as part of your project.
You should look at the design questions in the README.md file of the
Project 05 repository. The questions there will help you design and plan
your process queue implementation.
Feel free to ask the TAs or the instructor for some guidance on these questions before attempting to implement this project.
Your project will be graded on the following metrics:
| Metric | Points |
|---|---|
Source Code
|
16.0
|
Demonstration
|
6.0
|
Documentation
|
2.0
|