The fifth project is to build a simplified implementation of memcached, which is an in-memory key-value store that is used for quick access to frequently used computations (ie. a cache). To support concurrent access to the cache, your library must use the C++ thread library to ensure thread-safe behavior.
To use the memcached programming paradigm, we first declare a memcached
object, in this case a templated HTCache
as follows:
// Declare a HTCache object HTCache<int64_t, bool> cache(addrlen, pagesize, policy, handler);
This object takes the size of the address space (in terms of the number
of bits), the size of each page (in terms of the number of
entries), the eviction policy (either FIFO, Random, LRU, or
Clock, and the handler (the function to call when a requested item is
missing from the cache). It maps an int64_t
key to a bool
value.
Once we have this HTCache
object, we can retrieve a value from the
cache with following command:
// Retrieve value associated with key auto value = cache.get(key);
If the value associated with the key is not in the HTCache
then the
provided handler will be called to generate or retrieve the missing
value, which is then stored in the cache and returned to the caller.
Likewise, we can also store values into the cache by doing the following:
// Store value associated with key
cache.put(key, value);
If the HTCache
is full, that is it has no free space for another value,
then we must perform an eviction. To do so, we will use the specified
eviction policy to select a currently existing value for replacement
and then store the new value in the place of the evicted object.
In this way, the use of a fault handler and an eviction policy means
that HTCache
is similar to the paging and virtual memory systems we
have discussed in class for the last few weeks.
Internally, the HTCache
uses a modified form of a hashed array tree,
which combines facets of hash tables with binary trees as shown below:
Unlike in typical hash tables, our hashed array tree splits up the
array of buckets into a tree of HTPages
. Rather than using the
result of the hash function to map to a specific bucket, we will use the
hash function to convert a key into a virtual address that we can then
use to navigate our hashed array tree in a manner similar to how we
translate virtual address to physical addresses with page tables.
This means that inside HTCache
, we have a PageTable
, which is actually
just a std::vector of HTPages
(ie. the root or directory). Likewise,
each HTPage
itself contains a std::vector of HTPageEntry
objects
which at the very least keep track of the key and value pair
associated with the HTPageEntry
.
To locate an item in the HTCache
, we must translate the key into an
address by using the std::hash function, and then use that to find the
appropriate key-value pair within our hashed array tree.
For example, the following code corresponds to the diagram above:
HTCache<int, bool> is_odd(4, 4, EVICT_FIFO, [](const int &key){ return key % 2; }); cout << is_odd.get(3) << endl; cout << is_odd.get(8) << endl;
The example code creates a HTCache
called is_odd
with an address
length of 4
bits as specified by the first argument to the HTCache
constructor. This means that our total address space is 2^4 = 16
addresses where each address corresponds to a HTPageEntry
.
The second argument, 4
specifies the number of entries per HTPage
.
Because each HTPage
has 4
HTPageEntries
, we need 2^4 Entries / 2^2
Entries Per Page = 2^2 = 4
HTPages
in our PageTable
std::vector.
To perform is_odd.get(3)
(that is retrieve the value of 3
from our
HTCache
), we must do the following:
First, use std::hash to convert the key into an integer virtual address.
In this example, let's assume the result of std::hash(3)
is simply
3
.
Next, extract the virtual page number or VPN from the integer address.
Since the virtual address is 3
in this case, the bit representation
is 0011
. Therefore the VPN is 00
or the first two bits of the
virtual address.
Afterwards, extract the offset into the HTPage
by taking the
remaining bits of the virtual address.
In this case, the final bits are 11
and therefore the offset is 11
In summary, we hash the key to generate a virtual address and then extract the VPN and the offset from this virtual address:
Hash(3) -> 0 0 1 1 ___ ___ VPN Offset
Once we have the VPN, we can then use this to select the appropriate
HTPage
from the PageTable
. Likewise, we can then access the
HTPageEntry
inside the selected HTPage
by using the computed
offset.
To perform is_odd.get(8)
(that is retrieve the value of 8
from our
HTCache
), we follow the same process and get a VPN of 10
and an
offset of 00
. This means we will need to lookup the first
HTPageEntry
of the third HTPage
of our PageTable
to find the
value for the key 8
.
In this way, the HTCache
's internal structure resembles a single-level
page table and uses a similar mechanism for translating virtual addresses
to physical address as we've discussed in class.
More details about this project and your deliverables are described below.
Many websites such as Wikipedia and Wordpress utilize memcached to speed up access to their websites. In fact the notion of a fast distributed key-value store is a core component of the NoSQL movement currently sweeping the data storage and processing industry.
For our project, we will create a simplified version of memcached that
acts uses an embedded library model rather than a client/server framework.
In this way, our htcache
is akin to [sqlite], which is an embeddable
database engine that many applications such as Firefox and Chrome
utilize internally.
Working in groups of one or two people, you are to create a
library that implements the HTCache
class described above by midnight
on Tuesday, November 28, 2017. Additionally, you are to write a
fibonacci application that utilizes your library, and perform experiments
to analyze your project by Thursday, November 30, 2017.
For this project, you must use C++ (not C) as the implementation language. Any test scripts or auxillary tools can be written in any reasonable scripting language.
Furthermore, you should use the C++ thread library rather than plain
POSIX threads. In particular, you will need to use std::mutex and
std::lock_guard to perform mutual exclusion an ensure thread-safe
concurrent access to the HTCache
.
Here is a timeline of events related to this project:
Date | Event |
---|---|
Thursday, November 16 | Project description and repository are available. |
Tuesday, November 21 | Test scripts / outputs are available from TAs (after design is checked off). |
Tuesday, November 28 | Library is due (pushed to GitLab). |
Thursday, November 30 | Demonstrations of library and application must be completed. |
To start this project, one group member must fork the Project 05 repository on GitLab:
https://gitlab.com/nd-cse-30341-fa17/cse-30341-fa17-project05
Once this repository has been forked, follow the instructions from Reading 00 to:
Make the repository private.
Configure access to the repository.
Make sure you add all the members of the team in addition to the instructional staff.
As you can see, the base Project 05 repository contains a README.md
file and the following folder hierarchy:
project05 \_ Makefile # This is the project Makefile \_ bin # This contains the application executables and scripts \_ include # This contains the HTCache library header files \_ htcache \_ htcache.h # This contains the HTCache source code \_ htpage.h # This contains the HTPage source code \_ src # This contains the application source code \_ tests # This contains the test source code
You must maintain this folder structure for your project and place files in their appropriate place.
The include/htcache/htcache.h
file contains the implementation of the
HTCache
class. You are to complete this class based on the design
described above and with the hints provided by the TODOs
in the file. In
particular, you will need to do the following:
The constructor must determine the Addresses
, Pages
, VPNShift
,
VPNMask
, and OffsetMask
and then initialize the PageTable
based on
these values.
The get
method must determine the virtual address, VPN, and
offset based on the key, and then use this information to retrieve
the value from the appropriate HTPage
.
It must take care of the case where the data is missing (ie. a
std::out_of_range exception with be thrown). In this situation, it
must call the Handler
function to compute the appropriate value
and then store that into the cache before returning this new value.
The put
method must also determine the virtual address, VPN, and
offset based on the key, and then use this information to store the
value into the appropriate HTPage
.
To keep track of the cache hits and misses, you must update the Hits
and Misses
instance variables appropriately. All of the information that
you determine and track will be dumped to a stream
by the provided
stats
methods.
Because multiple threads can access the HTCache
concurrently, you are to
use the provided Lock
judiciously to protect values that need to be
updated atomically.
The include/htcache/htpage.h
file contains the implementation of the
HTPage
class. You are to complete this class based on the design
described above and with the hints provided by the TODOs
in the file. In
particular, you will need to do the following:
Add additional variables to perform the bookkeeping necessary to implement the various eviction policies: FIFO, Random, LRU, and Clock.
Implement each of the eviction policies.
The constructor must initialize the std::vector of HTPageEntry
objects.
The copy constructor must copy the individual instance variables.
The get
method must use linear probing to locate the appropriate
HTPageEntry
. If it is not found, then it should throw a
std::out_of_range exception if it is not found.
The put
method must also use linear probing to locate the
appropriate HTPageEntry
. If no HTPageEntry
is found, then it should
add a new HTPageEntry
. If the HTPage
is full, an existing
HTPageEntry
must be selected based on the eviction Policy
and then used
for replacement.
Once again, since multiple threads can access each HTPage
concurrently,
you must use the Lock
to implement monitor-style protection.
Under the src
directory, we have provided the following example and
testing applications:
bs.cpp
: This performs binary search on an array.
bubble.cpp
: This performs bubble sort on an array.
insertion.cpp
: This performs insertion sort on an array.
prime.cpp
: This computes and adds primes.
scan.cpp
: This performs a scan of an array.
selection.cpp
: This performs selection sort on an array.
You may use each of these programs to test and analyze your HTCache
implementation:
# Build applications $ make # Scan Usage $ ./bin/scan Usage: ./bin/scan AddressLength PageSize EvictionPolicy Threads # Run scan example $ ./bin/scan 8 64 0 1 Addresses : 256 Pages : 4 VPNShift : 6 VPNMask : 0xC0 OffsetMask: 0x3F Hits : 0 Misses : 10240
Each application takes four arguments:
AddressLength
: Number of bits in address space.
PageSize
: Number of entries per page.
EvictionPolicy
: Which eviction policy to use (FIFO = 0
, Random =
1
, LRU = 2
, Clock = 3
).
Threads
: Number of threads to utilize.
To test the application without any caching, you can simply set
AddressLength
to 0
:
# Run scan example without cache $ ./bin/scan 0 0 0 1 Addresses : 0 Pages : 0 VPNShift : 0 VPNMask : 0x0 OffsetMask: 0x0 Hits : 0 Misses : 0
Note: The sorting examples (bubble.cpp
, insertion.cpp
, and
selection.cpp
) are not thread-safe and should only be used with one
thread.
In addition to these programs, you are also given a src/fibonacci.cpp
.
You are to modify this program to use your HTCache
library to memoize
the computation of all fibonacci numbers from 0
to 99
.
# Run fibonacci $ ./bin/fibonacci 8 64 0 1 0. Fibonacci( 0) = 0 0. Fibonacci( 1) = 1 0. Fibonacci( 2) = 1 0. Fibonacci( 3) = 2 0. Fibonacci( 4) = 3 0. Fibonacci( 5) = 5 0. Fibonacci( 6) = 8 0. Fibonacci( 7) = 13 0. Fibonacci( 8) = 21 0. Fibonacci( 9) = 34 ... 0. Fibonacci(93) = 12200160415121876738 0. Fibonacci(94) = 1293530146158671551 0. Fibonacci(95) = 13493690561280548289 0. Fibonacci(96) = 14787220707439219840 0. Fibonacci(97) = 9834167195010216513 0. Fibonacci(98) = 6174643828739884737 0. Fibonacci(99) = 16008811023750101250 Addresses : 256 Pages : 4 VPNShift : 6 VPNMask : 0xC0 OffsetMask: 0x3F Hits : 194 Misses : 100
While the exact organization of the project code is up to you, keep in mind that you will be graded in part on coding style, cleaniness, and organization. This means your code should be consistently formatted, not contain any dead code, have reasonable comments, and appropriate naming among other things:
Break long functions into smaller functions.
Make sure each function does one thing and does it well.
Abstract, but don't over do it.
Please refer to these Coding Style slides for some tips and guidelines on coding style expectations.
As you develop your library, you should create separate unit tests in the
tests
directory that ensure your HTCache
library works correctly.
These tests should cover the following:
You should have tests that verify the behavior of the constructor, put
,
and get
methods of the HTCache
class.
You should have tests that verify the behavior of the constructor, put
,
and get
methods of the HTPage
class.
You should have tests that verify the behavior of the eviction policies
of the HTPage
class.
You do not need to provide additional functional tests as those are provided to you already.
Once you have a working implementation of your HTCache
library, you
should use the provided applications along with your fibonacci program to
explore the following questions:
How does the size of the cache (ie. AddressLength
and PageSize
)
impact the performance?
Which eviction policy is provides the best performance?
How does having multiple threads impact the cache behavior?
You must create scripts or additional programs that perform these experiments and collect the data.
As part of your grade, you will need to present your library to a
TA where you will demonstrate the correctness of your HTCache
library and fibonacci application along with an analysis of the
experiments above.
As part of your demonstration, you must provide a presentation (between 5
- 10
slides) with the following content:
Design: An overview of your library implementation.
Testing: A discussion on how you tested your library.
Experiments: A discussion on what experiments you ran to explore the questions above and what the results were.
Analysis:: A discussion of the questions above. You should use your experimental results as evidence for your choices.
Summary: A summary of what you learned about caching and virtual memory.
Note, you must incorporate images, graphs, diagrams and other visual elements as part of your presentation where reasonable.
Please upload these slides to Google Drive and place a link in your
README.md
.
Be prepared to be asked about different aspects of your project, as the TA may ask you questions to probe your understanding of the material and your work.
As noted above, the Project 05 repository includes a README.md
file
with the following sections:
Members: This should be a list of the project members.
Design: This is a list of design questions that you should answer before you do any coding as they will guide you towards the resources you need.
Demonstration: This is where you should provide a Google Drive link to your demonstration slides.
Errata: This is a section where you can describe any deficiencies or known problems with your implementation.
You must complete this document report as part of your project.
You should look at the design questions in the README.md
file of the
Project 05 repository. The questions there will help you design and plan
your process queue implementation.
Feel free to ask the TAs or the instructor for some guidance on these questions before attempting to implement this project.
Your project will be graded on the following metrics:
Metric | Points |
---|---|
Source Code
|
16.0
|
Demonstration
|
6.0
|
Documentation
|
2.0
|