The following are rough project ideas for your consideration.
Part of your job will be to crystallize the purpose,
methods, and scope of your specific project into a project proposal.
Note that not all projects involving writing a lot of code,
but all involve a thorough quantitative evaluation of a system.
Students may undertake projects not listed here, but
should consult with the instructor before submitting a proposal.
You may certainly take any of these ideas and add a "twist"
to make it more interesting or challenging.
Many of these project ideas make use of software and systems
that are already deployed at Notre Dame, such as the
Chirp distributed filesystem,
the Condor distributed batch system,
and the Hadoop data processing system.
I encourage you to use these systems, so that you will have a ready-made testbed to work with,
and will become familiar with tools useful later in your career.
The Chirp filesystem
gives easy access to lots of different storage devices. Chirp allows the
user to read and write files on a single remote disk, but does not provide
any kind of replication or error checking. Remedy this by creating a library
for creating and accessing large files that are striped across
multiple Chirp servers. Your library support several different RAID configurations,
selected when the file is created. The library should have a simple interface
like chirp_raid_open, chirp_raid_pread, chirp_raid_pwrite,
chirp_raid_stat, and so forth. Get started by using the
Chirp API. Explore the performance of this library on a variety of workloads, varying
the RAID configuration and the number of servers in use. How does the performance
compare to using a single local disk or a single remote Chirp server?
Filesystem or Database? Filesystems are good at storing large amounts
of binary data, but not very good at searching for the needle in the haystack.
Perhaps a combination of the two owuld provide robust searching with good performance.
Read about the Inversion filesystem, and then build something similar
using current open source technologies such as FUSE and MySQL. Populate the system
with a large amount of data, and evaluate the performance on a variety of workloads,
begin careful to demonstrate what the limits of the system are.
Borrowing Remote Memory. As large as memories get, there always seems
to be a program that needs more memory than you actually have. Virtual memory that
swaps to disk is almost unusable now, because disks are so many orders of magnitude
slower. However, accessing the memory of a nearby machine on the network might be
done with a modest penalty. To address this problem, build a system that allows
an ordinary program running on one machine access one large virtual memory assembled
from multiple contributors. Start with the user level page table from the undergrad OS class, and build a user-level library that works with
a remote memory donor. Implement multiple page replaceement algorithms, and
evaluate them on several memory intensive applications. Compare the performance
to disk-based virtual memory and real memory.
Distributed Mach. Read ahead to learn about the Mach microkernel operating system.
Create a user-level library with the same basic concepts as Mach -- messages, ports, and tasks --
that allows for easy communication between processes, whether they run on the same machine,
or different machines. Use your library to build up some simple operating system services
or parallel applications. Explore the performance of this system, and compare it to using
multiple processes on the same machine.
The Effect of SSDs on Modern Filesystems Set up a modern SSD and magnetic disk
side by side on a single system. (Prof. Thain can provide you with the disks.)
Compare the performance of basic reads and writes on the raw disks,
to ensure that they behave as you expect. Then, install various filesystems on
the two disks, and evaluate the performance of multiple workfloads.
Do some filesystems do a better job of exploiting SSDs than others?
User Filesystem Study.
Many assumptions about user behavior in operating systems are based
on studies that are decades old. (example one, example two.) Produce a new study of how users
behave in the ND CSE network. Examine tools such as strace,
tcpdump, and fstrace for the purpose of recording logs of
filesystem activity. Demonstrate that you can record and analyze
a few hours of activity. Then, get permission from the CSE staff and a few
of your friends to trace activity on a few workstations for several weeks.
Write a comprehensive report on the file access behavior of those
people over the semester.
Cloud Performance Evaluation. Cloud computing systems such as
Amazon's EC2 provide additional computing capacity with the swipe of a
credit card, however there is a performance penalty to be paid by traversing
the network and employing virtualization. Explore the performance of
of the coud by carefully measuring the performance of multiple applications
ranging from CPU-bound to IO-bound on several categories of virtual machines.
What is the cloud most effective at? (Amazon has generously supplied us with
some credits on EC2 for this purpose -- see Dr. Thain to get a code for up
to $100 of computing time on Amazon.
From Sequential to Scalable. Software engineering has long been the domain
of sequential thinking. As we reach the end of Moore's law, it will become necessary
to think in parallel for more and more applications.
Take an existing application -- perhaps something relevant to your research --
and parallelize it using the Work Queue Framework,
developed at Notre Dame. Using resources from the Condor Pool, CRC, or Amazon,
try to scale the application up to hundreds or thousands of nodes, and explore the performance.
Measure the behavior of the system with various workloads and under various failure conditions.
Compare your solution to tradiitional methods of parallelization, such as multi-threading and MPI.