Project Ideas

The following are rough project ideas for your consideration. Part of your job will be to crystallize the purpose, methods, and scope of your specific project into a project proposal. Note that not all projects involving writing a lot of code, but all involve a thorough quantitative evaluation of a system. Students may undertake projects not listed here, but should consult with the instructor before submitting a proposal. You may certainly take any of these ideas and add a "twist" to make it more interesting or challenging.

Many of these project ideas make use of software and systems that are already deployed at Notre Dame, such as the Chirp distributed filesystem, the Condor distributed batch system, and the Hadoop data processing system. I encourage you to use these systems, so that you will have a ready-made testbed to work with, and will become familiar with tools useful later in your career.

Distributed RAID. The Chirp filesystem gives easy access to lots of different storage devices. Chirp allows the user to read and write files on a single remote disk, but does not provide any kind of replication or error checking. Remedy this by creating a library for creating and accessing large files that are striped across multiple Chirp servers. Your library support several different RAID configurations, selected when the file is created. The library should have a simple interface like chirp_raid_open, chirp_raid_pread, chirp_raid_pwrite, chirp_raid_stat, and so forth. Get started by using the Chirp API. Explore the performance of this library on a variety of workloads, varying the RAID configuration and the number of servers in use. How does the performance compare to using a single local disk or a single remote Chirp server?

Filesystem or Database? Filesystems are good at storing large amounts of binary data, but not very good at searching for the needle in the haystack. Perhaps a combination of the two owuld provide robust searching with good performance. Read about the Inversion filesystem, and then build something similar using current open source technologies such as FUSE and MySQL. Populate the system with a large amount of data, and evaluate the performance on a variety of workloads, begin careful to demonstrate what the limits of the system are.

Borrowing Remote Memory. As large as memories get, there always seems to be a program that needs more memory than you actually have. Virtual memory that swaps to disk is almost unusable now, because disks are so many orders of magnitude slower. However, accessing the memory of a nearby machine on the network might be done with a modest penalty. To address this problem, build a system that allows an ordinary program running on one machine access one large virtual memory assembled from multiple contributors. Start with the user level page table from the undergrad OS class, and build a user-level library that works with a remote memory donor. Implement multiple page replaceement algorithms, and evaluate them on several memory intensive applications. Compare the performance to disk-based virtual memory and real memory.

Distributed Mach. Read ahead to learn about the Mach microkernel operating system. Create a user-level library with the same basic concepts as Mach -- messages, ports, and tasks -- that allows for easy communication between processes, whether they run on the same machine, or different machines. Use your library to build up some simple operating system services or parallel applications. Explore the performance of this system, and compare it to using multiple processes on the same machine.

The Effect of SSDs on Modern Filesystems Set up a modern SSD and magnetic disk side by side on a single system. (Prof. Thain can provide you with the disks.) Compare the performance of basic reads and writes on the raw disks, to ensure that they behave as you expect. Then, install various filesystems on the two disks, and evaluate the performance of multiple workfloads. Do some filesystems do a better job of exploiting SSDs than others?

User Filesystem Study. Many assumptions about user behavior in operating systems are based on studies that are decades old. (example one, example two.) Produce a new study of how users behave in the ND CSE network. Examine tools such as strace, tcpdump, and fstrace for the purpose of recording logs of filesystem activity. Demonstrate that you can record and analyze a few hours of activity. Then, get permission from the CSE staff and a few of your friends to trace activity on a few workstations for several weeks. Write a comprehensive report on the file access behavior of those people over the semester.

Cloud Performance Evaluation. Cloud computing systems such as Amazon's EC2 provide additional computing capacity with the swipe of a credit card, however there is a performance penalty to be paid by traversing the network and employing virtualization. Explore the performance of of the coud by carefully measuring the performance of multiple applications ranging from CPU-bound to IO-bound on several categories of virtual machines. What is the cloud most effective at? (Amazon has generously supplied us with some credits on EC2 for this purpose -- see Dr. Thain to get a code for up to $100 of computing time on Amazon.

From Sequential to Scalable. Software engineering has long been the domain of sequential thinking. As we reach the end of Moore's law, it will become necessary to think in parallel for more and more applications. Take an existing application -- perhaps something relevant to your research -- and parallelize it using the Work Queue Framework, developed at Notre Dame. Using resources from the Condor Pool, CRC, or Amazon, try to scale the application up to hundreds or thousands of nodes, and explore the performance. Measure the behavior of the system with various workloads and under various failure conditions. Compare your solution to tradiitional methods of parallelization, such as multi-threading and MPI.