Project Ideas

The following are rough project ideas for your consideration. Each project is accompanied by a suggested topic area for your annotated bibliography. A significant portion of your job will be to crystallize the purpose, methods, and scope of your specific project. Students may undertake projects not listed here, but should consult with the instructor before submitting a proposal.

Virtual Machines

Virtual Machine Performance Comparison. Although virtual machines offer great flexibility and power, virtualization comes at a price that varies greatly with the technology. Conduct a detailed performance study of the performance of various virtual machine technologies, including UML, Bochs, VMWare, and Xen. Explain the properties of each virtualization technology that lead to the performance effects. Recommend what technology should be used for various purposes, such as operating system development, internet service sandboxing, utility computing, and so forth.
Bibliography: Virtual Machine Architectures

Virtual Machine Cluster for Utility Computing. Many users are interested in the utility computing concept, which proposes that users with large computing needs should simply pay to use remote CPUs only when they are needed. The trick is that everyone requires a different computing environment: one user wants RedHat 7.2, another wants Debian 6.3, and another wants OpenBSD with his favorite libraries installed. No service provider wants to spend the day installing and re-installing machines for each customer. Instead, one may use virtual machines on an existing cluster to create the needed computing environment on the fly. Build a simple utility computing cluster that allows a user to request a configuration type by name. The system would pick an unused machine in the cluster, establish a virtual machine, install the software, and then inform the user of the machine and port to login to. When the user is done, a simple message to the system should release the virtual machine.
Bibliography: Virtual Machines for Grid and Utility Computing

File Systems

Each of these projects can be built using the existing cooperative computing tools, allowing you start dig right into the interesting questions, using the existing code for storage and network access.

Really Really Really Big NFS Server. Create a very large filesystem by placing multiple storage devices behind a single NFS server. At first blush, this project is easy: just divide up the files across multiple disks. But, there are several difficult questions that must be addressed? How do you ensure maximum performance when disks may vary in performance, load, and capacity? Can the filesystem be built in such a way that disks can be added and removed while it runs? How should NFS semantics be mapped to Chirp semantics? What caching discipline should be employed at the NFS server?
Bibliography: Large Distributed File Systems

Freeze Frame File System. As we have studied in class, distributed file systems must strike a balance between consistency and availability. This is because a distributed file system always assumes that the user wants the most recent view of the data. But, what if a user was willing to "freeze" a view of the file system. i.e. "Show me the state of the filesystem at 1pm yesterday." How would that change the tradeoff between consistency, availability, and performance? Build a distributed filesystem with freeze-frame semantics. Be careful to define exactly the semantics and structure of the system before beginning. Compare to a filesystem that must maintain "most recent" semantics. Hard question: How do you handle newly-written files?
Bibliography: Consistency Management in File Systems

Distributed or (Peer-to-Peer?) Backup. Everyone knows that important data ought to be backed up. But, many organizations do not perform backups because nobody is willing to be responsible for establishing a backup server, shuffling tapes, and so forth. Build a distributed (peer-to-peer?) backup system for ordinary workstations, assuming that no one machine is willing to accept all backups. Each night, each workstation should search for available storage on other workstations, negotiate for permission, and transmit a backup image. That's the easy part. Hard part one: ensure that restoring from a backup image can be done reliably. Hard part two: ensure that everyone's disks are not filled with old backups after a few days. Hard part three: make it easy for the user to identify and recover the right files.
Bibliography: Distributed Backup Systems

Memory Management

User-Level Distributed Shared Memory - Create a distributed shared memory system entirely at user level. Page faults can be created and caught at user level by using mprotect to set permissions bits, and then catching the resulting SIGSEGVs that occur when such memory is touched. Be careful to define your memory semantics and the consistency protocol necessary to ensure those semantics. Build and test several simple applications that make use of the DSM, and evaluate the tradeoffs between consistency and performance.
Bibliography: Consistency in Distributed Shared Memory

Transparent Garbage Collection - You already know that garbage collection is built into languages such as Java. However, it can also be implemented in systems languages such as C, if we take a conservative approach. Build a transparent garbage collection library for C, using the existing malloc interface. Test it with existing applications, and evaluate the tradeoffs between performance and memory efficiency. How effective is conservative garbage collection at identifying unused memory?
Bibliography: Garbage Collection

Processes and Synchronization

Adaptive Load Control - High level languages allow a user to trivially harness many independent processors; they also allow a user to accidentally create more load than a system can handle. Consider a language such as the fault tolerant shell. With a simple script, one may retrieve a file from one hundred machines in parallel (with a timeout and retry for good measure) :

     forall h in 1 .to. 100
          try for 10 minutes
               scp node$h:bigfile bigfile.$h
          end
     end

However, one hundred simultaneous copies may be more than the system can handle. Perhaps the network switch cannot keep up with all one hundred machines blasting at once. Perhaps the collecting machine has a limit on the number open sockets or file handles. Such limits are likely to differ from place to place. Modify the fault tolerant shell to adapt at run-time to the parallelism available in the current system. Caution: Make sure that your solution can adapt to a wide variety of jobs and conditions.
Bibliography: Load Control in Operating Systems and Networks

Operating Systems and Databases

Database for Email Management. Email is both a flood of new information as well as a long history of our interactions with other people. However, our existing tools for storing and tracking email -- manually assigned folders with explicit search -- are poorly suited for managing this goldmine. Build a better system for storing and searching email: something like Google Mail, except for your local email. Let users perform free-form text searches. Keep standing search views that are like dynamic folders. For example, if the user has created a view for "operating systems", then new arriving messages that match the view should be automatically added. The trick, of course, is to do this with good performance!
Bibliography: Unstructured Text Databases

Security

Sandboxing by File System Logging. The sandboxing technologies discussed in class prevent unwanted access to simply denying actions that are not permitted. However, sometimes it is not clear if an action is desirable without looking at all of a program's actions together. Modify a sandbox such as Parrot to record all of its filesystem modifications to a single log file. This log file can then be examined to see the overall effect of a program. If accepable, the log can be played to modify the filesystem. This system could even be used to generate patches: a system change ccan be run and logged on one machine, then carried to another to play the same log. Challenge: Make sure that the sandboxed process is able to perceive the changes shown in the log at runtime. What is the performance overhead of logging?
Bibliography: Sandboxing

Distributed Access Control Lists Filesystems such as AFS allow users to create groups and access control lists, detailing who is allowed to access what files. But, suppose that the authority for various group lists is distributed. For example, you may wish to allow a particular file to be read by all graduate students and all members of the FBI authorized by Agent Riley. The list of graduate students is maintained by a server in the departmental office, and the list of FBI agents is maintained by a server in Riley's office. Both lists may change at any time, with various consequences regarding security. Build a distributed access control system, assuming no shared file system between the participants. Be sure to carefully consider the design possibilities and discuss the tradeoffs between consistency, availability, and performance. What happens when access to a given user must be revoked?
Bibliography: Filesystems and Access Control

Security Audit of a Distributed System. Perform a security audit of the CCL storage pool. Begin by understanding and explaining the security aspects of the system from top to bottom, including the high-level design, the implementation details, and common deployment configurations. The ideal paper will discover several different kinds of security flaws in the system and explain their origin, repair, and prevention. The ideal final talk would demonstrate a live compromise of a running system.
Bibliography: Security Audit, Secure Programming Techniques