Project Ideas
The following are rough project ideas for your consideration.
A significant portion of your job will be to crystallize the purpose,
methods, and scope of your specific project.
Students may undertake projects not listed here, but
should consult with the instructor before submitting a proposal.
Storage
Large Scale File Distribution.
Scientific users running codes on large distributed systems
often need to distribute a large dataset to every single
node on which they wish to run a job. To address this problem,
develop and evaluate an algorithm for rapidly and reliabily distributing
large files to all nodes of the CCL storage pool.
The mechanism for transferring files -- and directing transfers
between other machines -- is already in place.
The challenge is to decide how to schedule the transfers,
and how to deal with failures and performance variations.
Evaluate the algorithm on the 200-node CCL system.
(Careful: Make sure you also have a plan to clean up whatever
you transfer, so as not to wedge the entire system!)
A File System for Lots of Little Files.
Many scientific users employ filesystems in an unusual way:
they create directories containing thousands or millions of little files,
each containing perhaps 10-20 bytes of data. File systems are unusually
bad at storing such data -- each file occupies a minimum of one 4KB disk block.
However, users continue to work this way, because the data is easy to
manipulate with standard commands.
Begin by demonstrating a filesystem workload that could be dramatically
improved. For example, compare the performance of creating one million
10 byte files versus one million ten byte records in a single file.
Then, design, implement, and evaluate a filesystem tailored to datasets of many small files.
Use the FUSE Toolkit to easily
build and deploy the system in user space. Compare the performance
of this filesystem to a conventional filesystem on a wide variety of workloads.
Porting Dependencies Automatically.
Modern applications are quite difficult to move from computer to computer.
Few programs are constructed as a single, self-contained program.
Instead, they depend on all sorts of files, libraries, and other components
that must be "installed" wherever the program runs, perhaps requiring
special privileges to install. This makes distributing programs to new machines a real headache.
Construct a system for solving this problem automatically.
Using the logging facilities of Parrot, record absolutely everything needed by an application as it runs.
Use this log to construct a self-contained package of files
that can be moved to another machine.
Again using Parrot, redirect file accesses to the package,
thus making it appear as if everything needed by the application is available locally.
Compare this approach to other data access schemes, such
as demand-paging using a distributed filesystem.
Note: This project doesn't require much code to be written,
but will require a fair amount of careful analysis and performance measurement.
Memory
Gigantic Distributed Data Structures.
Many scientific users would like to create programs that employ gigantic data structures.
For example, biometric researchers at Notre Dame would like to manipulate arrays
and matricies of tens of thousands of entries with each element potentially on the order
of a megabyte. If implemented as an ordinary C data structure in virtual memory,
the structure might not even fit on the available disk or even in the available address space.
However, we can create such data structures by tying together the disk and the memory
available in multiple machines. Construct a library for gigantic arrays and matrices
that uses the CCL storage pool as the backing storage. Make sure that it is possible
to resize, reconfigure, and migrate data structures from place to place. Evaluate
the performance of your data structures on a wide variety of synthetic workloads --
row-major access, column-major access, random access -- and compare to the performance
on a large memory machine. Describe the strengths and the limitations of this approach
to data structures.
Multiscale Memory Modeling.
Many advanced computing systems (such as the Cascade project at Notre Dame)
have observed that access to memory is the new system bottleneck, and propose
clever new techniques for organizing CPUs and memories together.
However, the model of how programs access memory has not been updated
significantly since the work on WS that we will read in class.
An good model of memory access is needed in order to design advanced systems.
Construct tools to observe the memory behavior of modern complex programs at runtime.
Measure at two levels: (1) memory allocations using malloc/free and (2) virtual memory
accesses by raw address. To measure (1), build a malloc from source that logs
all mallocs and frees to a file. To measure (2), modify the Bochs virtual machine
to log all memory accesses to a file. Then, run a selection of modern applications
to capture their behavior. Characterize the behavior of each application.
What is the relationship between the layers?
How can we take advantage of these observed patterns?
Are You Linking What I'm Linking?
Dynamic linking allow multiple running programs to share a single copy
of common routines in memory, thus reducing the overall memory usage of a aystem.
However, dynamic linking also has some drawbacks, including complexity of
installation and management, loss of locality in the filesystem and memory,
loss of runtime performance, and perhaps others that you can think of.
Perform a comparison of static and dynamic linking in the real
world, and discuss the tradeoffs in detail. This comparison can be
performed at several levels:
Construct small synthetic programs that are linked both ways, and compare microbenchmarks such as startup time, function call overhead, and CPU performance. (Dynamically linking consumes an extra register that might affect performance.
Build larger standard programs in both versions and compare them. How may system calls, I/O operations, and other kernel activities are required by each version?
Use tools like nm and ar to examine the contents of executables and libraries on a standard Linux system. What is the degree of sharing across the filesystem?
By rebuilding from source packges, construct an entire system distribution both statically and dynamically. How does this affect performance and resource consumption?
Virtual Machines
Virtual Machine Cluster for Utility Computing.
Many administrators are interested in the utility computing concept,
which proposes that users with large computing needs should simply
pay to use remote CPUs only when they are needed.
The trick is that everyone requires a different computing environment:
one user wants RedHat 7.2, another wants Debian 6.3, and another wants
OpenBSD with his favorite libraries installed. No service provider
wants to spend the day installing and re-installing machines for
each customer. Instead, one may use virtual machines on an existing
cluster to create the needed computing environment on the fly.
Build a simple utility computing cluster that allows a user to request
a configuration type by name. The system would pick an unused machine
in the cluster, establish a virtual machine, install the software,
and then inform the user of the machine and port to login to.
When the user is done, a simple message to the system should release the virtual machine.
The problem is, installing software for each virtual machine is very expensive!
How can we maximize the performance of such a system?
Security
Detailed Kernel Logging for Auditing.
Despite the best efforts of administrators to prevent
unwanted access, wily hackers have always found ways to
defeat security mechanisms.
An audit log can be used to reconstruct what
happened to a system after a security incident.
However, standard Unix auditing (man acct)
does not provide much information.
Augment a Linux kernel to log a wide variety of information at run time:
programs run, files read or written, network connections initiated or accepted.
Evaluate the performance overhead of your auditing system, and perhaps adjust
the detail up or down as needed. Using a virtual machine, deploy the audited
system, and establish some interesting network services. Recruit your classmates
to log in and perform some benevolent or malicious acts. Develop a tool to
analyze the auditing log and reconstruct what happened.
Report on the performance, effectiveness, and limitations of your design.
Sandboxing by File System Logging.
The sandboxing technologies discussed in class prevent unwanted access
to simply denying actions that are not permitted. However, sometimes
it is not clear if an action is desirable without looking at all of
a program's actions together. So, construct a file system that writes
all changes to a log instead of writing to the target file system.
Use the FUSE Toolkit
as a starting point for your filesystem.
This log file can then be examined to see the overall effect of a program.
If accepable, the log can be played to modify the filesystem.
This system could even be used to generate patches: a system change
can be run and logged on one machine, then carried to another to play
the same log. Evaluate the performance overhead of such logging,
and describe how it affects programs with varying I/O patterns.
Processes and Synchronization
Adaptive Parallelism - High level languages and systems
allow a user to trivially harness many independent processors and run
many independent tasks in parallel. Unfortunately, such systems also
make it easy to allow a user to overload the system.
For example, with a simple script, one may easily send or retrieve
a file from one hundred machines in parallel. Transferring one
file at a time will likely not achieve maximum utilization of the network.
At the other extreme, transferring all one hundred at once will likely
result in collisions on the network and also achieve poor performance.
The optimal parallelism lies somewhere in between --
perhaps five or ten transfers at once -- and depends on the exact
resources in use and the load on the system.
Design and implement an algorithm for finding the optimal amount of
parallelism in a system with unknown resource constraints.
Evaluate this algorithm in a variety of settings with different
kinds and amounts of resources.