CSE 40822 / Project
Course Project for Cloud Computing
The final project in this course will be open ended. You will propose,
carry out, and report upon a project in groups of two or three students.
(Groups of three will be expected to do more.)
Overall, the project should demonstrate how to make use of a cloud system
to scale up an application to at least
100X of the size or speed of a single machine.
While a truly novel project idea would be nice, it's ok to recreate something that already exists, so that you can learn from the process. If something similar already exists, you should be aware of it and read about how it works, but then write your own code.
The project has three main components:
Build it on the cloud. Your project must involve harnessing a cloud computing system of some kind. The system may be one of those that we have used or studied in class, or it may be some other commercial cloud computing or storage system.
Evaluate it critically. You must evaluate what you have built for correctness, performance, and scalability. For correctness, you must develop a procedure to evaluate that the system actually accomplishes what it intends to do. (e.g. If you store some data in the cloud, you must verify that it is the same when it is read back.) For performance and scalability, you must select an appropriate measure -- simulations/day, GB/s, transactions/hour -- and then
evaluate the performance as the system scales up to 100X or higher.
Communicate it clearly. You must present your work cogently by writing a paper and making an oral presentation. The paper should describe the motivation, architecture, technical details, evaluation methods, and quantitative results of your project. The oral presentation should summarize the most important aspects of the paper and give a demo of how it works, during the last week of class.
If your project will make use of Amazon web services, you can make use of a academic grant we have received from Amazon.
Each member of the class can receive a $100 credit in Amazon web services, to be used to run virtual machines
and other services for the class. You will have to register with Amazon, enter your own credit card, and
then enter a special credit code. The TA will be distributing the credit codes shortly.
Project Ideas
The following are rough ideas for possible projects:
Convert a Real Application to the Cloud.
Take a real application that you use in your research, classes, or for fun,
convert it to parallel form using a cloud programming model
such as Work Queue, Hadoop, or Spark, and get it to run on as many processors as possible.
Be sure to carefully measure the performance at a varying number of processors
to produce a good speedup graph. What is the scalability limit of the system, and why?
Evaluate a NoSQL System. Select a NoSQL database like HBase, MongoDB, or Cassandra, and deploy it on the cloud using Amazon EC2. Learn how to upload data and issue queries, then evaluate the performance of the system as
you increase the number of clients and storage nodes. Note that these
systems are largely designed to handle multiple clients at once, so part
of the challenge is to run many clients via Condor or Amazon.
On Demand Allocation in Work Queue. As written, the Work Queue
simply accepts whatever workers are started by external means.
Modify the system to start and stop workers in the cloud, based on the
number of jobs waiting in the queue. (You may need to add some
hysteresis to prevent oscillations.) For a given application,
carefully quantify the tradeoff in performance vs resources consumed
using pre-allocation vs on-demand allocation.
Elastic Condor Pool. When multiple users are active, the Condor pool
can fill up, resulting in significant delays. Build a system that monitors how
busy the Condor pool is. When load is sufficiently high, allocate new machines
from EC2, and start Condor running to augment the pool. Measure the performance, cost, and responsiveness of the system under load.
Cloud Head-to-Head Evaluation. Everyone wants to know
which cloud provider is the "best" in terms of performance, scalability,
and cost. To answer this question, compare two cloud providers
(pick from Amazon EC2, Google App Engine, Windows Azure, or IBM Cloud)
head-to-head in a variety of basic operations: time to start a virtual
machine, CPU performance, upload/download speed, storage operations per second,
and so forth.
Build a Cloud Filesystem. Amazon S3 stores blobs in a flat
namespace, but doesn't provide a directory tree like a conventional filesystem.
Solve this problem by building a cloud filesystem library that stores
the directory tree in SimpleDB and the file blobs in S3. Evaluate
the performance and scalability compared to a local filesystem.
Build a Scalable Website. If you have already build some
sort of interactive website for a previous class, modify it to run
on the cloud, using a scalable storage engine. That is, if you have
a website that runs on a single node with Apache and SQL, modify it
to store everything in HBase or SimpleDB instead, and then measure
how it scales up. This is your chance to start your own social network!
Scalable Software Engineering. Many software engineering tasks
that were previously infeasible become much easier with access to cloud resources. For example, suppose that you want to evaluate how well your software works
on twenty different flavors Linux. Build a system which can build and test a piece of software on a large number of virtual machines simultaneously. Or, build a machine that evaluates a test procedure on a range of commits to see where a bug is present. (Be careful: The tricky part is to track your machines and work accurately so that you don't lose anything or run up a big bill.)
Come Up With Your Own. These are only ideas to get you thinking! Come up with your own idea, or modify one of these above.
Milestones
Friday October 17th. - Turn in a printed one page project proposal
that describes the project members, the cloud system that you intend
to use, what resources will be necessary to carry it out, and how you
intend to evaluate the performance and/or scalability of the system.
The instructor will follow up with you to make sure that the project
is of appropriate size and difficulty. If multiple groups propose
substantially similar projects, we may ask you to adjust your work slightly.
Week of November 10th. - Meet with the instructors to give
a demo on what you have working so far. At this point, you should
have installed (or have access to) the appropriate software and systems,
be able to show them working in some way, and have made an initial
measurement of performance or scalability. We will discuss the plan
for finishing in a timely way, and make any necessary course corrections.
Week of December 1st. - Give a 15-20 minute in-class
presentation on your project. The talk should include an overview of
the goal or problem, an overview of the cloud system that you employed,
how your application makes use of the system, and present some initial
results on performance and scalability. The work need not be complete
at this point, but it should be well along the way.
Each project partner should speak for a portion of the time.
Your talk should be accompanied by 5-10 carefully designed and edited slides.
December 10th, Noon - Turn in your final paper and your code.
The paper should give an overview of the goal or the problem,
a detailed description of the structure of your system and the application,
and an evaluation of the correctess and performance of your system.
The paper should include at least one diagram indicating the architecture
of the system and at least one graph which summarizes your performance
evaluation. There is no specific length requirement; the paper should be long enough to explain all of the necessary details. That said, anything less than ten pages is probably too short; anything longer than twenty pages is probably too long. All elements of the paper should be prepared with care
and attention to proper English. I an interested in your writing, not
your formatting, so please stick to standard 12-point Times font,
double-spaced, with one inch margins. Turn in your paper in PDF format
to your dropbox directory.
All relevant code should also be turned into your dropbox directory,
including source code, configuration files, scripts, etc.
The code should be complete enough that the grader can build and
run your work in the appropriate environment. If there are
important elements that cannot be turned in as code for whatever
reason (e.g. too big or expensive to download from the cloud) then turn
in links, screenshots, or other similar evidence of the completed work.
CSE 40822 / Project