|
Prof. Douglas Thain
Email: dthain@nd.edu
Office: 382 Fitzpatrick Hall
Office Hours: 3-5PM Tuesdays and Thursdays
|
|
TA: Neil Butcher
Email: nbutcher@nd.edu
Office: "Fishbowl" on 3rd floor Fitzpatrick
Hours: 3-5PM Mondays and Wednesdays
|
Getting Help
Piazza Discussion Page - The best place for technical questions like "What does this error message mean?"
Office Hours - One of the instructors is available each afternoon Mon-Thu.
Email - Contact Prof. Thain for questions about grades, course policies, etc.
Grades and Lecture Recordings - Are available on the course ND Sakai page.
Course Overview
This class is an introduction to the theory and practice
building large scale computer systems that harness hundreds or thousands
for machines to attack problems of enormous scale. Such distributed
systems are necessary to solve problems of such large size that they
cannot complete in any reasonable time on a single machine.
In recent years, these system have been known as
clouds
but have a much longer history known as
distributed systems.
Cloud computing encompasses a variety of modes of computing,
including infrastructure and data center management, high
throughput computing, distributed programming models, No-SQL storage,
and more. We will take a tour of many of these topics by alternating
with a high-level discussion of the principles, followed by case
studies with a current technology.
Each assignment will involve designing a program or system that scales
up to a large number of machines, using a variety of technologies.
This will be a highly practical class, and should be enjoyable to any student
who likes to write lots of code and make real systems work. Many students
who take this class end up using these tools in their daily work. The class is open to juniors, seniors, and graduate students.
Course Documents
Syllabus
A0 - Warm Up Assignment
A1 - High Throughput Ray-Tracing with Condor
A2 - Parallel DNA Analysis with Work Queue
A3 - Web Data Analysis With Hadoop
A4 - Data Processing with AWS and Pig
Final Project
Final Presentation
Tentative Schedule
Week
| Lecture Topic
| Readings and Materials
|
13 January
| The Cloud Landscape
Lecture Outline
|
A View of Cloud Computing
Exascale Computing and Big Data
2015 Hype Cycle
|
18 January
| Principles of Distributed Computing
Lecture Outline
|
A Note on Distributed Computing
A0: Due Friday
|
25 January
| Case Study: Condor
Lecture Outline
|
Condor Overview Paper
Condor at Notre Dame
Condor 8.0 Manual
|
1 February
| Case Study: Makeflow and Work Queue
Makeflow Slides
Work Queue Slides
(Guest Lecture by Dr. Tovar)
|
Makeflow Web Page
Work Queue Web Page
Makeflow and Work Queue Tutorial
A1: Due Friday
|
8 February
| Cloud Programming Models
Lecture Outline
|
Example Log Analyzer
|
15 February
| Case Study: Hadoop
Lecture Outline
|
Google Map-Reduce Paper
Hadoop Project Web Page
Hadoop at Notre Dame
A2 Due Friday
|
22 February
| The Hadoop Stack
PigLatin Notes
HBase Notes
|
Pig Latin Paper (Apache Pig)
BigTable Paper (Apache HBase)
Project Proposals due Friday
|
29 February
| The Hadoop Stack Continued
Spark Notes
|
Spark Paper (Apache Spark)
A3 Due Friday
|
7 March
| Spring Break
|
14 March
|
Case Study: Amazon AWS
Amazon Notes
|
Amazon AWS Docs
Midterm Exam on Friday
|
21 March
| Scaling Up Web Applications
Scaling Up Notes
|
Amazon Architecture Center
Memcached Project
|
28 March
| The CAP Theorem
Consistency Notes
|
Project Updates Due (Signup Here)
CAP 12 Years Later
Eventually Consistent
|
4 April
| Coordination and Configuration
Puppet Notes
Zookeeper Notes
|
A4 Due Monday
Puppet Docs
Zookeeper Paper (Apache Zookeeper)
|
11 April
|
Coordination Continued
(No class Friday)
|
Mesos Paper (Apache Mesos - Docker)
|
18 April
| Project Presentations
|
Monday:
Scaling Up with AWS, Alan Vuong and Katie Quinn
Work Queue and Google App Engine (Movie), Dylan Zaragoza
FuS3FS -- A Cloud Filesystem, Kaijun Feng and Chao Luo
Wednesday:
Short Read Alignment, Xinyi Wang and Xuanyi Li
Parallelizing BWA, Christopher Ray
Movie Rendering Service, Samantha Rack
Friday:
Building a Scalable Dynamic Website, Lucas Barbosa-Parzianello
The Future of Microblogging, Jack Magiera and Jon Richelsen
Machine Learning with MLLib and scikit-learn, Christopher Homa
|
25 April
| Project Presentations
|
Monday:
Matrix Multiplication in Hadoop, Siddharth Saraph
Scaling Kamona, Bruno Braga and Fernando Beletti
Wednesday:
Jewel: File Syncing with AWS, Kevin Riehm
Amazon DynamoDB: Scaling and Benchmarking, Celeste Castillo and Ben Kennel
Final Projects Due Noon Wednesday
|
Wednesday, May 4th 8:00-10:00AM
|
Final Exam
|