|   | Prof. Douglas Thain Email: dthain@nd.edu
 Office: 382 Fitzpatrick Hall
 Office Hours: 3-5PM Tuesdays and Thursdays
 |   | TA: Neil Butcher Email: nbutcher@nd.edu
 Office: "Fishbowl" on 3rd floor Fitzpatrick
 Hours: 3-5PM Mondays and Wednesdays
 | 
Getting Help
 Piazza Discussion Page - The best place for technical questions like "What does this error message mean?"
 Office Hours - One of the instructors is available each afternoon Mon-Thu.
 Email - Contact Prof. Thain for questions about grades, course policies, etc.
 Grades and Lecture Recordings - Are available on the course ND Sakai page.
Course Overview
This class is an introduction to the theory and practice
building large scale computer systems that harness hundreds or thousands
for machines to attack problems of enormous scale.  Such distributed
systems are necessary to solve problems of such large size that they
cannot complete in any reasonable time on a single machine.
In recent years, these system have been known as 
clouds
but have a much longer history known as 
distributed systems.
Cloud computing encompasses a variety of modes of computing,
including infrastructure and data center management, high
throughput computing, distributed programming models, No-SQL storage,
and more.  We will take a tour of many of these topics by alternating
with a high-level discussion of the principles, followed by case
studies with a current technology.
Each assignment will involve designing a program or system that scales
up to a large number of machines, using a variety of technologies.
This will be a highly practical class, and should be enjoyable to any student
who likes to write lots of code and make real systems work.  Many students
who take this class end up using these tools in their daily work.  The class is open to juniors, seniors, and graduate students.
Course Documents
 Syllabus
 A0 - Warm Up Assignment
 A1 - High Throughput Ray-Tracing with Condor
 A2 - Parallel DNA Analysis with Work Queue
 A3 - Web Data Analysis With Hadoop
 A4 - Data Processing with AWS and Pig
 Final Project
 Final Presentation
Tentative Schedule
| Week | Lecture Topic | Readings and Materials | 
| 13 January | The Cloud Landscape Lecture Outline
 | A View of Cloud Computing Exascale Computing and Big Data
 2015 Hype Cycle
 | 
| 18 January | Principles of Distributed Computing Lecture Outline
 | A Note on Distributed Computing A0: Due Friday
 | 
| 25 January | Case Study: Condor Lecture Outline
 | Condor Overview Paper Condor at Notre Dame
 Condor 8.0 Manual
 
 | 
| 1 February | Case Study: Makeflow and Work Queue Makeflow Slides
 Work Queue Slides
 (Guest Lecture by Dr. Tovar)
 
 | Makeflow Web Page Work Queue Web Page
 Makeflow and Work Queue Tutorial
 A1: Due Friday
 | 
| 8 February | Cloud Programming Models Lecture Outline
 | Example Log Analyzer 
 | 
| 15 February | Case Study: Hadoop Lecture Outline
 | Google Map-Reduce Paper Hadoop Project Web Page
 Hadoop at Notre Dame
 A2 Due Friday
 | 
| 22 February | The Hadoop Stack PigLatin Notes
 HBase Notes
 
 | Pig Latin Paper (Apache Pig) BigTable Paper (Apache HBase)
 Project Proposals due Friday
 
 | 
| 29 February | The Hadoop Stack Continued Spark Notes
 
 | Spark Paper (Apache Spark) A3 Due Friday
 
 | 
| 7 March | Spring Break | 
| 14 March | Case Study: Amazon AWS Amazon Notes
 | Amazon AWS Docs Midterm Exam on Friday
 
 | 
| 21 March | Scaling Up Web Applications Scaling Up Notes
 | Amazon Architecture Center Memcached Project
 
 | 
| 28 March | The CAP Theorem Consistency Notes
 | Project Updates Due (Signup Here) CAP 12 Years Later
 Eventually Consistent
 
 | 
| 4 April | Coordination and Configuration Puppet Notes
 Zookeeper Notes
 
 | A4 Due Monday Puppet Docs
 Zookeeper Paper (Apache Zookeeper)
 
 | 
| 11 April | Coordination Continued (No class Friday)
 | Mesos Paper (Apache Mesos - Docker) 
 | 
| 18 April | Project Presentations 
 | Monday:
 Scaling Up with AWS, Alan Vuong and Katie Quinn
 Work Queue and Google App Engine (Movie), Dylan Zaragoza
 FuS3FS -- A Cloud Filesystem, Kaijun Feng and Chao Luo
Wednesday:
 Short Read Alignment, Xinyi Wang and Xuanyi Li
 Parallelizing BWA, Christopher Ray
 Movie Rendering Service, Samantha Rack
Friday:
 Building a Scalable Dynamic Website, Lucas Barbosa-Parzianello
 The Future of Microblogging, Jack Magiera and Jon Richelsen
 Machine Learning with MLLib and scikit-learn, Christopher Homa | 
| 25 April | Project Presentations 
 | Monday:
 Matrix Multiplication in Hadoop, Siddharth Saraph
 Scaling Kamona, Bruno Braga and Fernando Beletti
Wednesday:
 Jewel: File Syncing with AWS, Kevin Riehm
 Amazon DynamoDB: Scaling and Benchmarking, Celeste Castillo and Ben Kennel
Final Projects Due Noon Wednesday | 
| Wednesday, May 4th 8:00-10:00AM
 | Final Exam |