CSE 40822/60822
Cloud Computing - Spring 2016

MWF 9:25-10:15, Debartolo 125
Prof. Douglas Thain
Email: dthain@nd.edu
Office: 382 Fitzpatrick Hall
Office Hours: 3-5PM Tuesdays and Thursdays
TA: Neil Butcher
Email: nbutcher@nd.edu
Office: "Fishbowl" on 3rd floor Fitzpatrick
Hours: 3-5PM Mondays and Wednesdays

Getting Help

  • Piazza Discussion Page - The best place for technical questions like "What does this error message mean?"
  • Office Hours - One of the instructors is available each afternoon Mon-Thu.
  • Email - Contact Prof. Thain for questions about grades, course policies, etc.
  • Grades and Lecture Recordings - Are available on the course ND Sakai page.
  • Course Overview

    This class is an introduction to the theory and practice building large scale computer systems that harness hundreds or thousands for machines to attack problems of enormous scale. Such distributed systems are necessary to solve problems of such large size that they cannot complete in any reasonable time on a single machine. In recent years, these system have been known as clouds but have a much longer history known as distributed systems.

    Cloud computing encompasses a variety of modes of computing, including infrastructure and data center management, high throughput computing, distributed programming models, No-SQL storage, and more. We will take a tour of many of these topics by alternating with a high-level discussion of the principles, followed by case studies with a current technology.

    Each assignment will involve designing a program or system that scales up to a large number of machines, using a variety of technologies. This will be a highly practical class, and should be enjoyable to any student who likes to write lots of code and make real systems work. Many students who take this class end up using these tools in their daily work. The class is open to juniors, seniors, and graduate students.

    Course Documents

  • Syllabus
  • A0 - Warm Up Assignment
  • A1 - High Throughput Ray-Tracing with Condor
  • A2 - Parallel DNA Analysis with Work Queue
  • A3 - Web Data Analysis With Hadoop
  • A4 - Data Processing with AWS and Pig
  • Final Project
  • Final Presentation
  • Tentative Schedule

    Week Lecture Topic Readings and Materials
    13 January The Cloud Landscape
    Lecture Outline
    A View of Cloud Computing
    Exascale Computing and Big Data
    2015 Hype Cycle
    18 January Principles of Distributed Computing
    Lecture Outline
    A Note on Distributed Computing
    A0: Due Friday
    25 January Case Study: Condor
    Lecture Outline
    Condor Overview Paper
    Condor at Notre Dame
    Condor 8.0 Manual
    1 February Case Study: Makeflow and Work Queue
    Makeflow Slides
    Work Queue Slides
    (Guest Lecture by Dr. Tovar)
    Makeflow Web Page
    Work Queue Web Page
    Makeflow and Work Queue Tutorial
    A1: Due Friday
    8 February Cloud Programming Models
    Lecture Outline
    Example Log Analyzer
    15 February Case Study: Hadoop
    Lecture Outline
    Google Map-Reduce Paper
    Hadoop Project Web Page
    Hadoop at Notre Dame
    A2 Due Friday
    22 February The Hadoop Stack
    PigLatin Notes
    HBase Notes
    Pig Latin Paper (Apache Pig)
    BigTable Paper (Apache HBase)
    Project Proposals due Friday
    29 February The Hadoop Stack Continued
    Spark Notes
    Spark Paper (Apache Spark)
    A3 Due Friday
    7 March Spring Break
    14 March Case Study: Amazon AWS
    Amazon Notes
    Amazon AWS Docs
    Midterm Exam on Friday
    21 March Scaling Up Web Applications
    Scaling Up Notes
    Amazon Architecture Center
    Memcached Project
    28 March The CAP Theorem
    Consistency Notes
    Project Updates Due (Signup Here)
    CAP 12 Years Later
    Eventually Consistent
    4 April Coordination and Configuration
    Puppet Notes
    Zookeeper Notes
    A4 Due Monday
    Puppet Docs
    Zookeeper Paper (Apache Zookeeper)
    11 April Coordination Continued
    (No class Friday)
    Mesos Paper (Apache Mesos - Docker)
    18 April Project Presentations
    Monday:
  • Scaling Up with AWS, Alan Vuong and Katie Quinn
  • Work Queue and Google App Engine (Movie), Dylan Zaragoza
  • FuS3FS -- A Cloud Filesystem, Kaijun Feng and Chao Luo
  • Wednesday:
  • Short Read Alignment, Xinyi Wang and Xuanyi Li
  • Parallelizing BWA, Christopher Ray
  • Movie Rendering Service, Samantha Rack
  • Friday:
  • Building a Scalable Dynamic Website, Lucas Barbosa-Parzianello
  • The Future of Microblogging, Jack Magiera and Jon Richelsen
  • Machine Learning with MLLib and scikit-learn, Christopher Homa
  • 25 April Project Presentations
    Monday:
  • Matrix Multiplication in Hadoop, Siddharth Saraph
  • Scaling Kamona, Bruno Braga and Fernando Beletti
  • Wednesday:
  • Jewel: File Syncing with AWS, Kevin Riehm
  • Amazon DynamoDB: Scaling and Benchmarking, Celeste Castillo and Ben Kennel
  • Final Projects Due Noon Wednesday
    Wednesday, May 4th
    8:00-10:00AM
    Final Exam