A distributed system is any computer system consisting of multiple machines that work together on a common problem. Distributed systems appear in many areas of computing, including cloud computing, mobile computing, edge computing, the internet of things, aerospace systems, and more. Distributed systems have been both interesting and difficult to build because their components may be autonomous and highly failure-prone. Students will learn the fundamental principles of distributed systems, study examples of current distributed systems, and build their own distributed systems from scratch. Topics include concurrency, fault tolerance, replication, consistency, agreement.
Students will undertake a final project that involves building and evaluating a custom distributed system. Grading will be based on assignments, exams, and a final project.
This will be a fun and challenging class for students who like to
build working software systems. Distributed systems connects the
very practical aspects of software engineering (e.g. like how to handle
a network disconnection) and the fundamental principles of computers
(e.g. whether a partitioned system can reach agreement.) The skills
that you learn here will apply directly to advanced systems used in industry.
Five programming assignments are required, due approximately two weeks apart for the first ten weeks of the semester. The assignments together build towards an implementation of a scalable key-value store that could run in a cloud service or as a peer-to-peer system.
In the final project, students will propose, build, and measure a distributed system of their own design, which must make use of multiple techniques discussed in class to achieve a system that is robust and performant. Examples might include a distributed filesystem, a parallel programming model, or a peer-to-peer data routing system. The final submission will include a project report describing the design of the system.
Graduate students taking CSE 60771 will have the following additional work. A selection of paper readings will be assigned that address the course topics in greater detail, balanced between "classic" results in distributed systems and specific case studies in distributed systems design. In addition, the final project report must be written as an academic suitable for submission to an academic conference, including a problem statement, survey of related work, system design, and a complete performance evaluation.