CSE 66771 - Foundations of Distributed Systems
Prof. Douglas Thain
Email: dthain at nd dot edu
Office: 382 Fitzpatrick
Summer 2014 Session
MWF 9:30AM DeBartolo 334
Overview
This course explores the foundations of distributed systems through a series of classic papers selected from the research literature. Topics include time, synchronization, consensus, consistency, fault tolerance, and security. This course serves as a foundation for advanced graduate study in fields such as mobile computing, cloud computing, networking, and large scale system design.
Time and Location
Because the course is held during the compressed summer session
(Monday, June 16th - Friday, July 25th) we will meet 9-10AM MTWRF
in a conference room, location TBA. Each class session will involve
discussion of the assigned readings for the day. Active participation
by all students is required. Students will be expected to read a
Grading
25% Discussion
25% Paper Summaries
25% Midterm Exam
25% Final Exam
Topics and Readings
Time
- Leslie Lamport,
Time, Clocks, and Ordering of Events in a Distributed System,
Communications of the ACM 12:7, 1978
cite
- K. Mani Chandy and Leslie Lamport, Distributed Snapshots: Determining Global States of Distributed Systems,
ACM Transactions on Computer Systems, 3:1, 1985.
cite
-
D. Jefferson, B. Beckman, F. Wieland, L. Blume, M. Diloreto,
Distributed Simulation and the Time Warp Operating System,
ACM Symposium on Operating System Principles.
cite
- D. L. Mills,
Internet time synchronization: The network time protocol,
IEEE Transactions on Communication, 39:10.
cite
- Barbara Liskov,
Practical Uses of Synchronized Clocks,
Distributed Computing 6:4, 1993.
cite
Consensus
- Ricart and Agrawala, An Optimal Algorithm for Mutual Exclusion in Computer Networks, Operating Systems Review, 1981
pdf
- H. Garcia Molina,
Elections in a Distributed Computing System,
IEEE Transactions on Computers 13:1, 1982.
cite
- K. Mani Chandi, Jayadev Misra, Laura Hass,
Distributed Deadlock Detection,
ACM Transactions on Computer Systems 1:2, 1983,
cite
-
Leslie Lamport, Robert Shostak, Marsall Pease,
The Byzantine Generals Problem,
ACM Transactions on Programming Languages and Systems 4:3, 1982.
cite
- Miguel Castro and Barbara Liskov,
Practical Byzantine Fault Tolerance and proactive recovery,
ACM Transactions on Computer Systems 20:4, 2002.
cite
- Kenneth Birman, Andre Schiper, and Pat Stephenson,
Lightweight Causal and Atomic Group Multicast,
ACM Transactions on Computer Sytems, 9;3, 1991.
cite
Robustness and Correctness
- Butler Lampson and Howard Sturgis
Crash Recovery in a Distributed Data Storage System
Technical Report, Xerox PARC, 1979.
pdf
- Richard Schlichting and Fred Schneider,
Fail-Stop Processors: an approach to designing fault tolerant computing systems,
ACM Transactions on Computer Systems 1;3, 1983.
cite
-
P. J. Leu
Concurrent robust checkpointing and recovery in distributed systems
International Conference on Data Engineering, 1988.
cite
-
Gerard J. Holzmann, "The Model Checker SPIN", IEEE Transactions on Software Engineering, 23:5, 1997.
cite
pdf
Consistency
-
Davidson, Hector Garcia-Molina, Dale Skeen,
Consistency in a Partitioned Network: A survey,
ACM Computing Surveys, 17:3, 1985.
cite
- C. Gray and D. Cheriton,
Leases: An Efficient Fault Tolerant Mechanism for Distributed File Cache Consistency,
ACM Symposium on Operating Systems Principles, 1989
cite
- Ladin, Liskov, Shrira, and Ghemaway,
Providing High Availability with Lazy Replication,
ACM TOCS 10:4, 1992.
cite
-
Petersen, Spreitzer, Terry, Theimer, and Demers,
Flexible Update Propagation for Weakly Consistent Replication,
ACM Symposium on Operating Systems Principles, 1997.
cite
- Yasushi Saito and Marc Shapiro,
Optimistic Replication,
ACM Computing Surveys, 37:1, 2005.
cite
- Renesse and Schneider,
Chain Replication for Supporting High Throughput and Availability,
USENIX Symposium on Operating System Design and Implementation, 2004.
cite
- Verner Vogels,
Eventually Consistent,
Comunications of the ACM 52:1, 2009
cite
Trust
- Saltzer and Schroeder,
The Protection of Information in Computer Systems,
Communications of the ACM 17:7, July 1974.
cite
- Needham and Schroeder
Using encryption for authentication in large networks of computers
Communications of the ACM, 1978
cite
- Steiner, Neuman, and Schiller,
Kerberos: An authentication service for open network systems,
USENIX Winter Conference, 1988.
pdf
- Satoshi Nakamoto,
Bitcoin: A peer-to-peer electronic cash system,
pdf
- Petros Maniatis, Mema Roussopoulos, T. J. Giuli, David S. H. Rosenthal, and Mary Baker,
The LOCKSS peer-to-peer digital preservation system,
ACM Transactions on Computer Systems, 23:1, 2005
cite
Advice
- Waldo, Wyant, Wollrath, and Kendall,
A Note on Distributed Computing,
Sun Microsystem Technical Report, November 1994.
pdf
- Saltzer, Reed, and Clark,
End-to-End Arguments in Computer System Design,
ACM Transactions on Computer Systems, 1984.
cite
- Eric A. Brewer,
Lessons from Giant-Scale Services,
IEEE Internet Computing. Vol. 5, No. 4. pp. 46-55. July/August 2001.
cite
pdf