Caution: These are high level notes that I use to organize my lectures. You may find them useful for reviewing main points, but they aren't a substitute for the readings or for participating in class.
Week 1: The Cloud Landscape
The term “cloud” is very broad and encompasses a wide variety of computing techniques. Some of them have been around for a long time (e.g. distributed computing) while others are relatively new (pay-as-you-go).
A rough working definition: A cloud is a distributed systems composed of multiple machines that work together to serve multiple users with high reliability, large capacity, and rapid scalability.
Some key aspects of cloud computing:(but not everything called “cloud” has all of these)
- Centralized Data Center(s)
- Multi-Tenancy
- Resource Virtualization
- Pay-as-You-Go
- Service Oriented
- Highly Parallel
- Infinite Capacity?
A brief history of computing, leading up to clouds:
- 1960s - Mainframes Centralized (MULTICS-Utility Computing)
- 1970s - Minicomputers In Between (VAX/VMS + Terminals)
- 1980s - Personal Computers Distributed (IBM PCs)
- 1990s - Networks of Workstations In Between (Sun + NFS, PCs + Novell)
- 2000s - Internet and Peer to Peer Distributed
- 2000s - Grid Computing Distributed Data Centers
- 2010s - Cloud Computing Centralized
- Today - Edge Computing Decentralized
(Many aspects of computing writ large can be seen as pendulum that swing from one extreme to another with both technology and society. Centralization/Distribution is one of these pendulums.)
Cloud Architecture Layers:
End User
|
Scalable Web Interface
|
Applications
|
Middleware (HTCondor, Hadoop, ...)
|
Virtualized Resources (VMWare, Docker, ...)
|
Physical Resources (CPU, RAM, Disk, GPU)
|
Layers of Service Delivery:
- IaaS – e.g. Intel X86 Machines
- PaaS – e.g. Google App Engine
- SaaS – e.g. Hadoop Installation
- FaaS – e.g. Amazon Lambda
How does this change things for IT and business as a whole?
- Provision Business Functions, not Machines
- Replicate Configurations Accurately
- High Throughput Computing
- Match Resources to Load (Friendster vs Facebook)
- Data Analytics - Compute Close to Data
- Backup, Reliability, Availability
Distinguishing related terms:
- Cloud - Clients access big remote services.
- Grid - Multiple large sites interoperating.
- Cluster - Everything in one room.
- Multithreaded - Everything on one chip.
- Exascale – High performance computing >= 1 Exa-Flop per Second
- Big Data – Volume, Variety, Velocity.
- Edge - Services located closer to producers and consumers
Cloud on the Hype Cycle
References:
- Michael Armbrust et al, “A View of Cloud Computing”, Communications of the ACM, Volume 53, Number 4, DOI: 10.1145/1721654.1721672, April 2010.
- Daniel Reed and Jack Dongarra, “Exascale Computing and Big Data” Communications of the ACM, Volume 58, Number 7, DOI:10.1145/2699414
- Neil Savage, “Going Serverless”, Communications of the ACM, volume 61, number 2, February 2018. DOI: 10.1145/3171583
- Weisong Shi, Jie Cao, Quan Zhang, Youhuizi Li, Lanyu Xu, “Edge Computing: Vision and Challenges”, IEEE Internet of Things Journal, volume 3, number 5, Oct 2016. DOI: 10.1109/JIOT.2016.2579198
Week 2: Principles of Distributed Computing
To understand clouds, we must first have a handle on distributed systems in general, so this week is a crash course in operating systems, networks, and then distributed systems, which is the combination of the two.
Definitions:
- Serious: A distributed system is a set of processes communicating over a network. (Prof. Thain)
- Mostly serious: You know you have a distributed system when the crash of a computer you've never heard of stops you from getting any work done. – Leslie Lamport
Quick Overview of Operating Systems
The earliest machines had no OS, which made sharing and portability hard. A modern OS exists to share resources between competing users, and to allow programs to move portably between different machines.
>
Layers of a conventional operating system.
Applications | firefox, emacs gcc
|
System Calls | open/read/write/fork/exec
|
Abstractions | Filesystem, Virtual Memory
|
Drivers | Disk, Network, Video
|
Hardware | IDE, Ethernet, VGA
|
A process is a running program that has its own private address space, and is protected from interference by other programs. It is both a unit of concurrency and a unit of independent failure. (i.e. A process can be safely killed.)
A thread is an additional unit of concurrency that can run inside a process. But it is not a unit of independent failure: threads cannot be killed in any reliable way.
Multiprocess Server Example:
- ssh process on client, sshd on the server.
- sshd forks on connect, then forks user’s shell
- What happens on failures?
- What happens for multiple users?
HTTPD Example:
- Browser on client, httpd on the server.
- Single process: httpd serves files directly.
(What happens on large file downloads?)
- Multi process: httpd forks on every request.
(What happens when you have too many clients?)
Quick Overview of Networking
Architecture of the Internet:
- LAN/switch connects machines over short distances.
(Ethernet, Token Ring, Wireless, DSL, etc…)
Machines communicate by sending short packets with a header.
- WAN/routers connects LANs over wide areas.
The Internet Protocol provides a common messaging format across network technologies, carried as a payload in LAN packets.
Networking Layers:
Application: | HTTP, FTP, DNS …
|
Transport: | TCP / UDP
|
Network: | Internet Protocol
|
Data Link: | Ethernet, Token Ring, 802.11
|
Physical: | Cat5, Optical, RF
|
Most Commonly Used Protocols:
- UDP - Short messages, unreliable delivery
- TCP - Long streams, ordered, reliable delivery.
(But in what sense is TCP “reliable” ?)
Idealized Vision of the Internet
- Anyone can send data to anyone else!
- Core of the network is dumb and unreliable.
- End points have all the reliability and policy.
Reality of the Internet
- Manually configured firewalls blocks all sorts of traffic.
(Even the good guys have to ask for permission to communicate.)
- Shortage of addresses in IPV4 -> proxies and translation
Abstract view of the Internet from applications:
- Send packets to remote hosts.
- They may arrive, or they may not.
- It is up to the other side to acknowledge in some way!
Principles of Distributed Systems
A distributed system consists of a set of processes that work together to accomplish some task by communicating over a network.
As described above, processes are independent, self-contained programs, and the network allows them to exchange (unreliable)
packets of limited size.
We would like to build distributed systems that work as simply and reliably as non-distributed systems, but it simply isn’t possible. Distributed systems are fundamentally different than standalone machines in (at least) four ways outlined by “A Note on Distributed Computing”
- Latency
- Memory Access
- Partial Failure
- Concurrency
- and Autonomy (says Prof Thain)
“A Note” discusses this common fallacy: “Let’s take an existing program, break it into pieces (functions, objects, modules, etc) and then connect the pieces over the network. Now we have a usable distributed system that works just like the original system.”
(This is the key idea in RPC, CORBA, DCOM, RMI, and many other similar systems.)
It does not work because distributed systems are fundamentally different.
Easy to show with a thought experiment:
Suppose you have a regular program makes use a library that implements a stack data structure with the operations push(x) and x=pop().
We want to share the stack among multiple distributed users, so put the stack in a separate server process, and have it accept and return messages. If the client sends “push(x)”, the server responds with “ok”. If the clients sends “pop()” the server responds with “x”, which is the value at the top of the stack. Messages can be lost, so if the client doesn’t get a response in a reasonable amount of time, it simply sends the request message again.
Questions to consider:
- What happens if the client’s messages are lost?
- What happens if the server’s responses are lost?
- What happens if multiple clients do this simultaneously?
Small group discussion: Design a solution to this problem. Change the messages exchanged so that no data is lost, and the stack still works as desired.
Design principles for distributed protocols:
- Idempotency
- Fate Sharing
- Garbage Collection
- Transactions
Moral of the story:
Interfaces to distributed systems must be designed from scratch to accommodate failure and concurrency!
References:
Martin van Steen and Andrew Tannenbaum, Distributed Systems, CreateSpace Independent Publisher, 2017. ISBN: 978-1543057386
Case Study: HTCondor
Purpose:
- High Throughput Computing
- Cycle Scavenging
- Sharing of Resources
Basic Structure:
- Matchmaker
- Resource (startd)
- Starter
- Agent (schedd)
- Shadow
Matchmaking:
- ClassAd Attributes (examples)
- Requirements Expression
- Rank Expression
- Fair Share Scheduling
Job Universes:
- Standard Universe (single executable with checkpointing)
- Vanilla Universe (unix executable, but no checkpointing)
- Java Universe (JVM provided by execution site)
- (Others, see manual)
Building Computing Communities
- Basic Condor Pool
- Gateway Flocking
- Direct Flocking
- Glide-Ins
Example Applications
- High Throughput Image Rendering (C.O.R.E. Digital Pictures)
- Circuit Simulation (Micron)
- Optimization Research (NUG30)
- Physics Data Analysis (LHC)
- Gravitational Wave Analysis (LIGO)
References
Workflows and Makeflow
What is a workflow?
- A graph of tasks and data.
- A "campaign" of work for a batch system.
- Can be defined statically or dynamically.
A workflow is a form of parallel programming.
- Concurrent elements.
- Synchronization issues.
- Resource limits.
- Fault tolerance.
Examples of Workflow Systems:
- DAGMan
- Kepler
- Pegasus
- Swift
- Apache Taverna
- Apache Storm
Case Study: Makeflow
- Open source workflow engine.
- Designed around Unix abstractions.
- Interacts with existing batch systems: HTCondor, Torque, SLURM, Mesos, Kubernetes, Amazon, Lambda, ...
Architecture
- DAG Input
- Workflow Core and Transaction Log
- Batch Drivers
- Wrappers
Makeflow Language
- Classic Make Language
- Must tell the truth about each job!
- New JX Representation
- Debug Using JX2JSON
Example Used in Class:
{
"define" : {
"ntemps" : 100,
"detail" : "high",
"grandinputs" : [ "output."+x+".txt" for x in range(1,ntemps,1) ]
},
"rules" : [
{
"command" : "echo --temp "+x+" --detail "+detail+ " >output."+x+".txt",
"inputs" : [ "input."+n+".txt" for n in range(1,11,2) ],
"outputs" : [ "output."+x+".txt" ]
} for x in range(1,ntemps,1),
{
"command" : "cat "+ join(grandinputs," ")+" >grandoutput.txt",
"inputs" : grandinputs,
"outputs" : [ "grandoutput.txt" ]
}
]
References
- "Workflows for e-Science", Ian Taylor, Ewa Deelman, Dennis Gannon, Matthew Shields (eds), Springer 2007. ISBN 978-1-84628-757-2 DOI: https://doi.org/10.1007/978-1-84628-757-2_2
- Michael Albrecht, Patrick Donnelly, Peter Bui, and Douglas Thain, "Makeflow: A portable abstraction for data intensive computing on clusters, clouds, and grids", SWEET Workshop on Scalable Workflow Technologies, 2012. DOI: https://doi.org/10.1145/2443416.2443417
- Makeflow Web Page
Map-Reduce and Hadoop
Background and Context
- Early days of the Google web search engine. (2004)
- Complex programs mixed up logic with fault tolerance.
- Simplified computing model: Map-Reduce
- Result: Much greater productivity at scale.
The Map-Reduce Programming Model
User provides two functions: Map and Reduce, and asks for them to be invoked in a given data set. They must have the following form:
Map( key, value ) -> list( key, value )
Reduce( key, list(values) ) -> output
The framework is responsible for locating the data, applying the functions, and then storing the outputs. The user is not concerned with locality, fault tolerance, optimization, and so forth.
<
The Map functions are applied to each of the files comprising the data sets, and emit a series of (key,value) pairs. Then, for each key, a bucket is created for all of the values with that key. The Reduce function is then applied to all values in that bucket.
(Blackboard diagram of how this works.)
WordCount is the “hello world” of Map-Reduce. This program reads in a large number of files and computes the frequency of each unique word in the input.
Map( key, value ) {
// key is the file name
// value is the file contents
For each word in value {
Emit( word, 1 )
}
}
Reduce( key, list(values) ) {
count = 0;
For each v in list(values) {
count++;
}
Emit( key, count );
}
Sometimes you need to run multiple rounds of Map-Reduce in order to get the desired effect. For example, suppose you now want to generate the top ten most frequently used words in this set of documents. Run Map-Reduce on the output of the previous, but with this program:
Map( key, value ) {
word = key
count = value
Emit( 1, “count word”);
}
Reduce( key, list(values) ) {
For first ten items in list(values) {
Emit( value )
}
}
Example Problems to Work in Class
Suppose you have the following weather data. A set of (unsorted) tuples, each consisting of a year, month, day, and the maximum observed temp that day:
(2007,12,10,35)
(2008,3,22,75)
(2015,2,15,12) ...
- Write a Map-Reduce program to compute the maximum temperature observed each month for which data is present.
- Write a Map-Reduce program to compute the average temperature for the day of the year (over all years).
-
Now suppose that you have data representing a graph of friends:
A -> B,C,D
B -> A,C,D
C -> A,B
D -> A,B
Write a Map-Reduce program that will identify common friends:
(A,B) -> C,D
(A,C) -> B
. . .
- Write a Map-Reduce program that will identify the people
with the greatest number of friends (incoming links, not outgoing links.)
The Hadoop Distributed System
Hadoop began a an open-source implementation very similar in spirit to the Google File System (GFS) and the Map-Reduce programming model. It has grown into a complex ecosystem of interacting pieces of software.
HDFS - Hadoop Distributed Filesystem Architecture:
- One Name Node + Many Data Nodes
- Files divided into large 64MB chunks.
- Files once written, are immutable.
- Chunks are replicated three times in two different racks.
Interface:
- Java library.
- Hadoop command-line tool.
- Status web page.
Considerations:
- Fault tolerance.
- High access latency.
- Uploading can be slow, due to replication.
- Very high throughput on parallel reads.
- Multiple disks per data node
- Secondary name node performs log compression.
Hadoop Map-Reduce Architecture:
- One JobTracker per cluster coordinates the entire M-R computation.
- TaskTrackers on each node dispatch and monitor each M-R task.
- HDFS -> Maps -> Temporary Space -> Shuffle -> Reducers -> HDFS
Interface:
- Native M-R code in Java.
- Other languages use the streaming interface.
- Hadoop command-line tool.
Considerations:
- Fault tolerance.
- Stragglers.
- Data balance.
- Number of “reducers”.
Question: Which part of a Map-Reduce program is naturally scalable, and which part is likely to be a bottleneck? Does that affect how you would design a M-R program.
References:
- Dean, Jeffrey, and Sanjay Ghemawat. "MapReduce: simplified data processing on large clusters." Communications of the ACM 51.1 (2008): 107-113.
- K. Shvachko, H. Kuang, S. Radia, R. Chansler, “The Hadoop Distributed Filesystem”, IEEE Mass Storage Systems and Technologies, 2010.
- Jimmy Lin and Chris Dyer, Data-Intensive Text Processing with MapReduce, Morgan & Claypool Publishers, 2010.