CSE 40822/60822 - Cloud Computing

Caution: These are high level notes that I use to organize my lectures. You may find them useful for reviewing main points, but they aren't a substitute for the readings or for participating in class.

Week 1: The Cloud Landscape

The term “cloud” is very broad and encompasses a wide variety of computing techniques. Some of them have been around for a long time (e.g. distributed computing) while others are relatively new (pay-as-you-go).

A rough working definition: A cloud is a distributed systems composed of multiple machines that work together to serve multiple users with high reliability, large capacity, and rapid scalability.

Some key aspects of cloud computing:(but not everything called “cloud” has all of these)

Centralized Data Center(s)
Multi-Tenancy
Resource Virtualization
Pay-as-You-Go
Service Oriented
Highly Parallel
Infinite Capacity?

A brief history of computing, leading up to clouds:

1960s - Mainframes Centralized (MULTICS-Utility Computing)
1970s - Minicomputers In Between (VAX/VMS + Terminals)
1980s - Personal Computers Distributed (IBM PCs)
1990s - Networks of Workstations In Between (Sun + NFS, PCs + Novell)
2000s - Internet and Peer to Peer Distributed
2000s - Grid Computing Distributed Data Centers
2010s - Cloud Computing Centralized
Today - Edge Computing Decentralized

(Many aspects of computing writ large can be seen as pendulum that swing from one extreme to another with both technology and society. Centralization/Distribution is one of these pendulums.)

Cloud Architecture Layers:

End User

Scalable Web Interface

Applications

Middleware
(HTCondor, Hadoop, ...)

Virtualized Resources
(VMWare, Docker, ...)

Physical Resources
(CPU, RAM, Disk, GPU)

Layers of Service Delivery:

IaaS – e.g. Intel X86 Machines
PaaS – e.g. Google App Engine
SaaS – e.g. Hadoop Installation
FaaS – e.g. Amazon Lambda

How does this change things for IT and business as a whole?

Provision Business Functions, not Machines
Replicate Configurations Accurately
High Throughput Computing
Match Resources to Load (Friendster vs Facebook)
Data Analytics - Compute Close to Data
Backup, Reliability, Availability

Distinguishing related terms:

Cloud - Clients access big remote services.
Grid - Multiple large sites interoperating.
Cluster - Everything in one room.
Multithreaded - Everything on one chip.
Exascale – High performance computing >= 1 Exa-Flop per Second
Big Data – Volume, Variety, Velocity.
Edge - Services located closer to producers and consumers

Cloud on the Hype Cycle

Gartner Hype Cycle 2017

References:

Michael Armbrust et al, “A View of Cloud Computing”, Communications of the ACM, Volume 53, Number 4, DOI: 10.1145/1721654.1721672, April 2010.
Daniel Reed and Jack Dongarra, “Exascale Computing and Big Data” Communications of the ACM, Volume 58, Number 7, DOI:10.1145/2699414
Neil Savage, “Going Serverless”, Communications of the ACM, volume 61, number 2, February 2018. DOI: 10.1145/3171583
Weisong Shi, Jie Cao, Quan Zhang, Youhuizi Li, Lanyu Xu, “Edge Computing: Vision and Challenges”, IEEE Internet of Things Journal, volume 3, number 5, Oct 2016. DOI: 10.1109/JIOT.2016.2579198

Week 2: Principles of Distributed Computing

To understand clouds, we must first have a handle on distributed systems in general, so this week is a crash course in operating systems, networks, and then distributed systems, which is the combination of the two.

Definitions:

Serious: A distributed system is a set of processes communicating over a network. (Prof. Thain)
Mostly serious: You know you have a distributed system when the crash of a computer you've never heard of stops you from getting any work done. – Leslie Lamport

Quick Overview of Operating Systems

The earliest machines had no OS, which made sharing and portability hard. A modern OS exists to share resources between competing users, and to allow programs to move portably between different machines. Layers of a conventional operating system.

Applications	firefox, emacs gcc
System Calls	open/read/write/fork/exec
Abstractions	Filesystem, Virtual Memory
Drivers	Disk, Network, Video
Hardware	IDE, Ethernet, VGA

A process is a running program that has its own private address space, and is protected from interference by other programs. It is both a unit of concurrency and a unit of independent failure. (i.e. A process can be safely killed.)

A thread is an additional unit of concurrency that can run inside a process. But it is not a unit of independent failure: threads cannot be killed in any reliable way.

Multiprocess Server Example:

ssh process on client, sshd on the server.
sshd forks on connect, then forks user’s shell
- What happens on failures?
- What happens for multiple users?

HTTPD Example:

Browser on client, httpd on the server.
Single process: httpd serves files directly.
(What happens on large file downloads?)
Multi process: httpd forks on every request.
(What happens when you have too many clients?)

Quick Overview of Networking

Architecture of the Internet:

LAN/switch connects machines over short distances.
(Ethernet, Token Ring, Wireless, DSL, etc…)
Machines communicate by sending short packets with a header.
WAN/routers connects LANs over wide areas.
The Internet Protocol provides a common messaging format across network technologies, carried as a payload in LAN packets.

Networking Layers:

Application:	HTTP, FTP, DNS …
Transport:	TCP / UDP
Network:	Internet Protocol
Data Link:	Ethernet, Token Ring, 802.11
Physical:	Cat5, Optical, RF

Most Commonly Used Protocols:

UDP - Short messages, unreliable delivery
TCP - Long streams, ordered, reliable delivery.
(But in what sense is TCP “reliable” ?)

Idealized Vision of the Internet

Anyone can send data to anyone else!
Core of the network is dumb and unreliable.
End points have all the reliability and policy.

Reality of the Internet

Manually configured firewalls blocks all sorts of traffic.
(Even the good guys have to ask for permission to communicate.)
Shortage of addresses in IPV4 -> proxies and translation

Abstract view of the Internet from applications:

Send packets to remote hosts.
They may arrive, or they may not.
It is up to the other side to acknowledge in some way!

Principles of Distributed Systems

A distributed system consists of a set of processes that work together to accomplish some task by communicating over a network. As described above, processes are independent, self-contained programs, and the network allows them to exchange (unreliable) packets of limited size.

We would like to build distributed systems that work as simply and reliably as non-distributed systems, but it simply isn’t possible. Distributed systems are fundamentally different than standalone machines in (at least) four ways outlined by “A Note on Distributed Computing”

Latency
Memory Access
Partial Failure
Concurrency
and Autonomy (says Prof Thain)

“A Note” discusses this common fallacy: “Let’s take an existing program, break it into pieces (functions, objects, modules, etc) and then connect the pieces over the network. Now we have a usable distributed system that works just like the original system.” (This is the key idea in RPC, CORBA, DCOM, RMI, and many other similar systems.) It does not work because distributed systems are fundamentally different.

Easy to show with a thought experiment:

Suppose you have a regular program makes use a library that implements a stack data structure with the operations push(x) and x=pop(). We want to share the stack among multiple distributed users, so put the stack in a separate server process, and have it accept and return messages. If the client sends “push(x)”, the server responds with “ok”. If the clients sends “pop()” the server responds with “x”, which is the value at the top of the stack. Messages can be lost, so if the client doesn’t get a response in a reasonable amount of time, it simply sends the request message again.

Questions to consider:

What happens if the client’s messages are lost?
What happens if the server’s responses are lost?
What happens if multiple clients do this simultaneously?

Small group discussion: Design a solution to this problem. Change the messages exchanged so that no data is lost, and the stack still works as desired.

Design principles for distributed protocols:

Idempotency
Fate Sharing
Garbage Collection
Transactions

Moral of the story: Interfaces to distributed systems must be designed from scratch to accommodate failure and concurrency!

References:

Martin van Steen and Andrew Tannenbaum, Distributed Systems, CreateSpace Independent Publisher, 2017. ISBN: 978-1543057386

Case Study: HTCondor

Purpose:

High Throughput Computing
Cycle Scavenging
Sharing of Resources

Basic Structure:

Matchmaker
Resource (startd)
Starter
Agent (schedd)
Shadow

Matchmaking:

ClassAd Attributes (examples)
Requirements Expression
Rank Expression
Fair Share Scheduling

Job Universes:

Standard Universe (single executable with checkpointing)
Vanilla Universe (unix executable, but no checkpointing)
Java Universe (JVM provided by execution site)
(Others, see manual)

Building Computing Communities

Basic Condor Pool
Gateway Flocking
Direct Flocking
Glide-Ins

Example Applications

High Throughput Image Rendering (C.O.R.E. Digital Pictures)
Circuit Simulation (Micron)
Optimization Research (NUG30)
Physics Data Analysis (LHC)
Gravitational Wave Analysis (LIGO)

References

Douglas Thain, Todd Tannenbaum, and Miron Livny, Distributed Computing in Practice: The Condor Experience, Concurrency and Computation: Practice & Experience, Volume 17 Issue 2-4, February 2005.

Workflows and Makeflow

What is a workflow?

A graph of tasks and data.
A "campaign" of work for a batch system.
Can be defined statically or dynamically.

A workflow is a form of parallel programming.

Concurrent elements.
Synchronization issues.
Resource limits.
Fault tolerance.

Examples of Workflow Systems:

DAGMan
Kepler
Pegasus
Swift
Apache Taverna
Apache Storm

Case Study: Makeflow

Open source workflow engine.
Designed around Unix abstractions.
Interacts with existing batch systems: HTCondor, Torque, SLURM, Mesos, Kubernetes, Amazon, Lambda, ...

Architecture

DAG Input
Workflow Core and Transaction Log
Batch Drivers
Wrappers

Makeflow Language

Classic Make Language
Must tell the truth about each job!
New JX Representation
Debug Using JX2JSON

Example Used in Class:

{
        "define" : {
                  "ntemps" : 100,
                  "detail" : "high",
                  "grandinputs" : [ "output."+x+".txt" for x in range(1,ntemps,1) ]
        },

        "rules" : [
                {
                        "command" : "echo --temp "+x+" --detail "+detail+ " >output."+x+".txt",
                        "inputs" : [ "input."+n+".txt" for n in range(1,11,2) ],
                        "outputs" : [ "output."+x+".txt" ]
                } for x in range(1,ntemps,1),

                {
                "command" : "cat "+ join(grandinputs," ")+" >grandoutput.txt",
                "inputs" : grandinputs,
                "outputs" : [ "grandoutput.txt" ]
                }

        ]

References

"Workflows for e-Science", Ian Taylor, Ewa Deelman, Dennis Gannon, Matthew Shields (eds), Springer 2007. ISBN 978-1-84628-757-2 DOI: https://doi.org/10.1007/978-1-84628-757-2_2
Michael Albrecht, Patrick Donnelly, Peter Bui, and Douglas Thain, "Makeflow: A portable abstraction for data intensive computing on clusters, clouds, and grids", SWEET Workshop on Scalable Workflow Technologies, 2012. DOI: https://doi.org/10.1145/2443416.2443417
Makeflow Web Page

Map-Reduce and Hadoop

Background and Context

Early days of the Google web search engine. (2004)
Complex programs mixed up logic with fault tolerance.
Simplified computing model: Map-Reduce
Result: Much greater productivity at scale.

The Map-Reduce Programming Model

User provides two functions: Map and Reduce, and asks for them to be invoked in a given data set. They must have the following form:

Map( key, value ) -> list( key, value )
Reduce( key, list(values) ) -> output

The framework is responsible for locating the data, applying the functions, and then storing the outputs. The user is not concerned with locality, fault tolerance, optimization, and so forth. <

The Map functions are applied to each of the files comprising the data sets, and emit a series of (key,value) pairs. Then, for each key, a bucket is created for all of the values with that key. The Reduce function is then applied to all values in that bucket.

(Blackboard diagram of how this works.)

WordCount is the “hello world” of Map-Reduce. This program reads in a large number of files and computes the frequency of each unique word in the input.

Map( key, value ) {
   // key is the file name
   // value is the file contents
   For each word in value {
      Emit( word, 1 )
   }
}

Reduce( key, list(values) ) {
   count = 0;
   For each v in list(values) {
      count++;
   }
   Emit( key, count );
}

Sometimes you need to run multiple rounds of Map-Reduce in order to get the desired effect. For example, suppose you now want to generate the top ten most frequently used words in this set of documents. Run Map-Reduce on the output of the previous, but with this program:

Map( key, value ) {
   word = key
   count = value
   Emit( 1, “count word”);
}
Reduce( key, list(values) ) {
   For first ten items in list(values) {
      Emit( value )
   }
}

Example Problems to Work in Class

Suppose you have the following weather data. A set of (unsorted) tuples, each consisting of a year, month, day, and the maximum observed temp that day:

(2007,12,10,35)
(2008,3,22,75)
(2015,2,15,12) ...

Write a Map-Reduce program to compute the maximum temperature observed each month for which data is present.
Write a Map-Reduce program to compute the average temperature for the day of the year (over all years).
Now suppose that you have data representing a graph of friends:
```
A -> B,C,D
B -> A,C,D
C -> A,B
D -> A,B
```
Write a Map-Reduce program that will identify common friends: (A,B) -> C,D (A,C) -> B . . .
Write a Map-Reduce program that will identify the people with the greatest number of friends (incoming links, not outgoing links.)

The Hadoop Distributed System

Hadoop began a an open-source implementation very similar in spirit to the Google File System (GFS) and the Map-Reduce programming model. It has grown into a complex ecosystem of interacting pieces of software.

HDFS - Hadoop Distributed Filesystem Architecture:

One Name Node + Many Data Nodes
Files divided into large 64MB chunks.
Files once written, are immutable.
Chunks are replicated three times in two different racks.

Interface:

Java library.
Hadoop command-line tool.
Status web page.

Considerations:

Fault tolerance.
High access latency.
Uploading can be slow, due to replication.
Very high throughput on parallel reads.
Multiple disks per data node
Secondary name node performs log compression.

Hadoop Map-Reduce Architecture:

One JobTracker per cluster coordinates the entire M-R computation.
TaskTrackers on each node dispatch and monitor each M-R task.
HDFS -> Maps -> Temporary Space -> Shuffle -> Reducers -> HDFS