Goals
The goal of this course is to cover the tools and techniques necessary to manage large-scale scientific workflows, with an emphasis on the systems available for use at Notre Dame through the Cooperative Computing Lab and the Center for Research Computing. Students will be introduced to the difficulties involved in managing large datasets and complex workflows, as well as the methods frequently used to ameliorate them. This course is designed for graduate students from any college or discipline who deal with large and/or complex workflows (we currently work with fields ranging from computer vision to molecular biology to economics).
Topics
The course will cover a variety of topics related to scientific workflows, including workflow analysis, commonly used workflow description languages like Makeflow and Map-Reduce, job management systems like Condor, Work Queue, and Hadoop, and common pitfalls encountered when dealing with large workflows.
Format and Grading
The course will meet once a week for 75 minutes. Class meetings will consist of a combination of lectures, guest speakers, class discussion, and hands-on labwork. Students will be evaluated primarily on participation, along with a few homework assignments and a final presentation to be given late in the semester. There will not be a final exam.
Schedule
| Date | Topic | Notes |
|---|
| 1/17 | Overview | |
| 1/24 | Makeflow | |
| 1/31 | Weaver | |
| 2/07 | Condor and SGE | |
| 2/14 | Master-Worker: WorkQueue, MPI Queue | |
| 2/21 | Hadoop+HDFS | |
| 2/28 | Data Mangement: Chirp, Parrot, AFS | |
| 3/06 | Troubleshooting Distributed Workflows | |
| 3/13 | Spring Break | |
| 3/20 | Workflows in the Cloud: Amazon EC2 / ND Cloud | |
| 3/27 | Composing Workflows* | |
| 4/03 | Case Studies* | |
| 4/10 | TBD/Makeup | |
| 4/17 | TBD/Makeup | |
| 4/24 | Presentations | |
| 5/01 | Presentations | |
*tentative