Cooperative Computing Lab
CCL | Software | Install | Manuals | Forum | Papers
CCL Home

Research

Software Community Operations

Applications Demonstration

During this demo, we will take you through some distributed computation examples using Makeflow and Work Queue.

  1. Introduction
    1. Goals
  2. Demo: BWA - Work Queue-based Bioinformatics Application
    1. Setup
    2. Submit Multi-slot Work Queue Workers using SGE
  3. Demo: Replica Exchange - A Work Queue-based Proteomics Application
    1. Submit Work Queue Workers to a Foreman

Introduction

To demonstrate a scalable scientific application written using Makeflow and Work Queue we are going to use a bioinformatics application BWA and replica exchange, a proteomics application.

The BWA or the Burrows-Wheeler Alignment algorithm compares and matches a set of test sequences against reference databases of sequences. This algorithm is implemented in Makeflow and can be found here.

The replica exchange algorithm is used to sample the energy states of the protein molecule using simulations. The Python script for the replica exchange application is included in the CCTools source that can be downloaded here.

Goals

In this demonstration of the scalable scientific applications implemented usign Makeflow and Work Queue, we intend to show the following:

  1. Scalability: We are going to achieve the required scale for these demos by aggregating all the workers which you are going to submit.
  2. Elasticity: We show how these applications dynamically adapt to worker failures and termination. This includes harnessing new workers as they come alive.
  3. Features of CCTools: We will demonstrate a variety of additional features and tools of the CCTools software which are useful for building and running scalable scientific applications:
    • project names
    • multi-slot workers
    • submit worker scripts (sge_submit_workers)
    • hierarchical workers
    • work_queue_status.

Demo: BWA - Makeflow-based Bioinformatics Application

We have a BWA Makeflow running using Work Queue under the project name bwa-xsede on a remote resource in the Lonestar cluster.

We are now going to submit workers for this BWA application running under the project name bwa-xsede.

First, be sure to be logged in to the head node of XSEDE's Lonestar where you will submit workers for the BWA application.

ssh USERNAME@login1.ls4.tacc.utexas.edu

Please replace USERNAME with the username in the sheet of paper handed out to you. Use the password given there as well.

NOTE: Please feel free to use your own XSEDE account if you have one.

Setup

Make sure that your CCTools directory is added to your $PATH:

export PATH=~/cctools/bin:${PATH}

Check to see if the setup was correctly configured:

$ work_queue_worker -v work_queue_worker version 4.0.0rc1-TRUNK (released 07/16/2013) Built by dpandiar@login2.ls4.tacc.utexas.edu on Jul 16 2013 at 16:30:12 System: Linux login2.ls4.tacc.utexas.edu 2.6.18-194.32.1.el5_TACC #2 SMP Configuration: --prefix /home1/02486/dpandiar/cctools

Submit Multi-slot Work Queue Workers using SGE

We are going to submit multi-slot workers to the Lonestar SGE that utilizes 4 cores on its allocated node.

To do this, we will use the sge_submit_workers script installed in the CCTools directory. sge_submit_workers creates the worker script to be executed and submit this script to SGE for the specified number of workers using qsub:

sge_submit_workers -p "-q development -l h_rt=01:00:00 -V -pe 12way 12" --cores 0 -M bwa-xsede 15

The above command submits 15 workers for execution on Lonestar's cluster.

Each submitted worker when executed will contact the Catalog Server to determine the location of the master with the project name bwa-xsede. They will then attempt a connection to the master at the location resolved by the Catalog Server. On connection, the master will then start dispatching tasks to the connected workers.

To monitor the status of your submitted workers jobs, you can watch their status SGE's qstat interface:

watch "qstat -u $USER"

You can also monitor the run-time status of the BWA application in terms of the number of tasks completed, number of workers connected, etc., using the work_queue_status tool installed in your CCTools directory.

work_queue_status

You will see an output similar to the one below:

$ work_queue_status PROJECT HOST PORT WAITING BUSY COMPLETE WORKERS 4206.blast.biocomp cclweb03.cse.nd.edu 9001 0 1 0 4 bwa-xsede d6copt007.crc.nd.edu 1024 0 1 0 15 one_two_many_lots d6copt129.crc.nd.edu 9000 399 600 3028 600 nr_blast helix.cse.nd.edu 9694 970 30 16 30 forcebalance not0rious.Stanford.EDU 9238 0 54 216 54 numbtest opteron.crc.nd.edu 9123 0 0 1 0 cclosdci workspace.crc.nd.edu 9907 0 0 41 1

You can also monitor the status of just the BWA application as follows:

$ work_queue_status | grep bwa-xsede bwa-xsede d6copt007.crc.nd.edu 1024 0 1 0 15

Demo: Replica Exchange - A Work Queue-based Proteomics Application

We have a replica exchange running using Work Queue with a foreman under the project name replica-foreman on a remote resource in the Lonestar cluster.

Submitting workers to this forman is similar to above, except the project name is replica-foreman.

Submit Work Queue Workers to a Foreman

First, remove any prior jobs submitted to SGE so we start with a clean slate.

qdel -u $USER

Here, we are going to submit workers to the Lonestar SGE that will connect to a foreman we have set up.

Again, we will use the sge_submit_workers script installed in the CCTools directory: qsub:

sge_submit_workers -p "-q development -l h_rt=01:00:00 -V -pe 12way 12" --cores 0 -M replica-foreman 15

The above command submits 15 workers for execution on Lonestar's cluster.

Each of these worker will contact the foreman identified as replica-foreman. This foreman will then communicate with the project master. If we had workers in other clusters, we can have a foreman for each cluster, thus reducing the network traffic flowing through wide area networks.

As before, you can monitor the status of the application by using the work_queue_status tool installed in your CCTools directory.

work_queue_status