|CCL HomeSoftware Community Operations||
Work Queue Tutorial
This tutorial will have you install CCTools into your FutureGrid home directory and will take you through some distributed computation examples using Work Queue.
Login to the Future Grid Head Node
In this tutorial, we will again use the alamo login node:
The setup for this tutorial follows the setup described and installed in the last tutorial session.
NOTE: If you have not already built and installed CCTools on the Future Grid login node, please follow the instructions listed here before continuing further.
Set Environment Variables
For this tutorial, you will need to add your CCTools directory to your $PATH:
We will use both Python and Perl in this tutorial. They need to be able to find our installed packages. So set this environment variable for running the Python examples:
And this environment variable for running the Perl examples:
Work Queue program running the Simulation Executable
It is simple to write a program in C/Perl/Python (or any language with appropriate Work Queue bindings) which can generate tasks to be queued for Work Queue. In this example, we will create 100 simulation tasks using simulation.py.
①Here we load the work_queue Python binding. Python will look in PYTHONPATH which is setup in your environment.
②This instantiates a Work Queue master to which you may submit work. Setting the port to 0 instructs Work Queue to pick an arbitrary port to bind on.
③We create a task which takes a shell command argument. The first task created in this workflow will have the command:
④Each task usually depends on a number of files to run. These include the executable and any input files. Here we specify the simulation.py executable and its input infile. Notice that we specify both simulation.py and infile twice when calling specify_file. The first argument is the name of the file on the master and the second argument is the name of the file we want created on the worker. Usually these filenames will be the same as in this example.
⑤We specify the output file, outfile, which we want transferred back to the master.
⑥At this point we have finished the description of our task and it is ready to be submitted for execution on the Work Queue workers. Q.submit submits this task.
⑦At this point we wish to wait for all submitted tasks to complete. So long as the queue is not empty, we continue to call Q.wait waiting for the result of a task we submitted.
⑧Here we call Q.wait(5) which takes a timeout argument. The call to wait will return a finished task which allows us to analyze the return_status or output. In this example, we set the timeout to 5 seconds which allows our application to do other things if a task is taking an inordinate amount of time to complete. We could have used the constant WORK_QUEUE_WAITFORTASK to wait indefinitely until a task completes.
The Perl code of the above program is in wq.pl and is shown here:
You can download this program using:
To run the above Work Queue Python program, do
To run the above Work Queue Perl program, do
When a Work Queue program is run, it prints the port on which it is listening for connections from the workers. For example:
Start 10 workers on the Torque compute nodes in FutureGrid for this master
replacing XXXX with the port the Work Queue master program is listening on.
You can also start workers on the HPC/HTC cluster at the University of Arizona through the PBS batch submission system using the pbs_submit_workers script. You can download this script as follows:
If you have allocations to the higher priority queues (standard, quality, etc) in the UA clusters, you can submit workers to these queues by specifying the appropriate options.
If you encounter an error, be sure you did not forget to setup your environment.
The goal of this exercise is to change the workflow to chain the executions of simulation.py so that the output of one simulation is the input of another. For this exercise, the workflow should look like:
Because our simulation.py is sophisticated and runs on average for 5 seconds, in this example we will only do 5 instances of the simulation (instead of 100) so it takes about 25 seconds.
For this exercise, remember that when you run Q.submit(T), it finalizes the task and allows it to be sent to a worker for execution. You will need to wait for the output from a worker to come back before sending out the next one. As before, you can wait for the completion of a task using Q.wait(5).