To complete Tutorial B, you will need access to a Linux cluster equipped with some sort of batch systems such as HTCondor, PBS, Torque, SGE, or another system supported by Makeflow. If you don't have a cluster, you can run the simple examples on a single server or desktop running Linux.
ssh USERNAME@crcfe01.crc.nd.eduIf you are using a Windows machine, download and install PuTTY and use that to connect to the host condorfe.crc.nd.edu.
git clone https://github.com/cooperative-computing-lab/cctools cctools-src cd cctools-src ./configure make make installThe software is now installed in $HOME/cctools. To use it directly, you will need to add it to your path using one of these two commands: (if one fails, just try the other)
export PATH=$HOME/cctools/bin:$PATH setenv PATH $HOME/cctools/bin:$PATH
makeflow -v
export PATH=/opt/sge/bin/lx-amd64:$PATH setenv PATH /opt/sge/bin/lx-amd64:$PATHAnd check that you can run qstat:
qstat
cd $HOME mkdir tutorial cd tutorialNow, download this program, which performs a highly "sophisticated" simulation of black holes colliding together:
wget http://www.nd.edu/~dthain/courses/disc/tutorialB/simulation.pyTry running it once, just to see what it does:
chmod 755 simulation.py ./simulation.py 5Now, let's use Makeflow to run several simulations. Create a file called example.makeflow and paste the following text into it:
input.txt: LOCAL /bin/echo "Simulate Black Holes" > input.txt output.1: simulation.py input.txt ./simulation.py 1 < input.txt > output.1 output.2: simulation.py input.txt ./simulation.py 2 < input.txt > output.2 output.3: simulation.py input.txt ./simulation.py 3 < input.txt > output.3 output.4: simulation.py input.txt ./simulation.py 4 < input.txt > output.4To run it on your local machine, one job at a time, do this:
makeflow -j 1 example.makeflowNote that if you run it a second time, nothing will happen, because all of the files are built:
makeflow example.makeflow makeflow: nothing left to doUse the -c option to clean everything up before trying it again:
makeflow -c example.makeflowOf course, you are running on a machine with multiple cores. If you leave out the -j option, then makeflow will run as many jobs as you have cores:
makeflow example.makeflowIf the jobs are expected to be long running, then you can dispatch jobs to a local batch system like SGE, Condor, or Torque by using the appropriate command:
makeflow -T sge example.makeflow makeflow -T condor example.makeflow makeflow -T torque example.makeflow ...After that completes, examine the output files output.1 etc, and you will notice that each job ran on a different machine in the cluster.
makeflow -c example.makeflow makeflow -T wq example.makeflow -p 0 listening for workers on port XXXX. ...You are going to need to have two terminals open at once for the next step, so open up another terminal (or PuTTY session) and line it up next to your first one. (You may have to set your PATH again as noted above.) Then, in the new terminal, start a worker using the same port number:
work_queue_worker localhost XXXXGo back to your first shell and observe that the makeflow has finished. Your worker process will stay there for a few minutes until it is sure that Makeflow has finished. Use Control-C to forcibly kill the worker, if you have to.
Of course, remembering port numbers all the time gets old fast, so try the same thing again, but using a project name (-N) to give makeflow and the worker the same project name. (Replace MYPROJECT with a name of your choice.)
makeflow -c example.makeflow makeflow -T wq example.makeflow -N MYPROJECT listening for workers on port XXXX ...Now open up another shell and run your worker with a project name:
work_queue_worker -N MYPROJECTWhen using a project name, your workflow is advertised to the catalog server, and can be viewed using work_queue_status:
work_queue_status
sge_submit_workers -N MYPROJECT 5 Creating worker submit scripts in dthain-workers... Your job 18728 ("worker.sh") has been submitted Your job 18729 ("worker.sh") has been submitted Your job 18730 ("worker.sh") has been submitted Your job 18731 ("worker.sh") has been submitted Your job 18732 ("worker.sh") has been submittedUse the qstat command to observe that they are submitted (and possibly running):
qstat -u $USER job-ID prior name user state submit/start at queue ------------------------------------------------------------------------------------------------ 18728 100.49976 worker.sh dthain r 06/02/2016 12:04:45 long@d6copt172.crc.nd.edu 18729 100.49976 worker.sh dthain r 06/02/2016 12:04:47 long@d6copt184.crc.nd.edu 18730 100.49976 worker.sh dthain r 06/02/2016 12:04:47 long@d6copt025.crc.nd.edu 18731 100.49976 worker.sh dthain r 06/02/2016 12:04:48 long@d6copt025.crc.nd.edu 18732 100.49976 worker.sh dthain r 06/02/2016 12:04:48 long@dqcneh084.crc.nd.eduNow, restart your Makeflow and it will use the workers already running in SGE
makeflow -c example.makeflow makeflow -T wq example.makeflow -N MYPROJECT listening for workers on port XXXX. ...You can leave the workers running there, if you want to start another Makeflow. (Try cleaning up and running again right now.) They will remain until they have been idle for fifteen minutes, then will stop automatically.
If you add the -d all option to Makeflow, it will display debugging information that shows where each task was sent, when it was returned, and so forth:
makeflow -c example.makeflow makeflow -T wq example.makeflow -N MYPROJECT -d all listening for workers on port XXXX.
condor_submit_workers -N MYPROJECT 5 Creating worker submit scripts in dthain-workers... Submitting job(s)..... 5 job(s) submitted to cluster 258192.Use the condor_q command to observe that they are submitted to Condor:
condor_q ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 258192.0 dthain 5/31 16:03 0+00:00:12 R 0 0.7 work_queue_worker 258192.1 dthain 5/31 16:03 0+00:00:12 R 0 0.7 work_queue_worker 258192.2 dthain 5/31 16:03 0+00:00:12 R 0 0.7 work_queue_worker 258192.3 dthain 5/31 16:03 0+00:00:12 R 0 0.7 work_queue_worker 258192.4 dthain 5/31 16:03 0+00:00:11 R 0 0.7 work_queue_workerNow, restart your Makeflow and it will use the workers already running in Condor:
makeflow -c example.makeflow makeflow -T wq example.makeflow -N MYPROJECT listening for workers on port XXXX. ...You can leave the workers running there, if you want to start another Makeflow. They will remain until they have been idle for fifteen minutes, then will stop automatically.
If you add the -d all option to Makeflow, it will display debugging information that shows where each task was sent, when it was returned, and so forth:
makeflow -c example.makeflow makeflow -T wq example.makeflow -N MYPROJECT -d all listening for workers on port XXXX.