Prerequisites:
ssh USERNAME@crcfe01.crc.nd.eduIf you are using a Windows machine, download and install PuTTY and use that to connect to the host condorfe.crc.nd.edu.
setenv PATH /afs/crc.nd.edu/user/condor/software/bin:$PATH export PATH=/afs/crc.nd.edu/user/condor/software/bin:$PATH
git clone https://github.com/cooperative-computing-lab/cctools cctools-src cd cctools-src ./configure make make installThe software is now installed in $HOME/cctools. To use it directly, you will need to add it to your path using one of these two commands: (if one fails, just try the other)
export PATH=$HOME/cctools/bin:$PATH setenv PATH $HOME/cctools/bin:$PATHNow, check that work_queue_status is in your path before proceeding:
work_queue_status
C: wget http://ccl.cse.nd.edu/software/manuals/work_queue_example.c Python: wget http://ccl.cse.nd.edu/software/manuals/work_queue_example.py Perl: wget http://ccl.cse.nd.edu/software/manuals/work_queue_example.plDepending on your language, you will need to either compile the program (C) or set some environment variables (Python and Perl):
C: gcc work_queue_example.c -o work_queue_example -I${HOME}/cctools/include/cctools -L${HOME}/cctools/lib -lwork_queue -ldttools -lm Python: setenv PYTHONPATH ${HOME}/cctools/lib/python2.7/site-packages Perl: setenv PERL5LIB ${HOME}/cctools/lib/perl5/site_perlThe example program simply runs gzip on whatever files you give on the command line, so run it like this:
C: ./work_queue_example * Python: ./work_queue_example.py * Perl: ./work_queue_example.pl *The example program listens on the default port of 9123. So, you might get the following error:
couldn't listen on port 9123: Address already in useWhether you have this problem or not, go ahead and modify the program to use port zero instead:
C Example:
q = work_queue_create(0); if(!q) { printf("couldn't listen on port"); return 1; } |
Python Example:
try: q = WorkQueue(0) except: print "Instantiation of Work Queue failed!" sys.exit(1) |
Perl Example:
my $q = work_queue_create(0); if (not defined($q)) { print "Instantiation of Work Queue failed!\n"; exit 1; } |
listening on port XXXX... ... waiting for tasks to complete...Now, open up a new window (you may have to repeat your PATH setup as above) and run a single local worker:
work_queue_worker localhost XXXXAfter a short delay, you should see all the tasks completing in the first window. After another short delay, the worker should stop when it realizes there is no more work to be done. You can also stop the worker by pressing Control-C.
C Example:
q = work_queue_create(0); if(!q) { printf("couldn't listen.... return 1; } work_queue_specify_name(q,"MYPROJECT"); printf("listening on port %d...\n", work_queue_port(q)); |
Python Example:
try: q = WorkQueue(0) except: print "Instantiation of Work Queue failed!" sys.exit(1) q.specify_name("MYPROJECT"); print "listening on port %d..." % q.port |
Perl Example:
my $q = work_queue_create(0); if (not defined($q)) { print "Instantiation of Work Queue failed!\n"; exit 1; } work_queue_specify_project_($q,"MYPROJECT"); $port = work_queue_port($q); print "listening on port $port...\n"; |
condor_submit_workers -N MYPROJECT 5 Creating worker submit scripts in /tmp/nhazekam-workers... Submitting job(s)..... 5 job(s) submitted to cluster 643692.Use the condor_q command to observe that they are submitted to Condor:
condor_q -submitter $USER -- Submitter: nhazekam@nd.edu : <129.74.85.211:9618?... : crcfe01.crc.nd.edu ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 643692.0 nhazekam 10/18 14:16 0+00:00:02 R 0 0.7 work_queue_worker 643692.1 nhazekam 10/18 14:16 0+00:00:02 R 0 0.7 work_queue_worker 643692.2 nhazekam 10/18 14:16 0+00:00:02 R 0 0.7 work_queue_worker 643692.3 nhazekam 10/18 14:16 0+00:00:02 R 0 0.7 work_queue_worker 643692.4 nhazekam 10/18 14:16 0+00:00:02 R 0 0.7 work_queue_workerNow, restart your application and it will use the workers already running in Condor:
./work_queue_example * listening on port XXXX...
While the application is running, you can open up another window and see the progress of your application with work_queue_status, which will show you the project name, master host, and number of tasks in each state. This information is updated approximately once per minute.
When the application completes, your workers will still be in the cluster. You can leave them there, if you want to start another application quickly. If they are not used within fifteen minutes, they will stop automatically.