work_queue_pool(1)

NAME

work_queue_pool - submit a pool of Work Queue workers on various batch systems.

SYNOPSIS

work_queue_pool [options] <hostname> <port> <number>

or

work_queue_pool [options] -M <project-name> <num-workers>

or

work_queue_pool [options] -A [-c <file>]

DESCRIPTION

work_queue_pool submits and maintains a number of work_queue_worker(1) processes on various batch systems, such as Condor and SGE. Each work_queue_worker process represents a Work Queue worker. All the Work Queue workers managed by a work_queue_pool process can be pointed to a specific Work Queue master, or be instructed to find their preferred masters through a catalog server.

If the <hostname> and <port> arguments are provided, the workers maintained by the work_queue_pool process would only work for the master running at <hostname>:<port>. If the -a option is present, then the <hostname> and <port> arguments are not needed and the workers would contact a catalog server to find out the appropriate masters (see the -M option). In either case, the <number> argument specifies the number of workers that work_queue_pool should maintain.

If a work_queue_worker process managed by the work_queue_pool is shutdown (i.e. failure, eviction, etc.), then the work_queue_pool will re-submit a new work_queue_worker to the specified batch system <type> in order to maintain a constant <number> of work_queue_worker processes.

OPTIONS

Batch Options

-d,--debug <flag>
Enable debugging for this subsystem.
-l,--logfile <logfile>
Log work_queue_pool status to logfile.
-S,--scratch <file>
Scratch directory. (default is /tmp/${USER}-workers)
-T,--batch-type <type>
Batch system type: unix, condor, sge, workqueue, xgrid. (default is unix)
-r,--retry <count>
Number of attemps to retry if failed to submit a worker.
-m,--workers-per-job <count>
Each batch job will start local workers. (default is 1)
-W,--worker-executable <path>
Path to the work_queue_worker(1) executable.
-A, --auto-pool-feature Run in auto pool mode, according to the configuration file given by -c (see below, work_queue_pool.conf if -c not given).
-c,--config <config>
Path to the auto pool configuration file. Implies -A.
-q, --one-shot Gurantee running workers and quit. The workers would terminate after their idle timeouts unless the user explicitly shuts them down.
-h, --help Show this screen.

Worker Options

-a, --advertise Enable auto mode. In this mode the workers would ask a catalog server for available masters. (deprecated, implied by -M,-A).
-t,--timeout <time>
Abort after this amount of idle time.
-C,--catalog <catalog>
Set catalog server to <catalog>. Format: HOSTNAME:PORT
-M,--master-name <project>
Name of a preferred project. A worker can have multiple preferred projects.
-N Same as -M,--master-name (deprecated).
-o,--debug-file <file>
Send debugging to this file.
-E, --extra-options Extra options that should be added to the worker.

Auto pool feature

work_queue_pool has the ability to maintain workers to several masters/foremen as needed, even when the multiple master/foremen report the same name to the catalogue. This is enabled by creating a pool configuration file, and using the -A option. By default, -A tries to read the configuration file work_queue_pool.conf in the current working directory. The -c option can be used to specify a different path. The configuration file is a list of key-value pairs, one pair per line, and the value separated by the key with a colon (:). The possible valid keys are:
min_workers: The minimum number of workers to maintain. This is the only required key, and its value has to be greater than zero.
max_workers: The maximum number of workers to maintain for the whole pool. The default is 100.
distribution: A coma separated list of =, in which is the maximum number of workers assigned to the master with . The specification allows some basic regular expression substitutions ('.' for any character, '*' for zero or more of the previous character, '?' for one or more of the previous character.
default_capacity: The initial capacity of the masters. Capacity is the maximum number of workers that can connect to the master such that no worker is idle. The default is 0.
ignore_capacity: Boolean yes|no value. The default is no.
mode: One of fixed|on-demand. If on-demand (the default), work_queue_pool adjust the observed capacity of the master as tasks are dispatched/completed, until the number of workers assigned to the master equal that of its distribution specification (see above). Please note that on-demand does not work if the master is a foreman. If fixed, the number of workers assigned is immediately the one given in the distribution.
max_change_per_min: For on-demand mode, this fields indicated the maximum number of workers that can be submitted per minute.

EXIT STATUS

On success, returns zero. On failure, returns non-zero.

EXAMPLES

Example 1

Suppose you have a Work Queue master running on barney.nd.edu and it is listening on port 9123. To start 10 workers on the Condor batch system for your master, you can invoke work_queue_pool like this:
work_queue_pool -T condor barney.nd.edu 9123 10
If you want to start the 10 workers on the SGE batch system instead, you only need to change the -T option:
work_queue_pool -T sge barney.nd.edu 9123 10
If you have access to both of the Condor and SGE systems, you can run both of the above commands and you will then get 20 workers for your master.

Example 2

Suppose you have started a Work Queue master with makeflow(1) like this:
makeflow -T wq -N myproject makeflow.script
The -N option given to makeflow specifies the project name for the Work Queue master. The master's information, such as hostname and port, will be reported to a catalog server. The work_queue_pool program can start workers that prefer to work for this master by specifying the same project name on the command line (see the -N option):
work_queue_pool -T condor -N my_project 10
Suppose you have two masters with project names "project-a" and "project-b", and you would like 70 workers assigned to project-a, and 30 to project-b. You could write a 'work_queue_pool.conf' file with the following contents:
distribution: project-a=70, project-b=30
max_workers: 100
min_workers: 2
And simply run:
work_queue_pool -T condor -A
Now, suppose you have several masters (or foremen) with names such as "project-1", "project-2", etc., and you would like to assign the same number of workers to all of them as they are launched, with at least 50 workers running all the time, but with no more than 400 workers running simultaneously. Furthermore, you would like to reuse workers as some of the masters finish their computation. Using the auto pool feature:
distribution: project.*=400
max_workers: 400
min_workers: 50
Note that the previous works even when not all the masters have distinct names.

KNOWN BUGS

mode: on-demand does not work when the master is a foreman. Use mode: fixed, an specify a number of workers with: min_workers:.

COPYRIGHT

The Cooperative Computing Tools are Copyright (C) 2003-2004 Douglas Thain and Copyright (C) 2005-2011 The University of Notre Dame. This software is distributed under the GNU General Public License. See the file COPYING for details.

SEE ALSO

  • Cooperative Computing Tools Documentation
  • Work Queue User Manual
  • work_queue_worker(1)
  • work_queue_status(1)
  • work_queue_pool(1)
  • condor_submit_workers(1)
  • sge_submit_workers(1)
  • torque_submit_workers(1)
  • ec2_submit_workers(1)
  • ec2_remove_workers(1)

  • CCTools 4.1.4rc4 released on 03/31/2014