resource_monitor(1)
NAME
resource_monitor - monitors the cpu, memory, io, and disk usage of a tree of processes.
SYNOPSIS
resource_monitor [options] -- command [command-options]
resource_monitorv [options] -- command [command-options]
DESCRIPTION
resource_monitor is a tool to monitor the computational
resources used by the process created by the command given as an
argument, and all its descendants. The monitor works
'indirectly', that is, by observing how the environment changed
while a process was running, therefore all the information
reported should be considered just as an estimate (this is in
contrast with direct methods, such as ptrace). It has been tested
in Linux, FreeBSD, and Darwin, and can be used automatically by
makeflow and work queue applications.
Additionally, the user can specify maximum resource limits in the
form of a file, or a string given at the command line. If one of
the resources goes over the limit specified, then the monitor
terminates the task, and reports which resource went over the
respective limits.
In systems that support it, resource_monitor wraps some
libc functions to obtain a better estimate of the resources used.
In contrast, resource_monitorv disables this wrapping,
which means, among others, that it can only monitor the root
process, but not its descendants.
Currently, the monitor does not support interactive applications. That
is, if a process issues a read call from standard input, and standard
input has not been redirected, then the tree process is
terminated. This is likely to change in future versions of the tool.
resource_monitor generates up to three log files: a summary
file with the maximum values of resource used, a time-series that
shows the resources used at given time intervals, and a list of
files that were opened during execution.
The summary file has the following format:
command: [the command line given as an argument]
start: [seconds at the start of execution, since the epoch, float]
end: [seconds at the end of execution, since the epoch, float]
exit_type: [one of normal, signal or limit, string]
signal: [number of the signal that terminated the process.
Only present if exit_type is signal int]
limits_exceeded: [resources over the limit. Only present if
exit_type is limit, string]
exit_status: [final status of the parent process, int]
max_concurrent_processes: [the maximum number of processes running concurrently, int]
total_processes: [count of all of the processes created, int]
wall_time: [seconds spent during execution, end - start, float]
cpu_time: [user + system time of the execution, in seconds, float]
virtual_memory: [maximum virtual memory across all processes, in MB, int]
resident_memory: [maximum resident size across all processes, in MB, int]
swap_memory: [maximum swap usage across all processes, in MB, int]
bytes_read: [number of bytes read from disk, int]
bytes_written: [number of bytes written to disk, int]
workdir_num_files: [total maximum number of files and directories of
all the working directories in the tree, int]
workdir_footprint: [size in MB of all working directories in the tree, int]
The time-series log has a row per time sample. For each row, the columns have the following meaning:
wall_clock [the sample time, since the epoch, in microseconds, int]
cpu_time [accumulated user + kernel time, in microseconds, int]
concurrent [concurrent processes at the time of the sample, int]
virtual [current virtual memory size, in MB, int]
resident [current resident memory size, in MB, int]
swap [current swap usage, in MB, int]
bytes_read [accumulated number of bytes read, int]
bytes_written [accumulated number of bytes written, int]
files [current number of files and directories, across all
working directories in the tree, int]
footprint [current size of working directories in the tree, in MB int]
OPTIONS
-d,--debug <subsystem> |
| Enable debugging for this subsystem.
|
-o,--debug-file <file> |
| Write debugging output to this file. By default, debugging is sent to stderr (":stderr"). You may specify logs be sent to stdout (":stdout"), to the system syslog (":syslog"), or to the systemd journal (":journal").
|
-i,--interval <n> |
| Interval between observations, in seconds (default=1).
|
-l,--limits-file <file> |
| Use maxfile with list of var: value pairs for resource limits.
|
-L,--limits <string> |
| String of the form "var: value, var: value\ to specify resource limits. (Could be specified multiple times.)
|
-f, --child-in-foreground | Keep the monitored process in foreground (for interactive use).
|
-O,--with-output-files <template> |
| Specify template for log files (default=resource-pid-).
--with-summary-file <file> | | Write resource summary to (default=.summary).
--with-time-series <file> | | Write resource time series to (default=.series).
--with-opened-files <file> | | Write list of opened files to (default=.opened).
-V,--verbatim-to-summary <str> | | Include this string verbatim in a line in the summary. (Could be specified multiple times.)
| --without-summary-file | Do not write the summary log file.
| --without-time-series | Do not write the time-series log file.
| --without-opened-files | Do not write the list of opened files.
| --with-disk-footprint | Measure working directory footprint (potentially slow).
| --without-disk-footprint | Do not measure working directory footprint (default).
| -v,--version | Show version string.
| -h,--help | Show help text.
| | | | |
The limits file should contain lines of the form:
resource: max_value
It may contain any of the following fields, in the same units as
defined for the summary file:
max_concurrent_processes,
wall_time, cpu_time,
virtual_memory, resident_memory, swap_memory,
bytes_read, bytes_written,
workdir_number_files_dirs, workdir_footprint
ENVIRONMENT VARIABLES
CCTOOLS_RESOURCE_MONITOR_HELPER Location of the desired helper library to wrap libc calls. If not provided, a version of the helper library is packed with the resource_monitor executable.
EXIT STATUS
The exit status of the command line provided.
EXAMPLES
To monitor 'sleep 10', at 2 second intervals, with output to sleep-log.summary, sleep-log.series, and sleep-log.files, and with a monitor alarm at 5 seconds:
% resource_monitor --interval=2 -L"wall_time: 5" -o sleep-log -- sleep 10
It can also be run automatically from makeflow, by specifying the '-M' flag:
% makeflow -M Makeflow
In this case, makeflow wraps every command line rule with the
monitor, and writes the resulting logs per rule in an
automatically created directory
Additionally, it can be run automatically from Work Queue:
q = work_queue_create_monitoring(port);
work_queue_enable_monitoring(q, some-log-file);
wraps every task with the monitor, and appends all generated
summary files into the file some-log-file.
BUGS
The monitor cannot track the children of statically linked executables.
Not all systems report major memory faults, which means IO from memory maps is computed by changes in the resident set, and therefore not very exact.
One would expect to be able to generate the information of the summary from the time-series, however they use different mechanisms, and the summary tends to be more accurate.
COPYRIGHT
The Cooperative Computing Tools are Copyright (C) 2003-2004 Douglas Thain and Copyright (C) 2005-2011 The University of Notre Dame. This software is distributed under the GNU General Public License. See the file COPYING for details.
CCTools 4.3.3 released on 03/25/2015