Statistics describing a work queue. More...
#include <work_queue.h>
Data Fields | |
int | workers_connected |
Number of workers currently connected to the master. | |
int | workers_init |
Number of workers connected, but that have not send their available resources report yet. | |
int | workers_idle |
Number of workers that are not running a task. | |
int | workers_busy |
Number of workers that are running at least one task. | |
int | workers_able |
Number of workers on which the largest task can run. | |
int | workers_joined |
Total number of worker connections that were established to the master. | |
int | workers_removed |
Total number of worker connections that were released by the master, idled-out, fast-aborted, or lost. | |
int | workers_released |
Total number of worker connections that were asked by the master to disconnect. | |
int | workers_idled_out |
Total number of worker that disconnected for being idle. | |
int | workers_fast_aborted |
Total number of worker connections terminated for being too slow. | |
int | workers_blacklisted |
Total number of workers blacklisted by the master. | |
int | workers_lost |
Total number of worker connections that were unexpectedly lost. | |
int | tasks_waiting |
Number of tasks waiting to be dispatched. | |
int | tasks_on_workers |
Number of tasks currently dispatched to some worker. | |
int | tasks_running |
Number of tasks currently executing at some worker. | |
int | tasks_with_results |
Number of tasks with retrieved results and waiting to be returned to user. | |
int | tasks_submitted |
Total number of tasks submitted to the queue. | |
int | tasks_dispatched |
Total number of tasks dispatch to workers. | |
int | tasks_done |
Total number of tasks completed and returned to user. | |
int | tasks_failed |
Total number of tasks completed and returned to user with result other than WQ_RESULT_SUCCESS. | |
int | tasks_cancelled |
Total number of tasks cancelled. | |
int | tasks_exhausted_attempts |
Total number of task executions that failed given resource exhaustion. | |
timestamp_t | time_when_started |
Absolute time at which the master started. | |
timestamp_t | time_send |
Total time spent in sending tasks to workers (tasks descriptions, and input files. | |
timestamp_t | time_receive |
Total time spent in receiving results from workers (output files. | |
timestamp_t | time_send_good |
Total time spent in sending data to workers for tasks with result WQ_RESULT_SUCCESS. | |
timestamp_t | time_receive_good |
Total time spent in sending data to workers for tasks with result WQ_RESULT_SUCCESS. | |
timestamp_t | time_status_msgs |
Total time spent sending and receiving status messages to and from workers, including workers' standard output, new workers connections, resources updates, etc. | |
timestamp_t | time_internal |
Total time the queue spents in internal processing. | |
timestamp_t | time_polling |
Total time blocking waiting for worker communications (i.e., master idle waiting for a worker message). | |
timestamp_t | time_application |
Total time spent outside work_queue_wait. | |
timestamp_t | time_workers_execute |
Total time workers spent executing done tasks. | |
timestamp_t | time_workers_execute_good |
Total time workers spent executing done tasks with result WQ_RESULT_SUCCESS. | |
timestamp_t | time_workers_execute_exhaustion |
Total time workers spent executing tasks that exhausted resources. | |
int64_t | bytes_sent |
Total number of file bytes (not including protocol control msg bytes) sent out to the workers by the master. | |
int64_t | bytes_received |
Total number of file bytes (not including protocol control msg bytes) received from the workers by the master. | |
double | bandwidth |
Average network bandwidth in MB/S observed by the master when transferring to workers. | |
int | capacity_tasks |
The estimated number of tasks that this master can effectively support. | |
int | capacity_cores |
The estimated number of workers' cores that this master can effectively support. | |
int | capacity_memory |
The estimated number of workers' MB of RAM that this master can effectively support. | |
int | capacity_disk |
The estimated number of workers' MB of disk that this master can effectively support. | |
int | capacity_instantaneous |
The estimated number of tasks that this master can support considering only the most recently completed task. | |
int | capacity_weighted |
The estimated number of tasks that this master can support placing greater weight on the most recently completed task. | |
int64_t | total_cores |
Total number of cores aggregated across the connected workers. | |
int64_t | total_memory |
Total memory in MB aggregated across the connected workers. | |
int64_t | total_disk |
Total disk space in MB aggregated across the connected workers. | |
int64_t | committed_cores |
Committed number of cores aggregated across the connected workers. | |
int64_t | committed_memory |
Committed memory in MB aggregated across the connected workers. | |
int64_t | committed_disk |
Committed disk space in MB aggregated across the connected workers. | |
int64_t | max_cores |
The highest number of cores observed among the connected workers. | |
int64_t | max_memory |
The largest memory size in MB observed among the connected workers. | |
int64_t | max_disk |
The largest disk space in MB observed among the connected workers. | |
int64_t | min_cores |
The lowest number of cores observed among the connected workers. | |
int64_t | min_memory |
The smallest memory size in MB observed among the connected workers. | |
int64_t | min_disk |
The smallest disk space in MB observed among the connected workers. | |
int | total_workers_connected |
int | total_workers_joined |
int | total_workers_removed |
int | total_workers_lost |
int | total_workers_idled_out |
int | total_workers_fast_aborted |
int | tasks_complete |
int | total_tasks_dispatched |
int | total_tasks_complete |
int | total_tasks_failed |
int | total_tasks_cancelled |
int | total_exhausted_attempts |
timestamp_t | start_time |
timestamp_t | total_send_time |
timestamp_t | total_receive_time |
timestamp_t | total_good_transfer_time |
timestamp_t | total_execute_time |
timestamp_t | total_good_execute_time |
timestamp_t | total_exhausted_execute_time |
int64_t | total_bytes_sent |
int64_t | total_bytes_received |
double | capacity |
double | efficiency |
double | idle_percentage |
int64_t | total_gpus |
int64_t | committed_gpus |
int64_t | max_gpus |
int64_t | min_gpus |
int | port |
int | priority |
int | workers_ready |
int | workers_full |
int | total_worker_slots |
int | avg_capacity |
Statistics describing a work queue.
Number of workers currently connected to the master.
Number of workers connected, but that have not send their available resources report yet.
Number of workers that are not running a task.
Number of workers that are running at least one task.
Number of workers on which the largest task can run.
Total number of worker connections that were established to the master.
Total number of worker connections that were released by the master, idled-out, fast-aborted, or lost.
Total number of worker connections that were asked by the master to disconnect.
Total number of worker that disconnected for being idle.
Total number of worker connections terminated for being too slow.
Total number of workers blacklisted by the master.
(Includes workers_fast_aborted.)
Total number of worker connections that were unexpectedly lost.
(does not include idled-out or fast-aborted)
Number of tasks waiting to be dispatched.
Number of tasks currently dispatched to some worker.
Number of tasks currently executing at some worker.
Number of tasks with retrieved results and waiting to be returned to user.
Total number of tasks submitted to the queue.
Total number of tasks dispatch to workers.
Total number of tasks completed and returned to user.
(includes tasks_failed)
Total number of tasks completed and returned to user with result other than WQ_RESULT_SUCCESS.
Total number of tasks cancelled.
Total number of task executions that failed given resource exhaustion.
Absolute time at which the master started.
Total time spent in sending tasks to workers (tasks descriptions, and input files.
).
Total time spent in receiving results from workers (output files.
).
Total time spent in sending data to workers for tasks with result WQ_RESULT_SUCCESS.
Total time spent in sending data to workers for tasks with result WQ_RESULT_SUCCESS.
Total time spent sending and receiving status messages to and from workers, including workers' standard output, new workers connections, resources updates, etc.
Total time the queue spents in internal processing.
Total time blocking waiting for worker communications (i.e., master idle waiting for a worker message).
Total time spent outside work_queue_wait.
Total time workers spent executing done tasks.
Total time workers spent executing done tasks with result WQ_RESULT_SUCCESS.
Total time workers spent executing tasks that exhausted resources.
int64_t work_queue_stats::bytes_sent |
Total number of file bytes (not including protocol control msg bytes) sent out to the workers by the master.
int64_t work_queue_stats::bytes_received |
Total number of file bytes (not including protocol control msg bytes) received from the workers by the master.
double work_queue_stats::bandwidth |
Average network bandwidth in MB/S observed by the master when transferring to workers.
The estimated number of tasks that this master can effectively support.
The estimated number of workers' cores that this master can effectively support.
The estimated number of workers' MB of RAM that this master can effectively support.
The estimated number of workers' MB of disk that this master can effectively support.
The estimated number of tasks that this master can support considering only the most recently completed task.
The estimated number of tasks that this master can support placing greater weight on the most recently completed task.
int64_t work_queue_stats::total_cores |
Total number of cores aggregated across the connected workers.
int64_t work_queue_stats::total_memory |
Total memory in MB aggregated across the connected workers.
int64_t work_queue_stats::total_disk |
Total disk space in MB aggregated across the connected workers.
Committed number of cores aggregated across the connected workers.
Committed memory in MB aggregated across the connected workers.
int64_t work_queue_stats::committed_disk |
Committed disk space in MB aggregated across the connected workers.
int64_t work_queue_stats::max_cores |
The highest number of cores observed among the connected workers.
int64_t work_queue_stats::max_memory |
The largest memory size in MB observed among the connected workers.
int64_t work_queue_stats::max_disk |
The largest disk space in MB observed among the connected workers.
int64_t work_queue_stats::min_cores |
The lowest number of cores observed among the connected workers.
int64_t work_queue_stats::min_memory |
The smallest memory size in MB observed among the connected workers.
int64_t work_queue_stats::min_disk |
The smallest disk space in MB observed among the connected workers.
deprecated fields:
double work_queue_stats::capacity |
double work_queue_stats::efficiency |
int64_t work_queue_stats::total_gpus |
int64_t work_queue_stats::committed_gpus |
int64_t work_queue_stats::max_gpus |
int64_t work_queue_stats::min_gpus |