Project 4: Job Scheduler

The goals of this project are:

To demonstrate mastery of process and thread management.

To practice working with synchronized data structures.

To gain experience with asynchronous job scheduling.

Overview

Consider what happens when you have a large number of jobs to accomplish, but you cannot simply run them all at once simultaneously: downloading lots of large files, sending models to a 3D-printer, rendering lots of frames for a movie, processing credit card payments, etc.

To manage a large amount of work, you need a job scheduler which will accept job submissions and put them in order. In the background, the scheduler will select a job to run, and start it. When it completes, the result is recorded and the next job started. At any point, you should be able to list the status of all jobs, and obtain the output of those that have finished.

User Interface

Your program should be called jobsched. When started, print out a prompt and wait for input from the user. The available commands should be:

submit <command>
status
wait <jobid>
remove <jobid>
njobs <n>
drain
quit
help

The submit command defines a new job and the Unix command that should be run when the job is scheduled. It should return immediately and display a unique integer job ID generated internally by your program. (Just start at one and count up.) The job will then run in the background when selected by the scheduler.

The status command lists all of the jobs currently known, giving the job id, current state (waiting, running, or done), the process id, the exit status of the job (if done) and the Unix command of the job, in a format like this:

JOBID PID  STATE EXIT COMMAND
10    323  DONE  0    fractal -o output.bmp
11    391  DONE  1    curl http://www.google.com
12    480  RUN   -    lpr myreport.pdf
13    485  RUN   -    cp -r somebigfile /tmp
14    -    WAIT  -    make test

The wait command takes a jobid and pauses until that job is done executing. When the job is complete, it should display the job ID and command, and then display the standard output generated by the job. (If the job was already complete, or wait is called multiple times, it should just display the relevant information right away.)

The remove command takes a jobid and then removes it from the queue, also deleting any stored output of the job. However, it should only do this if the job is in the WAIT or DONE states. If the job is currently running, this command should display a suitable error, and refuse to remove the job.

The njobs command indicates how many jobs the scheduler may run at once, which should be one by default.

The drain command should wait until all jobs in the queue are in the DONE state.

The quit command should immediately exit the program, regardless of any jobs in the queue. (If end-of-file is detected on the input, the program should quit in the same way.)

The help command should display the available commands in a helpful manner.

Implementation Advice

This project will bring together a variety of concepts that you have studied so far: process management, thread management, synchronization, and scheduling. We aren't going to talk through every little function you should use; you will need to review prior material and look up documentation as needed.

Here is the basic architecture you should use:

The program should consists of two threads: a main thread and a scheduler thread which interact through a common job queue. The main thread interacts with the user by reading commands, submitting jobs to the queue, displaying status, and so forth. To remain responsive, the main thread should not do any of the real work. Instead, the scheduler thread works in the background by selecting jobs out of the job queue, carrying out the commands, and then updating the job records as events occur.

The tricky part of this assignment is the job queue itself. The job queue should be implemented as a monitor as discussed in class: a data structure that is protected by a mutex and one or more condition variable(s). The job queue should only be accessed by functions that take care to use the mutex for mutual exclusion and the condition variable(s) to sleep and wakeup.

Generally speaking, the main thread should remain responsive to the user by only performing quick actions on the job queue. The user ought to be able to see immediate response to all commands, except those that specifically wait for a job to complete.

To avoid confusing the user with mixed output, the running jobs should not display any output to the console. Instead, the standard output of each job should be redirected to a file, something like output.N where N is the jobid of the job. Only when the user enters wait N should the output of the job be displayed to the screen, by reading it back from the file.

Testing

Test your scheduler carefully by using it to run a variety of jobs, both short and long. Think critically about unexpected events, such as improper input from the user, errors returned from the operating system, exhaustion of resources, and so forth. Whenever such an event occurs, you must display a brilliant explanatory error message, and (if possible) continue operation of the program.

A particularly helpful way to test is to create small input files that contain a sequence of operations, for example:

submit ls -l
status
wait 1
status
remove 1
status
quit

Then, just run your scheduler with input redirected from that file: ./jobsched < test.txt

Turning In

Please review the general instructions for assignments.

Turn in all of your source code and a Makefile that builds jobsched when the user types make, and cleans up all intermediate files on make clean.

This assignment is due at 11:59PM on Friday, March 27th. If you have unusual circumstances that would prevent you from meeting that deadline, please contact Prof. Thain.

Your dropbox is mounted on the student machines at this location:

/escnfs/courses/sp20-cse-30341.01/dropbox/YOURNETID

To submit your files, make a directory called project4 in your dropbox, and copy your files there.

Grading

Your grade will be based on:

Correct implementation of each of the interactive commands.

Correct behavior of the background job scheduler.

Correct synchronization between multiple threads in the job queue.

Good coding style, including clear formatting, sensible variable names, and useful comments.