The goal of the eighth homework assignment is to allow you to practice using low-level system calls related to processes. To do this, you will create two new programs in Python that involve forking, execing, and waiting.
For this assignment, record your source material in the homework08
folder
of your assignments Bitbucket repository and push your work by 11:59
PM Wednesday, April 6, 2016.
For the activities, you must use the low-level process operations discussed in class: os.fork, os.exec*, os.kill, os.wait, or os.waitpid.
This means that you cannot use os.system, os.popen, or subprocess.
timeout.py
(7 Points)For the first activity, you are to create a program called timeout.py
,
which executes the specified command and then waits for it to complete within
the specified number of SECONDS
. If the program fails to complete within
that duration, it is terminated with SIGTERM
:
# Usage message
$ ./timeout.py -h
Usage: timeout.py [-t SECONDS] command...
Options:
-t SECONDS Timeout duration before killing command (default is 10 seconds)
-v Display verbose debugging output
# Successful execution within time limit
$ ./timeout.py sleep 1 && echo Success
Success
# Unsuccessful execution that exceeded time limit
$ ./timeout.py sleep 15 || echo Failure
Failure
# Set timeout duration to 5 seconds
$ ./timeout.py -t 5 sleep 1 && echo Success
Success
# Set timeout duration to 1 second
$ ./timeout.py -t 1 sleep 5 || echo Failure
Failure
# Set timeout duration to 5 seconds and display debugging output
$ ./timeout.py -v -t 5 sleep 1 && echo Success
Executing "sleep 1" for at most 5 seconds...
Forking...
Enabling Alarm...
Waiting...
Execing...
Disabling Alarm...
Process 6930 terminated with exit status 0
Success
# Set timeout duration to 1 second and display debugging output
$ ./timeout.py -v -t 1 sleep 5 || echo Failure
Executing "sleep 5" for at most 1 seconds...
Forking...
Enabling Alarm...
Waiting...
Execing...
Alarm Triggered after 1 seconds!
Killing PID 7470...
Disabling Alarm...
Process 7470 terminated with exit status 15
Failure
Note, timeout.py
must sys.exit with the exit status of the child
process. You don't have to worry about the grammar regarding the "seconds"
debugging output.
This program is inspired by the ACM-ICPC International Collegiate Programming Contest, where each solution only has a fixed amount of time to complete.
Rather than having a bunch of if
statements to control the debugging
output, create a debug
function that abstracts this displaying of debugging
output:
def debug(message, *args):
''' Print message formatted with args to sys.stderr if VERBOSE is True '''
...
# Main Execution
debug('Executing "{}" for at most {} seconds...', COMMAND, SECONDS)
...
Note, the debug
function takes advantage of Python's support for
arbitrary argument lists.
Remember that system calls can fail and that in Python these failures
manifes themselves as OSError exceptions. You must guard the execution of
these system calls with try/except
blocks.
The child can use os.execvp or os.execlp to execute COMMAND
.
The parent can use signal.alarm to implement the timeout mechanism.
The parent can use os.kill to terminate the child.
Make sure that you don't create any zombie processes. Ensure that the parent always calls os.wait on its child.
In the past, we have given you test scripts to help automate checking if your
scripts are correct. For this activity, you are to create test_timeout.sh
,
which should verify the correctness of timeout.py
by checking the following
conditions and scenarios:
Verify that timeout.py
is executable.
Verify that timeout.py
has python2.7
in the she-bang.
Verify that timeout.py
prints something reasonable to STDERR
when
the -h
flag is set.
Verify that timeout.py
exits with success when executing:
./timeout.py -t 5 sleep N
Where N
is 1-4
(that is test for when N
is 1
, then N
is 2
, etc.).
Verify that timeout.py
exits with failure when executing:
./timeout.py -t 1 sleep N
Where N
is 2-5
(that is test for when N
is 2
, then N
is 3
, etc.).
Verify that timeout.py
prints something reasonable to STDERR
when
the -v
flag is set.
Should any test fail, you should print out an informative error message and exit with a failure status code.
In your README.md
, describe how you implemented the timeout.py
script. In
particular, briefly discuss:
What the role of the parent and child processes were and how each accomplished their tasks using system calls.
How the timeout mechanism worked and what system calls were used.
How the test script verifies the correctness of your program.
What happens when you set SECONDS
and the argument to sleep
to the
same duration:
./timeout.py -t 2 sleep 2
To explore this, run the above command 300
times. Do you always get
the same result (ie. does the script exit with success or failure each
time)? Explain how you experimented with this and whether or not it is
reasonable to expect consistent results.
rorschach.py
(8 Points)For the second activity, you are to create a program called rorschach.py
which monitors a series of directories for changes and executes actions based
on pattern rules:
$ ./rorschach.py -h
Usage: rorschach.py [-r RULES -t SECONDS] DIRECTORIES...
Options:
-r RULES Path to rules file (default is .rorschach.yml)
-t SECONDS Time between scans (default is 2 seconds)
-v Display verbose debugging output
-h Show this help message
For instance, suppose you had a directory with the following files:
# List files in the data directory
$ ls data
a.txt b.txt c.txt
Now suppose you had a rules.yaml
file that contained the following:
- pattern: '*'
action: 'echo {path}'
Each rule consists of a pattern
, either a shell glob or a regular
expression, that is tested against the path
of the file being examined.
If there is a match and a change has been detected, then the action
is
executed.
For example, suppose we run rorschach.py
on the data
directory above with
rules.yaml
file:
$ ./rorschach.py -r rules.yaml data
If we touch the file a.txt
in the data
directory, we should see the
following output:
# Terminal with rorschach | # Terminal with shell in data directory
$ ./rorschach.py -r rules.yaml data | $ touch a.txt
data/a.txt
If we create a file d.txt
in the data
directory, we should see the
following output:
# Terminal with rorschach | # Terminal with shell in data directory
$ ./rorschach.py -r rules.yaml data | $ touch a.txt
data/a.txt | $ touch d.txt
data/d.txt
To summarize, the rorschach.py
program will continuously scan (i.e. every
2
seconds) the data
directory for changes to files and then use the
rules.yaml
file to execution actions
on files that match the
corresponding pattern
.
This type of utility is called a file watching service and is quite useful. Many people use such applications to trigger actions (e.g. rebuild project, deploy code, etc.) based on different events (file modification, creation, or removal). For instance, Facebook has their own Watchman utility, while there is also fswatch and inotifywatch.
To read the rules, you will need to use the yaml.load function from the PyYAML package to parse the file into a list of dicts.
Note, if you use the instructor's version of Python on the student machines, then this package will already be installed. For other environments, you will need to figure out a way to get that package.
Default settings should be:
RULES
: .rorschach.ymlSECONDS
: 2DIRECTORIES
: .VERBOSE
: FalseBreakup of your program into smaller chunks by writing functions that:
check_directory
: This function walks the specified directory and
checks each file if it matches any of the rules.
check_file
: This function checks each file to see if it matches any
of the rules and then executes the action.
execute_action
: This function executes the action.
You can use the shlex.split function to help create a list of arguments to use with os.execvp when executing the action.
You must support both rules that contain either {name}
or {path}
in
the action where {name}
is the basename of the file and {path}
is
the full path of the file. To support this, you can use the str.format
method on the action string.
You program should only execute actions on files that have changed since
the start of rorschach.py
.
One way to detect changes is to keep track of modification times in a data structure with fast lookup times. You will have to think about which standard Python collection to use and exactly when to add or lookup information with the data structure.
Do not try to implement everything at once. Instead, approach this program with the iterative and incremental development mindset and slowly build pieces of your application one feature at a time:
Parses command-line arguments
Loads rules from file
Iterates through each directory
Iterates through each file in each directory
Iterates through each rule
Detects if a file matches a rule
Detects a new file
Detects a modified file
Executions an action
Remember that the goal at the end of each iteration is that you have a working program that successfully implements all of the features up to that point.
Focus one thing at a time and feel free to write small test scripts use the interpreter to try out small snippets of code.
In your README.md
, describe how you implemented the rorschach.py
script. In
particular, briefly discuss:
How you scanned the filesystem to ensure you checked the files in the specified directories.
How you loaded the rules and used them to check the files.
What data structure did you use to help detect changes to files and the logic you used determine if a file was new or modified.
How you executed each action.
The current design to rorschach.py
suffers from two problems related to
the following concepts:
Explain what these problems mean in the context of rorschach.py
and
under which scenarios would these issues cause performance or efficiency
challenges.
What are some ways these challenges can be alleviated? (You don't have to implement them, just suggest a few ways we can prevent or mitigate busy waiting and how to implement cache invalidation)
For extra credit, you are to setup virtualenv on the student machines and demonstrate to a TA or the instructor using a virtual environment to install and use a package such as:
If you have any questions, comments, or concerns regarding the course, please
provide your feedback at the end of your README.md
.
To submit your assignment, please commit your work to the homework08
folder
in your assignments Bitbucket repository by 11:59 PM Wednesday, April
6, 2016. Your homework08
folder should contain the following files:
README.md
rorschach.py
timeout.py
test_timeout.sh
Examples of the two applications, timeout.py
and rorschach.py
, can be
found on the student machines at ~pbui/pub/bin/timeout
and
pbui/pub/bin/rorschach
.
You may use these to get an idea of what sort of behavior is expected, but you should not reverse engineer them!