The goal of the seventh homework assignment is to allow you to practice using low-level system calls related to files and the filesystem. To do this, you will re-create our friends dd and find in Python.
For this assignment, record your source material in the homework07
folder
of your assignments Bitbucket repository and push your work by 11:59
PM Friday, March 25, 2016.
dd.py
(5 Points)For the first activity, you are to create a partial re-implementation of dd, which is a utility to copy blocks of data from one file to another:
$ ./dd.py -h
Unknown argument: -h
Usage: dd.py options...
Options:
if=FILE Read from FILE instead of stdin
of=FILE Write to FILE instead of stdout
count=N Copy only N input blocks
bs=BYTES Read and write up to BYTES bytes at a time
seek=N Skip N obs-sized blocks at start of output
skip=N Skip N ibs-sized blocks at start of input
As you can see, dd.py
should take six possible options:
if
: This specifies the FILE
to read from instead of stdin
.
of
: This specifies the FILE
to write to instead of stdout
.
count
: This specifies how many blocks to copy.
bs
: This specifes how many bytes each block should be.
seek
: This specifies how many blocks (if any) we should skip in the
output file.
skip
: This specifies how many blocks (if any) we should skip in the
input file.
For this activity, you must use the low-level file descriptor operations discussed in class: os.open, os.read, os.write, os.close, and os.lseek.
stdin
is file descriptor 0
and is the default value for if
.
stdout
is file descriptor 1
and is the default value for of
.
The default bs
value should be 512
The default count
should be sys.maxint
You can parse command line options manually, perhaps using str.split.
You should handle errors gracefully by catching exceptions for each system call. In fact, you may wish to wrap the system calls in their own functions:
def open_fd(path, mode):
try:
return os.open(path, mode)
except OSError as e:
print >>sys.stderr, 'Could not open file {}: {}'.format(path, e)
sys.exit(1)
The os.open
system call requires a mode such as os.O_RDONLY
,
os.O_WRONLY
, or os.O_RDWR
. You can combine additional flags along with
these modes by performing a bitwise or.
You cannot os.lseek on stdin
or stdout
.
To aid you in testing the dd.py
script, we are providing you with
test_dd.sh, which you can use as follows:
# Download script
$ curl -O https://bitbucket.org/CSE-20189-SP16/idlebin/raw/master/test_dd.sh
# Make script executable
$ chmod +x test_dd.sh
# Run test script
$ ./test_dd.sh
dd test successful!
By default dd truncates the output file. For your dd.py
, you are not
to truncate the output file. Instead you should mimic the behavior of dd
with the conv=notrunc
option.
In your README.md
, describe how you implemented the dd.py
script. In
particular, briefly discuss:
How you handled parsing the command line options.
How you opened the input and output files (in particular, what modes did you use).
How you utilized the seek
and skip
arguments.
How you utilized count
and bs
to read data from if
and write to
of
.
find.py
(10 Points)For the second activity, you are to create a partial re-implementation of find, which recursively searches a directory and prints items it finds based on the specified options:
Usage: find.py directory [options]...
Options:
-type [f|d] File is of type f for regular file or d for directory
-executable File is executable and directories are searchable to user
-readable File readable to user
-writable File is writable to user
-empty File or directory is empty
-name pattern Base of file name matches shell pattern
-path pattern Path of file matches shell pattern
-regex pattern Path of file matches regular expression
-perm mode File's permission bits are exactly mode (octal)
-newer file File was modified more recently than file
-uid n File's numeric user ID is n
-gid n File's numeric group ID is n
These options should mostly follow the rules and behaviors found in the traditional find utility.
By default find does not follow symbolic links by default. For your
find.py
, you are to follow symbolic links and mimic the behavior of find
with the -L
flag by having os.walk followlinks
.
To recursively traverse a directory, you should use os.walk. Make sure you follow the symbolic links.
As you walk the directory, utilize an include
function to determine
whether or not you should print out that filesystem entry:
def include(path):
''' Returns True if item should be included in output, otherwise False '''
if condition:
return False
...
return True
Use the os.stat function to get inode information. If os.stat fails (due to a broken symbolic link), then you should use os.lstat and note the broken link.
Use os.access to determine if a file is executable
, readable
, or
writable
.
An entry is considered empty if it is a file and the size is 0, if it is a directory and it has no files, and if it is a symbolic link and the link is not broken.
Use fnmatch.fnmatch to check shell patterns.
Use re to handle regular expressions.
Use os.path.basename to get the base of a file name.
Use os.path.join to construct a file path.
To aid you in testing the find.py
script, we are providing you with
test_find.sh, which you can use as follows:
# Download script
$ curl -O https://bitbucket.org/CSE-20189-SP16/idlebin/raw/master/test_find.sh
# Make script executable
$ chmod +x test_find.sh
# Run test script
$ ./test_find.sh
find test successful!
In your README.md
, describe how you implemented the find.py
script. In
particular, briefly discuss:
How you handled parsing the command line options.
How you walked the directory tree.
How you determined whether or not to print a filesystem objects path.
Use strace to compare the number of calls to the stat
and lstat
system calls your find.py
does compared to the traditional find command
on /etc
. Do you notice anything strange with find's implementation?
Investigate and explain how find is getting file information.
For extra credit, you are to sign-up for a VPS out in the cloud. That is, you are to sign up for a virtualized Linux host running on an Internet hosting provider such as:
Amazon EC2: Note, there is a Free Tier.
Google Cloud Platform: Note, there is a Free 60-day Trial.
Windows Azure: Note there is a Free Trial
Digital Ocean: You can find various promotion codes online and you can sign up for the GitHub Education Pack to get some free credit.
Vultr: This is what I use because I am cheap and like to run alternative Linux distributions such as Void Linux and Alpine Linux. There is a $5 credit when you sign up, which should last you a month. If you wish to throw me a bone, you can use this referral link
To receive full credit, you must show a TA or the instructor your VM (ie.
show the control panel and you SSHing
into it).
If you have any questions, comments, or concerns regarding the course, please
provide your feedback at the end of your README.md
.
To submit your assignment, please commit your work to the homework07
folder
in your assignments Bitbucket repository by 11:59 PM Friday, March
25, 2016. Your homework07
folder should contain the following files:
README.md
dd.py
find.py