Homework 07: Files, Filesystem

The goal of the seventh homework assignment is to allow you to practice using low-level system calls related to files and the filesystem. To do this, you will re-create our friends dd and find in Python.

For this assignment, record your source material in the homework07 folder of your assignments Bitbucket repository and push your work by 11:59 PM Friday, March 25, 2016.

Activity 1: `dd.py` (5 Points)

For the first activity, you are to create a partial re-implementation of dd, which is a utility to copy blocks of data from one file to another:

$ ./dd.py -h
Unknown argument: -h
Usage: dd.py options...

Options:

      if=FILE     Read from FILE instead of stdin
      of=FILE     Write to FILE instead of stdout

      count=N     Copy only N input blocks
      bs=BYTES    Read and write up to BYTES bytes at a time

      seek=N      Skip N obs-sized blocks at start of output
      skip=N      Skip N ibs-sized blocks at start of input

As you can see, dd.py should take six possible options:

if: This specifies the FILE to read from instead of stdin.
of: This specifies the FILE to write to instead of stdout.
count: This specifies how many blocks to copy.
bs: This specifes how many bytes each block should be.
seek: This specifies how many blocks (if any) we should skip in the output file.
skip: This specifies how many blocks (if any) we should skip in the input file.

Low-level System Calls

For this activity, you must use the low-level file descriptor operations discussed in class: os.open, os.read, os.write, os.close, and os.lseek.

Hints

stdin is file descriptor 0 and is the default value for if.
stdout is file descriptor 1 and is the default value for of.
The default bs value should be 512
The default count should be sys.maxint
You can parse command line options manually, perhaps using str.split.

You should handle errors gracefully by catching exceptions for each system call. In fact, you may wish to wrap the system calls in their own functions:

def open_fd(path, mode):
    try:
        return os.open(path, mode)
    except OSError as e:
        print >>sys.stderr, 'Could not open file {}: {}'.format(path, e)
        sys.exit(1)

The os.open system call requires a mode such as os.O_RDONLY, os.O_WRONLY, or os.O_RDWR. You can combine additional flags along with these modes by performing a bitwise or.
You cannot os.lseek on stdin or stdout.

Test Script

To aid you in testing the dd.py script, we are providing you with test_dd.sh, which you can use as follows:

# Download script
$ curl -O https://bitbucket.org/CSE-20189-SP16/idlebin/raw/master/test_dd.sh

# Make script executable
$ chmod +x test_dd.sh

# Run test script
$ ./test_dd.sh
dd test successful!

Output Truncation

By default dd truncates the output file. For your dd.py, you are not to truncate the output file. Instead you should mimic the behavior of dd with the conv=notrunc option.

Questions

In your README.md, describe how you implemented the dd.py script. In particular, briefly discuss:

How you handled parsing the command line options.
How you opened the input and output files (in particular, what modes did you use).
How you utilized the seek and skip arguments.
How you utilized count and bs to read data from if and write to of.

Activity 2: `find.py` (10 Points)

For the second activity, you are to create a partial re-implementation of find, which recursively searches a directory and prints items it finds based on the specified options:

Usage: find.py directory [options]...

Options:

    -type [f|d]     File is of type f for regular file or d for directory

    -executable     File is executable and directories are searchable to user
    -readable       File readable to user
    -writable       File is writable to user

    -empty          File or directory is empty

    -name  pattern  Base of file name matches shell pattern
    -path  pattern  Path of file matches shell pattern
    -regex pattern  Path of file matches regular expression

    -perm  mode     File's permission bits are exactly mode (octal)
    -newer file     File was modified more recently than file

    -uid   n        File's numeric user ID is n
    -gid   n        File's numeric group ID is n

These options should mostly follow the rules and behaviors found in the traditional find utility.

Following Symbolic Links

By default find does not follow symbolic links by default. For your find.py, you are to follow symbolic links and mimic the behavior of find with the -L flag by having os.walk followlinks.

Hints

To recursively traverse a directory, you should use os.walk. Make sure you follow the symbolic links.

As you walk the directory, utilize an include function to determine whether or not you should print out that filesystem entry:

def include(path):
    ''' Returns True if item should be included in output, otherwise False '''

    if condition:
        return False

    ...

    return True

Use the os.stat function to get inode information. If os.stat fails (due to a broken symbolic link), then you should use os.lstat and note the broken link.
Use functions stat to parse the inode information.
Use os.access to determine if a file is executable, readable, or writable.
An entry is considered empty if it is a file and the size is 0, if it is a directory and it has no files, and if it is a symbolic link and the link is not broken.
Use fnmatch.fnmatch to check shell patterns.
Use re to handle regular expressions.
Use os.path.basename to get the base of a file name.
Use os.path.join to construct a file path.

Test Script

To aid you in testing the find.py script, we are providing you with test_find.sh, which you can use as follows:

# Download script
$ curl -O https://bitbucket.org/CSE-20189-SP16/idlebin/raw/master/test_find.sh

# Make script executable
$ chmod +x test_find.sh

# Run test script
$ ./test_find.sh
find test successful!

Questions

In your README.md, describe how you implemented the find.py script. In particular, briefly discuss:

How you handled parsing the command line options.
How you walked the directory tree.
How you determined whether or not to print a filesystem objects path.
Use strace to compare the number of calls to the stat and lstat system calls your find.py does compared to the traditional find command on /etc. Do you notice anything strange with find's implementation? Investigate and explain how find is getting file information.

Guru Point (1 Point)

For extra credit, you are to sign-up for a VPS out in the cloud. That is, you are to sign up for a virtualized Linux host running on an Internet hosting provider such as:

Amazon EC2: Note, there is a Free Tier.
Google Cloud Platform: Note, there is a Free 60-day Trial.
Windows Azure: Note there is a Free Trial
Digital Ocean: You can find various promotion codes online and you can sign up for the GitHub Education Pack to get some free credit.
Vultr: This is what I use because I am cheap and like to run alternative Linux distributions such as Void Linux and Alpine Linux. There is a $5 credit when you sign up, which should last you a month. If you wish to throw me a bone, you can use this referral link

To receive full credit, you must show a TA or the instructor your VM (ie. show the control panel and you SSHing into it).

Feedback

If you have any questions, comments, or concerns regarding the course, please provide your feedback at the end of your README.md.

Submission

To submit your assignment, please commit your work to the homework07 folder in your assignments Bitbucket repository by 11:59 PM Friday, March 25, 2016. Your homework07 folder should contain the following files:

README.md
dd.py
find.py

Homework 07: Files, Filesystem

Activity 1: dd.py (5 Points)

Low-level System Calls

Hints

Test Script

Output Truncation

Questions

Activity 2: find.py (10 Points)

Following Symbolic Links

Hints

Test Script

Questions

Guru Point (1 Point)

Feedback

Submission

Activity 1: `dd.py` (5 Points)

Activity 2: `find.py` (10 Points)