Reading 07: Review

Everyone:

Next week, we will have Exam 02, which will cover scripting in Python with a focus on: data structures, functional programming, and concurrency and parallelism. This reading assignment is meant to prepare you for this exam and is based on the items in Checklist 02.

TL;DR

The focus of this reading is to allow you to review for Exam 02.

Readings

The readings for Wednesday, March 7 are:

Checklist 02

Quiz

This week, the reading is split into two sections: the first part is a short dredd quiz, while the second part involves four short Python scripts: translate1.py, translate2.py, translate3.py, and translate4.py.

To test these scripts, you will need to download the Makefile and test scripts:

$ git checkout master                 # Make sure we are in master branch
$ git pull --rebase                   # Make sure we are up-to-date with GitLab

$ git checkout -b reading07           # Create reading07 branch and check it out

$ cd reading07                        # Go into reading07 folder

# Download Reading 07 Makefile
$ curl -LO https://gitlab.com/nd-cse-20289-sp18/cse-20289-sp18-assignments/raw/master/reading07/Makefile

# Execute tests (and download them)
$ make

Code Snippets

Record the answers to the following Reading 07 Quiz questions in your reading07 branch:

Translations

Given the following Unix pipelines, write Python scripts (ie. translateX.py) that accomplishes the same task.

translate1.py: grep -Po '9\d*9' /etc/passwd | wc -l
translate2.py: cat /etc/passwd | cut -d : -f 5 | grep -Po '[Uu]ser' | wc -l
translate3.py: curl -sL http://yld.me/raw/lmz | cut -d , -f 2 | grep -Eo '^B.*' | sort
translate4.py: /bin/ls -l /etc | awk '{print $2}' | sort | uniq -c

Notes

No credit will be given for simply calling os.system on the given pipeline.
Use functional programming whenever possible.
You do not need to do a literal translation (that is you don't have to replicate each portion of the pipeline); you just need to accomplish the same overall task and emit the same output.
Most of the scripts should only be 5 - 10 lines long.

Guru Point (1 Point)

For extra credit, you can brute-force attack passwords larger than length 6 utilizing by Makeflow to coordinate an army of hulk.py's. As discussed in class, Makeflow is a workflow execution system that models applications in terms of a DAG. During execution, this graph is traversed and independent nodes are executed in parallel if there are enough resources.

Before we can use Makeflow, we must first create a workflow DAG. To help you get started, we have provided you with fury.py:

# Download fury
$ curl -LO https://gitlab.com/nd-cse-20289-sp18/cse-20289-sp18-assignments/raw/master/reading07/fury.py

# Make it executable
$ chmod +x fury.py

Internally, fury.py contains the following Python code:

#!/usr/bin/env python3

import hulk
import json

# Constants

HULK     = 'hulk.py'
HASHES   = hulk.HASHES

# Makeflow Class

class Makeflow(object):

    def __init__(self):
        self.rules = []

    def add_rule(self, command, inputs, outputs, local=False):
        rule = {
            'command': command,
            'inputs' : inputs,
            'outputs': outputs,
        }

        if local:
            rule['local_job'] = True

        self.rules.append(rule)

    def __str__(self):
        return json.dumps({
            'rules'      : self.rules,
        }, indent=4)

# Main execution

if __name__ == '__main__':
    makeflow = Makeflow()
    outputs  = []

    # Password of length 1 - 4
    for length in range(1, 5):          # TODO: do up through length 6
        output = 'p.{}'.format(length)
        makeflow.add_rule(
            './{} -l {} -s {} > {}'.format(HULK, length, HASHES, output),
            [HULK, HASHES],
            [output],
        )
        outputs.append(output)

    # Passwords of length 7, 8
    # TODO: Add rules for lengths 7 and 8 by taking advantage of prefix arguments

    # Merge all passwords
    makeflow.add_rule(
        'cat {} > passwords.txt'.format(' '.join(outputs)),
        outputs,
        ['passwords.txt'],
        True
    )

    print(makeflow)

As can be seen, fury.py defines a simple Makeflow class that allows you to add rules, each of which is a definition of a node in the graph (ie. the command to run, a list of inputs, and a list of outputs). Creating a workflow is just a matter of defining all the rules or commands that need to be executed.

In the given starter code, we have defined the rules for running hulk.py on passwords of length 1 - 4. Additionally, we have a final merge rule that combines the output of all the previous commands into a single passwords.txt file.

Execution

To use fury.py, you need to make sure you have a working hulk.py and that it is in the same directory as fury.py. You will also need a copy of hashes.txt from [Homework 05]. Once all these conditions have been met, you can use fury.py to generate a Makeflow file by running fury.py:

$ ./fury.py | tee Makeflow
{
    "rules": [
        {
            "command": "./hulk.py -l 1 -s hashes.txt > p.1",
            "inputs": [
                "hulk.py",
                "hashes.txt"
            ],
            "outputs": [
                "p.1"
            ]
        },
        {
            "command": "./hulk.py -l 2 -s hashes.txt > p.2",
            "inputs": [
                "hulk.py",
                "hashes.txt"
            ],
            "outputs": [
                "p.2"
            ]
        },
        {
            "command": "./hulk.py -l 3 -s hashes.txt > p.3",
            "inputs": [
                "hulk.py",
                "hashes.txt"
            ],
            "outputs": [
                "p.3"
            ]
        },
        {
            "command": "./hulk.py -l 4 -s hashes.txt > p.4",
            "inputs": [
                "hulk.py",
                "hashes.txt"
            ],
            "outputs": [
                "p.4"
            ]
        },
        {
            "command": "cat p.1 p.2 p.3 p.4 > passwords.txt",
            "inputs": [
                "p.1",
                "p.2",
                "p.3",
                "p.4",
            ],
            "outputs": [
                "passwords.txt"
            ],
            "local_job": true
        }
    ]
}

As you can see, fury.py generates a JSON document that contains the rules or commands that need to be ran in the workflow.

To run Makeflow, you will first need to set some environmental variables:

# In Bash
export PATH=~condor/software/sbin:$PATH
export PATH=~condor/software/bin:$PATH
export PATH=/afs/crc.nd.edu/group/ccl/software/x86_64/redhat6/cctools/current/bin:$PATH

# Check
$ which makeflow
/afs/crc.nd.edu/group/ccl/software/x86_64/redhat6/cctools/current/bin/makeflow

Next, you can run the Makeflow generated by fury.py on the local machine by doing the following:

$ makeflow --jx -T local
parsing ./Makeflow...
local resources: 12 cores, 11908 MB memory, 8789 MB disk
max running local jobs: 12
checking ./Makeflow for consistency...
./Makeflow has 5 rules.
starting workflow....
submitting job: ./hulk.py -l 4 -s hashes.txt > p.4
submitted job 31128
submitting job: ./hulk.py -l 3 -s hashes.txt > p.3
submitted job 31129
submitting job: ./hulk.py -l 2 -s hashes.txt > p.2
submitted job 31130
submitting job: ./hulk.py -l 1 -s hashes.txt > p.1
submitted job 31131
job 31131 completed
job 31130 completed
job 31129 completed
job 31128 completed
submitting job: cat p.1 p.2 p.3 p.4 > passwords.txt
submitted job 31170
job 31170 completed
nothing left to do

When using the local batch system (ie. -T local), Makeflow automatically detects how many cores there are on the system and will execute up to that many processes at once. In this case, since we only have 4 hulk.py rules, each is run indepedently until a final project occurs at the end of the workflow.

Extension

Now that you have an idea of what Makeflow does, you now need to modify fury.py so that it generates rules for passwords of length 5, 6, 7, and 8. There are TODOS indicating where you should modify the code.

Passwords of length 5 and 6 are straightforward, just extend the current range in the provided for loop.

For passwords of length 7 and 8, you should take advantage of the prefix command-line argument of hulk.py. For instance, instead of a single rule for passwords of length 7, you should have multiple rules for passwords of length 6 and a unique prefix:

{
    "command": "./hulk.py -c 2 -l 6 -s hashes.txt -p a > p.7a",
    "inputs": [
        "hulk.py",
        "hashes.txt"
    ],
    "outputs": [
        "p.7a"
    ]
},
{
    "command": "./hulk.py -c 2 -l 6 -s hashes.txt -p b > p.7b",
    "inputs": [
        "hulk.py",
        "hashes.txt"
    ],
    "outputs": [
        "p.7b"
    ]
},
...
{
    "command": "./hulk.py -c 2 -l 6 -s hashes.txt -p 9 > p.79",
    "inputs": [
        "hulk.py",
        "hashes.txt"
    ],
    "outputs": [
        "p.79"
    ]
},
...

The same thing applies for passwords of length 8, except you should have prefixes of length 2. In total, your resulting Makeflow should have 1339 rules or jobs.

Distributed Computing

Once you have a Makeflow with all the rules necessary to brute-force passwords of length 1 - 8, you can now execute it on the Condor cluster using Work Queue. As discussed in class, Condor is a system for managing a large number of machine resources, while Work Queue is a framework for building large scale master-work applications.

Using one of the student machines, you can start your Makeflow with the following command:

# Start Makeflow with Work Queue batch system
$ makeflow --jx -T wq -N fury-$NETID      # Replace $NETID with your NETID

This will start your Makeflow with the Work Queue engine. Unfortunately, nothing will happen until you submit workers to the Makeflow. To do this, you can run the following command (from another terminal or shell):

$ condor_submit_workers -N fury-$NETID 50 # Replace $NETID with your NETID
Creating worker submit scripts in /tmp/pbui-workers...
Submitting job(s)..................................................
50 job(s) submitted to cluster 647011.

This will submit 50 workers to the Condor pool. To check on the status of the workers, you can run the following command:

$ condor_q -submitter pbui
-- Submitter: pbui@nd.edu : <129.74.152.75:9618?... : student02.cse.nd.edu
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD
647011.0   pbui            2/26 21:05   0+00:00:00 I  0   1.0  work_queue_worker
647011.1   pbui            2/26 21:05   0+00:00:00 I  0   1.0  work_queue_worker
647011.2   pbui            2/26 21:05   0+00:00:00 I  0   1.0  work_queue_worker
647011.3   pbui            2/26 21:05   0+00:00:00 I  0   1.0  work_queue_worker
...

To check on the status of the Work Queue system, you can use the following command:

$ work_queue_status
PROJECT            HOST                   PORT WAITING RUNNING COMPLETE WORKERS
dihedral           128.120.146.4          9251      70       0      303       0
dihedral           128.120.146.4          7325       0       6      802      83
forcebalance       chem9165.ucdavis.edu  50123     241      30      873      30
wq_test_shore      js-129-114-104-114.je  9155       0      34       62      15
wq_test_Pisa2dvru1 js-157-111.jetstream-  9155      22       0        0       0
forcebalance       skyholder.ucdavis.edu 50123     246      30      868      30
fury-pbui          student02.cse.nd.edu   9002     950      50        0      50

As can be seen in the display above, the fury-pbui project has 950 rules left to run. It is currently running 50 jobs on 50 workers (the ones we submitted to Condor earlier).

To start a local worker for testing purposes you can use the following command:

$ work_queue_worker -d all -N fury-$NETID

If you have your own machines, you can download CCTools and run the work_queue_worker or work_queue_factory from your own machine. This will allow you to add additional resources to your Work Queue pool and thus help you complete the brute-force attack sooner.

This Might Take Awhile

Cracking all 10419 passwords took the instructor about six hours using 200+ workers.

Background

Since this will take a while, you will probably want to run Makeflow inside of either a tmux or [screen] session:

# Start tmux
$ tmux

To make sure you have permissions to write your date, make sure you grab tokens before you run the Makeflow:

# Grab AFS tokens
$ kinit -l30d

$ aklog

$ tokens  # Check if you have tokens

# Start Makeflow
$ makeflow --jx -T wq -N fury-$NETID      # Replace $NETID with your NETID

With this setup, you can then disconnect from student02. To return to your Makeflow, you can just do:

$ tmux attach

Monitoring

To monitor the Makeflow, you can either look at the output of work_queue_status or you can use the makeflow_monitor script:

$ makeflow_monitor Makeflow.makeflowlog

Interruptions

If your workflow fails or gets interrupted, you can always restart your Makeflow by using the command above and it should resume where it left off. That is, it will not repeat any tasks it has successfully completed.

If you modify the Makeflow file, however, you will need to remove the Makeflow.makeflowlog and restart the whole process over.

Extra Credit

To get credit for this Guru Point, you must show a TA or the instructor your completed fury.py and get at least 10,000 passwords on the deadpool.

Submission

To submit you work, follow the same process outlined in Reading 01:

$ git checkout master                 # Make sure we are in master branch
$ git pull --rebase                   # Make sure we are up-to-date with GitLab

$ git checkout -b reading07           # Create reading07 branch and check it out

$ cd reading07                        # Go into reading07 folder

$ $EDITOR answers.json                # Edit your answers.json file

$ ../.scripts/submit.py               # Check reading07 quiz
Submitting reading07 assignment ...
Submitting reading07 quiz ...
     Q01 0.20
     Q02 0.20
     Q03 0.20
     Q04 0.20
     Q05 0.20
     Q06 0.20
     Q07 0.20
     Q08 0.20
     Q09 0.20
     Q10 0.20
   Score 2.00

$ git add answers.json                # Add answers.json to staging area
$ git commit -m "Reading 07: Quiz"    # Commit work

$ $EDITOR translate1.py               # Edit your translate1.py file
$ $EDITOR translate2.py               # Edit your translate2.py file
$ $EDITOR translate3.py               # Edit your translate3.py file
$ $EDITOR translate4.py               # Edit your translate4.py file

$ make                                # Test all scripts
Testing translations ...
 translate1.py                            ... Success
 translate2.py                            ... Success
 translate3.py                            ... Success
 translate4.py                            ... Success
   Score 2.00

$ git add Makefile                    # Add Makefile to staging area
$ git add translate1.py               # Add translate1.py to staging area
$ git add translate2.py               # Add translate2.py to staging area
$ git add translate3.py               # Add translate3.py to staging area
$ git add translate4.py               # Add translate4.py to staging area
$ git commit -m "Reading 07: Scripts" # Commit work

$ git push -u origin reading07        # Push branch to GitLab

Remember to create a merge request and assign the appropriate TA from the Reading 07 TA List.