The goal of this project is to allow you to practice utilizing distributed computing by harnessing multiple machines to brute-force crack a large collection of passwords. To do this, you will create two new Python programs:
hulk.py
: This is a basic brute-force password cracker that attempts
every permutation of letters from an alphabet
up to a specified length
.
fury.py
: This is a Work Queue application that coordinates workers
running on different machines.
Once you have these programs, you will attempt to crack the following list of password hashes:
http://yld.me/btd
To keep track of who has decrypted the most passwords, you can submit your set of discovered passwords to the deadpool:
$ cat PASSWORDS | curl --data-binary @- http://xavier.h4x0r.space:9111/NETID
Replace PASSWORDS
with the name of the file with your passwords and NETID
with your Notre Dame NetID.
To view the current leaderboard, visit the deadpool:
http://xavier.h4x0r.space:9111
For this assignment, you are to work in groups of 1
, 2
, or 3
and
record your source material in the project02
folder of your project
Bitbucket repository and push your work by 11:59 PM Wednesday, April 27,
2016.
For this project, you must use software or services provided by the Cooperative Computing Lab: Condor and Work Queue.
The following readings provide a background into distributed computing and
Anatomy of a hack: How crackers ransack passwords like "qeadzcwrsfxv1331"
Work Queue + Python: A Framework For Scalable Scientific Ensemble Applications
Optional References:
Once again, you should use your projects repository on Bitbucket. Ensure
that you have modified the README.md
to include the name of the group
members and their netids and that CSE-20189-SP16
has at least write
access to the repository.
For the first activity, you are to create hulk.py
, which is a script that
uses brute-force to smash a set of MD5 hashes:
$ ./hulk.py -h
Usage: hulk.py [-a ALPHABET -l LENGTH -s HASHES -p PREFIX]
Options:
-a ALPHABET Alphabet used for passwords
-l LENGTH Length for passwords
-s HASHES Path to file containing hashes
-p PREFIX Prefix to use for each candidate password
Given, an ALPHABET
(default is abcdefghijklmnopqrstuvwxyz0123456789
),
hulk.py
will compute the MD5 hash of every permutation of the
ALPHABET
up to the specified LENGTH
and check if it is in the set of
HASHES
. If a PREFIX
is specified, then this should be inserted before
each candidate permutation.
For instance, suppose we had a hashes.txt
file that contained the following
MD5 checksums:
0cc175b9c0f1b6a831c399e269772661
92eb5ffee6ae2fec3ad71c777531578f
4a8a08f09d37b73795649038408b5f33
900150983cd24fb0d6963f7d28e17f72
If we executed hulk.py
with a LENGTH
of 1
, we should get the following
result:
$ ./hulk.py -l 1 -s hashes.txt
a
b
c
That is, hulk.py
determined that 3 of the hashes corresponds to the
passwords a
, b
, and c
.
If we executed hulk.py
with a LENGTH
of 2
and a PREFIX
of a
, we
should get the following result:
$ ./hulk.py -l 2 -s hashes.txt -p a
abc
That is, hulk.py
determined that 1 of the hashes corresponds to the
password abc
. Note, we could have achieved the same results by executing:
$ ./hulk.py -l 3 -s hashes.txt
abc
The difference is the former searches each permutation in the form aXX
where X
is a letter from the ALPHABET
due to the PREFIX
of a
, while
the latter searches each permutation in the form XXX
.
The default ALPHABET
can be formed using the string.ascii_lowercase
and string.digits constants.
The default PREFIX
should be an empty string.
The default LENGTH
should be 8
.
The default HASHES
path should be hashes.txt
You should use either a dict or set to store the collection of MD5 hashes.
You can compute the MD5 hash of a string by using the hashlib library:
# Compute MD5 hash of string
def md5sum(s):
return hashlib.md5(s).hexdigest()
You can compute each permutation of the given ALPHABET
by using the
itertools.product function with the repeat
argument set appropriately.
Once you have completed hulk.py
, you are to create another script named
fury.py
, which will utilize Work Queue to coordinate multiple workers
running hulk.py
on different machines.
To utilize Work Queue on the student machines you must first setup the following environment variables:
$ setenv PATH /afs/nd.edu/user37/condor/software/sbin:$PATH
$ setenv PATH /afs/nd.edu/user37/condor/software/bin:$PATH
$ setenv PATH /afs/nd.edu/user37/ccl/software/cctools/bin:$PATH
$ setenv PYTHONPATH /afs/nd.edu/user37/ccl/software/cctools/lib/python2.6/site-packages:$PYTHONPATH
Note: if PYTHONPATH
is not set, then simply leave off the last :$PYTHONPATH
.
To check if your environment is setup properly, you can try the following commands:
# Show all Condor submissions from current machine
$ condor_q
-- Submitter: student00.cse.nd.edu : <129.74.152.73:9618?sock=3137_dc95_4> : student00.cse.nd.edu
ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD
1063343.0 cshinave 9/29 00:05 0+00:00:00 H 0 0.0 measure ./distance
1063402.0 cshinave 9/30 19:02 0+00:00:00 H 0 0.0 measure ./distance
1063685.0 cluo1 1/13 16:01 0+00:00:00 H 0 0.0 echo hello condor
1063720.0 xli19 1/19 02:15 0+00:00:01 H 0 0.0 A0.sh
1063734.0 xli19 1/20 08:35 0+00:00:06 H 0 0.0 A0.sh
1064269.0 avuong 2/3 20:51 0+00:00:00 H 0 2.9 povray +Irubiks.po
1065597.0 kfeng 2/4 00:50 0+00:00:01 H 0 2.9 povray +Irubiks.po
1065597.1 kfeng 2/4 00:50 0+00:00:01 H 0 2.9 povray +Irubiks.po
1065597.2 kfeng 2/4 00:50 0+00:00:01 H 0 2.9 povray +Irubiks.po
1065597.3 kfeng 2/4 00:50 0+00:00:01 H 0 2.9 povray +Irubiks.po
1065597.4 kfeng 2/4 00:50 0+00:00:01 H 0 2.9 povray +Irubiks.po
1100331.0 pbui 4/17 09:39 0+05:05:00 R 0 463.9 work_queue_worker
...
1100334.198 pbui 4/17 09:57 0+04:45:54 R 0 170.9 work_queue_worker
414 jobs; 0 completed, 0 removed, 1 idle, 402 running, 11 held, 0 suspended
# Show all Work Queue masters
$ work_queue_status
PROJECT HOST PORT WAITING RUNNING COMPLETE WORKERS
lobster_shuffle earth.crc.nd.edu 9000 0 0 1317 56
lobster_shuffle earth.crc.nd.edu 9002 0 0 1327 19
EEMT_Comet gw176.iu.xsede.org 9000 307 24 35 1
Callahan_RSun hail.hpc.arizona.edu 9123 365 0 2769 0
hp24stab-gbsa haldarfe.crc.nd.edu 9000 0 49 92577 50
forcebalance markland-cluster.stan 9868 1 0 48 0
forcebalance markland-cluster.stan 9866 1 0 13 0
forcebalance not0rious.stanford.ed 9345 9 15 156 15
forcebalance not0rious.stanford.ed 9344 0 2 798 16
hulk-pbui student00.cse.nd.edu 9123 4312 295 21628 402
cclosdci workspace.crc.nd.edu 9907 0 0 26 1
cclosdci12 workspace.crc.nd.edu 9910 4 0 94 0
cclosdci8 workspace.crc.nd.edu 9909 0 0 9 1
cclosdci4 workspace.crc.nd.edu 9908 0 0 8 1
forcebalance xstream-ln01.stanford 9433 0 1 110 9
The goal of fury.py
is to:
Create a Work Queue master that will coordinate workers executing
the hulk.py
script:
# Create Work Queue master with:
# 1. Random port between 9000 - 9999
# 2. Project name of hulk-NETID
# 3. Catalog mode enabled
queue = work_queue.WorkQueue(work_queue.WORK_QUEUE_RANDOM_PORT, name='hulk-NETID', catalog=True)
queue.specify_log('fury.log') # Specify Work Queue log location
Divide up the work of brute-forcing every password by creating Work Queue tasks for unit of work:
# Example of creating and submitting a single Work Queue task
command = './hulk.py -l 1' # Command to execute
task = work_queue.Task(command) # Create Work Queue task
# Add source files
for source in ('hulk.py', 'hashes.txt'):
task.specify_file(source, source, work_queue.WORK_QUEUE_INPUT)
# Submit tasks
queue.submit(task)
Hint: Passwords of lengths 1
, 2
, 3
, 4
, 5
can be
brute-forced relatively quickly. Beyond that you will want to use the
PREFIX
feature of hulk.py
to search passwords of length 5
+ a
PREFIX
of length 1
, 2
, and 3
.
Followup: Charles suggests using the PREFIX
feature of hulk.py
to
search for passwords of length 6
+ a PREFIX
of length 1
and 2
.
This reduce the number of tasks, putting less pressure on the master
(who has to schedule all the tasks).
After all the work has been submitted, fury.py
should wait for all the
tasks to complete:
# Until the Master queue is empty
while not queue.empty():
# Wait for a task to complete
task = queue.wait()
# Check if task is valid and if task returned successfully
if task and task.return_status == 0:
# Write output of task to stdout
sys.stdout.write(task.output)
sys.stdout.flush()
Because fury.py
will run for a long time, you may wish to stop and start it
frequently as you debug and test it. To store your results and keep track of
the tasks you have already completed, you should keep a journal, which is a
common technique used in distributed computing. By storing the command
and output
of each tasks in a dict and then saving it to a JSON file,
you will be able to:
Keep track of your results. Once you get a task back from the queue, you can store it in your journal:
# Example recording
JOURNAL[task.command] = task.output.split()
with open('journal.json.new', 'w') as stream:
json.dump(JOURNAL, stream)
os.rename('journal.json.new', 'journal.json')
The reason why we write to a temporary journal.json.new
file and then
rename it is so that if the write
operation gets interrupted, we are
not left with a corrupt or incomplete journal. This a common technique
to ensure data integrity.
Skip already completed tasks. When fury.py
starts, it should load the
journal and check if the command has already been executed. If so, it can
avoid creating and submitting this new task. This way, if you stop your
master and restart it, you can avoid performing duplicative searches.
# Example check
command = './hulk.py -l 1'
if command in JOURNAL:
print >>sys.stderr, 'Already did', command
The contents of the journal will look something like this:
{"./hulk.py -l 1": [...], ...}
To execute fury.py
, you simply need to execute the script:
$ ./fury.py
Since fury.py
uses Work Queue, you will need to connect workers to
fury.py
. To start a single local work_queue_worker, you can use the
following command:
$ work_queue_worker -d all -N hulk-NETID
This will start one worker with debugging enabled on the local machine that
will communicate to the master with the name hulk-NETID
.
Alternatively, you can connect directly to the master by specifying the
hostname
and port
:
$ work_queue_worker -d all studentXX.cse.nd.edu 9XXX
This bypasses the catalog server, which can get slow when there are many users.
To start many workers on the Condor pool, you can use the following command:
$ condor_submit_workers -N hulk-NETID 200
This will submit 200
workers to the Condor pool that will communicate to
the master with the name hulk-NETID
.
To check the status of your workers, you can use the condor_q
command.
Do not start by submitting many workers at once. Instead, test by having
fury.py
dispatch a few tasks and then use a single local worker. Once you
are confident that things are working, submit progressively more workers.
To help you monitor the progress of fury.py
, you can use the
work_queue_monitor
script created by the instructor:
$ ~pbui/pub/bin/work_queue_monitor fury.log
Start Time: 1460920763603631
Elasped Time: 7.94 minutes
Task Rate: 135.49 (task/min)
Workers Init: 0
Workers Idle: 47
Workers Busy: 336
Progress: 1076 / 15828 (6.80%)
Estimated Time: 1.81 hours
This script will give you information about the Work Queue application based on the data stored in the Work Queue log generated by the master.
To extract the passwords stored in the journal, you can use the
journal_dump_passwords
script created by the instructor:
$ ~pbui/pub/bin/journal_dump_passwords
...
This script assumes the journal is named journal.json
and is in the current
directory.
Feel free to copy and modify either of these scripts.
Once you have completed hulk.py
and fury.py
, you are to attempt to crack
all of the provided password hashes and to submit your decrypted passwords
to the deadpool as noted at the top of this document:
$ cat PASSWORDS | curl --data-binary @- http://xavier.h4x0r.space:9111/NETID
For full credit, you must crack at least 8000
passwords. There are 10271
total hashes. Each password is no more than 8
characters long and only
consists of lowercase letters and numeric digits.
Many of the passwords in the password hashes collection were pulled from a set of the top 10,000 most commonly used passwords on the Internet, so the set may contain some vulgar and offensive words.
Cracking 10245
passwords in the collection of password hashes took the
instructor about 12 hours using between 200
and 400
workers.
You should be able to get the first 6000
in a few minutes, but after
that it will take much longer. Plan accordingly.
Once you have completed hulk.py
and fury.py
and have attempted to crack
the password hashes, briefly answer the following questions in your
README.md
:
Describe the implementation of hulk.py
. How does it crack passwords?
Explain how you tested hulk.py
and verified that it works properly.
Describe the implementation of fury.py
. How does it:
Utilize hulk.py
to crack passwords?
Divide up the work among the different workers?
Keep track of what has been attempted?
Recover from failures?
Explain how you tested fury.py
and verified that it works properly.
From your experience in this project and recalling your Discrete Math knowledge, what would make a password more difficult to brute-force: more complex alphabet or longer password? Explain.
To submit your assignment, please commit your work to the project02
folder
in your projects Bitbucket repository by 11:59 PM Wednesday, April
27, 2016. Your project02
folder should contain at least the following
files:
README.md
hulk.py
fury.py
passwords.txt
The passwords.txt
should contain a list of all all the passwords you found
from the password hashes collection.