Assignment A2: Converting a Sequential Program into a Distributed Program

One of the most challenging problems in distributed computing is simply deciding how to partition a program across the available nodes. Unfortunately, there is not (yet) any "magic" compiler or programming language that can take a sequential program and split it up for you. Instead, you need to reason about the overall size of the program, and identify natural places to split the computation. This assignment will give you experience in partitioning a simple but very large problem across our Condor pool.

You are going to perform a brute force search of a checksum string. A checksum is a one-way function that gives a kind of a small, statistically unique "signature" for a large sequence of bytes. One widely used checksum algorithm is called MD5, which yields a 128 bit (32 hexadecimal characters) result. You can use the md5sum program to compute the checksum for entire files, for example:

md5sum /usr/share/dict/words
aea877f722a9d0be2d2dd69ff8693f21  /usr/share/dict/words
Or, to checksum a single string:
echo -n "distributed systems" | md5sum -
d768e9e2025eace55ea3cb8edd839009 -

MD5 is called a "one-way function" because it is not computationally feasible to reverse the calculation. That is, given the checksum d768e9e2025eace55ea3cb8edd839009, there is no simple way to determine that it is the checksum of distributed systems. You would simply have to try checksumming all possible strings from a to zzzzzz... until you found the right one, which would take a ridiculously long time. But, if you know something about the string and you have access to a few hundred machines, you might be able to find the right combination in a reasonable amount of time.

In this assignment, the instructor will email each student a unique MD5 checksum. Your job is to write a distributed program that finds the input string to the checksum using brute force via Condor. To simplify the search, you may assume that the string is no more than 8 characters long, and consists only of the lower case characters a to z. The goal of the exercise is to teach you how to break a large sequential program into a distributed program that completes in a reasonable amount of time.

Getting Started

To get started, download these files:
  • main.c
  • md5.h
  • md5.c
  • And compile them like this into a program called md5search:
       gcc main.c md5.c -o md5search
    As given to you, the program simply prints out the checksum of the string given on the command line:
        % ./md5search abcdf
        checksum of abcdf is 5ff2aedbccf86eda8bb9338f86b1c308
    You will have to modify main.c in order to accomplish this assignment.

    I strongly suggest that you proceed carefully in steps as follows:

  • First modify main.c to sequentially search for all strings of a given length X.
  • Time how long it takes to search for all strings of length 2, 3, 4, etc, until the runtime becomes excessive. How long will it take one sequential program to search for all strings of length 8?
  • An ideal length for a Condor job is anywhere between 30-60 minutes. If your jobs are too long, they are likely to get kicked off of a machine before they can finish. If your jobs are too short, the system will spend more effort dispatching jobs than actually running them. Decide how to break up the search space so that each job can do about 30-60 minutes of work.
  • Modify md5search.c so that you can direct it to search only part of the space.
  • Write a program to generate N Condor submit files, each of which runs an instance of md5search. Make sure that each submit file writes to the same user log file via log = userlog.txt
  • First, test your solution on a small problem that will complete in minutes.
  • Once you are certain that your solution runs correctly, then solve the big problem assigned to you.
  • Once submitted your solution might take several days to complete. Get started early!
  • Turning In

    Copy the following files to /afs/
  • A file named main.c that gives the source for the search program submitted to Condor.
  • The source code of your script or program used to generate all of the Condor submit files.
  • A file named userlog.txt that shows all log events for your Condor run.
  • A file named solution.txt that contains the following things:
    1. The solution to the checksum assigned to you.
    2. An explanation of how you divided up the work into multiple Condor jobs.
    3. An explanation of any unexpected events or oddities in your workload.
    4. A carefully-explained estimate of how long it would have taken to solve the problem sequentially.
    5. The actual turnaround time of your workload, which is the elapsed time from the submission of the first job to the completion of the last job.
    6. The speedup of your workload, which is simply the sequential estimate divided by the actual turnaround time.