Lab 04: Duplicates

The goal of this lab assignment to allow you to explore algorithms and complexity by writing Python programs that solve the same problem but using different algorithmic approaches.

For this assignment, record your work in a Jupyter Notebook and upload it to the form below by 11:59 PM Friday, February 22.

Activity 0: Starter Notebook

To help you get started, we have provided you with a starter notebook, which you can download and then use to record your answers. To utilize the starter notebook, download it to wherever you are running Jupyter Notebook and then use the Jupyter Notebook interface to open the downloaded file.

Download Starter Notebook

The starter notebook is already formatted for all the various sections you need to fill out. Just look for the red instructions that indicates were you need to write text or the cyan comments that indicate where you need to write code.

Activity 1: Brute-force

For this lab, you will be implementing three different functions that solve the same problem:

Given a string, phrase, determine whether or not the string contains a duplicate.

For instance, the string Friday Friday getting down on Friday contains the word Friday multiple times and thus would return True. The string Everybody's looking forward to the weekend, however, does not contain any duplicate words and thus would return False

Function

For the first algorithm, you must implement the following function:

def has_duplicates_bf(phrase):
    ''' Use brute-force to determine whether or not the phrase contains
    duplicate words. '''
    return False

The has_duplicates_bf function uses brute-force to check if the string contains a duplicate as described by the following pseudo-code:

For each word in the string, do the following: Check if the current word has a match in the rest of the string. If there is a match, then there is a duplicate. If no match is found, then there is no duplicate.

For example, in the string Friday Friday getting down on Friday, we would take the first word Friday and compare it to the remaining words until we found a match.

Here are some examples of the function in action:

>>> has_duplicates_bf('a b c d')
False

>>> has_duplicates_bf('a b c a')
True

You can convert a string to a list of words by using the .split() string method:
```
words = phrase.split()
```
You can use two nested loops or you can use a single loop with the in operator.

Testing

Once you have implemented the algorithm above, you are to write a check_lyrics function:

LYRICS = [
    "But I've got a blank space, baby dnd I'll write your name",
    "Cause we never go out of style, We never go out of style.",
    "Baby, I'm just gonna shake, shake, shake, shake, shake, I shake it off, I shake it off",
    "So take a look what you've done, Cause, baby, now we got bad blood",
    "Ooh, look what you made me do, Look what you made me do",
]

def check_lyrics(function, display=True):
    ''' Check each lyric in LYRICS list.  Only print the result if display is True. '''
    pass

The check_lyrics function takes the function to check as its first argument, and then a display parameter which defaults to True. It will call the specified function on each string in the LYRICS list. If display is True then it will also print out the result of calling the function as shown below:

>>> check_lyrics(has_duplicates_bf, display=True)
But I've got a blank space, baby and I'll write your name -> False
Cause we never go out of style, We never go out of style. -> True
Baby, I'm just gonna shake, shake, shake, shake, shake, I shake it off, I shake it off -> True
So take a look what you've done, Cause, baby, now we got bad blood -> False
Ooh, look what you made me do, Look what you made me do -> True

Benchmarking

Once you have implemented both the has_duplicates_bf function and the check_lyrics function, you can now benchmark it to see how well it performs:

>>> %timeit check_lyrics(has_duplicates_bf, display=False)
17.5 µs ± 35.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

You will want to observe how long each loop (call to the has_duplicates_bf function) takes.

Reflection

After performing the tasks above, answer the following reflection questions:

Describe the flow control of your has_duplicates_bf function. How did it detect if there was a duplicate word?
What is the algorithmic complexity of your has_duplicates_bf function? (Consider the maximum number of comparisons required given N words in the phrase.)

Activity 2: Sort and Scan

For this activity, you will once again be implementing an algorithm to solve the problem above. This time, however, you will use the following algorithm:

Sort all words in string. For each word in the sorted list, do the following: Check if the current word matches the next word: If the words match, then there is a duplicate. If no neighbors match, then there is no duplicate.

Function

To implement this algorithm, you will need code the following function:

def has_duplicates_sc(phrase):
    ''' Sort and then scan phrase to see if it contains any duplicate words '''
    return False

You can use the sorted function to sort a list of of words:
```
sorted_words = sorted(list_of_words)
```

Testing

Once you have implemented the has_duplicates_sc function, you can test it with the check_lyrics function you wrote above:

>>> check_lyrics(has_duplicates_sc, display=True)
But I've got a blank space, baby and I'll write your name -> False
Cause we never go out of style, We never go out of style. -> True
Baby, I'm just gonna shake, shake, shake, shake, shake, I shake it off, I shake it off -> True
So take a look what you've done, Cause, baby, now we got bad blood -> False
Ooh, look what you made me do, Look what you made me do -> True

The output should be the same as the output in Activity 1.

Benchmarking

To measure how fast your implementation is, you can once again benchmark its execution:

>>> %timeit check_lyrics(has_duplicates_sc, display=False)
11.8 µs ± 10.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

Reflection

After performing the tasks above, answer the following reflection questions:

Describe the flow control of your has_duplicates_bf function. How did it detect if there was a duplicate word?
What is the algorithmic complexity of your has_duplicates_bf function? (Consider the maximum number of comparisons required given N words in the phrase.)

Activity 3: Sort and Binary Search

For this activity, you will once again be implementing an algorithm to solve the problem above. This time, however, you will use the following algorithm:

Sort all words in string. For each word in the sorted list, do the following: Check if the current word is in the rest of the list using binary search. If the word is found, then there is a duplicate. If no match is found, then there is no duplicate.

Function

To implement this algorithm, you will need code the following function:

import bisect

def has_duplicates_bs(phrase):
    ''' Sort and then perform binary search on phrase to determine if
    it contains duplicate words '''
    return False

You can use the bisect.bisect_right function to perform binary search:
```
found = bisect.bisect_right(words, target)
```

Testing

Once you have implemented the has_duplicates_bs function, you can test it with the check_lyrics function you wrote above:

>>> check_lyrics(has_duplicates_bs, display=True)
But I've got a blank space, baby and I'll write your name -> False
Cause we never go out of style, We never go out of style. -> True
Baby, I'm just gonna shake, shake, shake, shake, shake, I shake it off, I shake it off -> True
So take a look what you've done, Cause, baby, now we got bad blood -> False
Ooh, look what you made me do, Look what you made me do -> True

The output should be the same as the output in Activity 1.

Benchmarking

To measure how fast your implementation is, you can once again benchmark its execution:

>>> %timeit check_lyrics(has_duplicates_bs, display=False)
23 µs ± 170 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Reflection

After performing the tasks above, answer the following reflection questions:

Describe the flow control of your has_duplicates_bf function. How did it detect if there was a duplicate word?
What is the algorithmic complexity of your has_duplicates_bf function? (Consider the maximum number of comparisons required given N words in the phrase.)

Activity 4: Reflection

Examining the results of Activities 1, 2, and 3, what can you say about the relationship between algorithmic complexity and real world performance? Does better theoretical algorithmic complexity lead to better real world performance?

Submission

Once you have completed your lab, submit your Jupyter Notebook using the form:

Submit Lab Notebook