The goal of this lab assignment to allow you to explore algorithms and complexity by writing Python programs that solve the same problem but using different algorithmic approaches.
For this assignment, record your work in a Jupyter Notebook and upload it to the form below by 11:59 PM Friday, February 21.
To help you get started, we have provided you with a starter notebook, which you can download and then use to record your answers. To utilize the starter notebook, download it to wherever you are running Jupyter Notebook and then use the Jupyter Notebook interface to open the downloaded file.
The starter notebook is already formatted for all the various sections you need to fill out. Just look for the red instructions that indicates were you need to write text or the cyan comments that indicate where you need to write code.
For this lab, you will be implementing three different functions that solve the same problem:
Given a string,
phrase
, determine whether or not the string contains a duplicate.
For instance, the string Friday Friday getting down on Friday
contains
the word Friday
multiple times and thus would return True
. The string
Everybody's looking forward to the weekend
, however, does not contain any
duplicate words and thus would return False
For the first algorithm, you must implement the following function:
def has_duplicates_bf(phrase):
''' Use brute-force to determine whether or not the phrase contains
duplicate words. '''
return False
The has_duplicates_bf
function uses brute-force to check if the
string contains a duplicate as described by the following pseudo-code:
For each word in the string, do the following: Check if the current word has a match in the rest of the string. If there is a match, then there is a duplicate. If no match is found, then there is no duplicate.
For example, in the string Friday Friday getting down on Friday
, we would
take the first word Friday
and compare it to the remaining words until we
found a match.
Here are some examples of the function in action:
>>> has_duplicates_bf('a b c d')
False
>>> has_duplicates_bf('a b c a')
True
You can convert a string to a list of words by using the .split()
string method:
words = phrase.split()
You can use two nested loops or you can use a single loop with the in
operator.
Once you have implemented the algorithm above, you can use the provided
check_lyrics
function:
LYRICS = [
"But I've got a blank space, baby dnd I'll write your name",
"Cause we never go out of style, We never go out of style.",
"Baby, I'm just gonna shake, shake, shake, shake, shake, I shake it off, I shake it off",
"So take a look what you've done, Cause, baby, now we got bad blood",
"Ooh, look what you made me do, Look what you made me do",
]
def check_lyrics(function, display=True):
''' Check each lyric in LYRICS list. Only print the result if
display is True. '''
for lyric in LYRICS:
if display:
print('{} -> {}'.format(lyric, function(lyric)))
else:
function(lyric)
The check_lyrics
function takes the function to check as its first
argument, and then a display
parameter which defaults to True
. It will
call the specified function
on each string in the LYRICS
list. If
display
is True
then it will also print out the result of calling the
function
as shown below:
>>> check_lyrics(has_duplicates_bf, display=True)
But I've got a blank space, baby and I'll write your name -> False
Cause we never go out of style, We never go out of style. -> True
Baby, I'm just gonna shake, shake, shake, shake, shake, I shake it off, I shake it off -> True
So take a look what you've done, Cause, baby, now we got bad blood -> False
Ooh, look what you made me do, Look what you made me do -> True
Once you have implemented both the has_duplicates_bf
function, you can
now benchmark it to see how well it performs:
>>> %timeit check_lyrics(has_duplicates_bf, display=False)
17.5 µs ± 35.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
You will want to observe how long each loop (call to the
has_duplicates_bf
function) takes.
After performing the tasks above, answer the following reflection questions:
Describe the flow control of your has_duplicates_bf
function. How did it
detect if there was a duplicate word?
What is the algorithmic complexity of your has_duplicates_bf
function?
(Consider the maximum number of comparisons required given N
words in the
phrase
.)
For this activity, you will once again be implementing an algorithm to solve the problem above. This time, however, you will use the following algorithm:
Sort all words in string. For each word in the sorted list, do the following: Check if the current word matches the next word: If the words match, then there is a duplicate. If no neighbors match, then there is no duplicate.
To implement this algorithm, you will need code the following function:
def has_duplicates_sc(phrase):
''' Sort and then scan phrase to see if it contains any duplicate words '''
return False
sorted_words = sorted(list_of_words)
Once you have implemented the has_duplicates_sc
function, you can test it
with the check_lyrics
function provided above:
>>> check_lyrics(has_duplicates_sc, display=True)
But I've got a blank space, baby and I'll write your name -> False
Cause we never go out of style, We never go out of style. -> True
Baby, I'm just gonna shake, shake, shake, shake, shake, I shake it off, I shake it off -> True
So take a look what you've done, Cause, baby, now we got bad blood -> False
Ooh, look what you made me do, Look what you made me do -> True
The output should be the same as the output in Activity 1.
To measure how fast your implementation is, you can once again benchmark its execution:
>>> %timeit check_lyrics(has_duplicates_sc, display=False)
11.8 µs ± 10.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
After performing the tasks above, answer the following reflection questions:
Describe the flow control of your has_duplicates_bf
function. How did it
detect if there was a duplicate word?
What is the algorithmic complexity of your has_duplicates_bf
function?
(Consider the maximum number of comparisons required given N
words in the
phrase
.)
For this activity, you will once again be implementing an algorithm to solve the problem above. This time, however, you will use the following algorithm:
Sort all words in string. For each word in the sorted list, do the following: Check if the current word is in the rest of the list using binary search. If the word is found, then there is a duplicate. If no match is found, then there is no duplicate.
To implement this algorithm, you will need code the following function:
def has_duplicates_bs(phrase):
''' Sort and then perform binary search on phrase to determine if
it contains duplicate words '''
return False
You can use the binary_search
function we wrote in class to see if a
target
is in a sorted list of values
:
found = binary_search(words, target)
This function is included in the starter notebook.
Once you have implemented the has_duplicates_bs
function, you can test it
with the check_lyrics
function provided above:
>>> check_lyrics(has_duplicates_bs, display=True)
But I've got a blank space, baby and I'll write your name -> False
Cause we never go out of style, We never go out of style. -> True
Baby, I'm just gonna shake, shake, shake, shake, shake, I shake it off, I shake it off -> True
So take a look what you've done, Cause, baby, now we got bad blood -> False
Ooh, look what you made me do, Look what you made me do -> True
The output should be the same as the output in Activity 1.
To measure how fast your implementation is, you can once again benchmark its execution:
>>> %timeit check_lyrics(has_duplicates_bs, display=False)
23 µs ± 170 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
After performing the tasks above, answer the following reflection questions:
Describe the flow control of your has_duplicates_bf
function. How did it
detect if there was a duplicate word?
What is the algorithmic complexity of your has_duplicates_bf
function?
(Consider the maximum number of comparisons required given N
words in the
phrase
.)
Examining the results of Activities 1, 2, and 3, what can you say about the relationship between algorithmic complexity and real world performance? Does better theoretical algorithmic complexity always lead to better real world performance?
Once you have completed your lab, submit your Jupyter Notebook using the form: