The goal of this assignment is to allow you to practice utilizing dictionaries and sets in the Python programming language. Additionally, you will be practicing the use of top-down design to complete each of these activities. That is, for each activity you will write smaller functions that are used in higher-level functions to achieve the goals of the activity.
To record your solutions and answers, create a new Jupyter Notebook
titled Notebook06.ipynb
and use this notebook to complete the following
activities and answer the corresponding questions.
Make sure you label your activities appropriately. That is for Activity 1, have a header cell that is titled Activity 1. Likewise, use the Markdown cells to answer the questions (include the questions above the answer).
This Notebook assignment is due Midnight Friday, October 30, 2015 and is to be done individually.
For the first activity, you are to write two functions:
count_words(text)
:
Given a string
text
, this function returns adict
that contains the number of occurrences for each word (ignoring case and punctuation) for every word intext
.
display_frequences(text)
:
Given a string
text
, this function prints all the occurrences of each word intext
in descending order.
The following is a skeleton of the code you must implement to complete this activity.
# Imports
import string
# Functions
def count_words(text):
''' Return a dictionary containing the counts for each word in text
(ignoring spaces and punctuation) '''
# TODO
def display_frequencies(text):
''' Display a listing of words and their counts in descending order '''
# TODO
Once you have completed this activity, you should be able to do the following to test it:
>>> lyrics = '''
'Cause the players gonna play, play, play, play, play
And the haters gonna hate, hate, hate, hate, hate
Baby, I'm just gonna shake, shake, shake, shake, shake
I shake it off, I shake it off
'''
>>> display_frequencies(lyrics)
7 shake
5 play
5 hate
3 gonna
2 off
2 i
2 it
2 the
1 and
1 just
1 players
1 im
1 baby
1 haters
1 cause
The following are hints and suggestions that will help you complete this activity:
To remove punctuation, you can do use string.translate and string.punctuation together:
s.translate(None, string.punctuation)
To sort the dictionary in descending order, you can use sorted:
sorted(d, key=d.get, reverse=True)
display_frequencies
should call count_words
After completing the activity above, answer the following questions:
Briefy describe how your functions work and what challenges you faced implementing them.
Demonstrate your display_frequencies
function by writing a loop that checks
each of the following strings
:
strings = [
'Well, buzz buzz buzz goes the bumble bee',
'You are my sunshine, my only sunshine',
'Cheer, cheer for old Notre Dame',
'Wait, they don\'t love you like I love you',
]
For the second activity, you are to write three functions:
format_icon(icon)
:
Given a string
icon
, this function returns the HTML icon code for the icon (e.g.<i class="fa fa-{icon}"></i>
).
translate_icons(text)
:
Given a string
text
, this function returns a string of HTML containing the translated icon codes specified in the global variableWORD_TO_ICON
(see example below).
translate(text)
:
Given a string
text
, this function translates thetext
and displays the result HTML code.
The following is a skeleton of the code you must implement to complete this activity.
# Imports
from IPython.html.widgets.interaction import interact
from IPython.display import HTML, display
# Global Variables
WORD_TO_ICON = {
'<3' : 'heart',
# TODO
}
# Functions
def format_icon(icon):
''' Return formatted HTML icon code '''
# TODO
def translate_icons(text):
''' Return translated text that substitutes words in dictionary
with HTML icon codes '''
# TODO
def translate(text):
''' Display HTML of translated text '''
# Run interactive loop
interact(translate, text='')
Once you have completed this activity, you should be able to do the following to test it:
>>> format_icon('heart')
'<i class="fa fa-heart"></i>'
>>> translate_icons('i <3 $rocket')
'i <i class="fa fa-heart"></i> <i class="fa fa-rocket"></i>'
Your interactive program should look like the following:
The following are hints and suggestions that will help you complete this activity:
To draw an icon, you can use Font Awesome. For example, to draw a heart, you would emit the following HTML code:
<i class="fa fa-heart"></i>
which will appear as:
You can build a list of HTML strings and join them.
translate
should call translate_icons
, which in turn should call format_icon
.
After completing the activity above, answer the following questions:
Explain the concept of top-down design and how it was utilized in this activity. What do you think of this problem solving approach? Did it make this activity easier or harder?
Add at least 4 entries to the WORD_TO_ICON
and demonstrate their
translations by showing the output of translate_icons
on strings that
contain each entry.
For the third activity, you will write an interactive spell-checker by doing the following:
To get a large collection of words, you will be using requests.get to fetch the file here:
https://raw.githubusercontent.com/dwyl/english-words/master/words.txt
Once you have requested this huge file, you will need to split it by
newlines and then store the resulting list into a set. You can
then use this WORDS
set to check if any words you get in text
are
valid words.
Implement the spellcheck(text)
function:
Given a string
text
, this function checks if each word intext
is in theWORDS
set. If it is not, then the word will be underlined and colored red, otherwise the word is unmodified. Once each word in thetext
is verified, then the resulting HTML code is displayed.
The following is a skeleton of the code you must implement to complete this activity.
# Imports
from IPython.html.widgets.interaction import interact
from IPython.display import HTML, display
import requests
# Global Variables
WORDS = requests.get('https://raw.githubusercontent.com/dwyl/english-words/master/words.txt')
# TODO: split WORDS.text by newlines and then convert to set
# Functions
def spellcheck(text):
''' Spellcheck text by checking if each word is in the WORDS set
If the word is not in the set, then format the word to be red and
underlined. Otherwise, just show the word.
'''
# TODO
# Run interactive loop
interact(spellcheck, text='')
Your interactive program should look like the following:
The following are hints and suggestions that will help you complete this activity:
The requests.get function above will fetch a list of words from a
website and return a huge string of words separated by newlines. To access
the text, you must use WORDS.text
.
To underline a word and make it red, you can use the following HTML code:
<u style="color:red">{word}</u>
which will appear as: {word}
You can build a list of HTML strings and join them.
The list of words you will be using for your dictionary is huge (300,000+). Do not try to print or display the whole thing in your Notebook, or you are going to have a bad time.
After completing the activity above, answer the following questions:
Briefy describe how your spell-checker works and what challenges you faced implementing them. What differences do you see between your implementation of a spell-checker and that which can be found a typical word processing application?
Does using a set
instead of a list
make a difference? Test this by
using the %timeit command on a lookup using a list version of WORDS
and a set version of WORDS
:
>>> %timeit 'monkey' in WORDS_SET
...
>>> %timeit 'monkey' in WORDS_LIST
...
Which data structure is more efficient? Can you feel the difference when you are using the spell-checker interactively? Explain.
To submit your notebook, follow the same directions for Notebook00, except store this notebook in the notebook06 folder.