Overview

The goal of this assignment is to allow you to practice utilizing dictionaries and sets in the Python programming language. Additionally, you will be practicing the use of top-down design to complete each of these activities. That is, for each activity you will write smaller functions that are used in higher-level functions to achieve the goals of the activity.

To record your solutions and answers, create a new Jupyter Notebook titled Notebook06.ipynb and use this notebook to complete the following activities and answer the corresponding questions.

Make sure you label your activities appropriately. That is for Activity 1, have a header cell that is titled Activity 1. Likewise, use the Markdown cells to answer the questions (include the questions above the answer).

This Notebook assignment is due midnight Friday, October 7, 2016 and is to be done individually.

Activity 1: Display Word Frequencies

For the first activity, you are to write two functions:

  1. count_words(text):

    Given a string text, this function returns a dict that contains the number of occurrences for each word (ignoring case and punctuation) for every word in text.

  2. display_frequences(text):

    Given a string text, this function prints all the occurrences of each word in text in descending order.

Code Skeleton

The following is a skeleton of the code you must implement to complete this activity.

# Imports

import string

# Functions

def count_words(text):
    ''' Return a dictionary containing the counts for each word in text
    (ignoring spaces and punctuation) '''
    # TODO

def display_frequencies(text):
    ''' Display a listing of words and their counts in descending order '''
    # TODO

Once you have completed this activity, you should be able to do the following to test it:

>>> lyrics = '''
'Cause the players gonna play, play, play, play, play
And the haters gonna hate, hate, hate, hate, hate
Baby, I'm just gonna shake, shake, shake, shake, shake
I shake it off, I shake it off
'''

>>> display_frequencies(lyrics)
7 shake
5 play
5 hate
3 gonna
2 off
2 i
2 it
2 the
1 and
1 just
1 players
1 im
1 baby
1 haters
1 cause

Hints

The following are hints and suggestions that will help you complete this activity:

  1. To remove punctuation, you can do use string.translate and string.punctuation together:

    s.translate(None, string.punctuation)
    
  2. To sort the dictionary in descending order, you can use sorted:

    sorted(d, key=d.get, reverse=True)
    
  3. display_frequencies should call count_words

Questions

After completing the activity above, answer the following questions:

  1. Briefy describe how your functions work and what challenges you faced implementing them.

  2. Demonstrate your display_frequencies function by writing a loop that checks each of the following strings:

    strings = [
        'Well, buzz buzz buzz goes the bumble bee',
        'You are my sunshine, my only sunshine',
        'Cheer, cheer for old Notre Dame',
        'Wait, they don\'t love you like I love you',
    ]
    

Activity 2: Icon Translator

For the second activity, you are to write three functions:

  1. format_icon(icon):

    Given a string icon, this function returns the HTML icon code for the icon (e.g. <i class="fa fa-{icon}"></i>).

  2. translate_icons(text):

    Given a string text, this function returns a string of HTML containing the translated icon codes specified in the global variable WORD_TO_ICON (see example below).

  3. translate(text):

    Given a string text, this function translates the text and displays the result HTML code.

Code Skeleton

The following is a skeleton of the code you must implement to complete this activity.

# Imports

from IPython.html.widgets.interaction import interact
from IPython.display import HTML, display

# Global Variables

WORD_TO_ICON = {
    '<3'      : 'heart',
    # TODO
}

# Functions

def format_icon(icon):
    ''' Return formatted HTML icon code '''
    # TODO

def translate_icons(text):
    ''' Return translated text that substitutes words in dictionary
    with HTML icon codes '''
    # TODO

def translate(text):
    ''' Display HTML of translated text '''

# Run interactive loop

interact(translate, text='')

Once you have completed this activity, you should be able to do the following to test it:

>>> format_icon('heart')
'<i class="fa fa-heart"></i>'

>>> translate_icons('i <3 $rocket')
'i <i class="fa fa-heart"></i> <i class="fa fa-rocket"></i>'

Your interactive program should look like the following:

Hints

The following are hints and suggestions that will help you complete this activity:

  1. To draw an icon, you can use Font Awesome. For example, to draw a heart, you would emit the following HTML code:

    <i class="fa fa-heart"></i>
    

    which will appear as:

  2. You can build a list of HTML strings and join them.

  3. translate should call translate_icons, which in turn should call format_icon.

Questions

After completing the activity above, answer the following questions:

  1. Explain the concept of top-down design and how it was utilized in this activity. What do you think of this problem solving approach? Did it make this activity easier or harder?

  2. Add at least 4 entries to the WORD_TO_ICON and demonstrate their translations by showing the output of translate_icons on strings that contain each entry.

Activity 3: Spell-checker

For the third activity, you will write an interactive spell-checker by doing the following:

  1. To get a large collection of words, you will be using requests.get to fetch the file here:

    https://raw.githubusercontent.com/dwyl/english-words/master/words.txt

    Once you have requested this huge file, you will need to split it by newlines and then store the resulting list into a set. You can then use this WORDS set to check if any words you get in text are valid words.

  2. Implement the spellcheck(text) function:

    Given a string text, this function checks if each word in text is in the WORDS set. If it is not, then the word will be underlined and colored red, otherwise the word is unmodified. Once each word in the text is verified, then the resulting HTML code is displayed.

Code Skeleton

The following is a skeleton of the code you must implement to complete this activity.

# Imports

from IPython.html.widgets.interaction import interact
from IPython.display import HTML, display

import requests

# Global Variables

WORDS = requests.get('https://raw.githubusercontent.com/dwyl/english-words/master/words.txt')
# TODO: split WORDS.text by newlines and then convert to set

# Functions

def spellcheck(text):
    ''' Spellcheck text by checking if each word is in the WORDS set

    If the word is not in the set, then format the word to be red and
    underlined.  Otherwise, just show the word.
    '''
    # TODO

# Run interactive loop

interact(spellcheck, text='')

Your interactive program should look like the following:

Hints

The following are hints and suggestions that will help you complete this activity:

  1. The requests.get function above will fetch a list of words from a website and return a huge string of words separated by newlines. To access the text, you must use WORDS.text.

  2. To underline a word and make it red, you can use the following HTML code:

    <u style="color:red">{word}</u>
    

    which will appear as: {word}

  3. You can build a list of HTML strings and join them.

Large Dictionary

The list of words you will be using for your dictionary is huge (300,000+). Do not try to print or display the whole thing in your Notebook, or you are going to have a bad time.

Questions

After completing the activity above, answer the following questions:

  1. Briefy describe how your spell-checker works and what challenges you faced implementing them. What differences do you see between your implementation of a spell-checker and that which can be found a typical word processing application?

  2. Does using a set instead of a list make a difference? Test this by using the %timeit command on a lookup using a list version of WORDS and a set version of WORDS:

    >>> %timeit 'monkey' in WORDS_SET
    ...
    >>> %timeit 'monkey' in WORDS_LIST
    ...
    

    Which data structure is more efficient? Can you feel the difference when you are using the spell-checker interactively? Explain.

Submission

To submit your notebook, follow the same directions for Notebook 01, except store this notebook in the notebook06 folder.