Homework 05: Data Processing

This Is Not The Course Website You Are Looking For

This course website is from a previous semester. If you are currently in the class, please make sure you are viewing the latest course website instead of this old one.

The goal of this homework assignment is to allow you to practice using Python to create scripts that require sophisticated parsing of data and manipulation of data structures. In this assignment, you will download data in both CSV and JSON format, process it, and then present it to the user in the terminal.

For this assignment, record your scripts and any responses to the following activities in the homework05 folder of your assignments GitHub repository and push your work by noon Saturday, February 26.

Activity 0: Preparation¶

Before starting this homework assignment, you should first perform a git pull to retrieve any changes in your remote GitHub repository:

$ cd path/to/repository                   # Go to assignments repository

$ git checkout master                     # Make sure we are in master branch

$ git pull --rebase                       # Get any remote changes not present locally

Next, create a new branch for this assignment:

$ git checkout -b homework05              # Create homework05 branch and check it out

Once this is done, download the Makefile and test scripts:

# Go to homework05 folder
$ cd homework05

# Download the Makefile
$ curl -LO https://raw.githubusercontent.com/nd-cse-20289-sp22/cse-20289-sp22-assignments/master/homework05/Makefile

# Add and commit Makefile
$ git add Makefile
$ git commit -m "homework05: add Makefile"

# Download the test scripts
$ make test-scripts

You are now ready to work on the activities below.

Frequently Asked Questions¶

Note, you do not need to add and commit the test scripts since the Makefile will automaticaly download them again whenever you run make.

Activity 1: Reddit (5 Points)¶

Katie likes to sit in the back of the class. It has its perks:

She can beat the rush out the door when class ends.
She can see everyone browsing Facebook, playing video games¹, watching YouTube, or doing homework.
She feels safe from being called upon by the instructor... except when he does that strange thing where he goes around the class and tries to talk to people. Totally weird ².

That said, sitting in the back has its downsides:

She can never see what the instructor is writing because he has terrible handwriting and always writes too small ³.
She is prone to falling asleep because the instructor is really boring and the class is not as interesting as her other computer science courses.

To combat her boredom, Katie typically just browses Reddit. Her favorite subreddits are AdviceAnimals, aww, todayilearned, and of course UnixPorn. Lately, however, Katie has grown paranoid that her web browser is leaking information about her⁴, and so she wants to be able to get the latest links from Reddit directly in her terminal.

Fortunately for Katie, Reddit provides a JSON feed for every subreddit. You simply need to append .json to the end of each subreddit. For instance, the JSON feed for todayilearned can be found here:

https://www.reddit.com/r/todayilearned/.json

To fetch that data, Katie uses the Requests package in Python to access the JSON data:

r = requests.get('https://www.reddit.com/r/todayilearned/.json')
print(r.json())

429 Too Many Requests¶

Reddit tries to prevent bots from accessing its website too often. To work around any 429: Too Many Requests errors, we can trick Reddit by specifying our own user agent:

headers  = {'user-agent': 'reddit-{}'.format(os.environ.get('USER', 'cse-20289-sp22'))}
response = requests.get(url, headers=headers)

This should allow you to make requests without getting the dreaded 429 error.

The code above would output something like the following:

{"kind": "Listing", "data": {"modhash": "g8n3uwtdj363d5abd2cbdf61ed1aef6e2825c29dae8c9fa113", "children": [{"kind": "t3", "data": ...

Looking through that stream of text, Katie sees that the JSON data is a collection of structured or hierarchical dictionaries and lists. This looks a bit complex to her, so she wants you to help her complete the reddit.py script which fetches the JSON data for a specified subreddit or URL and allows the user to sort the articles by various fields, restrict the number of items displayed, and even shorten the URLs of each article.

Task 1: `reddit.py`¶

The reddit.py script takes the following arguments:

$ ./reddit.py
Usage: reddit.py [options] URL_OR_SUBREDDIT

    -s          Shorten URLs using (default: False)
    -n LIMIT    Number of articles to display (default: 10)
    -o ORDERBY  Field to sort articles by (default: score)
    -t TITLELEN Truncate title to specified length (default: 60)

The -s flag shortens URLs using the is.gd web service. This means that long URLs are converted into shorter ones such as: https://is.gd/g3QdWp

The -n flag specifies the number of articles to display. By default, this is 10.

The -o flag specifies the Reddit article attribute to use for sorting the articles. By default this should be score. Note, if the ORDERBY parameter is score, then the articles should be ordered in descending order. Otherwise, the articles should be ranked in ascending order.

The -t flag specifies the maximum length for the articles' titles. Titles longer than this value should be truncated. The default value is 60.

Examples¶

Here are some examples of reddit.py in action:

# Show Linux subreddit
$ ./reddit.py linux
  1.  I wanted to learn some assembly, so I made this terminal-bas (Score: 2689)
      https://i.redd.it/3o6gcysfgmh81.gif

  2.  Linux 5.17 will be called "Superb Owl" (Score: 1013)
      https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=754e0b0e35608ed5206d6a67a791563c631cec07

  3.  Martin Povišer is writing Linux drivers for audio hardware o (Score: 968)
      https://github.com/sponsors/povik

  4.  Discord is a black hole for information (Score: 670)
      https://knockout.chat/thread/33251/1

  5.  AmogOS, the joke distro, adds a crypto miner to their websit (Score: 416)
      https://github.com/Amog-OS/website/commit/9f377fc5a1756603d83e0fb661af563049d94558

  6.  Just a warning about typos (Score: 390)
      https://www.reddit.com/r/linux/comments/srfcjz/just_a_warning_about_typos/

  7.  I managed to convince my friend to switch to Linux. (Score: 330)
      https://www.reddit.com/r/linux/comments/sqmzex/i_managed_to_convince_my_friend_to_switch_to_linux/

  8.  KDE's Kalendar reaches version 1.0.0 (Score: 255)
      https://claudiocambra.com/2022/02/12/kalendar-1-0-is-out/

  9.  This week in KDE: A smooth release of Plasma 5.24 (Score: 180)
      https://pointieststick.com/2022/02/11/this-week-in-kde-a-smooth-release-of-plasma-5-24/

 10.  Script to remove the annoying podcasts from the Spotify flat (Score: 122)
      https://www.reddit.com/r/linux/comments/sr3ny8/script_to_remove_the_annoying_podcasts_from_the/

# Show top 5 nba articles
$ ./reddit.py -n 5 nba
  1.  Tyrese Haliburton in his 2nd game as a Pacer: 22 points, 16  (Score: 7358)
      https://www.reddit.com/r/nba/comments/srus5f/tyrese_haliburton_in_his_2nd_game_as_a_pacer_22/

  2.  In the 2017 offseason Kyrie requested a trade because he wan (Score: 6386)
      https://www.reddit.com/r/nba/comments/srvnzy/in_the_2017_offseason_kyrie_requested_a_trade/

  3.  [The Athletic] NBA 75: At No. 5, Magic Johnson combined dazz (Score: 3461)
      https://theathletic.com/3126173/2022/02/14/nba-75-at-no-5-magic-johnson-combined-dazzling-playmaking-with-charisma-to-lead-the-showtime-lakers-to-five-titles/

  4.  [Bontemps] The Sixers say James Harden is out through the Al (Score: 2730)
      https://twitter.com/timbontemps/status/1493276833663664130?s=21

  5.  Unrealistic NBA fantasies that would set the NBA world on fi (Score: 2386)
      https://www.reddit.com/r/nba/comments/ss4jg9/unrealistic_nba_fantasies_that_would_set_the_nba/

# Show top 2 Bloons TD 6 articles sorted by title
$ ./reddit.py -n 2 -o title btd6
  1.  99% of players be like (Score: 1197)
      https://i.redd.it/zxqizfv4jth81.jpg

  2.  Acidic Mixture DRIP (Score: 203)
      https://i.redd.it/rxhlfzs1kth81.png

# Show top 2 unixporn articles sorted by score (with shorten URLs)
$ ./reddit.py -n 2 -o score -s unixporn
  1.  [plasma] Dracula, I like the new kde blur a lot (Score: 826)
      https://is.gd/U3LKRn

  2.  [i3-gaps] Current rice. Working on status bar. Also any sugg (Score: 793)
      https://is.gd/YfTR7n

We'll do it live¶

Note, since we are pulling data from an active website, the articles may change between runs.

Skeleton Code¶

Here is skeleton code you can use to start your reddit.py script:

# Download reddit.py skeleton
$ curl -LO https://raw.githubusercontent.com/nd-cse-20289-sp22/cse-20289-sp22-assignments/master/homework05/reddit.py

It should look something like this:

import os
import sys

import requests

# Constants

ISGD_URL = 'http://is.gd/create.php'

# Functions

def usage(status=0):
    ''' Display usage information and exit with specified status '''
    print('''Usage: {} [options] URL_OR_SUBREDDIT

    -s          Shorten URLs using (default: False)
    -n LIMIT    Number of articles to display (default: 10)
    -o ORDERBY  Field to sort articles by (default: score)
    -t TITLELEN Truncate title to specified length (default: 60)
    '''.format(os.path.basename(sys.argv[0])))
    sys.exit(status)

def load_reddit_data(url):
    ''' Load reddit data from specified URL into dictionary

    >>> len(load_reddit_data('https://reddit.com/r/nba/.json')) >= 25
    True

    >>> load_reddit_data('linux')[0]['data']['subreddit']
    'linux'
    '''
    return []

def shorten_url(url):
    ''' Shorten URL using is.gd service

    >>> shorten_url('https://reddit.com/r/aoe2')
    'https://is.gd/dL5bBZ'

    >>> shorten_url('https://cse.nd.edu')
    'https://is.gd/3gwUc8'
    '''
    return ''

def print_reddit_data(data, limit=10, orderby='score', titlelen=60, shorten=False):
    ''' Dump reddit data based on specified attributes '''
    pass

def main():
    arguments = sys.argv[1:]
    url       = None
    limit     = 10
    orderby   = 'score'
    titlelen  = 60
    shorten   = False

# Main Execution

if __name__ == '__main__':
    main()

Hints¶

Read the TODO comments carefully. DO NOT REMOVE THE doctests!
load_reddit_data: To fetch the Reddit data, you can use the requests.get method. You can then access the JSON data via the json() method of the object returned by requests.get. You will need to think carefully about what portion of the JSON you actually want to utilize.
print_reddit_data: To display the articles, this function can use the sorted function with either the reverse or key parameter to order the articles. It can then slice the resulting articles to limit the number of things displayed.

Note, a tab character is used to separate the index and the title and to indent the url.

To sort by a particular field or attribute, you can take advantage of the key parameter of the sorted function:
```
articles = sorted(data, key=lambda i: i[field])
```
shorten_url: To shorten a URL, you can use the requests.get method to make a request to the is.gd webservice with the appropriate params as shown below:
```
requests.get(ISGD_URL, params={'format': 'json', 'url': url})
```
You will need to parse the result of this request to extract the shortened URL.

Task 2: `test_reddit.sh`¶

To aid you in testing the reddit.py script, we have provided you with test_reddit.sh, which you can use as follows:

$ ./test_reddit.sh
Testing reddit.py ...
   Doctests                                 ... 1.00 / 1.00
   Bad arguments                            ... Success
   -h                                       ... Success
   No arguments                             ... Success
   linuxactionshow                          ... Success
   linuxactionshow (-n 1)                   ... Success
   linuxactionshow (-o url)                 ... Success
   linuxactionshow (-t 10)                  ... Success
   linuxactionshow (-s)                     ... Success
   pop-os                                   ... Success
   pop-os (-n 5)                            ... Success
   pop-os (-n 5 -o title)                   ... Success
   pop-os (-n 5 -o url -t 20)               ... Success
   pop-os (-n 2 -o score -t 40 -s)          ... Success

   Score 5.00 / 5.00
  Status Success

Note, we chose an inactive subreddit, r/linuxactionshow for our test script since it should not change. However, it is possible that it may change⁵ and thus break our test script. Just let the staff know and adjustments can be made.

Activity 2: CSE Demographics (6 Points)¶

One of the most pressing questions in Computer Science education is addressing the lack of diversity in computing. In order to meanfully examine this issue, we will first need to look at some raw data. For this activity, you are to a script, demographics.py, that downloads, processes, and visualizes the Computer Science and Engineering demographic data from the classes of 2013 through 2024.

Background¶

For context and background material, you may wish to first read some of the following articles regarding gender and ethnic diversity in Computer Science and the technology industry in general:

Ethical and Professional Issues¶

This assignment is inspired by a discussion we have in the CSE 40175 Ethical and Professional Issues class, which is a required course for all Computer Science and Engineering students.

Demographic Data¶

Our beloved Ramzi has graciously provided us with the student demographic data for all Computer Science and Engineering students from the class of 2013 through 2024. This data looks something like this:

$ curl -sL https://yld.me/raw/fIRK | head -n 10
Year,Gender,Ethnic
2013,M,C
2014,M,C
2015,F,N
2016,F,N
2017,M,C
2018,M,C
2013,M,C
2014,M,S
2015,M,C

As can be seen above, the demographics CSV data has three columns:

The first column represents the graduating class year that student belongs to.
The second column records the gender of the student and contains either M or F which stands for Male or Female respectively.

The third column records the ethnicity of the student and consists of the following mapping:

Letter	Ethnicity
C	Caucasian
O	Asian
S	Hispanic
B	Black or African American
N	Native American / Pacific Islanders
T	Multiple Selection
U	Undeclared

Task 1: `demographics.py`¶

The demographics.py script takes the following arguments:

# Display usage message
$ ./demographics.py -h
Usage: demographics.py [options] [URL]

    -y  YEARS   Which years to display (default: all)
    -p          Display data as percentages.
    -G          Do not include gender information.
    -E          Do not include ethnic information.

The -y flag specifies which years to display (the default is to display all the years in the CSV data).

The -p flag forces the script to output data as percentages rather than raw counts.

The -G and -E flags suppresses outputting the gender and ethnic data respectively.

The user may specify a URL to the CSV file. If one is not specified then the script will use https://yld.me/raw/fIRK as the URL.

Note, the years are always displayed in ascending order.

Examples¶

Here are some examples of demographics.py in action:

# No arguments
$ ./demographics.py
        2013    2014    2015    2016    2017    2018    2019    2020    2021    2022    2023    2024
========================================================================================================
  M      49      44      58      60      65     101      96      92      89     124     119     115
  F      14      12      16      19      26      45      54      43      46      41      47      58
--------------------------------------------------------------------------------------------------------
  B       3       2       4       1       5       3       3       4       6       4       1       4
  C      43      43      47      53      60     107      96      92      87     106      99     108
  N       1       1       1       7       5       5      13      14      13      14      17       8
  O       7       5       9       9      12      10      13       7       8      14      13      19
  S       7       4      10       9       3      13      10      10      11      17      26      20
  T       2       1       1       0       6       8      15       7       9       8       8      10
  U       0       0       2       0       0       0       0       1       1       1       2       4
--------------------------------------------------------------------------------------------------------

# Show percentages rather than raw counts
$ ./demographics.py -p
       2013    2014    2015    2016    2017    2018    2019    2020    2021    2022    2023    2024
========================================================================================================
  M   77.8%   78.6%   78.4%   75.9%   71.4%   69.2%   64.0%   68.1%   65.9%   75.2%   71.7%   66.5%
  F   22.2%   21.4%   21.6%   24.1%   28.6%   30.8%   36.0%   31.9%   34.1%   24.8%   28.3%   33.5%
--------------------------------------------------------------------------------------------------------
  B    4.8%    3.6%    5.4%    1.3%    5.5%    2.1%    2.0%    3.0%    4.4%    2.4%    0.6%    2.3%
  C   68.3%   76.8%   63.5%   67.1%   65.9%   73.3%   64.0%   68.1%   64.4%   64.2%   59.6%   62.4%
  N    1.6%    1.8%    1.4%    8.9%    5.5%    3.4%    8.7%   10.4%    9.6%    8.5%   10.2%    4.6%
  O   11.1%    8.9%   12.2%   11.4%   13.2%    6.8%    8.7%    5.2%    5.9%    8.5%    7.8%   11.0%
  S   11.1%    7.1%   13.5%   11.4%    3.3%    8.9%    6.7%    7.4%    8.1%   10.3%   15.7%   11.6%
  T    3.2%    1.8%    1.4%    0.0%    6.6%    5.5%   10.0%    5.2%    6.7%    4.8%    4.8%    5.8%
  U    0.0%    0.0%    2.7%    0.0%    0.0%    0.0%    0.0%    0.7%    0.7%    0.6%    1.2%    2.3%
--------------------------------------------------------------------------------------------------------

# Show only gender percentages for the years 2022, 2023, and 2024 (with explicit URL)
$ ./demographics.py -y 2022,2023,2024 -E -p https://yld.me/raw/fIRK
       2022    2023    2024
================================
  M   75.2%   71.7%   66.5%
  F   24.8%   28.3%   33.5%
--------------------------------

Spacing and Formatting¶

Note, to pass the provided tests, you must match the spacing and formatting exactly. Only use spaces to align data (no \t) and make sure you have no trailing whitespace**.

Formatting hints are provided in the docstrings provided in the skeleton code.

Skeleton Code¶

Here is skeleton code you can use to start your demographics.py script:

# Download demographics.py skeleton
$ curl -LO https://raw.githubusercontent.com/nd-cse-20289-sp22/cse-20289-sp22-assignments/master/homework05/demographics.py

It should look something like this:

import collections
import os
import sys

import requests

# Constants

URL     = 'https://yld.me/raw/fIRK'
TAB     = ' '*8
GENDERS = ('M', 'F')
ETHNICS = ('B', 'C', 'N', 'O', 'S', 'T', 'U')

# Functions

def usage(status=0):
    ''' Display usage information and exit with specified status '''
    progname = os.path.basename(sys.argv[0])
    print(f'''Usage: {progname} [options] [URL]

    -y  YEARS   Which years to display (default: all)
    -p          Display data as percentages.
    -G          Do not include gender information.
    -E          Do not include ethnic information.
    ''')
    sys.exit(status)

def load_demo_data(url=URL):
    ''' Load demographics from specified URL into dictionary

    >>> load_demo_data('https://yld.me/raw/ilG').keys()
    dict_keys(['2013', '2014', '2015', '2016', '2017', '2018', '2019'])

    >>> load_demo_data('https://yld.me/raw/ilG')['2013'] == \
            {'M': 1, 'B': 2, 'F': 1, 'TOTAL': 2}
    True

    >>> load_demo_data('https://yld.me/raw/ilG')['2019'] == \
            {'M': 1, 'U': 2, 'F': 1, 'TOTAL': 2}
    True
    '''
    return {}

def print_demo_separator(years, char='='):
    ''' Print demographics separator

    Note: The row consists of the 8 chars for each item in years + 1.

    >>> print_demo_separator(['2012', '2013'])
    ========================
    '''
    print()

def print_demo_years(years):
    ''' Print demographics years row

    Note: The row is prefixed by 4 spaces and each year is right aligned to 8
    spaces ({:>8}).

    >>> print_demo_years(['2012', '2013'])
            2012    2013
    '''
    print()

def print_demo_fields(data, years, fields, percent=False):
    ''' Print demographics information (for particular fields)

    Note: The first column should be a 4-spaced field name ({:>4}), followed by
    8-spaced right aligned data columns ({:>8}).  If `percent` is True, then
    display a percentage ({:>7.1f}%) rather than the raw count.

    >>> data  = load_demo_data('https://yld.me/raw/ilG')
    >>> years = sorted(data.keys())
    >>> print_demo_fields(data, years, GENDERS, False)
       M       1       1       1       1       1       1       1
       F       1       1       1       1       1       1       1
    '''
    print()

def print_demo_data(data, years=None, percent=False, gender=True, ethnic=True):
    ''' Print demographics data for the specified years and attributes '''
    pass

def print_demo_gender(data, years, percent=False):
    ''' Print demographics gender information '''
    print_demo_fields(data, years, GENDERS, percent)
    print_demo_separator(years, '-')

def print_demo_ethnic(data, years, percent=False):
    ''' Print demographics ethnic information '''
    print_demo_fields(data, years, ETHNICS, percent)
    print_demo_separator(years, '-')

def main():
    ''' Parse command line arguments, load data from url, and then print
    demographic data. '''
    arguments = sys.argv[1:]
    url       = URL
    years     = None
    gender    = True
    ethnic    = True
    percent   = False

# Main Execution

if __name__ == '__main__':
    main()

Hints¶

Read the TODO comments carefully. DO NOT REMOVE THE doctests!
load_demo_data: To fetch the demographic data, you can use the requests.get method. You can then access the raw data via the text property of the object returned by requests.get. You will then need to organize the CSV data into a dictionary of the following format:
```
{"year": {"gender": count, "ethnic": count, "TOTAL": count}}
```
print_demo_separator: To display a separator line, this function should print 8 chars for each year + 1. Consider using string multiplication:
```
print(' '*8)  # Prints 8 spaces
```
print_demo_years: To display the years, this function should use the start with a string with 4 spaces and then concatenate to this string each year formatted so it is right-aligned to 8 spaces. You can accomplish alignment by doing the following:
```
# Print the value of variable aligned to the right and taking up at least 8 spaces
print(f'{variable:>8}')
```
print_demo_fields: To display all the rows for a given list of fields, this function should build a row of text column by column. Special care must be taken to display percentages versus raw counts.
print_demo_data: To display the table of data, this function should call the print_demo_years and print_demo_separator functions, followed by the print_demo_gender and print_demo_ethnic functions if necessary.
print_demo_gender: This function simply calls the print_demo_fields and print_demo_separator functions and is provided to you.
print_demo_ethnic: This function simply calls the print_demo_fields and print_demo_separator functions and is provided to you.

Iterative and incremental development¶

Do not try to implement everything at once. Instead, approach this activity with the iterative and incremental development mindset and slowly build pieces of your application one feature at a time:

Parse the program arguments.
Fetch the data from the specified URL and organize into a dictionary.
Write the table header.
Write one table row based on a field.

Remember that the goal at the end of each iteration is that you have a working program that successfully implements all of the features up to that point.

Focus on one thing at a time and feel free to write small test code to try things out.

Task 2: `test_demographics.sh`¶

To aid you in testing the demographics.py script, we have provided you with test_demographics.sh, which you can use as follows:

$ ./test_demographics.sh
Testing demographics.py ...
   Doctests                                 ... 1.00 / 1.00
   Bad arguments                            ... Success
   -h                                       ... Success
   No arguments                             ... Success
   fIRK                                     ... Success
   fIRK -y 2013                             ... Success
   fIRK -y 2024                             ... Success
   fIRK -y 2017,2021,2019,2023              ... Success
   fIRK -p                                  ... Success
   fIRK -G                                  ... Success
   fIRK -E                                  ... Success
   ilG                                      ... Success
   ilG -y 2016 -p                           ... Success
   ilG -y 2016 -p -E                        ... Success
   ilG -y 2016 -p -E -G                     ... Success

   Score 6.00 / 6.00
  Status Success

Activity 3: Quiz (1 Point)¶

Once you have completed all the activities above, you are to complete the following reflection quiz:

As with [Reading 01], you will need to store your answers in a homework04/answers.json file. You can use the form above to generate the contents of this file, or you can write the JSON by hand.

To test your quiz, you can use the check.py script:

$ ../.scripts/check.py
Checking homework05 quiz ...
     Q01 0.40
     Q02 0.60
   Score 1.00 / 1.00
  Status Success

Guru Point (1 Point)¶

For extra credit, you are to visualize the CSE demographics data above as graphical plots using Python and matplotlib inside a Jupyter notebook. You should produce graphs like these:

To run Jupyter on the student machines:, you can use the following command:

$ jupyter-notebook-3.9 --ip studentXX.cse.nd.edu --port 9000 --no-browser

Replace studentXX.cse.nd.edu with the name of the student machine you are on. Also, you may need to change the port to something else between 9000 - 9999.

Google Colaboratory¶

Alternatively, you can use Google Colaboratory to run a Jupyter notebook from your Google Drive. This is a hassle-free way to explore developing with Python and matplotlib, while using Google Cloud resources (thus no need for the VPN or the student machines).

Verification¶

To get credit for this Guru Point, show your Jupyter notebook with the graphical plots to the instructor or a TA to verify (or attached a video / screenshot to your Pull Request). You have up until a week after this assignment is due to verify your Guru Point.

Self-Service Extension¶

Remember that you can always forgo this Guru Point for two extra days to do the homework. That is, if you need an extension, you can simply skip the Guru Point and you will automatically have until Monday to complete the assignment for full credit.

Just leave a note on your Pull Request of your intensions.

Submission¶

To submit your assignment, please commit your work to the homework05 folder of your homework05 branch in your assignments GitHub repository. Your homework04 folder should only contain the following files:

Makefile
answers.json
demographics.py
reddit.py

Note: You do not need to commit the test scripts because the Makefile automatically downloads them.

#--------------------------------------------------
# BE SURE TO DO THE PREPARATION STEPS IN ACTIVITY 0
#--------------------------------------------------

$ cd homework05                           # Go to Homework 05 directory
...
$ $EDITOR reddit.py                       # Edit script
$ git add reddit.py                       # Mark changes for commit
$ git commit -m "homework05: activity 1"  # Record changes
...
$ $EDITOR demographics.py                 # Edit script
$ git add demographics.py                 # Mark changes for commit
$ git commit -m "homework05: activity 2"  # Record changes
...
$ $EDITOR answers.json                    # Edit quiz
$ git add answers.json                    # Mark changes for commit
$ git commit -m "homework05: activity 3"  # Record changes
...
$ git push -u origin homework05           # Push branch to GitHub

Pull Request¶

Remember to create a Pull Request and assign the appropriate TA from the Reading 06 TA List.

DO NOT MERGE your own Pull Request. The TAs use open Pull Requests to keep track of which assignments to grade. Closing them yourself will cause a delay in grading and confuse the TAs.

Or just browsing on your phone during lecture. Never forget. ↩
However, Caring is Creepy ↩
Even with the writing tablet... ↩
It is. ↩
Please don't mess with this sub-reddit on purpose. I'm looking at you #csememers (RIP). ↩

This Is Not The Course Website You Are Looking For

Homework 05: Data Processing

Activity 0: Preparation¶

Frequently Asked Questions¶

Activity 1: Reddit (5 Points)¶

429 Too Many Requests¶

Task 1: reddit.py¶

Examples¶

We'll do it live¶

Skeleton Code¶

Hints¶

Task 2: test_reddit.sh¶

Activity 2: CSE Demographics (6 Points)¶

Background¶

Ethical and Professional Issues¶

Demographic Data¶

Task 1: demographics.py¶

Examples¶

Spacing and Formatting¶

Skeleton Code¶

Hints¶

Iterative and incremental development¶

Task 2: test_demographics.sh¶

Activity 3: Quiz (1 Point)¶

Guru Point (1 Point)¶

Google Colaboratory¶

Verification¶

Self-Service Extension¶

Submission¶

Pull Request¶

Task 1: `reddit.py`¶

Task 2: `test_reddit.sh`¶

Task 1: `demographics.py`¶

Task 2: `test_demographics.sh`¶