The goal of this homework assignment is to allow you to practice using Python to interact with the web using the Requests package. In this assignment, you will write scripts that pull data from the Internet and manipulate it in some way.
For this assignment, record your scripts and any responses to the following
activities in the in the homework04
folder of your assignments GitLab
repository and push your work by 11:59 PM Friday, February 24, 2017.
Trevor1 loves having fun at the faculty's expense. When he is not using the write command to spam the instructor's terminal during class, he is making meme pics of the teacher and posting it on Facebook2:
Having learned about ImageMagick, Trevor decides to take his trolling game up a notch by creating a script to generate some amusing and possibly terrifying GIF animations3 such as the ones below:
To create these animations, Trevor uses the ImageMagick composite
tool to
generate blended image frames:
# Given input images source1 and source2, generate a blended output: $ composite -blend $stepsize $source1 $source2 $output
This command takes the source1
and source2
images and blends them based
on the percentage specified in stepsize
as show in the following formula:
Output = Source1*(StepSize)/100 + Source2*(100 - StepSize)/100
For instance if stepsize
is 20
then the composite
tool will take 20%
of the pixel value from source1
and 80%
from source2
to produce the
blended image:
# Blend: 20% Ramzi, 80% Tijana $ composite -blend 20 ramzi.jpg tijana.jpg 020-ramzi_tijana.gif
Conversely, if stepsize
is 80
then the composite
tool will take 80%
of the pixel value from source1
and 20%
from source2
to produce the
blended image:
# Blend: 80% Ramzi, 20% Tijana $ composite -blend 80 ramzi.jpg tijana.jpg 080-ramzi_tijana.gif
To create the animation, Trevor generates a series of these blended composite
images at regular intervals from 000
to 100
:
# List all the generated blended composite images
$ ls
000-ramzi_tijana.gif
010-ramzi_tijana.gif
020-ramzi_tijana.gif
030-ramzi_tijana.gif
040-ramzi_tijana.gif
050-ramzi_tijana.gif
060-ramzi_tijana.gif
070-ramzi_tijana.gif
080-ramzi_tijana.gif
090-ramzi_tijana.gif
100-ramzi_tijana.gif
Once he has all the individual composite images, he can stitch them together
to create a GIF animation by using the convert
command:
# Stitch blended composite images into an animation $ convert -loop 0 -delay 5 \ 000-ramzi_tijana.gif \ 010-ramzi_tijana.gif \ 020-ramzi_tijana.gif \ 030-ramzi_tijana.gif \ 040-ramzi_tijana.gif \ 050-ramzi_tijana.gif \ 060-ramzi_tijana.gif \ 070-ramzi_tijana.gif \ 080-ramzi_tijana.gif \ 090-ramzi_tijana.gif \ 100-ramzi_tijana.gif \ ramzi_tijana.gif
The -loop 0
means to have the GIF loop forever, while the -delay 5
means to wait 5 hundredths of a second before transitioning to the next
image frame.
As can be seen the convert
tool is given a list of all the composite images
followed by the final target file (ie. ramzi_tijana.gif
).
To have the animation blend forward and backwards, Trevor simply appends the list of composite images but in reverse:
# Stitch blended composite images into an animation that runs forwards and backwards $ convert -loop 0 -delay 5 \ 000-ramzi_tijana.gif \ 010-ramzi_tijana.gif \ 020-ramzi_tijana.gif \ 030-ramzi_tijana.gif \ 040-ramzi_tijana.gif \ 050-ramzi_tijana.gif \ 060-ramzi_tijana.gif \ 070-ramzi_tijana.gif \ 080-ramzi_tijana.gif \ 090-ramzi_tijana.gif \ 100-ramzi_tijana.gif \ 100-ramzi_tijana.gif \ 090-ramzi_tijana.gif \ 080-ramzi_tijana.gif \ 070-ramzi_tijana.gif \ 060-ramzi_tijana.gif \ 050-ramzi_tijana.gif \ 040-ramzi_tijana.gif \ 030-ramzi_tijana.gif \ 020-ramzi_tijana.gif \ 010-ramzi_tijana.gif \ 000-ramzi_tijana.gif \ ramzi_tijana.gif
The default version of ImageMagick on the student machines is pretty
old. To use a more recent version of ImageMagick, add the following
directory to your PATH
environmental variable:
~ccl/software/external/imagemagick/bin
In csh
, you would do:
$ setenv PATH ~ccl/software/external/imagemagick/bin:$PATH
In bash
, you would do:
$ export PATH=~ccl/software/external/imagemagick/bin:$PATH
In Python, you would do:
import os os.environ['PATH'] = '~ccl/software/external/imagemagick/bin:' + os.environ['PATH']
Once the PATH
is updated, you should be able to run composite -version
and see ImageMagick 6.6.4-2
.
To get the original source images, Trevor has to download them from the faculty profiles found on the Computer Science and Engineering Directory. Given a netid, you can access the person's profile by going to:
https://engineering.nd.edu/profiles/$NETID
For instance, Ramzi's profile is located at:
https://engineering.nd.edu/profiles/rbualuan
Each person's profile contains information about the person such as name, phone number, office location, etc. and includes a portrait image. The location of each portrait looks something like this:
https://engineering.nd.edu/profiles/rbualuan/@@images/42093b02-060d-4436-91b6-2cf068d4f8b8.jpeg
Going to each faculty member's profile and manually extracting this portrait image is a bit tedious, so Trevor decides to write his script in Python and use the Requests package to help him extract these image portraits in an automated fashion. Once the images are downloaded, the script can call the ImageMagick commands previously described to generate the delightful GIF animations.
Unfortunately, although Trevor is a mastermind at soliciting lulz, he is
not as effective at executing his brilliant pranks. He needs your help in
completing the Python script: blend.py
.
blend.py
The blend.py
has the following usage message:
$ ./blend.py -h Usage: blend.py [ -r -d DELAY -s STEPSIZE ] netid1 netid2 target -r Blend forward and backward -d DELAY GIF delay between frames (default: 20) -s STEPSIZE Blending percentage increment (default: 5)
As can be seen, the script takes three arguments: netid1
and netid2
correspond to the netids of the two people in the Computer Science and
Engineering Directory while target
is the name of the GIF animation file
to create.
In addition to these arguments, the script has three possible flags:
The -r
flags means that the animation should run both forward (blend
from netid1
to netid2
) and then backward (blend from netid2
to
netid1
).
The -d
flag allows the user to specify the animation DELAY
in
hundredths of a second (ie. how long to wait before shifting to next image
in animation).
The -s
flag allows the user to specify the STEPSIZE
which impacts have
many frames there are in the animation (this number must be between 0
and
100
).
To help you get started, Trevor has provided you with the following starter code:
# Download and display start code
$ curl -sL https://www3.nd.edu/~pbui/teaching/cse.20289.sp17/static/py/blend.py
The starter code contains the following:
#!/usr/bin/env python2.7 import atexit import os import re import shutil import sys import tempfile import requests # Global variables REVERSE = False DELAY = 20 STEPSIZE = 5 # Functions def usage(status=0): print '''Usage: {} [ -r -d DELAY -s STEPSIZE ] netid1 netid2 target -r Blend forward and backward -d DELAY GIF delay between frames (default: {}) -s STEPSIZE Blending percentage increment (default: {})'''.format( os.path.basename(sys.argv[0]), DELAY, STEPSIZE ) sys.exit(status) # Parse command line options args = sys.argv[1:] while len(args) and args[0].startswith('-') and len(args[0]) > 1: arg = args.pop(0) # TODO: Parse command line arguments if len(args) != 3: usage(1) netid1 = args[0] netid2 = args[1] target = args[2] # Main execution # TODO: Create workspace # TODO: Register cleanup # TODO: Extract portrait URLs # TODO: Download portraits # TODO: Generate blended composite images # TODO: Generate final animation
You are to complete the sections marked TODO
in order to complete the blend.py
script.
To parse the command line arguments, you will need to check arg
against
any possible flags. Any parameters to the flag can be accessed by
popping from the front of the args
list:
parameter = args.pop(0) # Remove the first item in the list
This is analogous to using shift
in a shell script to remove the
first item in the command line arguments.
Remember that arguments are strings by default. If you need a parameter to be another type, then you will have to explicitly cast it.
number = int(args.pop(0)) # Remove the first item in the list and convert to int
To create a workspace, you should use the tempfile.mkdtemp function which will create a temporary directory for you and return its location.
To register a cleanup function, you should use the atexit.register function to assign a function to run when the program exits. This cleanup function should remove the temporary directory you created by using the shutil.rmtree function.
This is somewhat analogous to creating a trap
in a shell script.
To extract portrait URLS, you will need to use the requests.get function to fetch the contents of a profile from the Computer Science and Engineering Directory and then the re.findall function to search and extract the URLS from the retrieved contents.
If retrieving the the contents fails or the search for a portrait URL
fails, then the program should exit with an error code: sys.exit(1)
.
To download portraits, you will need to use the requests.get method again.
To generate blended composite images, you will need to use the os.system
function to execute the ImageMagick composite
tool.
To generate the final animation, you will need to use the os.system
function to execute the ImageMagick convert
tool.
Since many of the operations above happen multiple times, you should consider
organizing the common code into functions:
search_portrait(netid)
: Given a netid
, this function returns the
corresponding image portrait URL.
download_file(url, path)
: Given a url
and path
, this function
downloads the data specified by url
and stores it in the file specified by
path
.
run_command(command)
: Given a command
, this function executes the
command
and checks its return status.
Organizing your code into smaller functions will not only make your program shorter and more concise, but also make it easier to debug and maintain.
To verify the correctness of your blend.py
script, you should try to
reproduce the images above:
# Animate Ramzi and Tijana $ ./blend.py -r -s 10 rbualuan tmilenkovic ramzi_tijana.gif Using workspace: /tmp/blendiGT7lt Searching portrait for rbualuan... https://engineering.nd.edu/profiles/rbualuan/@@images/42093b02-060d-4436-91b6-2cf068d4f8b8.jpeg Searching portrait for tmilenkovic... https://engineering.nd.edu/profiles/tmilenkovic/@@images/2e3cada8-ee15-4dba-88c8-21c89b05466b.jpeg Downloading https://engineering.nd.edu/profiles/rbualuan/@@images/42093b02-060d-4436-91b6-2cf068d4f8b8.jpeg to /tmp/blendiGT7lt/42093b02-060d-4436-91b6-2cf068d4f8b8.jpeg... Success! Downloading https://engineering.nd.edu/profiles/tmilenkovic/@@images/2e3cada8-ee15-4dba-88c8-21c89b05466b.jpeg to /tmp/blendiGT7lt/2e3cada8-ee15-4dba-88c8-21c89b05466b.jpeg... Success! Generating /tmp/blendiGT7lt/000-ramzi_tijana.gif ... Success! Generating /tmp/blendiGT7lt/010-ramzi_tijana.gif ... Success! Generating /tmp/blendiGT7lt/020-ramzi_tijana.gif ... Success! Generating /tmp/blendiGT7lt/030-ramzi_tijana.gif ... Success! Generating /tmp/blendiGT7lt/040-ramzi_tijana.gif ... Success! Generating /tmp/blendiGT7lt/050-ramzi_tijana.gif ... Success! Generating /tmp/blendiGT7lt/060-ramzi_tijana.gif ... Success! Generating /tmp/blendiGT7lt/070-ramzi_tijana.gif ... Success! Generating /tmp/blendiGT7lt/080-ramzi_tijana.gif ... Success! Generating /tmp/blendiGT7lt/090-ramzi_tijana.gif ... Success! Generating /tmp/blendiGT7lt/100-ramzi_tijana.gif ... Success! Generating ramzi_tijana.gif ... Success! Cleaning up workspace: /tmp/blendiGT7lt # Animate Peter and David $ ./blend.py -r pbui dchiang peter_david.gif ... # Animate Shreya and Scott $ ./blend.py -r skumar semrich shreya_scott.gif ... # Animate Bowyer and Flynn $ ./blend.py -r kbowyer pflynn bowyer_flynn.gif ...
Although it is not required, you should consider emitting diagnostic messages as shown above to inform the user the progress of your script.
As noted above, if the script encounters an error while searching for a portrait, downloading a file, or executing a command, the script should exit early and cleanup the temporary workspace.
# Exit early on failure $ ./blend.py batman superman fail.gif Using workspace: /tmp/blendFuiGiK Searching portrait for batman... Not Found! Cleaning up workspace: /tmp/blendFuiGiK
README.md
In your README.md
, describe how you implemented the blend.py
script. In
particular, briefly discuss:
How you parsed command line arguments.
How you managed the temporary workspace.
How you extracted the portrait URLS.
How you downloaded the portrait images.
How you generated the blended composite images.
How you generated the final animation.
How you checked for failure of different operations and exited early.
Katie likes to sit in the back of the class. It has its perks:
She can beat the rush out the door when class ends.
She can see everyone browsing Facebook, playing video games, watching YouTube, or doing homework.
She feels safe from being called upon by the instructor... except when he does that strange thing where he goes around the class and tries to talk to people. Totally weird 4.
That said, sitting in the back has its downsides:
She can never see what the instructor is writing because he has terrible handwriting and always writes too small.
She is prone to falling asleep because the instructor is really boring and the class is not as interesting as Discrete Math was last semester.
To combat her boredom, Katie typically just browses Reddit. Her favorite
subreddits are AdviceAnimals, aww, todayilearned, and of course
UnixPorn. Katie is tired of having to go to each subreddit and browsing for
cool links, however, and decides she wants to create a script, reddit.py
,
which will allow her to quickly filter or grep a subreddit.
Fortunately for Katie, Reddit provides a JSON feed for every subreddit.
You simply need to append .json
to the end of each subreddit. For
instance, the JSON feed for todayilearned can be found here:
https://www.reddit.com/r/todayilearned/.json
To fetch that data, Katie uses the Requests package in Python to access the JSON data:
import requests r = requests.get('https://www.reddit.com/r/todayilearned/.json') print r.json()
Reddit tries to prevent bots from accessing its website too often. To work
around any 429: Too Many Requests errors, we can trick Reddit by
specifying our own user agent:
headers = {'user-agent': 'reddit-{}'.format(os.environ['USER'])} response = requests.get('https://www.reddit.com/r/linux/.json', headers=headers)
This should allow you to make requests without getting the dreaded 429 error.
This script would output something like the following:
{"kind": "Listing", "data": {"modhash": "g8n3uwtdj363d5abd2cbdf61ed1aef6e2825c29dae8c9fa113", "children": [{"kind": "t3", "data": ...
Looking through that stream of text, Katie sees that the JSON data is a
collection of structured or hierarchical dictionaries and lists. This
looks a bit complex to her, so she wants you to help her complete the
reddit.py
script which fetches the JSON data for a subreddit and allows
the user to filter articles by specified fields using a regular expression.
reddit.py
The reddit.py
has the following usage message:
$ ./reddit.py -h Usage: reddit.py [ -f FIELD -s SUBREDDIT ] regex -f FIELD Which field to search (default: title) -n LIMIT Limit number of articles to report (default: 10) -s SUBREDDIT Which subreddit to search (default: linux)
As can be seen, the reddit.py
script takes three possible flags followed by
the regular expression to use in filtering the articles:
The -f
flag allows the user to specify which FIELD
to search when
filtering. By default this is the title
field, but it could be any field
in the JSON data corresponding to the article.
The -n
flag limits the number of articles to report or display. The
default is 10
articles.
The -s
flag allows the user to specify which SUBREDDIT
to search. By
default this is the linux
subreddit.
Here are some examples of reddit.py
in action:
# By default list 10 articles from the r/linux subreddit $ ./reddit.py 1. Title: Delta uses Linux for their in flight entertainment Author: SaberHamLincoln Link: https://www.reddit.com/r/linux/comments/5uv5uh/delta_uses_linux_for_their_in_flight_entertainment/ Short: https://is.gd/EhxxUr 2. Title: This year's Linux Sucks talk will be the last one ever, apparently. Author: deusmetallum Link: https://www.reddit.com/r/linux/comments/5uyiv9/this_years_linux_sucks_talk_will_be_the_last_one/ Short: https://is.gd/k8VAT1 3. Title: cron.weekly issue #68: Virtual Memory, Jenkins, Etckeeper, Tensorflow, PGP, Let's Encrypt & more Author: ilconcierge Link: https://www.reddit.com/r/linux/comments/5uyizs/cronweekly_issue_68_virtual_memory_jenkins/ Short: https://is.gd/o5Jkk2 4. Title: Rusty Builder rustup support in gnome builder Author: abdulkareemsn Link: https://www.reddit.com/r/linux/comments/5uybe6/rusty_builder_rustup_support_in_gnome_builder/ Short: https://is.gd/nxGy7Q 5. Title: The decline of GPL? Author: speckz Link: https://www.reddit.com/r/linux/comments/5uz3ut/the_decline_of_gpl/ Short: https://is.gd/R6hZit 6. Title: Linux Action Show's Review of the newest XPS 13 Developer Edition laptop Author: Khaotic_Kernel Link: https://www.reddit.com/r/linux/comments/5uxacn/linux_action_shows_review_of_the_newest_xps_13/ Short: https://is.gd/LqSyAm 7. Title: GPD Pocket Crowdfunder Passes $1 Million Mark Author: raymii Link: https://www.reddit.com/r/linux/comments/5uyp51/gpd_pocket_crowdfunder_passes_1_million_mark/ Short: https://is.gd/jwKa1S 8. Title: Linode offering $5 1 GB VPS now. Also upped storage for 2 GB and net out min for all plans to 1000 Mbits Author: upvotes2doge Link: https://www.reddit.com/r/linux/comments/5uxuun/linode_offering_5_1_gb_vps_now_also_upped_storage/ Short: https://is.gd/6y0r4M 9. Title: Self X-post from r/linuxmint: Workaround for backlight control issues in Linux Mint 18.1 Cinnamon with nVidia drivers Author: ProlificAlias Link: https://www.reddit.com/r/linux/comments/5uy86f/self_xpost_from_rlinuxmint_workaround_for/ Short: https://is.gd/cTHvjQ 10. Title: Krita Update: Support for svg loading and improved vector tools is on it's way. Author: raghukamath Link: https://www.reddit.com/r/linux/comments/5usd62/krita_update_support_for_svg_loading_and_improved/ Short: https://is.gd/iXF9eS # List top article in r/technology $ ./reddit.py -n 1 -s technology 1. Title: Got a tech question or want to discuss tech? Weekly /r/Technology Tech Support / General Discussion Thread Author: AutoModerator Link: https://www.reddit.com/r/technology/comments/5upmjy/got_a_tech_question_or_want_to_discuss_tech/ Short: https://is.gd/QjYHpY # List top article in r/linux that contains the word linux in title $ ./reddit.py -n 1 'Linux' 1. Title: Delta uses Linux for their in flight entertainment Author: SaberHamLincoln Link: https://www.reddit.com/r/linux/comments/5uv5uh/delta_uses_linux_for_their_in_flight_entertainment/ Short: https://is.gd/EhxxUr # List top article in r/linux whose author has a number in it $ ./reddit.py -n 1 -f author '[0-9]' 1. Title: Linode offering $5 1 GB VPS now. Also upped storage for 2 GB and net out min for all plans to 1000 Mbits Author: upvotes2doge Link: https://www.reddit.com/r/linux/comments/5uxuun/linode_offering_5_1_gb_vps_now_also_upped_storage/ Short: https://is.gd/6y0r4M # Error out on invalid field $ ./reddit.py -f fake Invalid field: fake
Notice that in addition to displaying the Title
, Author
, and Link
for
each article, the script also lists a shorten form of the longer Link
.
To do this, the reddit.py
script creates a URL redirect via the is.gd
web service.
To fetch the JSON data, you should use the requests.get function on
the appropriate SUBREDDIT
URL.
To filter the articles, you will need to iterate through the appropriate JSON data structure corresponding to the articles.
For each article, you should check if the specified FIELD
is valid. If
not, you should report an error and exit the program with an error code.
Otherwise, you should use the [re.search] function to check of the REGEX
matches the FIELD
for the current article.
To generate a shortened URL, you will need to use the requests.get
function on 'http://is.gd/create.php' with
parameters format
set to json
and url
set to the URL you wish to
compress. This request will return a JSON object that contains the
shorturl
which you should display.
To verify the correctness of your reddit.py
script, you should try to the
examples above. Because the Reddit is an active and live website, we
cannot provide an automated way to test the output of your script.
README.md
In your README.md
, describe how you implemented the reddit.py
script. In
particular, briefly discuss:
How you parsed command line arguments.
How you fetched the JSON data and iterated over the articles.
How you filtered each article based on the FIELD
and REGEX
.
How you generated the shortened URL.
For extra credit, you are to use NetFile to Publish a Web Site website. Rather than manually writing HTML, you can use a static website generator such as:
Alternatively, you can cobble together your own website generator using scripts and something like Python-Markdown. For instance, the course website and the instructor's homepage are created using a Python script called yasb.py.
Once you have decide on the tool you wish to use and create some static HTML files, you will need to upload the files to NetFile using your favorite file transfer tool (ie. sftp, scp, or rsync).
The actual content of the website is up to you, but I recommend that you take this opportunity to perhaps create an online portfolio that you can share with family, friends, and possible employers or schools.
Here are some examples:
If you have any questions, comments, or concerns regarding the course, please
provide your feedback at the end of your README.md
.
To submit your assignment, please commit your work to the homework04
folder
in your assignments GitLab repository. Your homework04
folder should
only contain the following files:
README.md
blend.py
reddit.py