Setup
Visit this GitHub Classroom link to create a Git repository for you, and clone it to your computer. Initially, it contains the following files:
file | description |
gpt.py | wrapper around HuggingFace's implementation of GPT-2 |
hw5-starter-code.py | starter code (also available on Kaggle) |
storycloze-2018/short_context_data.txt | one-sentence story prompts |
storycloze-2018/long_context_data.txt | four-sentence story prompts |
Part 1: Setup and Greedy Decoding (12 points)
- Write code to read data from either of the story prompt files. Remember to strip new lines.1
- Implement the greedy decoding method.6
- Write code to step through each example of
short_context_data.txt
and save the outputs to a file. Include this file with your submission. Your code should generate 40 new tokens per example.3 (Running on long_context_data.txt
is helpful but not required in this part.)
- What looks good about the outputs from greedy decoding, and why? Explicitly tie your explanation to how the greedy decoding method works.1
- What looks bad about the outputs from greedy decoding, and why? Explicitly tie your explanation to how the greedy decoding method works.1
Part 2: Ancestral Sampling (6 points)
- Implement ancestral sampling.4
- Save the outputs for
short_context_data.txt
as in Part 1.1 (Running on long_context_data.txt
is helpful but not required in this part.)
- Discuss how the outputs from sampling differ from those obtained from greedy decoding. Explicitly tie your explanation to how the sampling method works in contrast to greedy search.1
Part 3: Top-$p$ Decoding (12 points)
- Implement top-$p$ decoding.5
- Discuss how the nature of the outputs differs as the value of $p$ is changed (try a range of values between 0 and 1 for $p$) for
short_context_data.txt
.1. Save the outputs for short_context_data.txt
for what you think the ideal value of $p$ is.1 (Relative comparison is much more important than pinpointing a specific value for $p$.)
- Discuss how the nature of the outputs differs as the value of $p$ is changed (try a range of values between 0 and 1 for $p$) for
long_context_data.txt
.1 Save the outputs for long_context_data.txt
for what you think the ideal value of $p$ is.1 (Relative comparison is much more important than pinpointing a specific value for $p$.) Is your chosen value for $p$ different for the long-context case? Why or why not?1
- Discuss how the outputs from top-$p$ decoding differ from those obtained from sampling. Explicitly tie your explanation to how top-$p$ works in contrast to sampling.1
- Do the outputs of top-$p$ ever resemble the outputs of greedy decoding? For which values of $p$?1
Submission
Please read these submission instructions carefully.
- Add and commit your submission files to the repository you created in the beginning. The repository should contain:
- All of the code that you wrote.
- Your outputs from Parts 1–3.
- A README.md file with
- instructions on how to build/run your code.
- Your responses to all of the instructions/questions in the assignment.
- To submit:
- Push your work to GitHub and create a release in GitHub by clicking on "Releases" on the right-hand side, then "Create a new release" or "Draft a new release". Fill in "Tag version" and "Release title" with the part number(s) you’re submitting and click "Publish Release".
- If you submit the same part more than once, the grader will grade the latest release for that part.
- For computing the late penalty, the submission time will be considered the commit time, not the release time.