CSE 40657/60657: Natural Language Processing

Setup

Visit this GitHub Classroom link to create a Git repository for you, and clone it to your computer. Initially, it contains the following files:

file	description
gpt.py	wrapper around HuggingFace's implementation of GPT-2
hw5-starter-code.py	starter code (also available on Kaggle)
storycloze-2018/short_context_data.txt	one-sentence story prompts
storycloze-2018/long_context_data.txt	four-sentence story prompts

Write code to read data from either of the story prompt files. Remember to strip new lines.1
Implement the greedy decoding method.6
Write code to step through each example of short_context_data.txt and save the outputs to a file. Include this file with your submission. Your code should generate 40 new tokens per example.3 (Running on long_context_data.txt is helpful but not required in this part.)
What looks good about the outputs from greedy decoding, and why? Explicitly tie your explanation to how the greedy decoding method works.1
What looks bad about the outputs from greedy decoding, and why? Explicitly tie your explanation to how the greedy decoding method works.1

Implement ancestral sampling.4
Save the outputs for short_context_data.txt as in Part 1.1 (Running on long_context_data.txt is helpful but not required in this part.)
Discuss how the outputs from sampling differ from those obtained from greedy decoding. Explicitly tie your explanation to how the sampling method works in contrast to greedy search.1

Implement top-$p$ decoding.5
Discuss how the nature of the outputs differs as the value of $p$ is changed (try a range of values between 0 and 1 for $p$) for short_context_data.txt.1. Save the outputs for short_context_data.txt for what you think the ideal value of $p$ is.1 (Relative comparison is much more important than pinpointing a specific value for $p$.)
Discuss how the nature of the outputs differs as the value of $p$ is changed (try a range of values between 0 and 1 for $p$) for long_context_data.txt.1 Save the outputs for long_context_data.txt for what you think the ideal value of $p$ is.1 (Relative comparison is much more important than pinpointing a specific value for $p$.) Is your chosen value for $p$ different for the long-context case? Why or why not?1
Discuss how the outputs from top-$p$ decoding differ from those obtained from sampling. Explicitly tie your explanation to how top-$p$ works in contrast to sampling.1
Do the outputs of top-$p$ ever resemble the outputs of greedy decoding? For which values of $p$?1

Please read these submission instructions carefully.

Add and commit your submission files to the repository you created in the beginning. The repository should contain:
- All of the code that you wrote.
- Your outputs from Parts 1–3.
- A README.md file with
  - instructions on how to build/run your code.
  - Your responses to all of the instructions/questions in the assignment.
To submit:
- Push your work to GitHub and create a release in GitHub by clicking on "Releases" on the right-hand side, then "Create a new release" or "Draft a new release". Fill in "Tag version" and "Release title" with the part number(s) you’re submitting and click "Publish Release".
- If you submit the same part more than once, the grader will grade the latest release for that part.
- For computing the late penalty, the submission time will be considered the commit time, not the release time.