CSE 40657/60657
Homework 2

2021/03/12 5pm

In 2005, a blog post went viral that showed a bootleg copy of Star Wars: Episode III – Revenge of the Sith with its Chinese version translated (apparently by low-quality machine translation) back into an English movie called Star War: The Third Gathers – The Backstroke of the West. Can you do better?

Visit this GitHub Classroom link to create a Git repository for you, and clone it to your computer. Initially, it contains the following files:

data/train.*training data
data/dev.*development data
data/test.*test data
bleu.pyevaluation script for translation
layers.pysome useful neural network layers
model2.pydirect IBM Model 2

The training data is Star Wars Episodes 1, 2, and 4–6. The dev and test data are Episode 3. For each dataset, there are three or four files: suffix zh-en means Chinese-English parallel text (tab-separated), zh means Chinese, en means (correct) English, and backstroke.en means the English of Backstroke of the West.

As distributed, implements the following model, which is a variant of direct IBM Model 2 with the $t$ and $a$ tables factored into smaller matrices: \begin{align} P(\mathbf{e} \mid \mathbf{f}) &= \prod_{i=1}^m \sum_{j=1}^n a(j \mid i) \, t(e_i \mid f_j) \\ t(e_i \mid f_j) &= \left[\operatorname{softmax} \mathbf{U} \mathbf{V}_{f_j} \right]_{e_i}. \\ a(j \mid i) &= \left[\operatorname{softmax} \mathbf{K} \mathbf{Q}_{i} \right]_j \\ \mathbf{K}_j &= \mathbf{\bar{K}}_j & j &= 1, \ldots, n \\ \mathbf{Q}_i &= \mathbf{\bar{Q}}_i & i &= 1, \ldots, m \end{align} where $\mathbf{\bar{K}}_j$ and $\mathbf{\bar{Q}}_i$ (for $i,j = 1, \ldots, 100$) are learnable parameter vectors that can be thought of as embeddings of the numbers 1 to 100.

In this assignment, you'll improve this model so that it translates better than Backstroke of the West. (Although, because the training data is so small, this is a challenging task, and it probably won't be that much better.)

You may reuse any code you've used in previous homework assignments, or even the solution or another student's code as long as you cite properly.

1. These are your first steps

  1. Run python data/test.backstroke.en data/test.en to compute a BLEU score for Backstroke of the West (higher is better, 1.0 is the maximum).1
  2. Run python --train data/train.zh-en --dev data/dev.zh-en data/test.zh -o test.model2.en, which trains direct IBM Model 2 and then translates the test set into the file test.model2.en.1
  3. Run python test.model2.en data/test.en to compute a BLEU score for Model 2. Report the score.1
  4. After each epoch, the training prints out the translation of the first ten sentences of the dev set. Write down some observations about what's wrong with the translations.3 (Note: The badness of these translations isn't the fault of Brown et al., who never intended Model 2 to be used directly for translation, as we are doing here.)

2. I've made a lot of special modifications myself

  1. Modify model2.Decoder.step() so that the weighted average is inside the softmax, as in equation (3.33).5
  2. Run the model and recompute the BLEU score.1 The BLEU score probably didn't get any better, but write down any qualitative differences you see in the dev translations.3
  3. To get a sense of whether the modifiction is correct, the train perplexity should get better than 55 and the dev perplexity better than 120. But I neglected to make this requirement.0

3. Now witness the power of this fully operational translation system

  1. Modify the model further to improve its translations.7 See Section 3.6 for some recipes for how to do this. Most of your modifications will be in the Encoder and Decoder classes. You may use any of the classes/functions provided in, many of which have tricks tailored for small training data.
  2. The Git repository has an updated version of which makes the following changes:
    • Removes scaling from attention(), which was hurting rather than helping.
    • Adds a new ResidualTanhLayer module. If you're implementing a Transformer, I recommend using this for equations 3.72 and 3.75.
  3. Train the model. Record the trainer's output1 and save the model using the --save option.1
  4. Run the model and report the BLEU score. To get full credit, your score must be better than Backstroke of the West's.3 Write down some observations about what did or didn't improve.3


Please read these submission instructions carefully. We made some changes from HW1 that should streamline the process for both you and us.

  1. Add and commit your submission files to the repository you created in the beginning. The repository should contain:
    • All of the code that you wrote.
    • Your final saved model from Part 3.
    • A file with
      • instructions on how to build/run your code.
      • Your responses to all of the instructions/questions in the assignment.
  2. After you complete each part, create a commit and tag it with git tag -a part1, git tag -a part2, etc. If you make the final submission late, we'll use these tags to compute the per-part late penalty. (You can also create the tags after the fact, with git tag -a part1 abc123, where abc123 is the commit's checksum.)
  3. Push your repository and its tags to GitHub (git push --tags origin HEAD).
  4. Submit your repository to Gradescope under assignment HW2. If you submit multiple times, the most recent submission will be graded. If you make changes to your repository after submission, you must resubmit if you want us to see and grade your changes.