In 2005, a blog post went viral that showed a bootleg copy of Star Wars: Episode III – Revenge of the Sith with its Chinese version translated (apparently by low-quality machine translation) back into an English movie called Star War: The Third Gathers – The Backstroke of the West. Can you do better?
Visit this GitHub Classroom link to create a Git repository for you, and clone it to your computer. Initially, it contains the following files:
data/train.* | training data |
data/dev.* | development data |
data/test.* | test data |
bleu.py | evaluation script for translation |
layers.py | some useful neural network layers |
model2.py | direct IBM Model 2 |
The training data is Star Wars Episodes 1, 2, and 4–6. The dev and test data are Episode 3. For each dataset, there are three or four files: suffix zh-en
means Chinese-English parallel text (tab-separated), zh
means Chinese, en
means (correct) English, and backstroke.en
means the English of Backstroke of the West.
As distributed, model2.py
implements the following model, which is a variant of direct IBM Model 2 with the $t$ and $a$ tables factored into smaller matrices:
\begin{align}
P(\mathbf{e} \mid \mathbf{f}) &= \prod_{i=1}^m \sum_{j=1}^n a(j \mid i) \, t(e_i \mid f_j) \\
t(e_i \mid f_j) &= \left[\operatorname{softmax} \mathbf{U} \mathbf{V}_{f_j} \right]_{e_i}. \\
a(j \mid i) &= \left[\operatorname{softmax} \mathbf{K} \mathbf{Q}_{i} \right]_j \\
\mathbf{K}_j &= \mathbf{\bar{K}}_j & j &= 1, \ldots, n \\
\mathbf{Q}_i &= \mathbf{\bar{Q}}_i & i &= 1, \ldots, m
\end{align}
where $\mathbf{\bar{K}}_j$ and $\mathbf{\bar{Q}}_i$ (for $i,j = 1, \ldots, 100$) are learnable parameter vectors that can be thought of as embeddings of the numbers 1 to 100.
In this assignment, you'll improve this model so that it translates better than Backstroke of the West. (Although, because the training data is so small, this is a challenging task, and it probably won't be that much better.)
You may reuse any code you've used in previous homework assignments, or even the solution or another student's code as long as you cite properly.
python bleu.py data/test.backstroke.en data/test.en
to compute a BLEU score for Backstroke of the West (higher is better, 1.0 is the maximum).1python model2.py --train data/train.zh-en --dev data/dev.zh-en data/test.zh -o test.model2.en
, which trains direct IBM Model 2 and then translates the test set into the file test.model2.en
.1python bleu.py test.model2.en data/test.en
to compute a BLEU score for Model 2. Report the score.1model2.Decoder.step()
so that the weighted average is inside the softmax, as in equation (3.33).5Encoder
and Decoder
classes. You may use any of the classes/functions provided in layers.py
, many of which have tricks tailored for small training data.attention()
, which was hurting rather than helping.ResidualTanhLayer
module. If you're implementing a Transformer, I recommend using this for equations 3.72 and 3.75.--save
option.1Please read these submission instructions carefully. We made some changes from HW1 that should streamline the process for both you and us.
git tag -a part1
, git tag -a part2
, etc. If you make the final submission late, we'll use these tags to compute the per-part late penalty. (You can also create the tags after the fact, with git tag -a part1 abc123
, where abc123
is the commit's checksum.)git push --tags origin HEAD
).