Menu

CSE 40657/60657
Homework 5

Due
2017/12/06 11:55pm
Points
30

In 2005, a blog post went viral that showed a bootleg copy of Revenge of the Sith with its Chinese version translated (apparently by low-quality machine translation) back into English. Can you do better?

Clone the Homework 5 repository. It contains the following files:

train.zh-entraining data
train.alignalignments for training data
test.zhtest data, input sentences
test.entest data, reference translations
align-f1.pyevaluation script for alignment
bleu.pyevaluation script for translation
fst.pymodule for finite transducers (updated)
lm.pymodule for language models

The training data is Star Wars Episodes 1, 2, and 4–6. The test data is Episode 3. The alignments in train.align are not perfect; they are computed by a higher-quality alignment model (Model 4), but you'll treat them as a "silver" standard to evaluate your Model 1 alignments.

You may write code in any language you choose. You may reuse any code you've used in previous homework assignments, or even the solution or another student's code as long as you cite properly.

1. Word alignment

In this part, you'll implement IBM Model 1.

  1. The model awakens
    • Write code to read in train.zh-en. It contains one sentence pair per line, with the Chinese and English versions separated by a tab (\t). Both sides have been tokenized. Below, we assume that a fake word $\text{NULL}$ is prepended to every English sentence, so that $e_0 = \text{NULL}$.
    • Write code to create the data structure(s) to store the model parameters $t(f\mid e)$ and initialize them all to uniform.2 Important: You only need a $t(f \mid e)$ for every Chinese word $f$ and English word $e$ that occur in the same line. If $f$ and $e$ never occur in the same line, don't create $t(f \mid e)$. This will save you a lot of memory and time.
  2. Train the model you must
    • Write code to perform the E step.3 As described in the notes, for each sentence pair and for each Chinese position $j=1, \ldots, m$ and English position $i=0, \ldots, \ell$, do: $$c(f_j,e_i) \leftarrow c(f_j,e_i) + \frac{t(f_j \mid e_i)}{\sum_{i'=0}^\ell t(f_j \mid e_{i'})}$$
    • Write code to perform the M step.3 As described in the notes, for each Chinese word type $f$ and English word type $e$ (including NULL), do: $$t(f \mid e) \leftarrow \frac{c(f,e)}{\sum_{f'} c(f',e)}$$
    • Train the model on the training data. Report the total (natural) log-likelihood of the data after each pass through the data.1 $$ \begin{align*} \text{log-likelihood} &= \sum_{\text{$(\mathbf{f}, \mathbf{e})$ in data}} \log P(\mathbf{f} \mid \mathbf{e}) \\ P(\mathbf{f} \mid \mathbf{e}) &= \frac1{100} \times \prod_{j=1}^m \frac1{\ell+1} \left(\sum_{i=0}^\ell t(f_j \mid e_i)\right). \end{align*}$$ It should increase every time and eventually get better than -155000.3
    • After training, report $t(f \mid e)$ for the following (correct) word pairs:1
      English $e$ Chinese $f$
      jedi 绝地
      droid 机械人
      force 原力
      midi-chlorians 原虫
      yousa
  3. Test the model. Or test not. There is no try
    • Write code that, for each sentence pair of the training data, finds the best alignment according to the model, which is, for each $j$: $$\DeclareMathOperator*{\argmax}{arg~max} a_j = \argmax_{0 \leq i \leq \ell} t(f_j \mid e_i).$$ Print the (non-NULL) alignments in the same format as train.align.1
    • For example:
      0-1 2-0
      means that Chinese word 0 is aligned to English word 1, Chinese word 1 is unaligned (aligned to NULL), and Chinese word 2 is aligned to English word 0. (Note that in this format, positions are zero-based, unlike in the math notation elsewhere in this document.) Report your alignments for the first five lines of train.zh-en.1
    • Evaluate your alignments against train.align using the command
      python align-f1.py your_alignments train.align
      Report your F1 score. It should be at least 50%.3

2. Monotone decoding

True Model 1 decoding should search through all possible reorderings of the input sentence; since this is somewhat involved, we'll try something simpler, which is to translate Chinese sentences word-for-word without reordering.

  1. These were your father's transducers. Elegant weapons for a more civilized age.
    • Use the provided function lm.make_kneserney() to build a FST for a smoothed n-gram language model.1 Call this $M_{\text{LM}}$. Start with 2-grams; you can try higher-order models later. The states of $M_{\text{LM}}$ are tuples of various lengths, and there are $\varepsilon$-transitions from longer tuples to shorter tuples. So, later on when you do the Viterbi algorithm, you'll need to sort the states from longest to shortest. (I apologize that this is somewhat obscure, but it makes the LM a lot more efficient.)
    • Write code to build a FST for a translation model using the probabilities learned in Part 1.3 Call this $M_{\text{TM}}$. There should be just two states (a start state q0 and an accept state q1), and transitions:
      • A transition from q0 to q1 on symbol $\texttt{</s>:</s>}$
      • For each observed Chinese word type $f$, for each of its top 10 English translations $e$, a transition from q0 to q0 on symbol $f:e$ with probability $t(f \mid e)$. If $e = \text{NULL}$, make the transition output $\varepsilon$ (empty string).
      • We will handle unknown words simply by deleting them. For each Chinese word type $f$ in the test data, add a transition from q0 to q0 on symbol $f:\varepsilon$ with very small probability (like $10^{-100}$).
    • Write code that, given a Chinese sentence $\mathbf{f}$, does the following:3
      • Build a FST that maps $\mathbf{f}$ to itself. Call this $M_{\mathbf{f}}$.
      • Compose $M_{\mathbf{f}}$ with $M_{\text{TM}}$.
      • Compose that with $M_{\text{LM}}$.
      • Use the Viterbi algorithm to find the best translation. The states of the composed FST are of the form $((q_{\mathbf{f}}, q_{\text{TM}}), q_{\text{LM}})$. So the Viterbi algorithm should sort the states by $q_{\mathbf{f}}$, and in the case of a tie, by $-|q_{\text{LM}}|$ (that is, from longest to shortest LM state).
  2. Now witness the power of this fully operational translation system
    • Translate the first ten sentences of Episode 3 (test.zh). In your report, show the generated translations.1
    • The translations unfortunately aren't going to be very good. In your report, on what problems you see, and how they might be fixed.3
    • Translate the whole movie (this took us under 18 minutes; under 7 minutes using PyPy) and evaluate the accuracy of your translations using the BLEU metric:
      bleu.py your_translations episode3.en
      Report your score, which is between 0 and 1 (higher is better). Using the settings above, you should be able to get at least 0.01.3 To get the final point, try adjusting settings (reporting what you tried) to get the score up to 0.02.1

Submission

Please submit all of the following in a gzipped tar archive (.tar.gz or .tgz; not .zip or .rar) via Sakai: