CSE 40657/60657 Homework 4

Due
Fri 2021/04/23 5pm
Points
30

In this assignment you will build a semantic parser that translates natural-language queries of a geography database into SQL queries.

Whenever the instructions below say to "report" something, it should be reported in the README.md file that you submit.

1. Baseline Model

1. Visit this GitHub Classroom link to create a Git repository for you, and clone it to your computer. Initially, it contains the following files:
 train.eng-sql Training data dev.eng-sql Development data dev.eng Development data (English side) dev.sql Development data (SQL side) test.eng-sql Test data test.eng Test data (English side) test.sql Test data (SQL side) accuracy.py Compute accuracy
2. Use the HW2 machine translation system (either your solution or one of the official solutions) to train on the training data, validate on the dev data, and translate the test data.
3. Run accuracy.py to measure the accuracy on the test set.
4. I saw a lot of variance across runs; please repeat the above steps a total of ten times and report the mean and standard deviation.3 The mean should be at least 30%.1

2. Modify the Model

In this part, we'll add a copy mechanism to the model, as described in the notes. Edited on 4/14 to make notation conform to notes, and to correct one mistake.

1. Add a special symbol <COPY> to the target vocabulary (evocab).3 When the model outputs <COPY>, it copies one of the source words instead of choosing a word from the target vocabulary.
2. Decoder.step currently returns two tensors, an output vector (o) and a hidden vector (h). Modify it to also return the tensor of (cross-)attention weights $\alpha$:5 \begin{align} \mathbf{H} &\in \mathbb{R}^{n \times d} \\ \mathbf{g}^{(i)} &\in \mathbb{R}^d \\ \alpha &\in \mathbb{R}^n \\ \alpha &= \operatorname{softmax} (\mathbf{H} \, \mathbf{g}^{(i)}) \end{align} where $\mathbf{H}$ is the matrix of source encodings (called fencs in the HW2 solutions) and $\mathbf{g}^{(i)}$ is the current target encoding (called o in attention.py, line 121–122, and a in transformer.py).
3. Currently Model.logprob computes the log-probability of a word as $P(e) = \mathbf{p}^{(i)}_e$ which is equivalent to exp(o[enum]) in the HW2 solutions. (I really apologize for the inconsistent choice of variable names.) Modify it to:5 $P(e) = \mathbf{p}^{(i)}_e + \sum_{\substack{j=1, \ldots, n \\ f_j = e}} \mathbf{p}^{(i)}_{\texttt{<COPY>}} \alpha_j.$
4. Rerun the trainer and include the trainer output in your report. Do you see <COPY> in the sample translations it prints?1

3. Modify the Decoder

In this part, you'll modify the decoder in Model.translate. This is a greedy algorithm, meaning that at each step it chooses the best action.
1. In Model.translate, there is a line
for i in range(100):

which limits the output length to at most 100 words. Adjust this to something more appropriate for SQL queries3 -- I used $5|\texttt{fwords}|+10$.
2. Modify Model.translate to handle <COPY>.5 If the chosen target word (eword) is <COPY>, then it should be changed to the source word $\operatorname{argmax}_j \alpha_j$. Be sure to update enum as well, by numberizing eword.
3. Translate the test set and compute the accuracy. Repeat training and translation a total of ten times and report the average and standard deviation;3 the average should be at least 35%.1