CSE 40657/60657: Natural Language Processing

Whenever the instructions below say to "report" something, it should be reported in the README.md file that you submit.

1. Baseline Model

Visit this GitHub Classroom link to create a Git repository for you, and clone it to your computer. Initially, it contains the following files:

`train.eng-sql`	Training data
`dev.eng-sql`	Development data
`dev.eng`	Development data (English side)
`dev.sql`	Development data (SQL side)
`test.eng-sql`	Test data
`test.eng`	Test data (English side)
`test.sql`	Test data (SQL side)
`accuracy.py`	Compute accuracy

Use the HW2 machine translation system (either your solution or one of the official solutions) to train on the training data, validate on the dev data, and translate the test data.
Run accuracy.py to measure the accuracy on the test set.
I saw a lot of variance across runs; please repeat the above steps a total of ten times and report the mean and standard deviation.3 The mean should be at least 30%.1

2. Modify the Model

In this part, we'll add a copy mechanism to the model, as described in the notes. Edited on 4/14 to make notation conform to notes, and to correct one mistake.

Add a special symbol <COPY> to the target vocabulary (evocab).3 When the model outputs <COPY>, it copies one of the source words instead of choosing a word from the target vocabulary.
Decoder.step currently returns two tensors, an output vector (o) and a hidden vector (h). Modify it to also return the tensor of (cross-)attention weights $\alpha$:5 \begin{align} \mathbf{H} &\in \mathbb{R}^{n \times d} \\ \mathbf{g}^{(i)} &\in \mathbb{R}^d \\ \alpha &\in \mathbb{R}^n \\ \alpha &= \operatorname{softmax} (\mathbf{H} \, \mathbf{g}^{(i)}) \end{align} where $\mathbf{H}$ is the matrix of source encodings (called fencs in the HW2 solutions) and $\mathbf{g}^{(i)}$ is the current target encoding (called o in attention.py, line 121–122, and a in transformer.py).
Currently Model.logprob computes the log-probability of a word as \[ P(e) = \mathbf{p}^{(i)}_e \] which is equivalent to exp(o[enum]) in the HW2 solutions. (I really apologize for the inconsistent choice of variable names.) Modify it to:5 \[ P(e) = \mathbf{p}^{(i)}_e + \sum_{\substack{j=1, \ldots, n \\ f_j = e}} \mathbf{p}^{(i)}_{\texttt{<COPY>}} \alpha_j. \]
Rerun the trainer and include the trainer output in your report. Do you see <COPY> in the sample translations it prints?1

3. Modify the Decoder

In this part, you'll modify the decoder in Model.translate. This is a greedy algorithm, meaning that at each step it chooses the best action.

In Model.translate, there is a line
```
for i in range(100):
```
which limits the output length to at most 100 words. Adjust this to something more appropriate for SQL queries3 -- I used $5|\texttt{fwords}|+10$.
Modify Model.translate to handle <COPY>.5 If the chosen target word (eword) is <COPY>, then it should be changed to the source word $\operatorname{argmax}_j \alpha_j$. Be sure to update enum as well, by numberizing eword.
Translate the test set and compute the accuracy. Repeat training and translation a total of ten times and report the average and standard deviation;3 the average should be at least 35%.1

Please read these submission instructions carefully.

Add and commit your submission files to the repository you created in the beginning. The repository should contain:
- All of the code that you wrote.
- Your model and outputs from all three parts.
- A README.md file with
  - instructions on how to build/run your code.
  - Your responses to all of the instructions/questions in the assignment.
After you complete each part, create a commit and tag it with git tag -a part1, git tag -a part2, etc. If you make the final submission late, we'll use these tags to compute the per-part late penalty. (You can also create the tags after the fact, with git tag -a part1 abc123, where abc123 is the commit's checksum.)
Push your repository and its tags to GitHub (git push --tags origin HEAD).
Submit your repository to Gradescope under assignment HW4. If you submit multiple times, the most recent submission will be graded. If you make changes to your repository after submission, you must resubmit if you want us to see and grade your changes.

CSE 40657/60657 Homework 4

1. Baseline Model

2. Modify the Model

3. Modify the Decoder

CSE 40657/60657
Homework 4