In this assignment you will build a semantic parser that translates natural-language queries of a geography database into SQL queries.
Whenever the instructions below say to "report" something, it should be reported in the README.md file that you submit.
train.eng-sql | Training data |
dev.eng-sql | Development data |
dev.eng | Development data (English side) |
dev.sql | Development data (SQL side) |
test.eng-sql | Test data |
test.eng | Test data (English side) |
test.sql | Test data (SQL side) |
accuracy.py | Compute accuracy |
accuracy.py
to measure the accuracy on the test set.In this part, we'll add a copy mechanism to the model, as described in the notes. Edited on 4/14 to make notation conform to notes, and to correct one mistake.
<COPY>
to the target vocabulary (evocab
).3 When the model outputs <COPY>
, it copies one of the source words instead of choosing a word from the target vocabulary.Decoder.step
currently returns two tensors, an output vector (o
) and a hidden vector (h
). Modify it to also return the tensor of (cross-)attention weights $\alpha$:5
\begin{align}
\mathbf{H} &\in \mathbb{R}^{n \times d} \\
\mathbf{g}^{(i)} &\in \mathbb{R}^d \\
\alpha &\in \mathbb{R}^n \\
\alpha &= \operatorname{softmax} (\mathbf{H} \, \mathbf{g}^{(i)})
\end{align}
where $\mathbf{H}$ is the matrix of source encodings (called fencs
in the HW2 solutions) and $\mathbf{g}^{(i)}$ is the current target encoding (called o
in attention.py, line 121–122, and a
in transformer.py).
Model.logprob
computes the log-probability of a word as
\[ P(e) = \mathbf{p}^{(i)}_e \]
which is equivalent to exp(o[enum])
in the HW2 solutions. (I really apologize for the inconsistent choice of variable names.)
Modify it to:5
\[ P(e) = \mathbf{p}^{(i)}_e + \sum_{\substack{j=1, \ldots, n \\ f_j = e}} \mathbf{p}^{(i)}_{\texttt{<COPY>}} \alpha_j. \]<COPY>
in the sample translations it prints?1Model.translate
. This is a greedy algorithm, meaning that at each step it chooses the best action.
Model.translate
, there is a line
for i in range(100):
which limits the output length to at most 100 words. Adjust this to something more appropriate for SQL queries3 -- I used $5|\texttt{fwords}|+10$.Model.translate
to handle <COPY>
.5 If the chosen target word (eword
) is <COPY>
, then it should be changed to the source word $\operatorname{argmax}_j \alpha_j$. Be sure to update enum
as well, by numberizing eword
.
Please read these submission instructions carefully.
git tag -a part1
, git tag -a part2
, etc. If you make the final submission late, we'll use these tags to compute the per-part late penalty. (You can also create the tags after the fact, with git tag -a part1 abc123
, where abc123
is the commit's checksum.)git push --tags origin HEAD
).