In this assignment you will build a simple system for answering questions based on passages from Wikipedia.
Whenever the instructions below say to "report" something, it should be reported in the README.md file that you submit.
minitrain.txt | Small training data |
train.txt | Training data |
dev.txt | Development data |
eval.py | Evaluation script |
\t
)
architec@@ turally , the school has a catholic character . at@@ op the
main building 's gold d@@ ome is a golden statue of the virgin mary
. immediately in front of the main building and facing it , is a
copper statue of christ with arms up@@ raised with the legend " ven@@
ite ad me om@@ nes " . next to the main building is the basilica of
the sacred heart . immediately behind the basilica is the gro@@ t@@ to
, a mar@@ ian place of prayer and refl@@ ection . it is a repl@@ ica
of the gro@@ t@@ to at l@@ our@@ des , france where the virgin mary
repu@@ tedly appeared to saint ber@@ na@@ de@@ tte sou@@ bi@@ rou@@ s
in 185@@ 8 . at the end of the main drive ( and in a direct line that
connects through 3 stat@@ ues and the gold d@@ ome ) , is a simple ,
modern stone statue of mary . \t to whom did the virgin mary alle@@ ge@@
dly appear in 185@@ 8 in l@@ our@@ des france ? \t 118-127
The context is a passage from a Wikipedia article and the question is guaranteed to have an answer that can be found in the passage. The answers are space-separated ranges of integers. Here, there's just one answer, 118-127
, which means that the answer starts at token 118 using zero-based indexing and ends before token 127, again using zero-based indexing. (In other words, it's just like a Python slice 118:127
.)
<BOS> to whom did the virgin mary ... <SEP> architec@@ turally , the school has a catholic character ... <EOS>
And construct a set of "target" strings, one for each answer:
<BOS> saint ber@@ na@@ de@@ tte sou@@ bi@@ rou@@ s <EOS>
transformer.py
from the HW2 solution, write code to train a model to "translate" questions+contexts into answers.2 If there is more than one answer, repeat the question+context for each answer.minitrain.txt
as both the training and development set, which the model ought to be able to learn (near-)perfectly. After about 50 epochs, the perplexity (ppl) on the development set should be less than 1.2.1minitrain.txt
.1eval.py
to measure the F1 score of your answers, and report the score, which should be at least 95%.1Model.translate
function with a new function that guesses the $i$ and $j$ (such that $i \leq j$) that minimize $L$ above.2minitrain.txt
.1train.txt
) and answer the questions in the development data (dev.txt
), which is just the first 1000 lines of the full development data.dev.txt
.1Please read these submission instructions carefully.
git tag -a part1
, git tag -a part2
, etc. If you make the final submission late, we'll use these tags to compute the per-part late penalty. (You can also create the tags after the fact, with git tag -a part1 abc123
, where abc123
is the commit's checksum.)git push --tags origin HEAD
).