# CSE 40657/60657 Homework 4

Due
Mon 2021/11/22 11:59pm
Points
30

In this assignment, you’ll implement a sequence labeling model and train it to do slot-filling for a personal assistant for finding and making reservations at restaurants, hotels, and trains, and finding other things as well.

Whenever the instructions below say to "report" something, it should be reported in the README.md file that you submit.

## 1. Setup

1. Visit this GitHub Classroom link to create a Git repository for you, and clone it to your computer. Initially, it contains the following files:
file description
data/train training data
data/dev development/validation data
data/dev.words development/validation data (words only)
data/test test data
data/test.words test data (words only)
layers.py possibly useful neural network layers
seqlabel.py compute F1 score of sequence labels
The data files have one sentence per line. In train, valid, and test, each space-separated token is of the form word:label.
2. Write code to read in the training data. Please report how many unique labels are in the data (not including any special labels that you add for BOS or EOS).2
3. Write code to replace all words seen only once in the training data with UNK.1

## 2. RNN

In this part, you’ll implement a sequence labeler that is just an RNN. Since there is no CRF layer on top, it predicts labels independently of one another. Added equations: If we have a single sentence $w=w_1 \cdots w_n$ with correct labels $\mathcal{X} = X_1 \cdots X_n$, we compute: \begin{align} \mathbf{v}^{(t)} &= \text{Embedding}^{\fbox{1}}(w_t) & t &= 1, \ldots, n \\ \mathbf{G} &= \text{RNN}^{\fbox{2}}([\mathbf{v}^{(1)} \cdots \mathbf{v}^{(n)}]^\top) \\ \mathbf{H} &= \text{RNN}^{\fbox{3}}(\mathbf{G}) \\ \mathbf{y}^{(t)} &= \text{SoftmaxLayer}^{\fbox{4}}(\mathbf{H}_{t}) & t &= 1, \ldots, n \end{align} Then, for this single sentence, the loss function to be minimized is $-\sum_{t=1}^{n} \left[\log \mathbf{y}^{(t)}\right]_{X_t}$.

1. Implement an RNN encoder only (no CRF).5 For each word, use a softmax layer (linear transformation followed by softmax) to predict a label for that word.1 For us, the following configuration worked well: a 2-layer RNN with 200 dimensions per layer, using the Adam optimizer with a learning rate of 0.001.
2. Implement a labeler that just guesses, for each word, the label with the highest probability for that word.2
3. Train this model on the training data. After each epoch, label the dev data, report the F1 score (using the function seqlabel.compute_f1), and save your model.1 Label the test data and report your test F1. For full credit, your test F1 should be at least 82%.2
4. Examine the dev outputs and write down any observations you have (e.g., common types of failure cases, conjectures on why some sentences are easier to label than others).2

## 3. RNN+CRF

In this part, you’ll add a CRF to your model, which enables it to model dependencies between the labels.

1. Add a CRF after your RNN encoder, as described in the notes (taking out the softmax layer).6 We recommend implementing all the speedups described in the notes, especially under "Vectorization."
2. Implement a labeler that predicts the highest-scoring label sequence.3
3. Train this model on the training data. Budget time for training -- depending on the implementation, it could take about 15 minutes per epoch, and you will probably need to train for 5 or more epochs. After each epoch, label the dev data, report the F1 score, and save your model.1 Label the test data and report your test F1. For full credit, your test F1 should be at least 84%.2
4. Examine the dev outputs and write down any observations you have (e.g., How did adding the CRF help? Have the common types of failure cases changed from Part 2?).2