CSE 40657/60657: Natural Language Processing

Whenever the instructions below say to "report" something, it should be reported in the README.md file that you submit.

Setup

Visit this GitHub Classroom link to create a Git repository for you, and clone it to your computer. Initially, it contains the following files:

file	description
data.part{1,2,3}/train	training data
data.part{1,2,3}/dev	development/validation data
data.part{1,2,3}/test	test data
layers.py	possibly useful neural network layers
frames.py	possibly useful code for frames
score_frames.py	compute various accuracies for frames
score_types.py	compute various accuracies for frame types
bleu.py	compute BLEU scores
backend.py	database backend
tokenizer.py	very simple tokenizer

The datasets come in several flavors:

.words: just words
.bio: BIO tagged (same format as HW4)
.types: just frame types
.frames: just frames
.delex: delexicalized

Part 1 (10 points)

Implement a dialogue act classifier.5 The input is a sequence of words and the output is a dialogue act. You can choose any kind of classifier you want. Please report what kind of model you used.1
The format of the files train, dev, and test in data.part1 is:
```
I'm going from Cambridge to the Stansted Airport . \t find_train
```
where \t is a tab. In this example, I'm going from Cambridge to the Stansted Airport . are the input words and find_train is the correct output dialogue act. Train the classifier on the training data. Report your accuracy (at each epoch, if there are epochs) on the dev set, which should reach at least 80%.1 You can use score_types.py to compute accuracy, or you can just do it yourself.
When you've finalized your model, run it on the test set and report the accuracy, which should be at least 80%.3

Part 2 (10 points)

Integrate your dialogue act classifier with your slot filler from HW4 (or you can use the official HW4 solution if you prefer).3 When you call your slot filler, be sure to prepend the predicted dialogue act, as the slot filler expects. The input is a sequence of words and the output is a frame. For example, if the input is
```
I'm going from Cambridge to the Stansted Airport .
```
then the correct output would be
```
find_train ( train-departure = Cambridge ; train-destination = Stansted Airport )
```
The spaces are required (they'll make your life easier in Part 3). You can use the frames.Frame class, which knows how to print itself in the right format.
The format of the files train, dev, and test in data.part2 is
```
I'm going from Cambridge to the Stansted Airport . \t find_train ( train-departure = Cambridge ; train-destination = Stansted Airport )
```
Run your combined natural language "understanding" system on the dev and test data and report your score on both. To compute an accuracy score, run
```
python3 score_frames.py <your-dev-output> part2/dev.frames
python3 score_frames.py <your-test-output> part2/test.frames
```
The frame type accuracy score corresponds to the classification accuracy from Part 1. The argument F1 is analogous to the F1 score from HW4. The exact match score is simply the percentage of frames that are exactly correct. This score should be at least 68%1 on dev and 70% on test.3
The module backend.py exposes a function backend.backend(q). The argument q is a Frame object representing a user query, and the return value is a list of Frame objects representing the computer's response. You can run backend.py from the command line to understand how it works. Here are some queries you can try:
```
python backend.py "find_restaurant ( restaurant-food = italian ; restaurant-pricerange = cheap )"
python backend.py "find_attraction ( attraction-pricerange = free ; attraction-type = museum ; attraction-area = east )"
python backend.py "find_hotel ( hotel-area = centre ; hotel-pricerange = expensive )"
python backend.py "find_train ( train-departure = Cambridge ; train-destination = Stansted Airport ; train-day = sunday ; train-arriveby = 11:00)"
```
Integrate your system with this backend to make an interactive system.3 Use input() to read a line of text from the user, tokenizer.tokenize to tokenize it, your system to convert it to a frame, and backend.backend to generate results.

Part 3 (10 points)

The file part3/train.delex contains delexicalized frames and delexicalized strings for the kinds of responses we want the computer to be able to generate. Train your machine translation system from HW2 (or the official HW2 solution) on this data to translate delexicalized frames to delexicalized strings.1
Write code to delexicalize a frame, run your MT system, and relexicalize the response.3 Run it on part3/dev.frames and part3/test.frames. Compute the BLEU score against part3/dev.words and part3/test.words, respectively. The BLEU score should be at least 17% on both dev1 and test.3
Integrate your Part 2 and your MT system to form a complete interactive dialogue system! When the backend returns multiple results, just choose one randomly. There's no evaluation for this step. Please try typing in the following queries and report your system outputs.2
```
Could you recommend me an expensive hotel in the centre?
I would like to know of a museum that is free to attend.
Do you know of a cheap italian food restaurant?
I need a train from Cambridge to Stansted Airport arriving by 11:00 on Sunday.
```

Submission

Please read these submission instructions carefully.

Add and commit your submission files to the repository you created in the beginning. The repository should contain:
- All of the code that you wrote.
- Your final model and outputs from Parts 1 and 3.
- A README.md file with
  - instructions on how to build/run your code.
  - Your responses to all of the instructions/questions in the assignment.
To submit:
- Push your work to GitHub and create a release in GitHub by clicking on "Releases" on the right-hand side, then "Create a new release" or "Draft a new release". Fill in "Tag version" and "Release title" with the part number(s) you’re submitting and click "Publish Release".
- If you submit the same part more than once, the grader will grade the latest release for that part.
- For computing the late penalty, the submission time will be considered the commit time, not the release time.

CSE 40657/60657 Homework 5

Setup

Part 1 (10 points)

Part 2 (10 points)

Part 3 (10 points)

Submission

CSE 40657/60657
Homework 5