CSE 40657/60657
Homework 5

Mon 2021/12/06 5pm

In this assignment, you'll implement a dialogue act classifier, combining it with the labeling model from HW4 to perform slot-filling and generation to make a complete dialogue system.

Whenever the instructions below say to "report" something, it should be reported in the file that you submit.


Visit this GitHub Classroom link to create a Git repository for you, and clone it to your computer. Initially, it contains the following files:

file description
data.part{1,2,3}/train training data
data.part{1,2,3}/dev development/validation data
data.part{1,2,3}/test test data possibly useful neural network layers possibly useful code for frames compute various accuracies for frames compute various accuracies for frame types compute BLEU scores database backend very simple tokenizer
The datasets come in several flavors:

Part 1 (10 points)

  1. Implement a dialogue act classifier.5 The input is a sequence of words and the output is a dialogue act. You can choose any kind of classifier you want. Please report what kind of model you used.1
  2. The format of the files train, dev, and test in data.part1 is:
    I'm going from Cambridge to the Stansted Airport . \t find_train
    where \t is a tab. In this example, I'm going from Cambridge to the Stansted Airport . are the input words and find_train is the correct output dialogue act. Train the classifier on the training data. Report your accuracy (at each epoch, if there are epochs) on the dev set, which should reach at least 80%.1 You can use to compute accuracy, or you can just do it yourself.
  3. When you've finalized your model, run it on the test set and report the accuracy, which should be at least 80%.3

Part 2 (10 points)

  1. Integrate your dialogue act classifier with your slot filler from HW4 (or you can use the official HW4 solution if you prefer).3 When you call your slot filler, be sure to prepend the predicted dialogue act, as the slot filler expects. The input is a sequence of words and the output is a frame. For example, if the input is
    I'm going from Cambridge to the Stansted Airport .
    then the correct output would be
    find_train ( train-departure = Cambridge ; train-destination = Stansted Airport )
    The spaces are required (they'll make your life easier in Part 3). You can use the frames.Frame class, which knows how to print itself in the right format.
  2. The format of the files train, dev, and test in data.part2 is
    I'm going from Cambridge to the Stansted Airport . \t find_train ( train-departure = Cambridge ; train-destination = Stansted Airport )
    Run your combined natural language "understanding" system on the dev and test data and report your score on both. To compute an accuracy score, run
    python3 <your-dev-output> part2/dev.frames
    python3 <your-test-output> part2/test.frames
    The frame type accuracy score corresponds to the classification accuracy from Part 1. The argument F1 is analogous to the F1 score from HW4. The exact match score is simply the percentage of frames that are exactly correct. This score should be at least 68%1 on dev and 70% on test.3
  3. The module exposes a function backend.backend(q). The argument q is a Frame object representing a user query, and the return value is a list of Frame objects representing the computer's response. You can run from the command line to understand how it works. Here are some queries you can try:
    python "find_restaurant ( restaurant-food = italian ; restaurant-pricerange = cheap )"
    python "find_attraction ( attraction-pricerange = free ; attraction-type = museum ; attraction-area = east )"
    python "find_hotel ( hotel-area = centre ; hotel-pricerange = expensive )"
    python "find_train ( train-departure = Cambridge ; train-destination = Stansted Airport ; train-day = sunday ; train-arriveby = 11:00)"
    Integrate your system with this backend to make an interactive system.3 Use input() to read a line of text from the user, tokenizer.tokenize to tokenize it, your system to convert it to a frame, and backend.backend to generate results.
  4. Part 3 (10 points)

    1. The file part3/train.delex contains delexicalized frames and delexicalized strings for the kinds of responses we want the computer to be able to generate. Train your machine translation system from HW2 (or the official HW2 solution) on this data to translate delexicalized frames to delexicalized strings.1
    2. Write code to delexicalize a frame, run your MT system, and relexicalize the response.3 Run it on part3/dev.frames and part3/test.frames. Compute the BLEU score against part3/dev.words and part3/test.words, respectively. The BLEU score should be at least 17% on both dev1 and test.3
    3. Integrate your Part 2 and your MT system to form a complete interactive dialogue system! When the backend returns multiple results, just choose one randomly. There's no evaluation for this step. Please try typing in the following queries and report your system outputs.2
      Could you recommend me an expensive hotel in the centre?
      I would like to know of a museum that is free to attend.
      Do you know of a cheap italian food restaurant?
      I need a train from Cambridge to Stansted Airport arriving by 11:00 on Sunday.


    Please read these submission instructions carefully.

    1. Add and commit your submission files to the repository you created in the beginning. The repository should contain:
      • All of the code that you wrote.
      • Your final model and outputs from Parts 1 and 3.
      • A file with
        • instructions on how to build/run your code.
        • Your responses to all of the instructions/questions in the assignment.
    2. To submit:
      • Push your work to GitHub and create a release in GitHub by clicking on "Releases" on the right-hand side, then "Create a new release" or "Draft a new release". Fill in "Tag version" and "Release title" with the part number(s) you’re submitting and click "Publish Release".
      • If you submit the same part more than once, the grader will grade the latest release for that part.
      • For computing the late penalty, the submission time will be considered the commit time, not the release time.