CSE 40/60657: Natural Language Processing

Term
Spring 2015
Lectures
MW 2–3:15 pm, 125 DeBartolo Hall
Instructor
Prof. David Chiang
Office hours
TTh 1:30–3:30pm, 326D Cushing Hall
Teaching assistant
Kenton Murray
Office hours
Tue 5–7pm, Crossings (Law School cafe)

Computers process massive amounts of information every day in the form of human language. Although they do not understand it, they can learn how to do things like answer questions about it, or translate it into other languages. This course is a systematic introduction to the ideas that form the foundation of current language technologies and research into future language technologies.

Upon successful completion of this course, you will be able to implement solutions to various NLP problems:

More generally, you will be able to:

Preparation

Before taking this course, you should ideally have taken the following courses or their equivalents. These are not hard prerequisites. If you are taking any of these courses concurrently, or if you can do Homework 0 easily, then you should have no problem.

There is no required text for this course; readings and notes will be given to you. However, if you like having a book for reference, an optional text is: Daniel Jurafsky and James H. Martin, Speech and Language Processing, 2nd edition, Prentice Hall, 2008. Note: a 3rd edition is currently being prepared.

Schedule

Week Day Before class Class After class
1 1/14 Introduction. Formal frameworks for natural language: bag of words, finite-state automata, context-free grammars. Start thinking about your project
Start Homework 1
2 1/19 Read Chapter 1 Probabilities and parameter estimation. Naïve Bayes for text classification.
1/21 Read Chapter 2 Discriminative training with logistic regression and the perceptron algorithm, for better text classification.
3 1/26 Read Chapter 3 and Chapter 4 Unsupervised training with expectation-maximization, for document clustering and topic modeling.
1/28 Submit Homework 1 (night before)
4 2/2 Read Chapter 5 Quiz 1
Weighted finite-state automata and smoothing for language modeling.
2/4 Weighted finite-state automata, continued.
5 2/9 Weighted finite-state transducers. Part of speech tagging, entity detection, and other sequence labeling problems. Start Homework 2
2/11 Submit project proposal (night before)
Read Chapter 6
6 2/16 Viterbi algorithm. Intersection and composition of finite-state machines.
2/18
7 2/23 Present project proposals
2/25 Read Chapter 7 Submit Homework 2
8 3/2 Unsupervised training with the forward-backward algorithm, for transliteration, grapheme-to-phoneme conversion, morphological processing, and other sequence alignment problems. Start Homework 3
3/4 Class cancelled Submit project data/baseline description
Spring break
9 3/16 Read Chapter 8 Discriminative training with conditional random fields, for better sequence labeling.
3/18 Read Chapter 9 Linear regression for predicting numerical values from texts.
10 3/23 Read Chapter 10 Extensions for speech and translation.
3/25 Submit Homework 3 on Thursday at 11:55pm
11 3/30 Read Chapter 11 Quiz 2
Context-free grammars, for syntactic parsing and translation.
4/1 Read Chapter 12 Context-free grammars, continued. Weighted context-free grammars.
12 4/6 Easter Monday
4/8 Read Chapter 13 Parsing with the CKY algorithm. Submit project progress report on Friday at 11:55pm
13 4/13 Practical statistical parsing.
4/15
14 4/20 Read Chapter 14 (not covered on exam) Unsupervised training with the inside-outside algorithm.
4/22 Read Chapter 15 (not covered on exam) Synchronous CFGs for translation. Submit Homework 4 (night before)
15 4/27 Present project reports
4/29
Finals 5/5 4:15pm ExamSubmit project report on 5/5 at 11:55pm

Requirements

Homework 0 is meant to help you decide whether to take the class; it does not count as part of your grade. After that, there are four programming assignments (Homework 1–4).

There will be two quizzes, after units one and two, and an exam at the end of the semester.

Throughout the course, you will work on a research project. There will be three milestones during the semester and a report at the end of the semester.

Summary of requirements and point values:

assignment points
Homeworks 4 × 30
Project 60
Quizzes 2 × 30
Exam 60
total 300

Conversion to letter grades:

letter gradepoints
A 280–300
A− 270–279
B+ 260–269
B 250–259
B− 240–249
C+ 230–239
C 220–229
C− 210–219
D 180–209
F 0–179

Policies

Honor Code

All work that you submit must be your own. You may discuss assignments with other students or refer to books or websites as long as you cite your sources. You may not write solutions or code with other students or anyone else, nor may you copy solutions or code from any source.

Late Submissions

In the case of a serious illness or other excused absence, as defined by university policies, coursework submissions will be accepted late by the same number of days as the excused absence.

Otherwise, submit what you can on time, indicating clearly which parts you are not submitting. You can submit the rest later (but no later than the final project deadline), but the score of the late portion will decay exponentially with a half-life of one week, rounded to the nearest point.

Students with Disabilities

Any student who has a documented disability and is registered with Disability Services should speak with the professor as soon as possible regarding accommodations. Students who are not registered should contact the Office of Disability Services.

Lecture Capture Notification

Notre Dame is testing a lecture capture system. This system allows us to record and distribute lectures and other audio and video recordings to you in a secure environment. Because we will be recording in the classroom, your questions or comments may be recorded. Video recordings will typically only capture the front of the classroom. If you have any concerns about your voice or image being recorded, please speak to me to determine an alternative means of participating. No material will be shared with individuals outside of your class and faculty and staff who require access for support or specific academic purposes without your express permission.

You may watch recordings on your computer, tablet or smartphone. These recordings are jointly copyrighted by the University of Notre Dame and your instructor. Posting them to another website, including YouTube, Facebook, Vimeo, or any other site without express, written permission may result in disciplinary action and possible civil prosecution.