CSE 40/60657: Natural Language Processing
- Term
- Spring 2015
- Lectures
- MW 2–3:15 pm, 125 DeBartolo Hall
- Instructor
- Prof. David Chiang
- Office hours
- TTh 1:30–3:30pm, 326D Cushing Hall
- Teaching assistant
- Kenton Murray
- Office hours
- Tue 5–7pm, Crossings (Law School cafe)
Computers process massive amounts of information every day in the form of human language. Although they do not understand it, they can learn how to do things like answer questions about it, or translate it into other languages. This course is a systematic introduction to the ideas that form the foundation of current language technologies and research into future language technologies.
Upon successful completion of this course, you will be able to implement solutions to various NLP problems:
- Text classification (e.g., predict whether a text is spam or not spam)
- Language modeling (decide whether a sentence is fluent English)
- Sequence labeling (e.g., identify all the place names in a text)
- Sequence transduction (e.g., convert an English word to its correct pronunciation)
- Syntactic parsing (determine the grammatical structure of a sentence)
- Formulate NLP tasks as instances of various models: bag of words, finite-state automata, and context-free grammars.
- Apply machine learning methods to the above models: supervised and unsupervised, generative and discriminative.
- Design efficient algorithms to do the above.
Preparation
Before taking this course, you should ideally have taken the following courses or their equivalents. These are not hard prerequisites. If you are taking any of these courses concurrently, or if you can do Homework 0 easily, then you should have no problem.
- Programming, preferably in a high-level language like Python (CSE 30332)
- Automata theory (CSE 30151)
- Basic probability theory (ACMS 30440)
- Basic multivariable calculus (MATH 20550)
There is no required text for this course; readings and notes will be given to you. However, if you like having a book for reference, an optional text is: Daniel Jurafsky and James H. Martin, Speech and Language Processing, 2nd edition, Prentice Hall, 2008. Note: a 3rd edition is currently being prepared.
Schedule
Week | Day | Before class | Class | After class |
---|---|---|---|---|
1 | 1/14 | Introduction. Formal frameworks for natural language: bag of words, finite-state automata, context-free grammars. | Start thinking about your project Start Homework 1 |
|
2 | 1/19 | Read Chapter 1 | Probabilities and parameter estimation. Naïve Bayes for text classification. | |
1/21 | Read Chapter 2 | Discriminative training with logistic regression and the perceptron algorithm, for better text classification. | ||
3 | 1/26 | Read Chapter 3 and Chapter 4 | Unsupervised training with expectation-maximization, for document clustering and topic modeling. | |
1/28 | Submit Homework 1 (night before) | |||
4 | 2/2 | Read Chapter 5 | Quiz 1 Weighted finite-state automata and smoothing for language modeling. |
|
2/4 | Weighted finite-state automata, continued. | |||
5 | 2/9 | Weighted finite-state transducers. Part of speech tagging, entity detection, and other sequence labeling problems. | Start Homework 2 | |
2/11 | Submit project proposal (night before) Read Chapter 6 |
|||
6 | 2/16 | Viterbi algorithm. Intersection and composition of finite-state machines. | ||
2/18 | ||||
7 | 2/23 | Present project proposals | ||
2/25 | Read Chapter 7 | Submit Homework 2 | ||
8 | 3/2 | Unsupervised training with the forward-backward algorithm, for transliteration, grapheme-to-phoneme conversion, morphological processing, and other sequence alignment problems. | Start Homework 3 | |
3/4 | Class cancelled | Submit project data/baseline description | ||
Spring break | ||||
9 | 3/16 | Read Chapter 8 | Discriminative training with conditional random fields, for better sequence labeling. | |
3/18 | Read Chapter 9 | Linear regression for predicting numerical values from texts. | ||
10 | 3/23 | Read Chapter 10 | Extensions for speech and translation. | |
3/25 | Submit Homework 3 on Thursday at 11:55pm | |||
11 | 3/30 | Read Chapter 11 | Quiz 2 Context-free grammars, for syntactic parsing and translation. |
|
4/1 | Read Chapter 12 | Context-free grammars, continued. Weighted context-free grammars. | ||
12 | 4/6 | Easter Monday | ||
4/8 | Read Chapter 13 | Parsing with the CKY algorithm. | Submit project progress report on Friday at 11:55pm | |
13 | 4/13 | Practical statistical parsing. | ||
4/15 | ||||
14 | 4/20 | Read Chapter 14 (not covered on exam) | Unsupervised training with the inside-outside algorithm. | |
4/22 | Read Chapter 15 (not covered on exam) | Synchronous CFGs for translation. | Submit Homework 4 (night before) | |
15 | 4/27 | Present project reports | ||
4/29 | ||||
Finals | 5/5 4:15pm | Exam | Submit project report on 5/5 at 11:55pm |
Requirements
Homework 0 is meant to help you decide whether to take the class; it does not count as part of your grade. After that, there are four programming assignments (Homework 1–4).
There will be two quizzes, after units one and two, and an exam at the end of the semester.
Throughout the course, you will work on a research project. There will be three milestones during the semester and a report at the end of the semester.
Summary of requirements and point values:
assignment | points |
---|---|
Homeworks | 4 × 30 |
Project | 60 |
Quizzes | 2 × 30 |
Exam | 60 |
total | 300 |
Conversion to letter grades:
letter grade | points |
---|---|
A | 280–300 |
A− | 270–279 |
B+ | 260–269 |
B | 250–259 |
B− | 240–249 |
C+ | 230–239 |
C | 220–229 |
C− | 210–219 |
D | 180–209 |
F | 0–179 |
Policies
Honor Code
All work that you submit must be your own. You may discuss assignments with other students or refer to books or websites as long as you cite your sources. You may not write solutions or code with other students or anyone else, nor may you copy solutions or code from any source.
Late Submissions
In the case of a serious illness or other excused absence, as defined by university policies, coursework submissions will be accepted late by the same number of days as the excused absence.
Otherwise, submit what you can on time, indicating clearly which parts you are not submitting. You can submit the rest later (but no later than the final project deadline), but the score of the late portion will decay exponentially with a half-life of one week, rounded to the nearest point.
Students with Disabilities
Any student who has a documented disability and is registered with Disability Services should speak with the professor as soon as possible regarding accommodations. Students who are not registered should contact the Office of Disability Services.
Lecture Capture Notification
Notre Dame is testing a lecture capture system. This system allows us to record and distribute lectures and other audio and video recordings to you in a secure environment. Because we will be recording in the classroom, your questions or comments may be recorded. Video recordings will typically only capture the front of the classroom. If you have any concerns about your voice or image being recorded, please speak to me to determine an alternative means of participating. No material will be shared with individuals outside of your class and faculty and staff who require access for support or specific academic purposes without your express permission.
You may watch recordings on your computer, tablet or smartphone. These recordings are jointly copyrighted by the University of Notre Dame and your instructor. Posting them to another website, including YouTube, Facebook, Vimeo, or any other site without express, written permission may result in disciplinary action and possible civil prosecution.