Computers process massive amounts of information every day in the form of human language. Although they do not understand it, they can learn how to do things like answer questions about it, or translate it into other languages. This course is a systematic introduction to the ideas that form the foundation of current language technologies and research into future language technologies.
The official prerequisites are CSE 20312 or CDT 30020. Students should be experienced with writing substantial programs in Python. The course also makes use of finite automata, context-free grammars, basic linear algebra, multivariable differential calculus, and probability theory. Ideally, students should have taken CSE 30151, Math 10560, and ACMS 30440, but please contact the instructor if you have questions about the necessary background.
Announcements will be made on Campuswire, and the best way to contact the teaching staff is also on Campuswire.
The readings for each week should be done before the lectures that week; homeworks and project milestones should be done by Friday at 5pm in that week.
Unit | Week | Assignment | Mon | Wed | Fri |
1 | Chapter 1, Chapter 2 | 08/28 Language |
08/30 Probability |
||
Getting Language | 2 | Chapter 3 (v2), Chapter 4 (v3) Project idea |
09/02 Language models |
09/04 Smoothing |
09/06 Finite automata |
3 | Chapter 5 (v2), Chapter 6 (v3) | 09/09 Recurrent neural networks |
09/11 Text input Finite transducers and composition |
09/13 Viterbi algorithm |
|
4 | HW1: Text input | 09/16 Text normalization Unsupervised training |
09/18 continued |
09/20 continued |
|
5 | Chapter 7 Chapter 8 |
09/23 Phonetics and phonology |
09/25 Speech recognition |
09/27 Character and handwriting recognition |
|
Analyzing Language | 6 | Chapter 9, Chapter 10 HW2: Text correction |
09/30 Syntax |
10/02 Morphology |
10/04 Part-of-speech tagging |
7 | Chapter 11, Chapter 12 | 10/07 Context-free grammars |
10/09 CKY algorithm |
10/11 continued |
|
8 | Chapter 13 Project baseline |
10/14 Adding annotations |
10/16 Unsupervised annotation |
10/18 Beyond CFGs |
|
Fall break | |||||
Understanding Language | 9 | Chapter 14 | 10/28 Text classification Bag of words |
10/30 Topic modeling Word embeddings |
11/01 Projects |
10 | Chapter 15 HW3: Parsing |
11/04 Entity recognition |
11/06 Graph semantics and graph grammars |
11/08 Projects |
|
11 | 11/11 Logical semantics |
11/13 continued |
11/15 Projects |
||
Generating Language | 12 | Chapter 16 HW4: Entity recognition |
11/18 Word-based machine translation |
11/20 continued |
11/22 Projects |
13 | 11/25 continued |
Thanksgiving | |||
14 | 12/02 Neural machine translation Synchronous CFGs |
12/04 continued |
12/06 Projects |
||
15 | HW5: Machine translation | Chapter 17 12/09 Syntax-based machine translation |
12/11 Conclusion |
||
Final | Project report |
Your work in this course consists of five homework assignments, a research project, and participation (whether you come to class, whether you appear to be paying attention, and whether you appear to have done the readings).
All written work should be submitted through Sakai.
requirement | points |
homeworks | 5 × 30 |
project | 4 × 30 |
participation | 30 |
total | 300 |
letter grade | points |
A | 280–300 |
A− | 270–279 |
B+ | 260–269 |
B | 250–259 |
B− | 240–249 |
C+ | 230–239 |
C | 220–229 |
C− | 210–219 |
D | 180–209 |
F | 0–179 |
Students in this course are expected to abide by the Academic Code of Honor Pledge: “As a member of the Notre Dame community, I will not participate in or tolerate academic dishonesty.”
The following table summarizes how you may work with other students and use print/online sources:
Resources | Solutions | |
---|---|---|
Consulting | allowed | not allowed |
Copying | cite | not allowed |
If an instructor sees behavior that is, in his judgement, academically dishonest, he is required to file either an Honor Code Violation Report or a formal report to the College of Engineering Honesty Committee.
In the case of a serious illness or other excused absence, as defined by university policies, coursework submissions will be accepted late by the same number of days as the excused absence. Otherwise, you may submit part of an assignment on time for full credit and part of the assignment late with a penalty of 30% per week (that is, your score for that part will be $\lfloor 0.7^t s\rfloor$, where $s$ is your raw score and $t$ is the possibly fractional number of weeks late). No part of the assigment may be submitted more than once. No work may be submitted after the final project due date.
Any student who has a documented disability and is registered with Disability Services should speak with the professor as soon as possible regarding accommodations. Students who are not registered should contact the Office of Disability Services.