CSE 40/60657: Natural Language Processing
Project
- Due
- 2014/05/04 11:55pm
- Points
- 60
Through the course, you will develop an application of NLP techniques to a topic of your choosing. It may be related to a personal hobby (e.g., comic books, opera) or your area of research (e.g., medical informatics, English literature) or just something you find interesting.
Some important general guidelines:
- You may work alone or in pairs as long as you clearly define which person did which parts of the work. Each person should contribute equally. (If an undergraduate and graduate student work together, the graduate student should contribute roughly 2/3.)
- You will need to try at least two of the formal frameworks (bag of words, finite-state automata, context-free grammars).
- It’s not acceptable to do the same work for this project and another class’s project, but it’s acceptable (and encouraged) for this project to relate to another project as long as the boundary is clearly defined.
The finished product should have (roughly) five sections:
- Goal
- What are you trying to do?
- If successful, what difference would it make in people's lives?
- Methods
- What methods do you use?
- What are the existing approaches? Graduate students: this should be a reasonably comprehensive review.
- Experiments
- What data do you use?
- What metric(s) do you use to measure success?
- What baseline method do you compare against?
- How well do your methods perform compared with the baseline, and why?
- Conclusions: What do you learn from your experiments?
- Roles: Who did which part?
Milestone 1: Proposal
For the first milestone, you will choose a topic (and collaborator if any). It is okay to change later, but you must propose something at the first milestone. You will present your proposal both in writing and in class.
Written part
Length guideline: 1–2 pages for each undergraduate student, 1.5–3 pages for each graduate student. It should have the following sections:
- Goal
- What are you trying to do? Describe your goal without using any jargon. What are the inputs and what are the outputs?
- If successful, what difference would it make in people's lives?
- Methods
- You're not expected to have ideas yet about how to achieve your goal, but if you have any, feel free to describe them.
- What are the existing approaches? Graduate students: this should be a reasonably comprehensive review.
- Experiments
- What data will you use? Include URLs or LDC catalog numbers.
- What metric(s) will you use to measure success?
- What baseline method will you compare against? This should be something that you can implement in about one hour.
- Roles: Who will do which part?
In-class part
You will also give a very short (5 minute) presentation of your initial idea to the class. You'll only have time to talk about the first section (Goal). This will be a good chance to get feedback from others.
Your proposal will be graded according to the rubric at the bottom of this page, as if you completed the project with the best possible outcome. If you propose something interesting and realistic, you will get a good grade.
Milestone 2: Groundwork
By the second milestone, you should commit to a topic. You must also complete all of your data preparation and your baseline method.
Submit a revised version of your written proposal, with an expanded Experiments section.
- Goal: as above.
- Experiments
- What data will you use? Include URLs or LDC catalog numbers. Describe whatever preprocessing steps you used on the data (e.g., cleaning, tokenization).
- What metric(s) will you use to measure success?
- What baseline method will you compare against? This should be something that you can implement in about one hour.
- How well does your baseline method perform (using your evaluation metric)?
- Roles: Who did which part?
Your revised proposal will be graded according to the rubric at the bottom of this page, as if you completed the proposed project with the best possible outcome.
Milestone 3: Progress Report
By the third milestone, you should have completed at least one method and evaluated it against the baseline. Submit a report on how your project is going. Length guideline: 1–2 pages of new material for each undergraduate student, 1.5–3 pages of new material for each graduate student.
- Goal: as above.
- Methods
- What method(s) did you try? Describe each in enough detail that one of your classmates could reimplement it.
- Experiments: as above, plus
- How well do your methods perform compared with the baseline, and why? Graduate students: Pay particular attention to the “why”, including any follow-up experiments or analysis to support your answer.
- Roles: Who did which part?
Your progress report will be graded according to the rubric at the bottom of this page, as if you completed the project with the best possible outcome.
Final Report
You will present a final report on your project both in writing and in class.
In-class part
Your presentation should be no more than 10 minutes per person, plus 5 minutes total for questions. Although this presentation is part of your “final” report, it is okay if some of the work presented isn’t finished yet. You should present:
- Goal: as above.
- Methods: just a very brief description.
- Experiments: just a very brief description.
- Conclusions: What did you learn from your experiments?
Written part
Length guideline: 2–4 pages per undergraduate student, 3–6 pages per graduate student. It should have all the sections listed at the top of this page.
Data and code
Submit all the code that you wrote and all the data that you used (or links to them).
Grading
Point values
assignment | points |
---|---|
Milestone 1 | 10 |
Milestone 2 | 10 |
Milestone 3 | 10 |
Final report (in-class) | 10 |
Final report (written) | 20 |
total | 60 |
Rubric
Exceptional | Good | Acceptable | Unacceptable | |
---|---|---|---|---|
Substance | An impressive amount of work, or dramatically improves performance relative to baseline. | A substantial amount of work. Substantially improves performance relative to baseline. | A nontrivial amount of work. Improves performance relative to baseline. | Unambitious or incorrectly implemented. Fails to improve performance over baseline. |
Creativity | A novel idea, or application to a novel area. | Reimplementation of an interesting idea from the literature, or an interesting application of an idea from class. | An idea recycled from class. | |
Clarity | Presents goal clearly with strong motivation, and presents methods clearly enough to be reimplemented. | Has some problems getting either goal or methods across. | Totally unclear. |