Policies
Table of contents
Description
This course will explore current statistical techniques for the automatic analysis of natural (human) language data. The dominant modeling paradigm is corpus-driven statistical learning. This term, we are introducing a few new projects to give increased hands-on experience with a greater variety of NLP tasks and commonly used techniques.
This course assumes a good background in basic machine learning and a strong ability to program in Python. Prior experience with linguistics or natural languages is helpful, but not required. There will be a lot of statistics, algorithms, and coding in this class. The recommended background is CS 188 (or CS 281A) and CS 170 (or CS 270). An A in CS 188 (or CS 281A) is required. This course will be more work-intensive than most graduate or undergraduate courses.
Readings
The primary recommended texts for this course are:
- Jurafsky and Martin, Speech and Language Processing, 3rd edition.
- Eisenstein, Introduction to Natural Language Processing
Both texts are currently free online.
Office Hours
Professor office hours: TBA
GSI office hours: TBA
Late Work Policy
Everyone will have 7 slip days they may use on any project throughout the semester, but you may only use up to 2 slip days on an individual project without prior consent from the instructors.
If you need to use more than 2 slip days on a project, please make a private post on piazza before the project deadline to let us know.
Slip days will be counted by rounding up to the nearest day – ie. if a project is due at 11:59pm on Tuesday and you submit at 12:00am on Wednesday, this will use up one of your slip days. Weekend days will count towards slip days as well, so there is no notion of business days here.