This course will explore current statistical techniques for the automatic analysis of natural (human) language data. The dominant modeling paradigm is corpus-driven statistical learning. This term, we are introducing a few new projects to give increased hands-on experience with a greater variety of NLP tasks and commonly used techniques.
This course assumes a good background in basic machine learning and a strong ability to program in Python. Prior experience with linguistics or natural languages is helpful, but not required. There will be a lot of statistics, algorithms, and coding in this class. The recommended background is CS 188 (or CS 281A) and CS 170 (or CS 270). An A in CS 188 (or CS 281A) is required. This course will be more work-intensive than most graduate or undergraduate courses.
The primary recommended texts for this course are:
- Jurafsky and Martin, Speech and Language Processing, 3rd edition.
- Eisenstein, Introduction to Natural Language Processing
Both texts are currently free online.
Professor office hours: TBA
GSI office hours: TBA
While we will be flexible as possible as circumstances change, it is hard to be certain how the semester will play out in terms of the pandemic. Per campus guidelines, we will be starting with fully remote instruction for the first two weeks of instruction. Beyond that, we will transition aspects of the course if and as appropriate. For example, if our enrollment stays sufficiently high, fully remote lectures may turn out to be both safer and more effective for much or all of the semester. Regardless of the mix of instruction modes in use at any given time, we are anticipating a likelihood that students and staff alike may experience unexpected disruptions this semester. As a result, no aspect of the course will require in-person attendance, and we will have an increased number of late days (14). Please keep the majority of these late days available for any unforeseen disruptions.