Course Info

This course provides a graduate-level introduction to Natural Language Processing (NLP), covering techniques from foundational methods to modern approaches. We begin with core concepts such as word representations and neural network–based NLP models, including recurrent networks and attention mechanisms. We then study modern Transformer-based models, focusing on pre-training, fine-tuning, prompting, scaling laws, and post-training. The course concludes with recent advances in NLP, including retrieval-augmented models, reasoning models, and multimodal systems involving vision and speech.

Grading

Quizzes/Participation: 5%
Assignments
- Assignment 1: 15%
- Assignment 2: 20%
- Assignment 3: 20%
Final Project: 40%

All team members will receive the same score for team assignments and projects.

Late Days

You have a total of 6 late days to use during the semester for the assignments. This covers all possible circumstances of unanticipated delays, including illness, unexpected travel, or emergency. If you need an exception for this, you must arrange via DSP. Otherwise, additional late days are not allowed for any circumstances.

You may not use more than 3 late days per assignment. Gradescope will be closed after 48 hours (if it happens to remain open after that, you can still submit). Once Gradescope closes, any missing submission will receive zero. Only the latest submission will be graded if you submit multiple times.

Late days cannot be used for project-related deadlines.

We do not keep track of partial late days. One late day is spent per 24 hour window after the deadline, including if a submission occurs within (including at the very beginning of) a window.

You must keep track of your use of late days separately from us so you are aware of your usage.

Prerequisite

CS 288 assumes prior experience in machine learning and proficiency in PyTorch. Students should be familiar with neural networks, PyTorch, and NumPy; no introductory tutorials will be provided.

Undergraduate and master students: CS 182, 188, 189, or 183/283A are strongly encouraged. This will be reviewed in the enrollment form.

In person attendance

Lectures will be live and in-person. All lectures will be recorded and made available to enrolled students and Cal-affiliated auditors. However, we do not provide guarantees on how quickly these recordings & lecture slides will be made available online. Recordings are intended primarily for later reference only and not as a replacement for attending lecture. Thus, we strongly encourage in-person attendance, although we will not take attendance.

In-person attendance is required for the project presentations in Class 27 and 28, except where otherwise arranged via DSP.

Disability Support Services

If you need disability-related accommodations in this class contact the UC Berkeley Disabled Students Program ((510) 642-0518 / website). DSP services include accommodation letters, assistive technology and access services. An accommodation letter is needed in order for the instructor to grant an accommodation (e.g. extended exam time). Students must be assessed every semester to receive an accommodation letter.

Collaboration Policy

We follow the EECS Academic Misconduct policy: using work or resources that are not your own or that are not permitted by the course may result in disciplinary actions, including a failing grade in the course. We use plagiarism and similarity detection on all submissions. Working with others on assignments:

You are encouraged to discuss concepts and approaches with classmates, TAs, and instructors.
You must write all code and all text yourself, in your own words. Do not share or receive code, write-ups, screenshots, or autograder outputs. Discuss ideas verbally/at a whiteboard only.
Never post solutions publicly (including GitHub). You may not store or disseminate solutions after the course ends.

GenAI policy

We recognize GenAI tools are ubiquitous and can help you learn, but your submission must be your own original work. In this course, GenAI is a consulting tool, not an author.

Allowed (with citation)

Asking for clarification of error messages, APIs, or unfamiliar library behavior.
Asking for conceptual explanations (“What does cross-entropy measure?”), or workflow planning (“What steps to check for data leakage?”).
Tab based code completion is okay as long as you understand the process and could replicate the answer independently. (For your learning purposes, we recommend trying to re-implement any significant tab completions.) When you use GenAI in allowed ways, include an Acknowledgement (see template below) describing what tool you used and for what purpose.

Not allowed

Pasting any assignment text, dataset identifiers, or prompts directly into GenAI.
No Vibe coding. You should not blindly paste output or rely on agents to implement your assignments.
Using GenAI (ChatGPT/Copilot/etc) to summarize/interpret your own analysis for inclusion.
Requesting solutions, proofs, derivations, or code for assignment parts without understanding the concepts.
If in doubt, please reach out to staff by posting a question on Edstem!

Required acknowledgement format: Place this near the top of your submission if you used GenAI in any allowed way:

I acknowledge the use of [tool + link] to [specific purpose]. I used the outputs to [e.g., understand an error message / clarify pandas behavior]. I have the ability to explain and even independently replicate the work done in this document if asked by an instructor.

Laptop policy

Please feel free to bring and use your laptops / other electronics to class if it is related to the lecture, e.g., for note-taking. We discourage students from using the laptop for non-class related things.

What is the formula for curving the courses?

Curving will be based on an affine transformation of scores up to the discretion of the instructors.