9.520/6.7910: Statistical Learning Theory and Applications, Fall 2025

Course description

Understanding human intelligence and how to replicate it in machines is arguably one of the greatest problems in science. Learning, its principles and computational implementations, is at the very core of intelligence. During the last two decades, for the first time, artificial intelligence systems have been developed that solve complex tasks, until recently the exclusive domain of biological organisms, such as computer vision, speech recognition or natural language understanding: cameras recognize faces, smart phones understand voice commands, smart chatbots/assistants answer questions and cars can see and avoid obstacles. The machine learning algorithms that are at the roots of these success stories are trained with examples rather than programmed to solve a task. This has been a once-in-a-time paradigm shift for Computer Science: shifting from a core emphasis on programming to training-from-examples. This course — the oldest on ML at MIT — has been pushing for this shift from its inception around 1990. However, a comprehensive theory of learning is still incomplete, as shown by the several puzzles of deep learning. An eventual theory of learning that explains why deep networks work and what may lie beyond them, is becoming an urgent need. It may enable the development of more powerful learning approaches and perhaps inform our understanding of human intelligence before machines will become smarter than us.

In this spirit, the course covers foundations and recent advances in statistical machine learning theory, with the dual goal a) of providing students with the theoretical knowledge and the intuitions needed to use effective machine learning solutions and b) to prepare more advanced students to contribute to progress in the field. This year the emphasis is again on b).

The course is organized about a specific parametric approach to supervised learning in which a powerful approximating class of parametric class of functions — such as Deep Neural Networks — is trained — that is, optimized on a training set. This approach to learning is, in a sense, the ultimate inverse problem, in which sparse compositionality and stability play a key role in ensuring good generalization performance.

The content is roughly divided into three parts. The first part is about:

classical regularization and regularized least squares
kernel machines, SVM
logistic regression, square and exponential loss
large margin (minimum norm)
stochastic gradient methods,
overparametrization, implicit regularization
approximation/estimation errors.

The second part is about deep networks and, particular, about:

approximation theory — why deep networks avoid the curse of dimensionality and are a universal parametric approximator s
optimization theory — how weights and activation evolve in time and across layers during training with SGD
learning theory — how generalization in deep networks can be explained in terms of the complexity control implicit in regularized (or unregularized) SGD and in terms of the details of the sparse compositional architecture itself.

The third part is about a few topics of current research, starting with the connections between learning theory and the brain, which was the original inspiration for modern networks and may provide ideas for future developments and breakthroughs in the theory and the algorithms of leaning. Throughout the course, and especially in the final classes, ther wil be talks by leading researchers on selected advanced research topics.

Apart for the first part on regularization, which is an essential part of any introduction to the field of machine learning, this year course is designed for students with a good background in ML.

Prerequisites

We will make extensive use of basic notions of calculus, linear algebra and probability. The essentials are covered in class and in the math camp material. We will introduce a few concepts in functional/convex analysis and optimization. Note that this is an advanced graduate course and some exposure on introductory Machine Learning concepts or courses is expected: for course 6 students prerequisites are 6.041 and 18.06 and (6.036 or 6.401 or 6.867).

Grading

Requirements for grading are attending lectures/participation (10%), three problem sets (45%) and a final project (45%). Use of LLMs on problem sets and the final project — such as ChatGPT — is allowed, but it must be disclosed.

Classes are conducted in-person.

Problem Sets

Problem Set 1, out: Tue. Sept. 17, due: Thu. Oct. 03
Problem Set 2, out: Thu. Oct. 03, due: Thu. Oct. 17
Problem Set 3, out: Thu. Oct. 17, due: Thu. Oct. 31

Submission instructions: Follow the instructions included with the problem set. Submit your report online through Canvas by the due date/time.

Projects

Guidelines and key dates. Online form for project proposal (complete by Mon. Oct. 14).

Final project reports (5 pages for individuals, 8 pages for teams, NeurIPS style) are due on Tue. Dec. 10??.

Projects archive

List of Wikipedia entries, created or edited as part of projects during previous course offerings.

Navigating Student Resources at MIT

This document has more information about navigating student resources at MIT

Units:
3-0-9 H,G

Class Times:
Tuesday and Thursday: 11:00 am – 12:30 pm

Location:
46-3002 (Singleton Auditorium)

Instructors:
Tomaso Poggio (TP), Lorenzo Rosasco (LR), Michael Lee (ML), Akshay Rangamani (AR), Pierfrancesco Beneventano (PB), Andrea Pinto (AP), Eran Malach (EM)

TAs:
Michael Lee, Dan Mitropolski, Liu Ziyin

Office Hours:
Wednesdays 1PM – 2PM every Friday (46-5193)

Email Contact:
9.520@mit.edu

Previous Class:
FALL 2024, FALL 2023, FALL 2022, FALL 2021, FALL 2020, FALL 2019

Canvas page:
https://canvas.mit.edu/courses/28351

9.520/6.7910: Statistical Learning Theoryand Applications

Fall 2025