LH - Course - Residential: Statistical Learning Theory and Applications (G)

Statistical Learning Theory and Applications (G)
Statistical Learning Theory and Applications

Instructors: Tomaso Poggio, Lorenzo Rosasco

Course Numbers: 9.520J, 6.860J

Course Level: Graduate

Prerequisites: (MIT) 6.867, 6.041, 18.06, or permission of instructor

Course Website: https://cbmm.mit.edu/lh-9-520

Course Description: 

The class covers foundations and recent advances of Machine Learning from the point of view of Statistical Learning Theory.

Understanding intelligence and how to replicate it in machines is arguably one of the greatest problems in science. Learning, its principles and computational implementations, is at the very core of intelligence. During the last decade, for the first time, we have been able to develop artificial intelligence systems that can solve complex tasks considered out of reach. ATM machines read checks, cameras recognize faces, smart phones understand your voice and cars can see  and avoid obstacles.

The machine learning algorithms that are at the roots of these success stories are trained with labeled examples rather than programmed to solve a task. Among the approaches in modern machine learning, the course focuses on regularization techniques, that provide a theoretical foundation to high- dimensional supervised learning. Besides classic approaches such as  Support Vector Machines,  the course  covers state of the art techniques exploiting data geometry (aka manifold learning), sparsity and a variety of algorithms for supervised learning (batch and online), feature selection, structured prediction and multitask learning. Concepts from optimization theory useful for machine learning are covered in some detail (first order methods, proximal/splitting techniques…).

The final part of the course focuses on deep learning networks. It introduces a theoretical framework connecting the computations within the layers of deep learning networks to kernel machines. It studies an extension of the convolutional layers in order to deal with more general invariance properties and to learn them from implicitly supervised data.  This theory of hierarchical architectures may explain how visual cortex learn, in an implicitly supervised way, data representation that can lower the sample complexity of a final supervised learning stage.

The goal of this class is to provide students with the theoretical knowledge and the basic intuitions needed to use and develop effective machine learning solutions to challenging problems.