Class 24: Deep Learning Theory: Optimization

Instructor: Tomaso Poggio

Description

A mathematical theory of deep networks and of why they work as well as they do is now emerging. I will review some recent theoretical results on the approximation power of deep networks including conditions under which they can be exponentially better than shallow learning. A class of deep convolutional networks represent an important special case of these conditions, though weight sharing is not the main reason for their exponential advantage. I will also discuss another puzzle around deep networks: what guarantees that they generalize and they do not overfit despite the number of weights being larger than the number of training data and despite the absence of explicit regularization in the optimization?

Class Reference Material

Slides: PDF

T. Poggio, Why and When Can Deep-but Not Shallow-networks Avoid the Curse of Dimensionality: A Review, DOI: 10.1007/s11633-017-1054-2

The Center for Brains, Minds & Machines

Statistical Learning Theory and Applications

MIT

9.520/6.860, Class 24

Class 24: Deep Learning Theory: Optimization

Instructor: Tomaso Poggio

Description

Class Reference Material

Further Reading

Search form

Statistical Learning Theory and Applications

MIT

Class 24: Deep Learning Theory: Optimization

Instructor: Tomaso Poggio

Description

Class Reference Material

Further Reading