CBMM Special Seminar: Beyond Empirical Risk Minimization: the lessons of deep learning

Photo of Mikhail Belkin October 28, 2019 - 4:00 pm to 5:00 pm

Title: Beyond Empirical Risk Minimization: the lessons of deep learning

Abstract: "A model with zero training error is  overfit to the training data and  will typically generalize poorly"  goes statistical textbook wisdom.  Yet, in modern practice, over-parametrized deep networks with   near  perfect  fit on  training data still show excellent test performance.  This apparent  contradiction points to troubling cracks in the conceptual foundations of machine learning. While classical analyses of Empirical Risk Minimization rely on balancing the  complexity of  predictors with  training error, modern models are best described by interpolation. In that paradigm  a predictor is chosen by minimizing (explicitly or implicitly) a norm corresponding to a certain inductive bias over a space of functions that  fit the training data exactly. I will discuss the nature of the challenge to our understanding of machine learning and point the way forward to first analyses that account for the empirically observed phenomena.  Furthermore, I will show how  classical and modern models can  be unified within a single  "double descent" risk curve,  which subsumes the classical U-shaped bias-variance trade-off.

Finally, as an example of a particularly interesting inductive bias, I will show evidence that deep  over-parametrized autoencoders networks, trained with SGD, implement a form of associative memory with training examples as attractor states.


MIT Building 46
October 28, 2019
4:00 pm to 5:00 pm
Singleton Auditorium

43 Vassar Street, Cambridge MA 02139