|Title||Theory III: Dynamics and Generalization in Deep Networks|
|Publication Type||CBMM Memos|
|Year of Publication||2018|
|Authors||Poggio, T, Liao, Q, Miranda, B, Banburski, A, Boix, X, Hidary, J|
The general features of the optimization problem for the case of overparametrized nonlinear networks have been clear for a while: SGD selects with high probability global minima vs local minima. In the overparametrized case, the key question is not optimization of the empirical risk but optimization with a generalization guarantee. In fact, a main puzzle of deep neural networks (DNNs) revolves around the apparent absence of “overfitting”, defined as follows: the expected error does not get worse when increasing the number of neurons or of iterations of gra- dient descent. This is superficially surprising because of the large capacity demonstrated by DNNs to fit randomly labeled data and the absence of explicit regularization. Several recent efforts, including our previous versions of this technical report, strongly suggest that good test performance of deep networks depend on constraining the norm of their weights. Here we prove that
1This replaces previous versions of Theory IIIa and TheoryIIIb updating several vague or incorrect statements.
CBMM Memo No:
- CBMM Funded