Theory III: Dynamics and Generalization in Deep Networks

TitleTheory III: Dynamics and Generalization in Deep Networks
Publication TypeCBMM Memos
Year of Publication2018
AuthorsBanburski, A, Liao, Q, Miranda, B, Poggio, T, Rosasco, L, Hidary, J
Date Published06/2018
Abstract

Classical generalization bounds for classification suggest maximization of the margin of a deep network under the constraint of unit Frobenius norm of the weight matrix at each layer. We show that this goal can be achieved by gradient algorithms enforcing a unit norm constraint. We describe three algorithms of this kind and their relation with existing weight normalization and batch normalization algorithms, thus explaining their effectivenss. We also show that continuous standard gradient descent with normalization at the end is equivalent to gradient descent with norm constraint. We conjecture that this surprising property corresponds to the elusive implicit regularization of gradient descent in deep networks responsible for generalization despite overparametrization.

1This replaces previous versions of Theory IIIa and Theory IIIb.

DSpace@MIT

http://hdl.handle.net/1721.1/116692

Download:  PDF icon TheoryIII_ver2 PDF icon TheoryIII_ver11 PDF icon TheoryIII_ver12 PDF icon TheoryIII_ver13 PDF icon TheoryIII_ver14 PDF icon TheoryIII_ver15 CBMM Memo No:  090

Associated Module: 

Research Area: 

CBMM Relationship: 

  • CBMM Funded