Theory III: Dynamics and Generalization in Deep Networks

TitleTheory III: Dynamics and Generalization in Deep Networks
Publication TypeCBMM Memos
Year of Publication2018
AuthorsBanburski, A, Liao, Q, Miranda, B, Poggio, T, Rosasco, L, Hidary, J, De La Torre, F
Date Published06/2018

The key to generalization is controlling the complexity of
            the network. However, there is no obvious control of
            complexity -- such as an explicit regularization term --
            in the training of deep networks for classification. We
            will show that a classical form of norm control -- but
            kind of hidden -- is responsible for good expected
            performance by
            deep networks trained with gradient descent techniques on
            exponential-type losses. In particular, gradient descent
            induces a dynamics of the normalized weights which
            converge for $t \to \infty$ to an equilibrium which
            corresponds to a minimum norm (or maximum margin)
            solution. For sufficiently large but finite $\rho$ -- and
            thus fnite $t$ -- the dynamics converges to one of several
            hyperbolic minima corresponding to a regularized,
            constrained minimizer -- the network with normalized
            weights-- which is stable and generalizes. At the limit,
            generalizaton is lost but the minimum norm property of the
            solution provides, we conjecture, good expected
            performance. Our approach extends some of the results of
            Srebro from linear networks to deep networks and provides
            a new perspective on the implicit bias of gradient
            descent. The elusive complexity control we describe is
            responsible, at least in part, for the puzzling empirical
            finding of good predictive performance by deep networks, despite



CBMM Memo No:  090

Associated Module: 

Research Area: 

CBMM Relationship: 

  • CBMM Funded