The key to generalization is controlling the complexity of

the network. However, there is no obvious control of

complexity -- such as an explicit regularization term --

in the training of deep networks for classification. We

will show that a classical form of norm control -- but

kind of hidden -- is responsible for good expected

performance by

deep networks trained with gradient descent techniques on

exponential-type losses. In particular, gradient descent

induces a dynamics of the normalized weights which

converge for $t \to \infty$ to an equilibrium which

corresponds to a minimum norm (or maximum margin)

solution. For sufficiently large but finite $\rho$ -- and

thus fnite $t$ -- the dynamics converges to one of several

hyperbolic minima corresponding to a regularized,

constrained minimizer -- the network with normalized

weights-- which is stable and generalizes. At the limit,

generalizaton is lost but the minimum norm property of the

solution provides, we conjecture, good expected

performance. Our approach extends some of the results of

Srebro from linear networks to deep networks and provides

a new perspective on the implicit bias of gradient

descent. The elusive complexity control we describe is

responsible, at least in part, for the puzzling empirical

finding of good predictive performance by deep networks, despite

overparametrization.

%8 06/2018 %2