Dynamics and Neural Collapse in Deep Classifiers trained with the Square Loss

TitleDynamics and Neural Collapse in Deep Classifiers trained with the Square Loss
Publication TypeCBMM Memos
Year of Publication2021
AuthorsRangamani, A, Xu, M, Banburski, A, Liao, Q, Poggio, T

Here we consider a simplified model of the dynamics of gradient flow under the square loss in ReLU networks.  We show that convergence to a solution with the absolute minimum "norm" -- defined as the product of the Frobenius norms of each layer weight matrix -- is expected when normalization by a Lagrange multiplier (LN) is used together with Weight Decay (WD). In the absence of LN+WD, good solutions for classification may still be achieved because of the           implicit bias towards small norm solutions in the trajectory dynamics of gradient descent introduced by close-to-zero initial conditions on the norms of the weights. The main property of the minimizers that bounds their expected binary classification error is  the norm: we prove that among all the close-to-interpolating solutions, the ones associated with smaller norm have better margin and better bounds on the expected classification error. We also prove that 
quasi-interpolating solutions obtained by gradient descent in the presence of WD show the recently discovered behavior of Neural Collapse and describe related predictions. Our analysis  supports the idea that the advantage of deep networks relative to other standard classifiers is restricted to specific deep architectures such as CNNs and is due to their good approximation properties for target functions that are locally compositional.

CBMM Memo No:  117

Associated Module: 

Research Area: 

CBMM Relationship: 

  • CBMM Funded