Dynamics and Neural Collapse in Deep Classifiers trained with the Square Loss

TitleDynamics and Neural Collapse in Deep Classifiers trained with the Square Loss
Publication TypeCBMM Memos
Year of Publication2021
AuthorsXu, M, Rangamani, A, Banburski, A, Liao, Q, Galanti, T, Poggio, T
Abstract

Here we consider a model of the dynamics of gradient flow under the square loss in overparametrized ReLUnetworks. We show that convergence to a solution with the maximum margin, which is the inverse of the product of the Frobenius norms of each layer weight matrix, is expected when normalization by a Lagrange multiplier (LM) is used together with Weight Decay (WD). We prove that SGD converges to solutions that have a bias towards 1)large margin and 2) low rank of the weight matrices. In addition, the solutions are predicted to show Neural Collapse. Almost-non-vacuous bounds are suggested for the expected error from estimates of the empirical margin.

CBMM Memo No:  117

Associated Module: 

Research Area: 

CBMM Relationship: 

  • CBMM Funded