|Title||Dynamics and Neural Collapse in Deep Classifiers trained with the Square Loss|
|Publication Type||CBMM Memos|
|Year of Publication||2021|
|Authors||Xu, M, Rangamani, A, Banburski, A, Liao, Q, Galanti, T, Poggio, T|
Here we consider a model of the dynamics of gradient flow under the square loss in overparametrized ReLUnetworks. We show that convergence to a solution with the maximum margin, which is the inverse of the product of the Frobenius norms of each layer weight matrix, is expected when normalization by a Lagrange multiplier (LM) is used together with Weight Decay (WD). We prove that SGD converges to solutions that have a bias towards 1)large margin and 2) low rank of the weight matrices. In addition, the solutions are predicted to show Neural Collapse. Almost-non-vacuous bounds are suggested for the expected error from estimates of the empirical margin.
- CBMM Funded