Title | Dynamics and Neural Collapse in Deep Classifiers trained with the Square Loss |
Publication Type | CBMM Memos |
Year of Publication | 2021 |
Authors | Xu, M, Rangamani, A, Banburski, A, Liao, Q, Galanti, T, Poggio, T |
Abstract | Here we consider a model of the dynamics of gradient flow under the square loss in overparametrized ReLUnetworks. We show that convergence to a solution with the maximum margin, which is the inverse of the product of the Frobenius norms of each layer weight matrix, is expected when normalization by a Lagrange multiplier (LM) is used together with Weight Decay (WD). We prove that SGD converges to solutions that have a bias towards 1)large margin and 2) low rank of the weight matrices. In addition, the solutions are predicted to show Neural Collapse. Non-vacous bounds are shown for expected error based on empirical margin. |




Associated Module:
Research Area:
CBMM Relationship:
- CBMM Funded