Dynamics and Neural Collapse in Deep Classifiers trained with the Square Loss

TitleDynamics and Neural Collapse in Deep Classifiers trained with the Square Loss
Publication TypeCBMM Memos
Year of Publication2021
AuthorsRangamani, A, Xu, M, Banburski, A, Liao, Q, Poggio, T

Recent results suggest that square loss performs on par with cross-entropy loss in classification tasks for deep networks. While the theoretical understanding of training deep networks with the cross-entropy loss has been growing, 
the study of square loss for classification has been lacking. Here we study the dynamics of training under Gradient Descent techniques and show that we can expect convergence to minimum norm solutions when both Weight Decay (WD) and normalization techniques, like Lagrange multipliers, are used. We perform numerical simulations that show approximate independence on initial conditions as suggested by our analysis, while in the absence of normalization and regularization, we find that good solutions can be achieved for small initializations. We prove that quasi-interpolating solutions obtained by gradient descent in the presence of regularization are expected to show the recently discovered behavior of Neural Collapse.

Download:  PDF icon JMLR__2021-22.pdf
CBMM Memo No:  117

Associated Module: 

Research Area: 

CBMM Relationship: 

  • CBMM Funded