*Here we consider a model of the dynamics of gradient flow under the square loss in overparametrized ReLUnetworks. We show that convergence to a solution with the maximum margin, which is the inverse of the product of the\ Frobenius norms of each layer weight matrix, is expected when normalization by a Lagrange multiplier (LM) is\ used together with Weight Decay (WD). We prove that SGD converges to solutions that have a bias towards 1)large margin and 2) low rank of the weight matrices. In addition, the solutions are predicted to show\ **Neural Collapse. Almost-non-vacuous bounds are suggested\ for the expected error from estimates of the empirical margin.*