Theory of Deep Learning IIb: Optimization Properties of SGD | The Center for Brains, Minds & Machines

Title	Theory of Deep Learning IIb: Optimization Properties of SGD
Publication Type	CBMM Memos
Year of Publication	2017
Authors	Zhang, C, Liao, Q, Rakhlin, A, Miranda, B, Golowich, N, Poggio, T
Date Published	12/2017
Abstract	In Theory IIb we characterize with a mix of theory and experiments the optimization of deep convolutional networks by Stochastic Gradient Descent. The main new result in this paper is theoretical and experimental evidence for the following conjecture about SGD: SGD concentrates in probability - like the classical Langevin equation – on large volume, “flat” minima, selecting flat minimizers which are with very high probability also global minimizers.
DSpace@MIT	http://hdl.handle.net/1721.1/115407

Download:

CBMM Memo No: 072

Research Area: