Statistical Learning Theory and Applications

MIT

9.520/6.860, Class 11

Class 11: Neural networks - tips, tricks & software

Instructor: Andrzej Banburski

Description

We take a look at the problem of initialization and hyper-parameter tuning, before going on to discuss different optimizers, batch normalization, dropout, data augmentation and transfer learning. We then go through a brief overview of PyTorch and Tensorflow.

Slides

Slides for this lecture: PDF

I. Goodfellow, Y. Bengio and A. Courville, Deep Learning book, MIT Press, 2016.
Y. Bengio, Practical Recommendations for Gradient-Based Training of Deep Architectures , Neural Networks: Tricks of the Trade, pp 437-478, 2012
L. Bottou, Stochastic Gradient Descent Tricks, Neural Networks: Tricks of the Trade pp 421-436, 2012
S. Ioffe, C. Szegedy, Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift, ICML'15
S. Santurkar, D. Tsipras, A. Ilyas, A. Madry, How Does Batch Normalization Help Optimization? (No, It Is Not About Internal Covariate Shift),
Overview of optimizers
P. Goyal et al., Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
S. L. Smith, P. J. Kindermans, C. Ying, Q. V. Le, Don't Decay the Learning Rate, Increase the Batch Size , ICLR 2018
Overview of data augmentation
PyTorch tutorials
Tensorflow tutorials

The Center for Brains, Minds & Machines

Statistical Learning Theory and Applications

MIT

9.520/6.860, Class 11

Class 11: Neural networks - tips, tricks & software

Instructor: Andrzej Banburski

Description

Slides

Further Reading

Search form

Statistical Learning Theory and Applications

MIT

Class 11: Neural networks - tips, tricks & software

Instructor: Andrzej Banburski

Description

Slides

Further Reading