Class 11: Neural networks - tips, tricks & software
Instructor: Andrzej Banburski
Description
We take a look at the problem of initialization and hyper-parameter tuning, before going on to discuss different optimizers, batch normalization, dropout, data augmentation and transfer learning. We then go through a brief overview of PyTorch and Tensorflow.
Slides
Slides for this lecture: PDF
Further Reading
- I. Goodfellow, Y. Bengio and A. Courville, Deep Learning book, MIT Press, 2016.
- Y. Bengio, Practical Recommendations for Gradient-Based Training of Deep Architectures , Neural Networks: Tricks of the Trade, pp 437-478, 2012
- L. Bottou, Stochastic Gradient Descent Tricks, Neural Networks: Tricks of the Trade pp 421-436, 2012
- S. Ioffe, C. Szegedy, Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift, ICML'15
- S. Santurkar, D. Tsipras, A. Ilyas, A. Madry, How Does Batch Normalization Help Optimization? (No, It Is Not About Internal Covariate Shift),
- Overview of optimizers
- P. Goyal et al., Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
- S. L. Smith, P. J. Kindermans, C. Ying, Q. V. Le, Don't Decay the Learning Rate, Increase the Batch Size , ICLR 2018
- Overview of data augmentation
- PyTorch tutorials
- Tensorflow tutorials