Faced with large amounts of data, the aim of machine learning is to make predictions. It applies to many types of data, such as images, sounds, biological data, etc. A key difficulty is to find relevant vectorial representations. While this problem had been often handled in a ad-hoc way by domain experts, it has recently proved useful to learn these representations directly from large quantities of data, and Deep Learning Convolutional Networks (DLCN) with ReLU nonlinearities have been particularly successful. The representations are then based on compositions of simple parameterized processing units, the depth coming from the large number of such compositions.
The goal of this special issue was to explore some of the mathematical ideas and problems at the heart of deep learning. In particular, two key mathematical questions about deep learning are:
the question about the power of the architecture—which classes of functions can it approximate well? Why are deep networks better than shallow and when?
Learning the unknown parameters—weights and biases—from the data via optimization of a loss function: do multiple solutions exist? How “many”? Why is stochastic gradient descent (SGD) so unreasonably efficient, at least in appearance?
These questions are still open and a full theory of Deep Learning is still in the making. This special issue, however, begins with two papers that provide a useful contribution to several other theoretical questions surrounding supervised deep learning.
%B Information and Inference %V 5 %P 103-104 %G eng %U http://imaiai.oxfordjournals.org/content/5/2/103.short %R 10.1093/imaiai/iaw010