MIT researchers’ new theory illuminates machine learning’s black box.
by Cami Rosso
The resurgence of artificial intelligence (AI) is largely due to advances in pattern-recognition due to deep learning, a form of machine learning that does not require explicit hard-coding. The architecture of deep neural networks is somewhat inspired by the biological brain and neuroscience. Like the biological brain, the inner workings of exactly why deep networks work are largely unexplained, and there is no single unifying theory. Recently researchers at the Massachusetts Institute of Technology (MIT) revealed new insights about how deep learning networks work to help further demystify the black box of AI machine learning.
The MIT research trio of Tomaso Poggio, Andrzej Banburski, and Quianli Liao at the Center for Brains, Minds, and Machines developed a new theory as to why deep networks work and published their study published on June 9, 2020 in PNAS (Proceedings of the National Academy of Sciences of the United States of America).
The researchers focused their study on the approximation by deep networks of certain classes of multivariate functions that avoid the curse of dimensionality—phenomena in which there is an exponential dependence on the number of parameters for accuracy on the dimension. Frequently in applied machine learning, the data is highly dimensional. Examples of high dimensional data include facial recognition, customer purchase history, patient healthcare records, and financial market analysis.
The depth in deep networks refers to the number of computational layers–the more computational network layers, the deeper the network. To formulate their theory, the team examined deep learning’s approximation power, dynamics of optimization, and out-of-sample performance.
In the study, the researchers compared deep and shallow networks in which both used identical sets of procedures such as pooling, convolution, linear combinations, a fixed nonlinear function of one variable, and dot products. Why do deep networks have great approximation powers, and tend to achieve better results than shallow networks given they are both universal approximators?
The scientists observed that with convolutional deep neural networks with hierarchical locality, this exponential cost vanishes and becomes more linear again. Then they demonstrated that dimensionality can be avoided for deep networks of the convolutional type for certain types of compositional functions. The implications are that for problems with hierarchical locality, such as image classification, deep networks are exponentially more powerful than shallow networks.
“In approximation theory, both shallow and deep networks are known to approximate any continuous functions at an exponential cost,” the researchers wrote. “However, we proved that for certain types of compositional functions, deep networks of the convolutional type (even without weight sharing) can avoid the curse of dimensionality.”
The team then set out to explain why deep networks, which tend to be over-parameterized, perform well on out-of-sample data. The researchers demonstrated that for classification problems, given a standard deep network, trained with gradient descent algorithms, it is the direction in the parameter space that matters, rather than the norms or the size of the weights...
Read the full story on the Psychology Today website using the link below.