Humans and other organisms show an incredibly sophisticated ability to learn about their environments during their lifetimes. This learning is thought to alter the strength of connections between neurons in the brain, but we still do not understand the principles linking synaptic changes at the neural level to behavioral changes at the psychological level. Part of the difficulty stems from depth: the brain has a deep, many-layered structure that substantially complicates the learning process. To understand the specific impact of depth, I develop the theory of gradient descent learning in deep linear neural networks. Despite their linearity, the learning problem in these networks remains nonconvex and exhibits rich nonlinear learning dynamics. I give new exact solutions to the dynamics that quantitatively answer fundamental theoretical questions such as how learning speed scales with depth. These solutions revise the basic conceptual picture underlying deep learning systems—both engineered and biological—with ramifications for a variety of phenomena. In this talk I will highlight two consequences at different levels of detail. First, the theory suggests that depth influences the size and timing of receptive field changes in visual perceptual learning. And second, by considering data drawn from structured probabilistic graphical models, the theory reveals that only deep (and not shallow) networks undergo quasi stage-like transitions during learning reminiscent of those found in infant semantic development. These applications span levels of analysis from single neurons to cognitive psychology, demonstrating the potential of deep linear networks to connect detailed changes in neuronal networks to changes in high-level behavior and cognition.
52 Oxford Street, Harvard University Northwest Building, Cambridge, 02138