%0 Journal Article %J Nature Machine Intelligence %D 2020 %T A neural network trained for prediction mimics diverse features of biological neurons and perception %A William Lotter %A Gabriel Kreiman %A Cox, David %X

Recent work has shown that convolutional neural networks (CNNs) trained on image recognition tasks can serve as valuable models for predicting neural responses in primate visual cortex. However, these models typically require biologically infeasible levels of labelled training data, so this similarity must at least arise via different paths. In addition, most popular CNNs are solely feedforward, lacking a notion of time and recurrence, whereas neurons in visual cortex produce complex time-varying responses, even to static inputs. Towards addressing these inconsistencies with biology, here we study the emergent properties of a recurrent generative network that is trained to predict future video frames in a self-supervised manner. Remarkably, the resulting model is able to capture a wide variety of seemingly disparate phenomena observed in visual cortex, ranging from single-unit response dynamics to complex perceptual motion illusions, even when subjected to highly impoverished stimuli. These results suggest potentially deep connections between recurrent predictive neural network models and computations in the brain, providing new leads that can enrich both fields.

%B Nature Machine Intelligence %V 2 %P 210 - 219 %8 04/2020 %G eng %U http://www.nature.com/articles/s42256-020-0170-9 %N 4 %! Nat Mach Intell %R 10.1038/s42256-020-0170-9 %0 Journal Article %J Nature Machine Learning %D 2020 %T A neural network trained to predict future video frames mimics critical properties of biological neuronal responses and perception. %A William Lotter %A Gabriel Kreiman %A David Cox %X

While deep neural networks take loose inspiration from neuroscience, it is an open question how seriously to take the analogies between artificial deep networks and biological neuronal systems. Interestingly, recent work has shown that deep convolutional neural networks (CNNs) trained on large-scale image recognition tasks can serve as strikingly good models for predicting the responses of neurons in visual cortex to visual stimuli, suggesting that analogies between artificial and biological neural networks may be more than superficial. However, while CNNs capture key properties of the average responses of cortical neurons, they fail to explain other properties of these neurons. For one, CNNs typically require large quantities of labeled input data for training. Our own brains, in contrast, rarely have access to this kind of supervision, so to the extent that representations are similar between CNNs and brains, this similarity must arise via different training paths. In addition, neurons in visual cortex produce complex time-varying responses even to static inputs, and they dynamically tune themselves to temporal regularities in the visual environment. We argue that these differences are clues to fundamental differences between the computations performed in the brain and in deep networks. To begin to close the gap, here we study the emergent properties of a previously- described recurrent generative network that is trained to predict future video frames in a self-supervised manner. Remarkably, the model is able to capture a wide variety of seemingly disparate phenomena observed in visual cortex, ranging from single unit response dynamics to complex perceptual motion illusions. These results suggest potentially deep connections between recurrent predictive neural network models and the brain, providing new leads that can enrich both fields.

%B Nature Machine Learning %8 04/2020 %G eng %0 Report %D 2018 %T A neural network trained to predict future videoframes mimics critical properties of biologicalneuronal responses and perception %A William Lotter %A Gabriel Kreiman %A David Cox %X

While deep neural networks take loose inspiration from neuroscience, it is an open question how seriously to take the analogies between artificial deep networks and biological neuronal systems. Interestingly, recent work has shown that deep convolutional neural networks (CNNs) trained on large-scale image recognition tasks can serve as strikingly good models for predicting the responses of neurons in visual cortex to visual stimuli, suggesting that analogies between artificial and biological neural networks may be more than superficial. However, while CNNs capture key properties of the average responses of cortical neurons, they fail to explain other properties of these neurons. For one, CNNs typically require large quantities of labeled input data for training. Our own brains, in contrast, rarely have access to this kind of supervision, so to the extent that representations are similar between CNNs and brains, this similarity must arise via different training paths. In addition, neurons in visual cortex produce complex time-varying responses even to static inputs, and they dynamically tune themselves to temporal regularities in the visual environment. We argue that these differences are clues to fundamental differences between the computations performed in the brain and in deep networks. To begin to close the gap, here we study the emergent properties of a previously-described recurrent generative network that is trained to predict future video frames in a self-supervised manner. Remarkably, the model is able to capture a wide variety of seemingly disparate phenomena observed in visual cortex, ranging from single unit response dynamics to complex perceptual motion illusions. These results suggest potentially deep connections between recurrent predictive neural network models and the brain, providing new leads that can enrich both fields.

%I arXiv | Cornell University %8 05/2018 %G eng %U https://arxiv.org/pdf/1805.10734.pdf %0 Journal Article %J Proceedings of the National Academy of Sciences %D 2018 %T Recurrent computations for visual pattern completion %A Hanlin Tang %A Martin Schrimpf %A William Lotter %A Moerman, Charlotte %A Paredes, Ana %A Ortega Caro, Josue %A Hardesty, Walter %A David Cox %A Gabriel Kreiman %K Artificial Intelligence %K computational neuroscience %K Machine Learning %K pattern completion %K Visual object recognition %X

Making inferences from partial information constitutes a critical aspect of cognition. During visual perception, pattern completion enables recognition of poorly visible or occluded objects. We combined psychophysics, physiology, and computational models to test the hypothesis that pattern completion is implemented by recurrent computations and present three pieces of evidence that are consistent with this hypothesis. First, subjects robustly recognized objects even when they were rendered <15% visible, but recognition was largely impaired when processing was interrupted by backward masking. Second, invasive physiological responses along the human ventral cortex exhibited visually selective responses to partially visible objects that were delayed compared with whole objects, suggesting the need for additional computations. These physiological delays were correlated with the effects of backward masking. Third, state-of-the-art feed-forward computational architectures were not robust to partial visibility. However, recognition performance was recovered when the model was augmented with attractor-based recurrent connectivity. The recurrent model was able to predict which images of heavily occluded objects were easier or harder for humans to recognize, could capture the effect of introducing a backward mask on recognition behavior, and was consistent with the physiological delays along the human ventral visual stream. These results provide a strong argument of plausibility for the role of recurrent computations in making visual inferences from partial information.

%B Proceedings of the National Academy of Sciences %8 08/2018 %G eng %U http://www.pnas.org/lookup/doi/10.1073/pnas.1719397115 %! Proc Natl Acad Sci USA %R 10.1073/pnas.1719397115 %0 Conference Paper %B ICLR %D 2017 %T Deep Predictive Coding Networks for Video Prediction and Unsupervised Learning %A William Lotter %A Gabriel Kreiman %A David Cox %B ICLR %G eng %0 Generic %D 2017 %T Deep Predictive Coding Networks for Video Prediction and Unsupervised Learning %A William Lotter %A Gabriel Kreiman %A David Cox %X

While great strides have been made in using deep learning algorithms to solve supervised learning tasks, the problem of unsupervised learning—leveraging unlabeled examples to learn about the structure of a domain — remains a difficult unsolved challenge. Here, we explore prediction of future frames in a video sequence as an unsupervised learning rule for learning about the structure of the visual world. We describe a predictive neural network (“PredNet”) architecture that is inspired by the concept of “predictive coding” from the neuroscience literature. These networks learn to predict future frames in a video sequence, with each layer in the network making local predictions and only forwarding deviations from those predictions to subsequent network layers. We show that these networks are able to robustly learn to predict the movement of synthetic (rendered) objects, and that in doing so, the networks learn internal representations that are useful for decoding latent object parameters (e.g. pose) that support object recognition with fewer training views. We also show that these networks can scale to complex natural image streams (car-mounted camera videos), capturing key aspects of both egocentric movement and the movement of objects in the visual scene, and the representation learned in this setting is useful for estimating the steering angle. Altogether, these results suggest that prediction represents a powerful framework for unsupervised learning, allowing for implicit learning of object and scene structure.

%8 03/2017 %1

arXiv:1605.08104v5

%2

http://hdl.handle.net/1721.1/107497

%0 Generic %D 2016 %T PredNet - "Deep Predictive Coding Networks for Video Prediction and Unsupervised Learning" [code] %A William Lotter %A Gabriel Kreiman %A David Cox %X

The PredNet is a deep convolutional recurrent neural network inspired by the principles of predictive coding from the neuroscience literature [1, 2]. It is trained for next-frame video prediction with the belief that prediction is an effective objective for unsupervised (or "self-supervised") learning [e.g. 3-11].


For full project information and links to download code, etc. visit the website - https://coxlab.github.io/prednet/

%0 Conference Paper %B International Conference on Learning Representations (ICLR) %D 2016 %T Unsupervised Learning of Visual Structure using Predictive Generative Networks %A William Lotter %A Gabriel Kreiman %A David Cox %X

The ability to predict future states of the environment is a central pillar of intelligence. At its core, effective prediction requires an internal model of the world and an understanding of the rules by which the world changes. Here, we explore the internal models developed by deep neural networks trained using a loss based on predicting future frames in synthetic video sequences, using a CNN-LSTM-deCNN framework. We first show that this architecture can achieve excellent performance in visual sequence prediction tasks, including state-of-the-art performance in a standard 'bouncing balls' dataset (Sutskever et al., 2009). Using a weighted mean-squared error and adversarial loss (Goodfellow et al., 2014), the same architecture successfully extrapolates out-of-the-plane rotations of computer-generated faces. Furthermore, despite being trained end-to-end to predict only pixel-level information, our Predictive Generative Networks learn a representation of the latent structure of the underlying three-dimensional objects themselves. Importantly, we find that this representation is naturally tolerant to object transformations, and generalizes well to new tasks, such as classification of static images. Similar models trained solely with a reconstruction loss fail to generalize as effectively. We argue that prediction can serve as a powerful unsupervised loss for learning rich internal representations of high-level object features.

%B International Conference on Learning Representations (ICLR) %C San Juan, Puerto Rico %8 May 2016 %G eng %U http://arxiv.org/pdf/1511.06380v2.pdf %0 Generic %D 2015 %T UNSUPERVISED LEARNING OF VISUAL STRUCTURE USING PREDICTIVE GENERATIVE NETWORKS %A William Lotter %A Gabriel Kreiman %A David Cox %X

The ability to predict future states of the environment is a central pillar of intelligence. At its core, effective prediction requires an internal model of the world and an understanding of the rules by which the world changes. Here, we explore the internal models developed by deep neural networks trained using a loss based on predicting future frames in synthetic video sequences, using an Encoder-Recurrent-Decoder framework (Fragkiadaki et al., 2015). We first show that this architecture can achieve excellent performance in visual sequence prediction tasks, including state-of-the-art performance in a standard “bouncing balls” dataset (Sutskever et al., 2009). We then train on clips of out-of-the-plane rotations of computer-generated faces, using both mean-squared error and a generative adversarial loss (Goodfellow et al., 2014), extending the latter to a recurrent, conditional setting. Despite being trained end-to-end to predict only pixel-level information, our Predictive Generative Networks learn a representation of the latent variables of the underlying generative process. Importantly, we find that this representation is naturally tolerant to object transformations, and generalizes well to new tasks, such as classification of static images. Similar models trained solely with a reconstruction loss fail to generalize as effectively. We argue that prediction can serve as a powerful unsupervised loss for learning rich internal representations of high-level object features.

%8 12/15/2015 %G English %1

arXiv:1511.06380

%2

http://hdl.handle.net/1721.1/100275