Neural Circuits of Visual Intelligence

Neural Circuits of Visual Intelligence

The DiCarlo Lab explores neural network models that capture visual processing in the ventral visual stream of the primate brain, in support of object recognition.


How does intelligent behavior emerge from neural activity in the brain? How can the analysis of computational models help us understand the underlying neural processing, and vice versa? This unit explores these questions for vision, focusing on the function of the ventral visual stream in the brains of humans and monkeys.


James DiCarlo describes insights gained from a “reverse engineering” approach to studying the neural processes that underlie object recognition, which aims to build artificial neural networks that can predict recognition behavior of the primate visual system at multiple levels. The construction of model networks is informed by the high-level behavioral performance of humans and monkeys on recognition tasks and the detailed architecture and response properties of neurons along the ventral visual pathway in the primate brain.
Decades of work in cognitive neuroscience reveals that the human brain contains specialized areas for processing faces, places, bodies, language, speech, music, social interactions, Theory of Mind, and more. Focusing on regions of functional specialization along the ventral visual pathway, and using the analysis of predictive deep neural networks as a tool, Nancy Kanwisher discusses what functional selectivity means, whether it really exists, and why, from a computational perspective, the brain might exhibit such specialization.
James Haxby digs more deeply into the empirical evidence that processing in the ventral visual cortex is optimized for tasks such as face and body perception. Research using a range of static and dynamic stimuli highlights the importance of agent actions in defining the functional role of these regions. Computational analyses of fMRI data using multivariate methods capture fine-grained patterns of object representations along the ventral visual pathway.
The analysis of faces plays a key role in visual intelligence and social cognition. Winrich Freiwald explores the function of a specialized network of face processing regions in the primate brain, including a newly discovered region of the temporal pole (TP) that appears to be engaged in person recognition. Observations from fMRI and physiological studies have important implications for the representations of face features and computational processes underlying face recognition in the brain.
From a brief glimpse of a visual scene, we understand what is there, what people are doing, what is their relationship, what happened before, what might happen next, and so on. Gabriel Kreiman presents a general framework for solving visual cognition tasks with sequences of simple and reusable visual routines. He highlights three problems: recognizing partially occluded objects; the role of context in visual recognition; and visual search through attention guided eye movements. Deep convolutional networks with recurrent connections are proposed to model these tasks.
Jacqueline Gottlieb presents an extensive introduction to physiological studies of the control of eye movements and attention in the frontal eye fields and lateral intraparietal cortex of the primate brain, and reflects on how the brain actively allocates attention and assigns value to sources of information.
Robert Desimone: Attention
Robert Desimone describes the neural basis of attention in primate vision, including evidence for its computational role in enhancing the neural encoding of attended objects in cortical areas V4 and IT, and models of the underlying neural circuitry. Attention is believed to be implemented through neural synchrony within and across cortical areas, and the frontal eye fields and ventral pre-arcuate region of prefrontal cortex play a key role in top-down feature-based attentional control.
Jeremy Wolfe introduces Treisman’s Feature Integration Theory of attention and then examines the features that guide shifts of attention during search, describes challenges for model development such as how to terminate the search process and capture temporal differences between search and recognition, and outlines a neural architecture that combines a selective pathway for object recognition and non-selective pathway for visual properties such as texture and gist.
Current deep networks exceed human performance on standard image datasets like ImageNet, but models deviate from human perception in unexpected ways. Alexander Madry examines adversarial examples, in which image manipulations that are not perceptible by humans can yield erroneous classifications by deep learning networks. Such examples arise from non-robust features in the data that help the network to generalize to novel images; increasing feature robustness yields representations that are better aligned with human behavior.

Further Study

Online Resources

Additional information about the speakers’ research and publications can be found at these websites:

Also see the Brain-Score website and associated Brain-Score GitHub page, described by James DiCarlo

Nancy Kanwisher cites the following two videos that provide further background regarding functional specificity in the human brain:


Bashivan, P., Kar, K., DiCarlo, J. J. (2019) Neural population control via deep image synthesis, Science, 364, 453

Bichot, N. P., Xu, R., Ghadooshahy, A., Williams, M. L., Desimone, R. (2019) The role of prefrontal cortex in the control of feature attention in area V4, Nature Communications, 10, 5727

Engstrom, L., Iiyas, A., Santurkar, S., Tsipras, D., Tran, B., Madry, A. (2019) Adversarial robustness as a prior for learned representations, arXiv (Also see posting about this paper by Engstrom et al. (2019) that appeared in gradient science)

Engstrom, L., Iiyas, A., Santurkar, S., Tsipras, D., Tran, B., Madry, A. (2019) Adversarial examples are not bugs, they are features, arXiv (Also see posting about this paper by Engstrom et al. (2019) that appeared in gradient science)

Freiwald, W. A., Tsao, D. Y. (2010) Functional compartmentalization and viewpoint generalization within the macaque face-processing system, Science, 330, 845-851

Gottlieb, J., Oudeyer, P.Y. (2018) Toward a neuroscience of active sampling and curiosity. Nature Reviews Neuroscience, 19(12), 758-770

Guntupalli, J. S., Hanke, M., Halchenko, Y. O., Connolly, A. C., Ramadge, P. J., Haxby, J. V. (2016) A model of representational spaces in human cortex, Cerebral Cortex, 26(6), 2919-2934

Kar, K., Kubilius, J., Schmidt, K., Issa, E. B., DiCarlo, J. J. (2019) Evidence that recurrent circuits are critical to the ventral stream’s execution of core object recognition behavior, Nature Neuroscience, 22, 974-983

Kreiman, G., Serre, T. (2020) Beyond the feedforward sweep: Feedback computations in the visual cortex, Annals of the New York Academy of Sciences, 1464(1), 222-241

Kreiman, G. (2020) Biological and Computer Vision, Cambridge University Press

Landi, S. M., Viswanathan, P., Serene, S., Freiwald, W. A. (2021) A fast link between face  perception and memory in the temporal pole, Science, 373, July 1

Lee, H., Margalit, E., Jozwika, K. N., Cohen, M. A., Kanwisher, N., Yamins, D. L. K., DiCarlo, J. J. (2020) Topographic deep artificial neural networks reproduce the hallmarks of the primate inferior temporal cortex face processing network, bioRxiv doi:

Nastase, S. A., Halchenko, Y. O., Connolly, A. C., Gobbini, M. I., Haxby, J. V. (2018) Neural responses to naturalistic clips of behaving animals in two different task contexts, Frontiers in Neuroscience, 12, 316

Ponce, C. R., Xiao, W., Schade, P. F., Hartmann, T. S., Kreiman, G., Livingstone, M. S. (2019) Evolving images for visual neurons using a deep generative network reveals coding principles and neuronal preferences, Cell, 177, 999-1009

Sha, L., Haxby, J. V., Abdi, H., Guntupalli, J. S., Oosterhof, N. N., Halchenko, Y. O., Connolly, A. C. (2015) The animacy continuum in the human ventral vision pathway, J. Cognitive Neuroscience, 27(4), 665-678

Shomstein, S., Gottlieb, J. (2016) Spatial and non-spatial aspects of visual attention: interactive cognitive mechanisms and neural underpinnings, Neuropsychologia, 92, 9-19

Tang, H., Schrimpf, M., Lotter, W., Moerman, C., Parades, A., Carob, J. O., Hardesty, W., Cox, D., Kreiman, G. (2018) Recurrent computations for visual pattern completion, Proceedings of the National Academy of Sciences, 115(35), 8835-8840

Yamins, D. L. K., Hong, H., Cadieu, C. F., Solomon, E. A., Seibert, D., DiCarlo, J. J. (2014) Performance-optimized hierarchical models predict neural responses in higher visual cortex, Proceedings of the National Academy of Sciences, 111(23), 8619-8624

Yildirim, I., Belledonne, M., Freiwald, W., Tenenbaum, J. (2020) Efficient inverse graphics in biological face processing, Science Advances, 6, eaax5979

Wolfe, J. M. (2020) Visual search: How do we find what we are looking for? Annual Review of Vision Science, 6(1), 539-562

Wolfe, J. M., Horowitz, T. S. (2017) Five factors that guide attention in visual search, Nature: Human Behavior, 1, 0058

Zhou, H., Schafer, R. J., Desimone, R. (2015) Pulvinar-cortex interactions in vision and attention, Neuron, 89, 209-220