%0 Generic %D 2023 %T BrainBERT: Self-supervised representation learning for Intracranial Electrodes %A Christopher Wang %A Vighnesh Subramaniam %A Adam Uri Yaari %A Gabriel Kreiman %A Boris Katz %A Ignacio Cases %A Andrei Barbu %K decoding %K language models %K Neuroscience %K self-supervision %K transformer %X

We create a reusable Transformer, BrainBERT, for intracranial recordings bringing modern representation learning approaches to neuroscience. Much like in NLP and speech recognition, this Transformer enables classifying complex concepts, i.e., decoding neural data, with higher accuracy and with much less data by being pretrained in an unsupervised manner on a large corpus of unannotated neural recordings. Our approach generalizes to new subjects with electrodes in new positions and to unrelated tasks showing that the representations robustly disentangle the neural signal. Just like in NLP where one can study language by investigating what a language model learns, this approach opens the door to investigating the brain by what a model of the brain learns. As a first step along this path, we demonstrate a new analysis of the intrinsic dimensionality of the computations in different areas of the brain. To construct these representations, we combine a technique for producing super-resolution spectrograms of neural data with an approach designed for generating contextual representations of audio by masking. In the future, far more concepts will be decodable from neural recordings by using representation learning, potentially unlocking the brain like language models unlocked language.

%B International Conference on Learning Representations %C Kigali, Rwanda, Africa %8 02/2023 %U https://openreview.net/forum?id=xmcYx_reUn6 %0 Journal Article %J arXiv %D 2023 %T Forward learning with top-down feedback: empirical and analytical characterization %A Ravi Francesco Srinivasan %A Francesca Mignacco %A Martino Sorbaro %A Maria Refinetti %A Gabriel Kreiman %A Giorgia Dellaferrera %X

“Forward-only” algorithms, which train neural networks while avoiding a backward pass, have recently gained attention as a way of solving the biologically unrealistic aspects of backpropagation. Here, we first discuss the similarities between two “forward-only” algorithms, the Forward- Forward and PEPITA frameworks, and demonstrate that PEPITA is equivalent to a Forward- Forward framework with top-down feedback connections. Then, we focus on PEPITA to address compelling challenges related to the “forward- only” rules, which include providing an analytical understanding of their dynamics and reducing the gap between their performance and that of backpropagation. We propose a theoretical analysis of the dynamics of PEPITA. In particular, we show that PEPITA is well-approximated by an “adaptive-feedback-alignment” algorithm and we analytically track its performance during learning in a prototype high-dimensional setting. Finally, we develop a strategy to apply the weight mirroring algorithm on “forward-only” algorithms with top-down feedback and we show how it impacts PEPITA’s accuracy and convergence rate.

%B arXiv %8 02/2023 %G eng %U https://arxiv.org/abs/2302.05440 %0 Journal Article %J bioRxiv %D 2023 %T Out of sight, out of mind: Responses in primate ventral visual cortex track individual fixations during natural vision %A Will Xiao %A Saloni Sharma %A Gabriel Kreiman %A Margaret S. Livingstone %X

During natural vision, primates shift their gaze several times per second with large, ballistic eye movements known as saccades. Open questions remain as to whether visual neurons retain their classical retinotopic response properties during natural vision or whether neurons integrate information across fixations and predict the consequences of impending saccades. Answers are especially wanting for vision in complex scenes relevant to natural behavior. We let 13 monkeys freely view thousands of large natural images, recorded over 883 hours of neuronal responses throughout the ventral visual pathway across 4.7 million fixations, and designed flexible analyses to reveal the spatial, temporal, and feature selectivity of the responses. Ventral visual responses followed each fixation and did not become gaze-invariant as monkeys examined an image over seconds. Computational models revealed that neuronal responses corresponded to eye-centered receptive fields. The results suggest that ventral visual cortex remains predominantly retinotopic during natural vision and does not establish a gaze-independent representation of the world.

%B bioRxiv %8 02/2023 %G eng %U https://www.biorxiv.org/content/10.1101/2023.02.08.527666v1 %R 10.1101/2023.02.08.527666 %0 Generic %D 2023 %T Sparse distributed memory is a continual learner %A Trenton Bricken %A Xander Davies %A Deepak Singh %A Dmitry Krotov %A Gabriel Kreiman %K Biologically Inspired %K Continual Learning %K Sparse Distributed Memory %K Sparsity %K Top-K Activation %X

Continual learning is a problem for artificial neural networks that their biological counterparts are adept at solving. Building on work using Sparse Distributed Memory (SDM) to connect a core neural circuit with the powerful Transformer model, we create a modified Multi-Layered Perceptron (MLP) that is a strong continual learner. We find that every component of our MLP variant translated from biology is necessary for continual learning. Our solution is also free from any memory replay or task information, and introduces novel methods to train sparse networks that may be broadly applicable.

%B International Conference on Learning Representations %C Kigali, Rwanda, Africa %8 03/2023 %U https://openreview.net/forum?id=JknGeelZJpHP %0 Journal Article %J Cognitive Neuropsychology %D 2022 %T Do computational models of vision need shape-based representations? Evidence from an individual with intriguing visual perceptions %A Armendariz, Marcelo %A Will Xiao %A Vinken, Kasper %A Gabriel Kreiman %K computer vision models %K intermediate representations %K Ventral visual cortex %K visual deficits %B Cognitive Neuropsychology %P 1 - 3 %8 02/2022 %G eng %U https://www.tandfonline.com/doi/full/10.1080/02643294.2022.2041588 %! Cognitive Neuropsychology %R 10.1080/02643294.2022.2041588 %0 Journal Article %J arXiv %D 2022 %T On the Efficacy of Co-Attention Transformer Layers in Visual Question Answering %A Ankur Sikarwar %A Gabriel Kreiman %X

In recent years, multi-modal transformers have shown significant progress in Vision-Language tasks, such as Visual Question Answering (VQA), outperforming previous architectures by a considerable margin. This improvement in VQA is often attributed to the rich interactions between vision and language streams. In this work, we investigate the efficacy of co-attention transformer layers in helping the network focus on relevant regions while answering the question. We generate visual attention maps using the question-conditioned image attention scores in these co-attention layers. We evaluate the effect of the following critical components on visual attention of a state-of-the-art VQA model: (i) number of object region proposals, (ii) question part of speech (POS) tags, (iii) question semantics, (iv) number of co-attention layers, and (v) answer accuracy. We compare the neural network attention maps against human attention maps both qualitatively and quantitatively. Our findings indicate that co-attention transformer modules are crucial in attending to relevant regions of the image given a question. Importantly, we observe that the semantic meaning of the question is not what drives visual attention, but specific keywords in the question do. Our work sheds light on the function and interpretation of co-attention transformer layers, highlights gaps in current networks, and can guide the development of future VQA models and networks that simultaneously process visual and language streams.

%B arXiv %8 01/2022 %G eng %U https://arxiv.org/abs/2201.03965 %R 10.48550/arXiv.2201.03965 %0 Conference Proceedings %B Proceedings of the 39th International Conference on Machine Learning, PMLR %D 2022 %T Error-driven Input Modulation: Solving the Credit Assignment Problem without a Backward Pass %A Giorgia Dellaferrera %A Gabriel Kreiman %X

Supervised learning in artificial neural networks typically relies on backpropagation, where the weights are updated based on the error-function gradients and sequentially propagated from the output layer to the input layer. Although this approach has proven effective in a wide domain of applications, it lacks biological plausibility in many regards, including the weight symmetry problem, the dependence of learning on non-local signals, the freezing of neural activity during error propagation, and the update locking problem. Alternative training schemes have been introduced, including sign symmetry, feedback alignment, and direct feedback alignment, but they invariably rely on a backward pass that hinders the possibility of solving all the issues simultaneously. Here, we propose to replace the backward pass with a second forward pass in which the input signal is modulated based on the error of the network. We show that this novel learning rule comprehensively addresses all the above-mentioned issues and can be applied to both fully connected and convolutional models. We test this learning rule on MNIST, CIFAR-10, and CIFAR-100. These results help incorporate biological principles into machine learning.

%B Proceedings of the 39th International Conference on Machine Learning, PMLR %V 162 %P 4937-4955 %8 07/2022 %G eng %U https://proceedings.mlr.press/v162/dellaferrera22a.html %0 Journal Article %J Proceedings of the National Academy of Sciences %D 2022 %T Face neurons encode nonsemantic features %A Bardon, Alexandra %A Will Xiao %A Carlos R Ponce %A Margaret S Livingstone %A Gabriel Kreiman %X

The primate inferior temporal cortex contains neurons that respond more strongly to faces than to other objects. Termed “face neurons,” these neurons are thought to be selective for faces as a semantic category. However, face neurons also partly respond to clocks, fruits, and single eyes, raising the question of whether face neurons are better described as selective for visual features related to faces but dissociable from them. We used a recently described algorithm, XDream, to evolve stimuli that strongly activated face neurons. XDream leverages a generative neural network that is not limited to realistic objects. Human participants assessed images evolved for face neurons and for nonface neurons and natural images depicting faces, cars, fruits, etc. Evolved images were consistently judged to be distinct from real faces. Images evolved for face neurons were rated as slightly more similar to faces than images evolved for nonface neurons. There was a correlation among natural images between face neuron activity and subjective “faceness” ratings, but this relationship did not hold for face neuron–evolved images, which triggered high activity but were rated low in faceness. Our results suggest that so-called face neurons are better described as tuned to visual features rather than semantic categories.

%B Proceedings of the National Academy of Sciences %V 119 %8 02/2022 %G eng %U https://pnas.org/doi/full/10.1073/pnas.2118705119 %N 16 %! Proc. Natl. Acad. Sci. U.S.A. %R 10.1073/pnas.2118705119 %0 Journal Article %J Nature Neuroscience %D 2022 %T Neurons detect cognitive boundaries to structure episodic memories in humans %A Zheng, Jie %A Schjetnan, Andrea G. P. %A Yebra, Mar %A Gomes, Bernard A. %A Mosher, Clayton P. %A Kalia, Suneil K. %A Valiante, Taufik A. %A Mamelak, Adam N. %A Gabriel Kreiman %A Rutishauser, Ueli %X

While experience is continuous, memories are organized as discrete events. Cognitive boundaries are thought to segment experience and structure memory, but how this process is implemented remains unclear. We recorded the activity of single neurons in the human medial temporal lobe (MTL) during the formation and retrieval of memories with complex narratives. Here, we show that neurons responded to abstract cognitive boundaries between different episodes. Boundary-induced neural state changes during encoding predicted subsequent recognition accuracy but impaired event order memory, mirroring a fundamental behavioral tradeoff between content and time memory. Furthermore, the neural state following boundaries was reinstated during both successful retrieval and false memories. These findings reveal a neuronal substrate for detecting cognitive boundaries that transform experience into mnemonic episodes and structure mental time travel during retrieval.

%B Nature Neuroscience %V 25 %P 358 - 368 %8 03/2022 %G eng %U https://www.nature.com/articles/s41593-022-01020-w %N 3 %! Nat Neurosci %R 10.1038/s41593-022-01020-w %0 Journal Article %J arXiv %D 2022 %T One thing to fool them all: generating interpretable, universal, and physically-realizable adversarial features %A Stephen Casper %A Max Nadeau %A Gabriel Kreiman %X

It is well understood that modern deep networks are vulnerable to adversarial attacks. However, conventional attack methods fail to produce adversarial perturbations that are intelligible to humans, and they pose limited threats in the physical world. To study feature-class associations in networks and better understand their vulnerability to attacks in the real world, we develop feature-level adversarial perturbations using deep image generators and a novel optimization objective. We term these feature-fool attacks. We show that they are versatile and use them to generate targeted feature-level attacks at the ImageNet scale that are simultaneously interpretable, universal to any source image, and physically-realizable. These attacks reveal spurious, semantically-describable feature/class associations that can be exploited by novel combinations of objects. We use them to guide the design of “copy/paste” adversaries in which one natural image is pasted into another to cause a targeted misclassification.

%B arXiv %8 01/2022 %G eng %U https://arxiv.org/abs/2110.03605 %R 10.48550/arXiv.2110.03605 %0 Generic %D 2022 %T Robust Feature-Level Adversaries are Interpretability Tools %A Stephen Casper %A Max Nadeau %A Dylan Hadfield-Menell %A Gabriel Kreiman %K Adversarial Attacks %K Explainability %K Interpretability %X

The literature on adversarial attacks in computer vision typically focuses on pixel-level perturbations. These tend to be very difficult to interpret. Recent work that manipulates the latent representations of image generators to create "feature-level" adversarial perturbations gives us an opportunity to explore perceptible, interpretable adversarial attacks. We make three contributions. First, we observe that feature-level attacks provide useful classes of inputs for studying representations in models. Second, we show that these adversaries are uniquely versatile and highly robust. We demonstrate that they can be used to produce targeted, universal, disguised, physically-realizable, and black-box attacks at the ImageNet scale. Third, we show how these adversarial images can be used as a practical interpretability tool for identifying bugs in networks. We use these adversaries to make predictions about spurious associations between features and classes which we then test by designing "copy/paste" attacks in which one natural image is pasted into another to cause a targeted misclassification. Our results suggest that feature-level attacks are a promising approach for rigorous interpretability research. They support the design of tools to better understand what a model has learned and diagnose brittle feature associations. Code is available at https://github.com/thestephencasper/feature_level_adv.

%B NeurIPS %C New Orleans, Louisiana %8 10/2022 %U https://openreview.net/forum?id=lQ--doSB2o %0 Journal Article %J Scientific Reports %D 2022 %T Stochastic consolidation of lifelong memoryAbstract %A Shaham, Nimrod %A Chandra, Jay %A Gabriel Kreiman %A Sompolinsky, Haim %X

Humans have the remarkable ability to continually store new memories, while maintaining old memories for a lifetime. How the brain avoids catastrophic forgetting of memories due to interference between encoded memories is an open problem in computational neuroscience. Here we present a model for continual learning in a recurrent neural network combining Hebbian learning, synaptic decay and a novel memory consolidation mechanism: memories undergo stochastic rehearsals with rates proportional to the memory’s basin of attraction, causing self-amplified consolidation. This mechanism gives rise to memory lifetimes that extend much longer than the synaptic decay time, and retrieval probability of memories that gracefully decays with their age. The number of retrievable memories is proportional to a power of the number of neurons. Perturbations to the circuit model cause temporally-graded retrograde and anterograde deficits, mimicking observed memory impairments following neurological trauma.

%B Scientific Reports %V 12 %8 07/2022 %G eng %U https://www.nature.com/articles/s41598-022-16407-9 %N 1 %! Sci Rep %R 10.1038/s41598-022-16407-9 %0 Journal Article %J BioRxiv %D 2022 %T Task-specific neural processes underlying conflict resolution during cognitive control %A Yuchen Xiao %A Chien-Chen Chou %A Garth Rees Cosgrove %A Nathan E Crone %A Scellig Stone %A Joseph R Madsen %A Ian Reucroft %A Yen-Cheng Shih %A Daniel Weisholtz %A Hsiang-Yu Yu %A William S. Anderson %A Gabriel Kreiman %X

Cognitive control involves flexibly combining multiple sensory inputs with task-dependent goals during decision making. Several tasks have been proposed to examine cognitive control, including Stroop, Eriksen-Flanker, and the Multi-source interference task. Because these tasks have been studied independently, it remains unclear whether the neural signatures of cognitive control reflect abstract control mechanisms or specific combinations of sensory and behavioral aspects of each task. To address this question, here we recorded invasive neurophysiological signals from 16 subjects and directly compared the three tasks against each other. Neural activity patterns in the theta and high-gamma frequency bands differed between incongruent and congruent conditions, revealing strong modulation by conflicting task demands. These neural signals were specific to each task, generalizing within a task but not across tasks. These results highlight the complex interplay between sensory inputs, motor outputs, and task demands and argue against a universal and abstract representation of conflict.

%B BioRxiv %8 01/2022 %G eng %U https://www.biorxiv.org/content/10.1101/2022.01.16.476535 %R 10.1101/2022.01.16.476535 %0 Journal Article %J Nature Human Behaviour %D 2021 %T Beauty is in the eye of the machine %A Zhang, Mengmi %A Gabriel Kreiman %X

Ansel Adams said, “There are no rules for good photographs, there are only good photographs.” Is it possible to predict our fickle and subjective appraisal of ‘aesthetically pleasing’ visual art? Iigaya et al. used an artificial intelligence approach to show how human aesthetic preference can be partially explained as an integration of hierarchical constituent image features.

Artificial intelligence (AI) has made rapid strides in a wide range of visual tasks, including recognition of objects and faces, automatic diagnosis of clinical images, and answering questions about images. More recently, AI has also started penetrating the arts. For example, in October 2018, the first piece of AI-generated art came to auction, with an initial estimate of US$ 10,000, and strikingly garnered a final bid of US$ 432,500 (Fig. 1). The portrait depicts a portly gentleman with a seemingly fuzzy facial expression, dressed in a black frockcoat with a white collar. Appreciating and creating a piece of art requires a general understanding of aesthetics. What are the nuances, structures, and semantics embedded in a painting that can provide us with an aesthetically pleasing sense?

%B Nature Human Behaviour %V 5 %P 675 - 676 %8 05/2021 %G eng %U http://www.nature.com/articles/s41562-021-01125-5 %N 6 %! Nat Hum Behav %R 10.1038/s41562-021-01125-5 %0 Book %D 2021 %T Biological and Computer Vision %A Gabriel Kreiman %X

Imagine a world where machines can see and understand the world the way humans do. Rapid progress in artificial intelligence has led to smartphones that recognize faces, cars that detect pedestrians, and algorithms that suggest diagnoses from clinical images, among many other applications. The success of computer vision is founded on a deep understanding of the neural circuits in the brain responsible for visual processing. This book introduces the neuroscientific study of neuronal computations in visual cortex alongside of the cognitive psychological understanding of visual cognition  and the burgeoning field of biologically-inspired artificial intelligence. Topics include the neurophysiological investigation of visual cortex, visual illusions, visual disorders, deep convolutional neural networks, machine learning, and generative adversarial networks among others. It is an ideal resource for students and researchers looking to build bridges across different approaches to studying and building visual systems.

%I Cambridge University Press %C Cambridge, UK %8 02/2021 %@ 978-1108705004 %G eng %U https://www.cambridge.org/core/books/biological-and-computer-vision/BB7E68A69AFE7A322F68F3C4A297F3CF %R 10.1017/9781108649995 %0 Journal Article %J bioRxiv %D 2021 %T Cognitive boundary signals in the human medial temporal lobe shape episodic memory representation %A Jie Zheng %A Andrea Gómez Palacio Schjetnan %A Mar Yebra %A Clayton Mosher %A Suneil Kalia %A Taufik A. Valiante %A Adam N. Mamelak %A Gabriel Kreiman %A Ueli Rutishauser %X

While experience unfolds continuously, memories are organized as a set of discrete events that bind together the “where”, “when”, and “what” of episodic memory. This segmentation of continuous experience is thought to be facilitated by the detection of salient environmental or cognitive events. However, the underlying neural mechanisms and how such segmentation shapes episodic memory representations remain unclear. We recorded from single neurons in the human medial temporal lobe while subjects watched videos with different types of embedded boundaries and were subsequently evaluated for memories of the video contents. Here we show neurons that signal the presence of cognitive boundaries between subevents from the same episode and neurons that detect the abstract separation between different episodes. The firing rate and spike timing of these boundary-responsive neurons were predictive of later memory retrieval accuracy. At the population level, abrupt neural state changes following boundaries predicted enhanced memory strength but impaired order memory, capturing the behavioral tradeoff subjects exhibited when recalling episodic content versus temporal order. Successful retrieval was associated with reinstatement of the neural state present following boundaries, indicating that boundaries structure memory search. These findings reveal a neuronal substrate for detecting cognitive boundaries and show that cognitive boundary signals facilitate the mnemonic organization of continuous experience as a set of discrete episodic events.

%B bioRxiv %8 01/2021 %G eng %0 Conference Paper %B AAAI 2021 %D 2021 %T Frivolous Units: Wider Networks Are Not Really That Wide %A Stephen Casper %A Xavier Boix %A Vanessa D'Amario %A Ling Guo %A Martin Schrimpf %A Vinken, Kasper %A Gabriel Kreiman %X

A remarkable characteristic of overparameterized deep neural networks (DNNs) is that their accuracy does not degrade when the network width is increased. Recent evidence suggests that developing compressible representations allows the complex- ity of large networks to be adjusted for the learning task at hand. However, these representations are poorly understood. A promising strand of research inspired from biology involves studying representations at the unit level as it offers a more granular interpretation of the neural mechanisms. In order to better understand what facilitates increases in width without decreases in accuracy, we ask: Are there mechanisms at the unit level by which networks control their effective complex- ity? If so, how do these depend on the architecture, dataset, and hyperparameters? We identify two distinct types of “frivolous” units that prolifer- ate when the network’s width increases: prunable units which can be dropped out of the network without significant change to the output and redundant units whose activities can be ex- pressed as a linear combination of others. These units imply complexity constraints as the function the network computes could be expressed without them. We also identify how the development of these units can be influenced by architecture and a number of training factors. Together, these results help to explain why the accuracy of DNNs does not degrade when width is increased and highlight the importance of frivolous units toward understanding implicit regularization in DNNs.

%B AAAI 2021 %8 05/2021 %G eng %U https://dblp.org/rec/conf/aaai/CasperBDGSVK21.html %0 Journal Article %J arXiv %D 2021 %T Hypothesis-driven Online Video Stream Learning with Augmented Memory %A Mengmi Zhang %A Rohil Badkundri %A Morgan B. Talbot %A Rushikesh Zawar %A Gabriel Kreiman %X

The ability to continuously acquire new knowledge without forgetting previous tasks remains a challenging problem for computer vision systems. Standard continual learning benchmarks focus on learning from static iid images in an offline setting. Here, we examine a more challenging and realistic online continual learning problem called online stream learning. Like humans, some AI agents have to learn incrementally from a continuous temporal stream of non-repeating data. We propose a novel model, Hypotheses-driven Augmented Memory Network (HAMN), which efficiently consolidates previous knowledge using an augmented memory matrix of "hypotheses" and replays reconstructed image features to avoid catastrophic forgetting. Compared with pixel-level and generative replay approaches, the advantages of HAMN are two-fold. First, hypothesis-based knowledge consolidation avoids redundant information in the image pixel space and makes memory usage far more efficient. Second, hypotheses in the augmented memory can be re-used for learning new tasks, improving generalization and transfer learning ability. Given a lack of online incremental class learning datasets on video streams, we introduce and adapt two additional video datasets, Toybox and iLab, for online stream learning. We also evaluate our method on the CORe50 and online CIFAR100 datasets. Our method performs significantly better than all state-of-the-art methods, while offering much more efficient memory usage. All source code and data are publicly available at this URL

%B arXiv %8 04/2021 %G eng %U https://arxiv.org/abs/2104.02206 %R 10.48550/arXiv.2104.02206 %0 Journal Article %J Social Cognitive and Affective Neuroscience %D 2021 %T Localized task-invariant emotional valence encoding revealed by intracranial recordingsAbstract %A Weisholtz, Daniel S %A Gabriel Kreiman %A Silbersweig, David A %A Stern, Emily %A Cha, Brannon %A Butler, Tracy %K classifier %K decoding %K emotion %K intracranial EEG %K valence %X

The ability to distinguish between negative, positive and neutral valence is a key part of emotion perception. Emotional valence has conceptual meaning that supersedes any particular type of stimulus, although it is typically captured experimentally in association with particular tasks. We sought to identify neural encoding for task-invariant emotional valence. We evaluated whether high gamma responses (HGRs) to visually displayed words conveying emotions could be used to decode emotional valence from HGRs to facial expressions. Intracranial electroencephalography (iEEG) was recorded from fourteen individuals while they participated in two tasks, one involving reading words with positive, negative, and neutral valence, and the other involving viewing faces with positive, negative, and neutral facial expressions. Quadratic discriminant analysis was used to identify information in the HGR that differentiates the three emotion conditions. A classifier was trained on the emotional valence labels from one task and was cross-validated on data from the same task (within-task classifier) as well as the other task (between-task classifier). Emotional valence could be decoded in the left medial orbitofrontal cortex and middle temporal gyrus, both using within-task classifiers as well as between-task classifiers. These observations suggest the presence of task-independent emotional valence information in the signals from these regions.

%B Social Cognitive and Affective Neuroscience %8 12/2022 %G eng %U https://academic.oup.com/scan/advance-article/doi/10.1093/scan/nsab134/6481890 %R 10.1093/scan/nsab134 %0 Journal Article %J Cell Reports %D 2021 %T Mesoscopic physiological interactions in the human brain reveal small-world properties %A Wang, Jiarui %A Tao, Annabelle %A Anderson, William S. %A Madsen, Joseph R. %A Gabriel Kreiman %X

Cognition depends on rapid and robust communication between neural circuits spanning different brain areas. We investigated the mesoscopic network of cortico-cortical interactions in the human brain in an extensive dataset consisting of 6,024 h of intracranial field potential recordings from 4,142 electrodes in 48 subjects. We evaluated communication between brain areas at the network level across different frequency bands. The interaction networks were validated against known anatomical measurements and neurophysiological interactions in humans and monkeys. The resulting human brain interactome is characterized by a broad and spatially specific, dynamic, and extensive network. The physiological interactome reveals small-world properties, which we conjecture might facilitate efficient and reliable information transmission. The interaction dynamics correlate with the brain sleep/awake state. These results constitute initial steps toward understanding how the interactome orchestrates cortical communication and provide a reference for future efforts assessing how dysfunctional interactions may lead to mental disorders.

%B Cell Reports %V 36 %P 109585 %8 08/2021 %G eng %U https://linkinghub.elsevier.com/retrieve/pii/S2211124721010196 %N 8 %! Cell Reports %R 10.1016/j.celrep.2021.109585 %0 Generic %D 2021 %T Visual Search Asymmetry: Deep Nets and Humans Share Similar Inherent Biases %A Shashi Kant Gupta %A Mengmi Zhang %A CHIA-CHIEN WU %A Jeremy Wolfe %A Gabriel Kreiman %X

Visual search is a ubiquitous and often challenging daily task, exemplified by looking for the car keys at home or a friend in a crowd. An intriguing property of some classical search tasks is an asymmetry such that finding a target A among distractors B can be easier than finding B among A. To elucidate the mechanisms responsible for asymmetry in visual search, we propose a computational model that takes a target and a search image as inputs and produces a sequence of eye movements until the target is found. The model integrates eccentricity-dependent visual recognition with target-dependent top-down cues. We compared the model against human behavior in six paradigmatic search tasks that show asymmetry in humans. Without prior exposure to the stimuli or task-specific training, the model provides a plausible mechanism for search asymmetry. We hypothesized that the polarity of search asymmetry arises from experience with the natural environment. We tested this hypothesis by training the model on augmented versions of ImageNet where the biases of natural images were either removed or reversed. The polarity of search asymmetry disappeared or was altered depending on the training protocol. This study highlights how classical perceptual properties can emerge in neural network models, without the need for task-specific training, but rather as a consequence of the statistical properties of the developmental diet fed to the model. All source code and data are publicly available at https://github.com/kreimanlab/VisualSearchAsymmetry.

%B NeurIPS 2021 %8 12/2021 %U https://nips.cc/Conferences/2021/Schedule?showEvent=28848 %0 Conference Proceedings %B International Conference on Computer Vision (ICCV) %D 2021 %T When Pigs Fly: Contextual Reasoning in Synthetic and Natural Scenes %A Philipp Bomatter %A Mengmi Zhang %A Dimitar Karev %A Spandan Madan %A Claire Tseng %A Gabriel Kreiman %X

Context is of fundamental importance to both human and machine vision; e.g., an object in the air is more likely to be an airplane than a pig. The rich notion of context incorporates several aspects including physics rules, statistical co-occurrences, and relative object sizes, among others. While previous work has focused on crowd-sourced out-of-context photographs from the web to study scene context, controlling the nature and extent of contextual violations has been a daunting task. Here we introduce a diverse, synthetic Out-of-Context Dataset (OCD) with fine-grained control over scene context. By leveraging a 3D simulation engine, we systematically control the gravity, object co-occurrences and relative sizes across 36 object categories in a virtual household environment. We conducted a series of experiments to gain insights into the impact of contextual cues on both human and machine vision using OCD. We conducted psychophysics experiments to establish a human benchmark for out-of-context recognition, and then compared it with state-of-the-art computer vision models to quantify the gap between the two. We propose a context-aware recognition transformer model, fusing object and contextual information via multi-head attention. Our model captures useful information for contextual reasoning, enabling human-level performance and better robustness in out-of-context conditions compared to baseline models across OCD and other out-of-context datasets. All source code and data are publicly available at https://github.com/kreimanlab/ WhenPigsFlyContext

%B International Conference on Computer Vision (ICCV) %8 08/2021 %G eng %R 10.1109/iccv48922.2021.00032 %0 Journal Article %J Annals of the New York Academy of Sciences %D 2020 %T Beyond the feedforward sweep: feedback computations in the visual cortex %A Gabriel Kreiman %A Serre, Thomas %X

Visual perception involves the rapid formation of a coarse image representation at the onset of visual processing, which is iteratively refined by late computational processes. These early versus late time windows approximately map onto feedforward and feedback processes, respectively. State-of-the-art convolutional neural networks, the main engine behind recent machine vision successes, are feedforward architectures. Their successes and limitations provide critical information regarding which visual tasks can be solved by purely feedforward processes and which require feedback mechanisms. We provide an overview of recent work in cognitive neuroscience and machine vision that highlights the possible role of feedback processes for both visual recognition and beyond. We conclude by discussing important open questions for future research.

%B Annals of the New York Academy of Sciences %V 1464 %P 222 - 241 %8 02/2020 %G eng %U https://onlinelibrary.wiley.com/toc/17496632/1464/1 %N 1 %! Ann. N.Y. Acad. Sci. %R 10.1111/nyas.v1464.110.1111/nyas.14320 %0 Journal Article %J Ann. N.Y. Acad. Sci. | Special Issue: The Year in Cognitive Neuroscience %D 2020 %T Beyond the feedforward sweep: feedback computations in the visual cortex %A Gabriel Kreiman %A Serre, Thomas %K deeplearning;neuralnetworks;machinevision;visualreasoning;categorization;grouping %X

Visual perception involves the rapid formation of a coarse image representation at the onset of visual processing, which is iteratively refined by late computational processes. These early versus late time windows approximately map onto feedforward and feedback processes, respectively. State-of-the-art convolutional neural networks, the main engine behind recent machine vision successes, are feedforward architectures. Their successes and limitations provide critical information regarding which visual tasks can be solved by purely feedforward processes and which require feedback mechanisms. We provide an overview of recent work in cognitive neuroscience and machine vision that highlights the possible role of feedback processes for both visual recognition and beyond. We conclude by discussing important open questions for future research.

%B Ann. N.Y. Acad. Sci. | Special Issue: The Year in Cognitive Neuroscience %V 1464 %P 222-241 %8 02/2020 %G eng %U https://nyaspubs.onlinelibrary.wiley.com/doi/10.1111/nyas.14320 %N 1 %R 10.1111/nyas.14320 %0 Journal Article %J CVPR 2020 %D 2020 %T Can Deep Learning Recognize Subtle Human Activities? %A Jacquot, V %A Ying, J %A Gabriel Kreiman %B CVPR 2020 %8 01/2020 %G eng %0 Journal Article %J Science Advances %D 2020 %T Incorporating intrinsic suppression in deep neural networks captures dynamics of adaptation in neurophysiology and perception %A Vinken, K. %A Boix, X. %A Gabriel Kreiman %X

Adaptation is a fundamental property of sensory systems that can change subjective experiences in the context of recent information. Adaptation has been postulated to arise from recurrent circuit mechanisms or as a consequence of neuronally intrinsic suppression. However, it is unclear whether intrinsic suppression by itself can account for effects beyond reduced responses. Here, we test the hypothesis that complex adaptation phenomena can emerge from intrinsic suppression cascading through a feedforward model of visual processing. A deep convolutional neural network with intrinsic suppression captured neural signatures of adaptation including novelty detection, enhancement, and tuning curve shifts, while producing aftereffects consistent with human perception. When adaptation was trained in a task where repeated input affects recognition performance, an intrinsic mechanism generalized better than a recurrent neural network. Our results demonstrate that feedforward propagation of intrinsic suppression changes the functional state of the network, reproducing key neurophysiological and perceptual properties of adaptation.

%B Science Advances %V 6 %P eabd4205 %8 10/2020 %G eng %U https://advances.sciencemag.org/lookup/doi/10.1126/sciadv.abd4205 %N 42 %! Sci. Adv. %R 10.1126/sciadv.abd4205 %0 Journal Article %J Cognition %D 2020 %T Minimal videos: Trade-off between spatial and temporal information in human and machine vision. %A Guy Ben-Yosef %A Gabriel Kreiman %A Shimon Ullman %K Comparing deep neural networks and humans %K Integration of spatial and temporal visual information %K minimal images %K Minimal videos %K Visual dynamic recognition %X

Objects and their parts can be visually recognized from purely spatial or purely temporal information but the mechanisms integrating space and time are poorly understood. Here we show that visual recognition of objects and actions can be achieved by efficiently combining spatial and motion cues in configurations where each source on its own is insufficient for recognition. This analysis is obtained by identifying minimal videos: these are short and tiny video clips in which objects, parts, and actions can be reliably recognized, but any reduction in either space or time makes them unrecognizable. Human recognition in minimal videos is invariably accompanied by full interpretation of the internal components of the video. State-of-the-art deep convolutional networks for dynamic recognition cannot replicate human behavior in these configurations. The gap between human and machine vision demonstrated here is due to critical mechanisms for full spatiotemporal interpretation that are lacking in current computational models.

%B Cognition %8 08/2020 %G eng %U https://www.sciencedirect.com/science/article/abs/pii/S0010027720300822 %R 10.1016/j.cognition.2020.104263 %0 Journal Article %J Nature Machine Intelligence %D 2020 %T A neural network trained for prediction mimics diverse features of biological neurons and perception %A William Lotter %A Gabriel Kreiman %A Cox, David %X

Recent work has shown that convolutional neural networks (CNNs) trained on image recognition tasks can serve as valuable models for predicting neural responses in primate visual cortex. However, these models typically require biologically infeasible levels of labelled training data, so this similarity must at least arise via different paths. In addition, most popular CNNs are solely feedforward, lacking a notion of time and recurrence, whereas neurons in visual cortex produce complex time-varying responses, even to static inputs. Towards addressing these inconsistencies with biology, here we study the emergent properties of a recurrent generative network that is trained to predict future video frames in a self-supervised manner. Remarkably, the resulting model is able to capture a wide variety of seemingly disparate phenomena observed in visual cortex, ranging from single-unit response dynamics to complex perceptual motion illusions, even when subjected to highly impoverished stimuli. These results suggest potentially deep connections between recurrent predictive neural network models and computations in the brain, providing new leads that can enrich both fields.

%B Nature Machine Intelligence %V 2 %P 210 - 219 %8 04/2020 %G eng %U http://www.nature.com/articles/s42256-020-0170-9 %N 4 %! Nat Mach Intell %R 10.1038/s42256-020-0170-9 %0 Journal Article %J Nature Machine Learning %D 2020 %T A neural network trained to predict future video frames mimics critical properties of biological neuronal responses and perception. %A William Lotter %A Gabriel Kreiman %A David Cox %X

While deep neural networks take loose inspiration from neuroscience, it is an open question how seriously to take the analogies between artificial deep networks and biological neuronal systems. Interestingly, recent work has shown that deep convolutional neural networks (CNNs) trained on large-scale image recognition tasks can serve as strikingly good models for predicting the responses of neurons in visual cortex to visual stimuli, suggesting that analogies between artificial and biological neural networks may be more than superficial. However, while CNNs capture key properties of the average responses of cortical neurons, they fail to explain other properties of these neurons. For one, CNNs typically require large quantities of labeled input data for training. Our own brains, in contrast, rarely have access to this kind of supervision, so to the extent that representations are similar between CNNs and brains, this similarity must arise via different training paths. In addition, neurons in visual cortex produce complex time-varying responses even to static inputs, and they dynamically tune themselves to temporal regularities in the visual environment. We argue that these differences are clues to fundamental differences between the computations performed in the brain and in deep networks. To begin to close the gap, here we study the emergent properties of a previously- described recurrent generative network that is trained to predict future video frames in a self-supervised manner. Remarkably, the model is able to capture a wide variety of seemingly disparate phenomena observed in visual cortex, ranging from single unit response dynamics to complex perceptual motion illusions. These results suggest potentially deep connections between recurrent predictive neural network models and the brain, providing new leads that can enrich both fields.

%B Nature Machine Learning %8 04/2020 %G eng %0 Journal Article %J CVPR 2020 %D 2020 %T Putting visual object recognition in context %A Zhang, Mengmi %A Tseng, Claire %A Gabriel Kreiman %X

Context plays an important role in visual recognition. Recent studies have shown that visual recognition networks can be fooled by placing objects in inconsistent contexts (e.g. a cow in the ocean). To understand and model the role of contextual information in visual recognition, we systematically and quantitatively investigated ten critical properties of where, when, and how context modulates recognition including amount of context, context and object resolution, geometrical structure of context, context congruence, time required to incorporate contextual information, and temporal dynamics of contextual modulation. The tasks involve recognizing a target object surrounded with context in a natural image. As an essential benchmark, we first describe a series of psychophysics experiments, where we alter one aspect of context at a time, and quantify human recognition accuracy. To computationally assess performance on the same tasks, we propose a biologically inspired context aware object recognition model consisting of a two-stream architecture. The model processes visual information at the fovea and periphery in parallel, dynamically incorporates both object and contextual information, and sequentially reasons about the class label for the target object. Across a wide range of behavioral tasks, the model approximates human level performance without retraining for each task, captures the dependence of context enhancement on image properties, and provides initial steps towards integrating scene and object information for visual recognition.

%B CVPR 2020 %8 01/2020 %G eng %0 Conference Paper %B International Conference on Learning Representations (ICLR 2020) %D 2020 %T What can human minimal videos tell us about dynamic recognition models? %A Guy Ben-Yosef %A Gabriel Kreiman %A Shimon Ullman %X

In human vision objects and their parts can be visually recognized from purely spatial or purely temporal information but the mechanisms integrating space and time are poorly understood. Here we show that human visual recognition of objects and actions can be achieved by efficiently combining spatial and motion cues in configurations where each source on its own is insufficient for recognition. This analysis is obtained by identifying minimal videos: these are short and tiny video clips in which objects, parts, and actions can be reliably recognized, but any reduction in either space or time makes them unrecognizable. State-of-the-art deep networks for dynamic visual recognition cannot replicate human behavior in these configurations. This gap between humans and machines points to critical mechanisms in human dynamic vision that are lacking in current models.

 

Published as a workshop paper at “Bridging AI and Cognitive Science” (ICLR 2020)

%B International Conference on Learning Representations (ICLR 2020) %C Virtual Conference %8 04/2020 %G eng %U https://baicsworkshop.github.io/pdf/BAICS_1.pdf %0 Journal Article %J PLOS Computational Biology %D 2020 %T XDream: Finding preferred stimuli for visual neurons using generative networks and gradient-free optimization %A Will Xiao %A Gabriel Kreiman %E Fyshe, Alona %X

A longstanding question in sensory neuroscience is what types of stimuli drive neurons to fire. The characterization of effective stimuli has traditionally been based on a combination of intuition, insights from previous studies, and luck. A new method termed XDream (EXtending DeepDream with real-time evolution for activation maximization) combined a generative neural network and a genetic algorithm in a closed loop to create strong stimuli for neurons in the macaque visual cortex. Here we extensively and systematically evaluate the performance of XDream. We use ConvNet units as in silico models of neurons, enabling experiments that would be prohibitive with biological neurons. We evaluated how the method compares to brute-force search, and how well the method generalizes to different neurons and processing stages. We also explored design and parameter choices. XDream can efficiently find preferred features for visual units without any prior knowledge about them. XDream extrapolates to different layers, architectures, and developmental regimes, performing better than brute-force search, and often better than exhaustive sampling of >1 million images. Furthermore, XDream is robust to choices of multiple image generators, optimization algorithms, and hyperparameters, suggesting that its performance is locally near-optimal. Lastly, we found no significant advantage to problem-specific parameter tuning. These results establish expectations and provide practical recommendations for using XDream to investigate neural coding in biological preparations. Overall, XDream is an efficient, general, and robust algorithm for uncovering neuronal tuning preferences using a vast and diverse stimulus space. XDream is implemented in Python, released under the MIT License, and works on Linux, Windows, and MacOS.

%B PLOS Computational Biology %V 16 %P e1007973 %8 06/2020 %G eng %U https://dx.plos.org/10.1371/journal.pcbi.1007973 %N 6 %! PLoS Comput Biol %R 10.1371/journal.pcbi.1007973 %0 Journal Article %J Cell %D 2019 %T Evolving Images for Visual Neurons Using a Deep Generative Network Reveals Coding Principles and Neuronal Preferences %A Carlos R Ponce %A Will Xiao %A Peter F Schade %A Till S. Hartmann %A Gabriel Kreiman %A Margaret S Livingstone %X

What specific features should visual neurons encode, given the infinity of real-world images and the limited number of neurons available to represent them? We investigated neuronal selectivity in monkey inferotemporal cortex via the vast hypothesis space of a generative deep neural network, avoiding assumptions about features or semantic categories. A genetic algorithm searched this space for stimuli that maximized neuronal firing. This led to the evolution of rich synthetic images of objects with complex combinations of shapes, colors, and textures, sometimes resembling animals or familiar people, other times revealing novel patterns that did not map to any clear semantic category. These results expand our conception of the dictionary of features encoded in the cortex, and the approach can potentially reveal the internal representations of any system whose input can be captured by a generative model.

%B Cell %V 177 %P 1009 %8 05/2019 %G eng %U https://www.cell.com/cell/fulltext/S0092-8674(19)30391-5 %& 999 %R 10.1016/j.cell.2019.04.005 %0 Journal Article %J Physics of Life Reviews %D 2019 %T It's a small dimensional world after all %A Gabriel Kreiman %B Physics of Life Reviews %V 29 %P 96 - 97 %8 07/2019 %G eng %U https://linkinghub.elsevier.com/retrieve/pii/S1571064519300612 %! Physics of Life Reviews %R 10.1016/j.plrev.2019.03.015 %0 Book Section %B Psychology of Learning and Motivation %D 2019 %T What do neurons really want? The role of semantics in cortical representations. %A Gabriel Kreiman %X

What visual inputs best trigger activity for a given neuron in cortex and what type of semantic information may guide those neuronal responses? We revisit the methodol- ogies used so far to design visual experiments, and what those methodologies have taught us about neural coding in visual cortex. Despite heroic and seminal work in ventral visual cortex, we still do not know what types of visual features are optimal for cortical neurons. We briefly review state-of-the-art standard models of visual recog- nition and argue that such models should constitute the null hypothesis for any measurement that purports to ascribe semantic meaning to neuronal responses. While it remains unclear when, where, and how abstract semantic information is incorporated in visual neurophysiology, there exists clear evidence of top-down modulation in the form of attention, task-modulation and expectations. Such top-down signals open the doors to some of the most exciting questions today toward elucidating how abstract knowledge can be incorporated into our models of visual processing.

%B Psychology of Learning and Motivation %V 70 %G eng %& 7 %R https://doi.org/10.1016/bs.plm.2019.03.005 %0 Conference Proceedings %B 40th International Conference of the IEEE Engineering in Medicine and Biology Society - EMBC 2018 %D 2018 %T Development of automated interictal spike detector %A A Palepu %A Gabriel Kreiman %B 40th International Conference of the IEEE Engineering in Medicine and Biology Society - EMBC 2018 %C Honolulu, HI %8 07/2018 %G eng %U https://embc.embs.org/2018/ %0 Journal Article %J Nature Communications %D 2018 %T Finding any Waldo with zero-shot invariant and efficient visual search %A Zhang, Mengmi %A Feng, Jiashi %A Ma, Keng Teck %A Lim, Joo Hwee %A Qi Zhao %A Gabriel Kreiman %X

Searching for a target object in a cluttered scene constitutes a fundamental challenge in daily vision. Visual search must be selective enough to discriminate the target from distractors, invariant to changes in the appearance of the target, efficient to avoid exhaustive exploration of the image, and must generalize to locate novel target objects with zero-shot training. Previous work on visual search has focused on searching for perfect matches of a target after extensive category-specific training. Here, we show for the first time that humans can efficiently and invariantly search for natural objects in complex scenes. To gain insight into the mechanisms that guide visual search, we propose a biologically inspired computational model that can locate targets without exhaustive sampling and which can generalize to novel objects. The model provides an approximation to the mechanisms integrating bottom-up and top-down signals during search in natural scenes.

%B Nature Communications %V 9 %8 09/2018 %G eng %U http://www.nature.com/articles/s41467-018-06217-x %! Nat Commun %R 10.1038/s41467-018-06217-x %0 Conference Proceedings %B 2018 52nd Annual Conference on Information Sciences and Systems (CISS) %D 2018 %T Learning scene gist with convolutional neural networks to improve object recognition %A Wu Eric %A Wu Kevin %A Gabriel Kreiman %B 2018 52nd Annual Conference on Information Sciences and Systems (CISS) %C Princeton, NJ %8 05/2018 %G eng %U https://ieeexplore.ieee.org/abstract/document/8362305 %R 10.1109/CISS.2018.8362305 %0 Journal Article %J arXiv | Cornell University %D 2018 %T Learning Scene Gist with Convolutional Neural Networks to Improve Object Recognition %A Kevin Wu %A Eric Wu %A Gabriel Kreiman %X Advancements in convolutional neural networks (CNNs) have made significant strides toward achieving high performance levels on multiple object recognition tasks. While some approaches utilize information from the entire scene to propose regions of interest, the task of interpreting a particular region or object is still performed independently of other objects and features in the image. Here we demonstrate that a scene's 'gist' can significantly contribute to how well humans can recognize objects. These findings are consistent with the notion that humans foveate on an object and incorporate information from the periphery to aid in recognition. We use a biologically inspired two-part convolutional neural network ('GistNet') that models the fovea and periphery to provide a proof-of-principle demonstration that computational object recognition can significantly benefit from the gist of the scene as contextual information. Our model yields accuracy improvements of up to 50% in certain object categories when incorporating contextual gist, while only increasing the original model size by 5%. This proposed model mirrors our intuition about how the human visual system recognizes objects, suggesting specific biologically plausible constraints to improve machine vision and building initial steps towards the challenge of scene understanding. %B arXiv | Cornell University %V arXiv:1803.01967 %8 03/2018 %G eng %U http://arxiv.org/abs/1803.01967 %0 Journal Article %J Scientific Reports %D 2018 %T Minimal memory for details in real life events %A Pranav Misra %A Marconi, Alyssa %A M.F. Peterson %A Gabriel Kreiman %X

The extent to which the details of past experiences are retained or forgotten remains controversial. Some studies suggest massive storage while others describe memories as fallible summary recreations of original events. The discrepancy can be ascribed to the content of memories and how memories are evaluated. Many studies have focused on recalling lists of words/pictures, which lack the critical ingredients of real world memories. Here we quantified the ability to remember details about one hour of real life. We recorded video and eye movements while subjects walked along specified routes and evaluated whether they could distinguish video clips from their own experience from foils. Subjects were minimally above chance in remembering the minutiae of their experiences. Recognition of specific events could be partly explained by a machine-learning model of video contents. These results quantify recognition memory for events in real life and show that the details of everyday experience are largely not retained in memory.

%B Scientific Reports %V 8 %8 Jan-12-2018 %G eng %U https://www.nature.com/articles/s41598-018-33792-2 %N 1 %! Sci Rep %R 10.1038/s41598-018-33792-2 %0 Journal Article %J Cerebral Cortex %D 2018 %T Neural Interactions Underlying Visuomotor Associations in the Human Brain %A Radhika Madhavan %A Bansal, Arjun K %A Joseph Madsen %A Golby, Alexandra J %A Travis S Tierney %A Emad Eskandar %A WS Anderson %A Gabriel Kreiman %K frontal cortex %K human neurophysiology %K reinforcement learning %K visual cortex %X

Rapid andflexible learning during behavioral choices is critical to our daily endeavors and constitutes a hallmark ofdynamic reasoning. An important paradigm to examineflexible behavior involves learning new arbitrary associationsmapping visual inputs to motor outputs. We conjectured that visuomotor rules are instantiated by translating visual signalsinto actions through dynamic interactions between visual, frontal and motor cortex. We evaluated the neuralrepresentation of such visuomotor rules by performing intracranialfield potential recordings in epilepsy subjects during arule-learning delayed match-to-behavior task. Learning new visuomotor mappings led to the emergence of specificresponses associating visual signals with motor outputs in 3 anatomical clusters in frontal, anteroventral temporal andposterior parietal cortex. After learning, mapping selective signals during the delay period showed interactions with visualand motor signals. These observations provide initial steps towards elucidating the dynamic circuits underlyingflexiblebehavior and how communication between subregions of frontal, temporal, and parietal cortex leads to rapid learning oftask-relevant choices.

%B Cerebral Cortex %V 1–17 %8 12/2018 %G eng %U http://klab.tch.harvard.edu/publications/PDFs/gk7766.pdf %R 10.1093/cercor/bhy333 %0 Report %D 2018 %T A neural network trained to predict future videoframes mimics critical properties of biologicalneuronal responses and perception %A William Lotter %A Gabriel Kreiman %A David Cox %X

While deep neural networks take loose inspiration from neuroscience, it is an open question how seriously to take the analogies between artificial deep networks and biological neuronal systems. Interestingly, recent work has shown that deep convolutional neural networks (CNNs) trained on large-scale image recognition tasks can serve as strikingly good models for predicting the responses of neurons in visual cortex to visual stimuli, suggesting that analogies between artificial and biological neural networks may be more than superficial. However, while CNNs capture key properties of the average responses of cortical neurons, they fail to explain other properties of these neurons. For one, CNNs typically require large quantities of labeled input data for training. Our own brains, in contrast, rarely have access to this kind of supervision, so to the extent that representations are similar between CNNs and brains, this similarity must arise via different training paths. In addition, neurons in visual cortex produce complex time-varying responses even to static inputs, and they dynamically tune themselves to temporal regularities in the visual environment. We argue that these differences are clues to fundamental differences between the computations performed in the brain and in deep networks. To begin to close the gap, here we study the emergent properties of a previously-described recurrent generative network that is trained to predict future video frames in a self-supervised manner. Remarkably, the model is able to capture a wide variety of seemingly disparate phenomena observed in visual cortex, ranging from single unit response dynamics to complex perceptual motion illusions. These results suggest potentially deep connections between recurrent predictive neural network models and the brain, providing new leads that can enrich both fields.

%I arXiv | Cornell University %8 05/2018 %G eng %U https://arxiv.org/pdf/1805.10734.pdf %0 Journal Article %J Proceedings of the National Academy of Sciences %D 2018 %T Recurrent computations for visual pattern completion %A Hanlin Tang %A Martin Schrimpf %A William Lotter %A Moerman, Charlotte %A Paredes, Ana %A Ortega Caro, Josue %A Hardesty, Walter %A David Cox %A Gabriel Kreiman %K Artificial Intelligence %K computational neuroscience %K Machine Learning %K pattern completion %K Visual object recognition %X

Making inferences from partial information constitutes a critical aspect of cognition. During visual perception, pattern completion enables recognition of poorly visible or occluded objects. We combined psychophysics, physiology, and computational models to test the hypothesis that pattern completion is implemented by recurrent computations and present three pieces of evidence that are consistent with this hypothesis. First, subjects robustly recognized objects even when they were rendered <15% visible, but recognition was largely impaired when processing was interrupted by backward masking. Second, invasive physiological responses along the human ventral cortex exhibited visually selective responses to partially visible objects that were delayed compared with whole objects, suggesting the need for additional computations. These physiological delays were correlated with the effects of backward masking. Third, state-of-the-art feed-forward computational architectures were not robust to partial visibility. However, recognition performance was recovered when the model was augmented with attractor-based recurrent connectivity. The recurrent model was able to predict which images of heavily occluded objects were easier or harder for humans to recognize, could capture the effect of introducing a backward mask on recognition behavior, and was consistent with the physiological delays along the human ventral visual stream. These results provide a strong argument of plausibility for the role of recurrent computations in making visual inferences from partial information.

%B Proceedings of the National Academy of Sciences %8 08/2018 %G eng %U http://www.pnas.org/lookup/doi/10.1073/pnas.1719397115 %! Proc Natl Acad Sci USA %R 10.1073/pnas.1719397115 %0 Generic %D 2018 %T Spatiotemporal interpretation features in the recognition of dynamic images %A Guy Ben-Yosef %A Gabriel Kreiman %A Shimon Ullman %X

Objects and their parts can be visually recognized and localized from purely spatial information in static images and also from purely temporal information as in the perception of biological motion. Cortical regions have been identified, which appear to specialize in visual recognition based on either static or dynamic cues, but the mechanisms by which spatial and temporal information is integrated is only poorly understood. Here we show that visual recognition of objects and actions can be achieved by efficiently combining spatial and motion cues in configurations where each source on its own is insufficient for recognition. This analysis is obtained by the identification of minimal spatiotemporal configurations: these are short videos in which objects and their parts, along with an action being performed, can be reliably recognized, but any reduction in either space or time makes them unrecognizable. State-of-the-art computational models for recognition from dynamic images based on deep 2D and 3D convolutional networks cannot replicate human recognition in these configurations. Action recognition in minimal spatiotemporal configurations is invariably accompanied by full human interpretation of the internal components of the image and their inter-relations. We hypothesize that this gap is due to mechanisms for full spatiotemporal interpretation process, which in human vision is an integral part of recognizing dynamic event, but is not sufficiently represented in current DNNs.

%8 11/2018 %2

http://hdl.handle.net/1721.1/119248

%0 Generic %D 2018 %T What am I searching for? %A Zhang, Mengmi %A Feng, Jiashi %A Lim, Joo Hwee %A Qi Zhao %A Gabriel Kreiman %X

Can we infer intentions and goals from a person’s actions? As an example of this family of problems, we consider here whether it is possible to decipher what a person is searching for by decoding their eye movement behavior. We conducted two human psychophysics experiments on object arrays and natural images where we monitored subjects’ eye movements while they were looking for a target object. Using as input the pattern of "error" fixations on non-target objects before the target was found, we developed a model (InferNet) whose goal was to infer what the target was. "Error" fixations share similar features with the sought target. The Infernet model uses a pre-trained 2D convolutional architecture to extract features from the error fixations and computes a 2D similarity map between the error fixation and all locations across the search image by modulating the search image via convolution across layers. InferNet consolidates the modulated response maps across layers via max pooling to keep track of the sub-patterns highly similar to features at error fixations and integrates these maps across all error fixations. InferNet successfully identifies the subject’s goal and outperforms all the competitive null models, even without any object-specific training on the inference task.

%8 07/2018 %1

arXiv:1807.11926

%2

http://hdl.handle.net/1721.1/119576

%0 Journal Article %J NeuroImage %D 2018 %T What is changing when: decoding visual information in movies from human intracranial recordings %A Leyla Isik %A Jedediah Singer %A Nancy Kanwisher %A Madsen JR %A Anderson WS %A Gabriel Kreiman %K Electrocorticography (ECoG) %K Movies %K Natural vision %K neural decoding %K object recognition %K Ventral pathway %X

The majority of visual recognition studies have focused on the neural responses to repeated presentations of static stimuli with abrupt and well-defined onset and offset times. In contrast, natural vision involves unique renderings of visual inputs that are continuously changing without explicitly defined temporal transitions. Here we considered commercial movies as a coarse proxy to natural vision. We recorded intracranial field potential signals from 1,284 electrodes implanted in 15 patients with epilepsy while the subjects passively viewed commercial movies. We could rapidly detect large changes in the visual inputs within approximately 100 ms of their occurrence, using exclusively field potential signals from ventral visual cortical areas including the inferior temporal gyrus and inferior occipital gyrus. Furthermore, we could decode the content of those visual changes even in a single movie presentation, generalizing across the wide range of transformations present in a movie. These results present a methodological framework for studying cognition during dynamic and natural vision.

%B NeuroImage %V 180, Part A %P 147-159 %8 10/2018 %G eng %U https://www.sciencedirect.com/science/article/pii/S1053811917306742 %) Available online 18 August 2017 %R 10.1016/j.neuroimage.2017.08.027 %0 Conference Paper %B ICLR %D 2017 %T Deep Predictive Coding Networks for Video Prediction and Unsupervised Learning %A William Lotter %A Gabriel Kreiman %A David Cox %B ICLR %G eng %0 Generic %D 2017 %T Deep Predictive Coding Networks for Video Prediction and Unsupervised Learning %A William Lotter %A Gabriel Kreiman %A David Cox %X

While great strides have been made in using deep learning algorithms to solve supervised learning tasks, the problem of unsupervised learning—leveraging unlabeled examples to learn about the structure of a domain — remains a difficult unsolved challenge. Here, we explore prediction of future frames in a video sequence as an unsupervised learning rule for learning about the structure of the visual world. We describe a predictive neural network (“PredNet”) architecture that is inspired by the concept of “predictive coding” from the neuroscience literature. These networks learn to predict future frames in a video sequence, with each layer in the network making local predictions and only forwarding deviations from those predictions to subsequent network layers. We show that these networks are able to robustly learn to predict the movement of synthetic (rendered) objects, and that in doing so, the networks learn internal representations that are useful for decoding latent object parameters (e.g. pose) that support object recognition with fewer training views. We also show that these networks can scale to complex natural image streams (car-mounted camera videos), capturing key aspects of both egocentric movement and the movement of objects in the visual scene, and the representation learned in this setting is useful for estimating the steering angle. Altogether, these results suggest that prediction represents a powerful framework for unsupervised learning, allowing for implicit learning of object and scene structure.

%8 03/2017 %1

arXiv:1605.08104v5

%2

http://hdl.handle.net/1721.1/107497

%0 Journal Article %J Language, Cognition and Neuroscience %D 2017 %T A null model for cortical representations with grandmothers galore %A Gabriel Kreiman %K Computational models %K human visual cortex %K localist representation %K sparse coding %K visual recognition %X

There has been extensive discussion in the literature about the extent to which cortical representations can be described as localist or distributed. Here, we discuss a simple null model that encompasses a family of related architectures describing the transformation of signals throughout the parts of the visual system involved in object recognition. This family of models constitutes a rigorous first approximation to explain the neurophysiological properties of ventral visual cortex. This null model contains both distributed and local representations throughout the entire hierarchy of computations and the responses of individual units are meaningful and interpretable when encoding is adequately defined for each computational stage.

%B Language, Cognition and Neuroscience %P 274 - 285 %8 08/2016 %G eng %U https://www.tandfonline.com/doi/full/10.1080/23273798.2016.1218033 %! Language, Cognition and Neuroscience %R 10.1080/23273798.2016.1218033 %0 Book Section %B Computational and Cognitive Neuroscience of Vision %D 2017 %T Recognition of occluded objects %A Hanlin Tang %A Gabriel Kreiman %A Qi Zhao %B Computational and Cognitive Neuroscience of Vision %I Springer Singapore %G eng %U http://www.springer.com/us/book/9789811002113 %0 Generic %D 2017 %T On the Robustness of Convolutional Neural Networks to Internal Architecture and Weight Perturbations %A Nicholas Cheney %A Martin Schrimpf %A Gabriel Kreiman %X

Deep convolutional neural networks are generally regarded as robust function approximators. So far, this intuition is based on perturbations to external stimuli such as the images to be classified. Here we explore the robustness of convolutional neural networks to perturbations to the internal weights and architecture of the network itself. We show that convolutional networks are surprisingly robust to a number of internal perturbations in the higher convolutional layers but the bottom convolutional layers are much more fragile. For instance, Alexnet shows less than a 30% decrease in classification performance when randomly removing over 70% of weight connections in the top convolutional or dense layers but performance is almost at chance with the same perturbation in the first convolutional layer. Finally, we suggest further investigations which could continue to inform the robustness of convolutional networks to internal perturbations.

%8 03/2017 %1

arXiv:1703.08245

%2

http://hdl.handle.net/1721.1/107935

%0 Journal Article %J Neuroimage %D 2017 %T What is changing when: Decoding visual information in movies from human intracranial recordings %A Leyla Isik %A Jedediah Singer %A Joseph Madsen %A Nancy Kanwisher %A Gabriel Kreiman %X

The majority of visual recognition studies have focused on the neural responses to repeated presentations of static stimuli with abrupt and well-defined onset and offset times. In contrast, natural vision involves unique renderings of visual inputs that are continuously changing without explicitly defined temporal transitions. Here we considered commercial movies as a coarse proxy to natural vision. We recorded intracranial field potential signals from 1,284 electrodes implanted in 15 patients with epilepsy while the subjects passively viewed commercial movies. We could rapidly detect large changes in the visual inputs within approximately 100 ms of their occurrence, using exclusively field potential signals from ventral visual cortical areas including the inferior temporal gyrus and inferior occipitalgyrus. Furthermore, we could decode the content of those visual changes even in a single movie presentation, generalizing across the wide range of transformations present in a movie. These results present a methodological framework for studying cognition during dynamic and natural vision.

%B Neuroimage %G eng %U https://www.sciencedirect.com/science/article/pii/S1053811917306742 %R https://doi.org/10.1016/j.neuroimage.2017.08.027 %0 Journal Article %J Neuron %D 2016 %T Bottom-up and Top-down Input Augment the Variability of Cortical Neurons. %A Camille Gómez-Laberge %A Alexandra Smolyanskaya %A Jonathan J. Nassi %A Gabriel Kreiman %A Richard T Born %B Neuron %V 91(3) %P 540-547 %G eng %0 Generic %D 2016 %T Cascade of neural processing orchestrates cognitive control in human frontal cortex [code] %A Hanlin Tang %A Hsiang-Yu Yu %A Chien-Chen Chou %A Crone, Nathan E. %A Joseph Madsen %A WS Anderson %A Gabriel Kreiman %X

Code and data used to create the figures of Tang et al. (2016).  The results from this work show that there is a dynamic and hierarchical sequence of steps in human frontal cortex orchestrates cognitive control.

Used in conjunction with this mirrored CBMM Dataset entry

%I eLife %U http://klab.tch.harvard.edu/resources/tangetal_stroop_2016.html %0 Generic %D 2016 %T Cascade of neural processing orchestrates cognitive control in human frontal cortex [dataset] %A Hanlin Tang %A Hsiang-Yu Yu %A Chien-Chen Chou %A Crone, Nathan E. %A Joseph Madsen %A WS Anderson %A Gabriel Kreiman %X

Code and data used to create the figures of Tang et al. (2016).  The results from this work show that there is a dynamic and hierarchical sequence of steps in human frontal cortex orchestrates cognitive control.

Used in conjunction with this mirrored CBMM Code entry

%I eLife %U http://klab.tch.harvard.edu/resources/tangetal_stroop_2016.html %0 Journal Article %J eLIFE %D 2016 %T Cascade of neural processing orchestrates cognitive control in human frontal cortex %A Hanlin Tang %A Yu, HY %A Chou, CC %A NE Crone %A Joseph Madsen %A WS Anderson %A Gabriel Kreiman %X
Rapid and flexible interpretation of conflicting sensory inputs in the context of current goals is a critical component of cognitive control that is orchestrated by frontal cortex. The relative roles of distinct subregions within frontal cortex are poorly understood. To examine the dynamics underlying cognitive control across frontal regions, we took advantage of the spatiotemporal resolution of intracranial recordings in epilepsy patients while subjects resolved color-word conflict.We observed differential activity preceding the behavioral responses to conflict trials throughout frontal cortex; this activity was correlated with behavioral reaction times. These signals emerged first in anterior cingulate cortex (ACC) before dorsolateral prefrontal cortex (dlPFC), followed bymedial frontal cortex (mFC) and then by orbitofrontal cortex (OFC). These results disassociate the frontal subregions based on their dynamics, and suggest a temporal hierarchy for cognitive control in human cortex.
%B eLIFE %8 02/2016 %G eng %U http://dx.doi.org/10.7554/eLife.12352 %R 10.7554/eLife.12352 %0 Conference Proceedings %B 2016 Annual Conference on Information Science and Systems (CISS) %D 2016 %T A machine learning approach to predict episodic memory formation %A Hanlin Tang %A Jedediah Singer %A Matias J. Ison %A Gnel Pivazyan %A Melissa Romaine %A Elizabeth Meller %A Victoria Perron %A Marlise Arlellano %A Gabriel Kreiman %A Melissa Romaine %A Adrianna Boulin %A Rosa Frias %A James Carroll %A Sarah Dowcett %B 2016 Annual Conference on Information Science and Systems (CISS) %C Princeton, NJ %P 539 - 544 %G eng %U http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=7460560&newsearch=true&queryText=A%20machine%20learning%20approach%20to%20predict%20episodic%20memory%20formation %R 10.1109/CISS.2016.7460560 %0 Generic %D 2016 %T Neural Information Processing Systems (NIPS) 2015 Review %A Gabriel Kreiman %X

The charming city of Montreal hosted more than 4000 researchers from all over the globe during the Neural Information Processing Systems (NIPS) conference. In addition to the notable exponential growth in the number of attendees, a novel highlight this year was the addition of a Symposium format. The Brain, Minds and Machines Symposium aimed to discuss the relationship between biological hardware and how to understand the fundamental computations that give rise to intelligence...

View the CBMM NIPS page and watch the videos.

%8 01/2016 %0 Journal Article %J Scientific Reports %D 2016 %T Predicting episodic memory formation for movie events %A Hanlin Tang %A Jedediah Singer %A Matias J. Ison %A Gnel Pivazyan %A Melissa Romaine %A Rosa Frias %A Elizabeth Meller %A Adrianna Boulin %A James Carroll %A Victoria Perron %A Sarah Dowcett %A Arellano, Marlise %A Gabriel Kreiman %X

Episodic memories are long lasting and full of detail, yet imperfect and malleable. We quantitatively evaluated recollection of short audiovisual segments from movies as a proxy to real-life memory formation in 161 subjects at 15 minutes up to a year after encoding. Memories were reproducible within and across individuals, showed the typical decay with time elapsed between encoding and testing, were fallible yet accurate, and were insensitive to low-level stimulus manipulations but sensitive to high-level stimulus properties. Remarkably, memorability was also high for single movie frames, even one year post-encoding. To evaluate what determines the efficacy of long-term memory formation, we developed an extensive set of content annotations that included actions, emotional valence, visual cues and auditory cues. These annotations enabled us to document the content properties that showed a stronger correlation with recognition memory and to build a machine-learning computational model that accounted for episodic memory formation in single events for group averages and individual subjects with an accuracy of up to 80%. These results provide initial steps towards the development of a quantitative computational theory capable of explaining the subjective filtering steps that lead to how humans learn and consolidate memories.

%B Scientific Reports %8 10/2016 %G eng %U http://www.nature.com/articles/srep30175 %N 1 %! Sci Rep %R 10.1038/srep30175 %0 Generic %D 2016 %T Predicting episodic memory formation for movie events [code] %A Hanlin Tang %A Jedediah Singer %A Matias Ison %A Gnel Pivazyan %A Melissa Romaine %A Rosa Frias %A Elizabeth Meller %A Adrianna Boulin %A James Carroll %A Victoria Perron %A Sarah Dowcett %A Marlise Arlellano %A Gabriel Kreiman %X

Episodic memories are long lasting and full of detail, yet imperfect and malleable. We quantitatively  evaluated recollection of short audiovisual segments from movies as a proxy to real-life memory  formation in 161 subjects at 15  minutes up to a year after encoding. Memories were reproducible within  and across individuals, showed the typical decay with time elapsed between encoding and testing,  were fallible yet accurate, and were insensitive to low-level stimulus manipulations but sensitive to  high-level stimulus properties. Remarkably, memorability was also high for single movie frames, even  one year post-encoding. To evaluate what determines the efficacy of long-term memory formation,  we developed an extensive set of content annotations that included actions, emotional valence, visual  cues and auditory cues. These annotations enabled us to document the content properties that showed  a stronger correlation with recognition memory and to build a machine-learning computational model  that accounted for episodic memory formation in single events for group averages and individual  subjects with an accuracy of up to 80%. These results provide initial steps towards the development of a  quantitative computational theory capable of explaining the subjective filtering steps that lead to how  humans learn and consolidate memories.


To view more information and dowload datasets, etc. please visit the project website - http://klab.tch.harvard.edu/resources/Tangetal_episodicmemory_2016.html#sthash.cj1STRah.bumwWxcX.dpbs


The corresponding publication can be found here.


The corresponding code entry can be found here.

 

%0 Generic %D 2016 %T Predicting episodic memory formation for movie events [dataset] %A Hanlin Tang %A Jedediah Singer %A Matias Ison %A Gnel Pivazyan %A Melissa Romaine %A Rosa Frias %A Elizabeth Meller %A Adrianna Boulin %A James Carroll %A Victoria Perron %A Sarah Dowcett %A Marlise Arlellano %A Gabriel Kreiman %X

Episodic memories are long lasting and full of detail, yet imperfect and malleable. We quantitatively  evaluated recollection of short audiovisual segments from movies as a proxy to real-life memory  formation in 161 subjects at 15  minutes up to a year after encoding. Memories were reproducible within  and across individuals, showed the typical decay with time elapsed between encoding and testing,  were fallible yet accurate, and were insensitive to low-level stimulus manipulations but sensitive to  high-level stimulus properties. Remarkably, memorability was also high for single movie frames, even  one year post-encoding. To evaluate what determines the efficacy of long-term memory formation,  we developed an extensive set of content annotations that included actions, emotional valence, visual  cues and auditory cues. These annotations enabled us to document the content properties that showed  a stronger correlation with recognition memory and to build a machine-learning computational model  that accounted for episodic memory formation in single events for group averages and individual  subjects with an accuracy of up to 80%. These results provide initial steps towards the development of a  quantitative computational theory capable of explaining the subjective filtering steps that lead to how  humans learn and consolidate memories.


To view more information and dowload datasets, etc. please visit the project website - http://klab.tch.harvard.edu/resources/Tangetal_episodicmemory_2016.html#sthash.cj1STRah.bumwWxcX.dpbs


The corresponding publication can be found here.


The corresponding code entry can be found here.

 

%0 Generic %D 2016 %T PredNet - "Deep Predictive Coding Networks for Video Prediction and Unsupervised Learning" [code] %A William Lotter %A Gabriel Kreiman %A David Cox %X

The PredNet is a deep convolutional recurrent neural network inspired by the principles of predictive coding from the neuroscience literature [1, 2]. It is trained for next-frame video prediction with the belief that prediction is an effective objective for unsupervised (or "self-supervised") learning [e.g. 3-11].


For full project information and links to download code, etc. visit the website - https://coxlab.github.io/prednet/

%0 Generic %D 2016 %T There’s Waldo! A Normalization Model of Visual Search Predicts Single-Trial Human Fixations in an Object Search Task [code] %A Thomas Miconi %A Laura Groomes %A Gabriel Kreiman %X

When searching for an object in a scene, how does the brain decide where to look next? Visual search theories suggest the existence of a global “ priority map ” that integrates bottom-up visual information with top-down, target-speci fi c signals. We propose a mechanistic model of visual search that is consistent with recent neurophysiological evidence, can localize targets in cluttered images, and predicts single-trial behavior in a search task. This model posits that a high-level retinotopic area selective for shape features receives global, target-speci fi c modulation and implements local normalization through divisive inhibition. The normalization step is critical to prevent highly salient bottom-up features from monopolizing attention. The resulting activity pattern constitues a priority map that tracks the correlation between local input and target features. The maximum of this priority map is selected as the locus of attention. The visual input is then spatially enhanced around the selected location, allowing object-selective visual areas to determine whether the target is present at this location. This model can localize objects both in array images and when objects are pasted in natural scenes. The model can also predict single-trial human fi xations, including those in error and target-absent trials, in a search task involving complex objects.


To view more information and dowload code, etc. please visit the project website - http://klab.tch.harvard.edu/resources/miconietal_visualsearch_2016.html#sthash.KmHoBPsk.XILaGVDV.dpbs


The corresponding publication can be found here.


The corresponding dataset entry can be found here.

%0 Generic %D 2016 %T There’s Waldo! A Normalization Model of Visual Search Predicts Single-Trial Human Fixations in an Object Search Task [dataset] %A Thomas Miconi %A Laura Groomes %A Gabriel Kreiman %X

When searching for an object in a scene, how does the brain decide where to look next? Visual search theories suggest the existence of a global “ priority map ” that integrates bottom-up visual information with top-down, target-speci fi c signals. We propose a mechanistic model of visual search that is consistent with recent neurophysiological evidence, can localize targets in cluttered images, and predicts single-trial behavior in a search task. This model posits that a high-level retinotopic area selective for shape features receives global, target-speci fi c modulation and implements local normalization through divisive inhibition. The normalization step is critical to prevent highly salient bottom-up features from monopolizing attention. The resulting activity pattern constitues a priority map that tracks the correlation between local input and target features. The maximum of this priority map is selected as the locus of attention. The visual input is then spatially enhanced around the selected location, allowing object-selective visual areas to determine whether the target is present at this location. This model can localize objects both in array images and when objects are pasted in natural scenes. The model can also predict single-trial human fi xations, including those in error and target-absent trials, in a search task involving complex objects.


To view more information and dowload datasets, etc. please visit the project website - http://klab.tch.harvard.edu/resources/miconietal_visualsearch_2016.html#sthash.KmHoBPsk.XILaGVDV.dpbs


The corresponding publication can be found here.


The corresponding code entry can be found here.

%0 Journal Article %J Cerebral Cortex %D 2016 %T There's Waldo! A Normalization Model of Visual Search Predicts Single-Trial Human Fixations in an Object Search Task %A Thomas Miconi %A Laura Groomes %A Gabriel Kreiman %X

When searching for an object in a scene, how does the brain decide where to look next? Visual search theories suggest the existence of a global “priority map” that integrates bottom-up visual information with top-down, target-specific signals. We propose a mechanistic model of visual search that is consistent with recent neurophysiological evidence, can localize targets in cluttered images, and predicts single-trial behavior in a search task. This model posits that a high-level retinotopic area selective for shape features receives global, target-specific modulation and implements local normalization through divisive inhibition. The normalization step is critical to prevent highly salient bottom-up features from monopolizing attention. The resulting activity pattern constitues a priority map that tracks the correlation between local input and target features. The maximum of this priority map is selected as the locus of attention. The visual input is then spatially enhanced around the selected location, allowing object-selective visual areas to determine whether the target is present at this location. This model can localize objects both in array images and when objects are pasted in natural scenes. The model can also predict single-trial human fixations, including those in error and target-absent trials, in a search task involving complex objects.

---

Publisher released this paper early online on June 19, 2015.

%B Cerebral Cortex %V 26(7) %P 26:3064-3082 %G eng %0 Conference Paper %B International Conference on Learning Representations (ICLR) %D 2016 %T Unsupervised Learning of Visual Structure using Predictive Generative Networks %A William Lotter %A Gabriel Kreiman %A David Cox %X

The ability to predict future states of the environment is a central pillar of intelligence. At its core, effective prediction requires an internal model of the world and an understanding of the rules by which the world changes. Here, we explore the internal models developed by deep neural networks trained using a loss based on predicting future frames in synthetic video sequences, using a CNN-LSTM-deCNN framework. We first show that this architecture can achieve excellent performance in visual sequence prediction tasks, including state-of-the-art performance in a standard 'bouncing balls' dataset (Sutskever et al., 2009). Using a weighted mean-squared error and adversarial loss (Goodfellow et al., 2014), the same architecture successfully extrapolates out-of-the-plane rotations of computer-generated faces. Furthermore, despite being trained end-to-end to predict only pixel-level information, our Predictive Generative Networks learn a representation of the latent structure of the underlying three-dimensional objects themselves. Importantly, we find that this representation is naturally tolerant to object transformations, and generalizes well to new tasks, such as classification of static images. Similar models trained solely with a reconstruction loss fail to generalize as effectively. We argue that prediction can serve as a powerful unsupervised loss for learning rich internal representations of high-level object features.

%B International Conference on Learning Representations (ICLR) %C San Juan, Puerto Rico %8 May 2016 %G eng %U http://arxiv.org/pdf/1511.06380v2.pdf %0 Journal Article %J Frontiers in Systems Neuroscience %D 2015 %T Decrease in gamma-band activity tracks sequence learning %A Radhika Madhavan %A Daniel Millman %A Hanlin Tang %A NE Crone %A Fredrick A. Lenz %A Travis S Tierney %A Joseph Madsen %A Gabriel Kreiman %A WS Anderson %X

Learning novel sequences constitutes an example of declarative memory formation, involving conscious recall of temporal events. Performance in sequence learning tasks improves with repetition and involves forming temporal associations over scales of seconds to minutes. To further understand the neural circuits underlying declarative sequence learning over trials, we tracked changes in intracranial field potentials (IFPs) recorded from 1142 electrodes implanted throughout temporal and frontal cortical areas in 14 human subjects, while they learned the temporal-order of multiple sequences of images over trials through repeated recall. We observed an increase in power in the gamma frequency band (30–100 Hz) in the recall phase, particularly in areas within the temporal lobe including the parahippocampal gyrus. The degree of this gamma power enhancement decreased over trials with improved sequence recall. Modulation of gamma power was directly correlated with the improvement in recall performance. When presenting new sequences, gamma power was reset to high values and decreased again after learning. These observations suggest that signals in the gamma frequency band may play a more prominent role during the early steps of the learning process rather than during the maintenance of memory traces.

%B Frontiers in Systems Neuroscience %V 8 %8 01/21/2015 %G eng %U http://journal.frontiersin.org/article/10.3389/fnsys.2014.00222/abstract %! Front. Syst. Neurosci. %R 10.3389/fnsys.2014.00222 %0 Journal Article %J Journal of Neurophysiology %D 2015 %T Sensitivity to timing and order in human visual cortex %A Jedediah Singer %A Joseph Madsen %A WS Anderson %A Gabriel Kreiman %B Journal of Neurophysiology %V 113 %P 1656 - 1669 %8 Jan-03-2015 %G eng %U http://jn.physiology.org/lookup/doi/10.1152/jn.00556.2014 %N 5 %! J Neurophysiol %R 10.1152/jn.00556.2014 %0 Generic %D 2015 %T UNSUPERVISED LEARNING OF VISUAL STRUCTURE USING PREDICTIVE GENERATIVE NETWORKS %A William Lotter %A Gabriel Kreiman %A David Cox %X

The ability to predict future states of the environment is a central pillar of intelligence. At its core, effective prediction requires an internal model of the world and an understanding of the rules by which the world changes. Here, we explore the internal models developed by deep neural networks trained using a loss based on predicting future frames in synthetic video sequences, using an Encoder-Recurrent-Decoder framework (Fragkiadaki et al., 2015). We first show that this architecture can achieve excellent performance in visual sequence prediction tasks, including state-of-the-art performance in a standard “bouncing balls” dataset (Sutskever et al., 2009). We then train on clips of out-of-the-plane rotations of computer-generated faces, using both mean-squared error and a generative adversarial loss (Goodfellow et al., 2014), extending the latter to a recurrent, conditional setting. Despite being trained end-to-end to predict only pixel-level information, our Predictive Generative Networks learn a representation of the latent variables of the underlying generative process. Importantly, we find that this representation is naturally tolerant to object transformations, and generalizes well to new tasks, such as classification of static images. Similar models trained solely with a reconstruction loss fail to generalize as effectively. We argue that prediction can serve as a powerful unsupervised loss for learning rich internal representations of high-level object features.

%8 12/15/2015 %G English %1

arXiv:1511.06380

%2

http://hdl.handle.net/1721.1/100275

%0 Journal Article %J Frontiers in Systems Neuroscience %D 2014 %T Corticocortical feedback increases the spatial extent of normalization. %A Jonathan J. Nassi %A Camille Gomez-Laberge %A Gabriel Kreiman %A Richard T Born %X

Normalization has been proposed as a canonical computation operating across different brain regions, sensory modalities, and species. It provides a good phenomenological description of non-linear response properties in primary visual cortex (V1), including the contrast response function and surround suppression. Despite its widespread application throughout the visual system, the underlying neural mechanisms remain largely unknown. We recently observed that corticocortical feedback contributes to surround suppression in V1, raising the possibility that feedback acts through normalization. To test this idea, we characterized area summation and contrast response properties in V1 with and without feedback from V2 and V3 in alert macaques and applied a standard normalization model to the data. Area summation properties were well explained by a form of divisive normalization, which computes the ratio between a neuron's driving input and the spatially integrated activity of a "normalization pool." Feedback inactivation reduced surround suppression by shrinking the spatial extent of the normalization pool. This effect was independent of the gain modulation thought to mediate the influence of contrast on area summation, which remained intact during feedback inactivation. Contrast sensitivity within the receptive field center was also unaffected by feedback inactivation, providing further evidence that feedback participates in normalization independent of the circuit mechanisms involved in modulating contrast gain and saturation. These results suggest that corticocortical feedback contributes to surround suppression by increasing the visuotopic extent of normalization and, via this mechanism, feedback can play a critical role in contextual information processing.

%B Frontiers in Systems Neuroscience %V 8 %P 105 %8 05/30/2014 %G eng %U http://journal.frontiersin.org/article/10.3389/fnsys.2014.00105/abstract %R 10.3389/fnsys.2014.00105 %0 Book Section %B Single Neuron Studies of the Brain: Probing Cognition %D 2014 %T Data Analysis techniques for human microwire recordings: spike detection and sorting, decoding, relation between units and local field potentials %A Rutishauser, U %A Moran Cerf %A Gabriel Kreiman %B Single Neuron Studies of the Brain: Probing Cognition %G eng %& 6 %0 Book Section %B Cognitive Neuroscience %D 2014 %T Neural correlates of consciousness: perception and volition %A Gabriel Kreiman %B Cognitive Neuroscience %V V %G eng %0 Journal Article %J Journal of Neuroscience %D 2014 %T Neural Dynamics Underlying Target Detection in the Human Brain %A Bansal, A %A Radhika Madhavan %A Agam, Y %A Golby, A %A Joseph Madsen %A Gabriel Kreiman %B Journal of Neuroscience %V 34 %G eng %& 3042 %0 Book Section %B Single neuron studies of the human brain. Probing cognition %D 2014 %T The next ten years and beyond %A Gabriel Kreiman %A Rutishauser, U %A Moran Cerf %A Itzhak Fried %B Single neuron studies of the human brain. Probing cognition %G eng %& 19 %0 Generic %D 2014 %T A normalization model of visual search predicts single trial human fixations in an object search task. %A Thomas Miconi %A Laura Groomes %A Gabriel Kreiman %K Circuits for Intelligence %K Pattern recognition %X

When searching for an object in a scene, how does the brain decide where to look next? Theories of visual search suggest the existence of a global attentional map, computed by integrating bottom-up visual information with top-down, target-specific signals. Where, when and how this integration is performed remains unclear. Here we describe a simple mechanistic model of visual search that is consistent with neurophysiological and neuroanatomical constraints, can localize target objects in complex scenes, and predicts single-trial human behavior in a search task among complex objects. This model posits that target-specific modulation is applied at every point of a retinotopic area selective for complex visual features and implements local normalization through divisive inhibition. The combination of multiplicative modulation and divisive normalization creates an attentional map in which aggregate activity at any location tracks the correlation between input and target features, with relative and controllable independence from bottom-up saliency. We first show that this model can localize objects in both composite images and natural scenes and demonstrate the importance of normalization for successful search. We next show that this model can predict human fixations on single trials, including error and target-absent trials. We argue that this simple model captures non-trivial properties of the attentional system that guides visual search in humans.

%8 04/2014 %1

arXiv:1404.6453v1

%2

http://hdl.handle.net/1721.1/100172

%0 Generic %D 2014 %T People, objects and interactions in movies %A Gabriel Kreiman %K Computer vision %X

This database contains annotations for commercial movies including information about presence/ absence of specific people, their viewpoints, their motion, their emotions, presence/absence of specific objects and their motion.

See the related publication - https://cbmm.mit.edu/publications/predicting-episodic-memory-formation-movie-events

See the related code - https://cbmm.mit.edu/publications/predicting-episodic-memory-formation-movie-events-code

Click HERE for to access the website to download the database >

%8 06/2014 %0 Generic %D 2014 %T A role for recurrent processing in object completion: neurophysiological, psychophysical and computational evidence. %A Hanlin Tang %A Buia, Calin %A Joseph Madsen %A WS Anderson %A Gabriel Kreiman %X

Recognition of objects from partial information presents a significant challenge for theories of vision because it requires spatial integration and extrapolation from prior knowledge. We combined neurophysiological recordings in human cortex with psychophysical measurements and computational modeling to investigate the mechanisms involved in object completion. We recorded intracranial field potentials from 1,699 electrodes in 18 epilepsy patients to measure the timing and selectivity of responses along human visual cortex to whole and partial objects. Responses along the ventral visual stream remained selective despite showing only 9>25 of the object. However, these visually selective signals emerged ~100 ms later for partial versus whole objects. The processing delays were particularly pronounced in higher visual areas within the ventral stream, suggesting the involvement of additional recurrent processing. In separate psychophysics experiments, disrupting this recurrent computation with a backward mask at ~75ms significantly impaired recognition of partial, but not whole, objects. Additionally, computational modeling shows that the performance of a purely bottom>up architecture is impaired by heavy occlusion and that this effect can be partially rescued via the incorporation of top>down connections. These results provide spatiotemporal constraints on theories of object recognition that involve recurrent processing to recognize objects from partial information.

%8 04/2014 %1

arXiv 1409.2942

%2

http://hdl.handle.net/1721.1/100173

%0 Generic %D 2014 %T Sensitivity to Timing and Order in Human Visual Cortex. %A Jedediah Singer %A Joseph Madsen %A WS Anderson %A Gabriel Kreiman %K Circuits for Intelligence %K Pattern recognition %K Visual %X

Visual recognition takes a small fraction of a second and relies on the cascade of signals along the ventral visual stream. Given the rapid path through multiple processing steps between photoreceptors and higher visual areas, information must progress from stage to stage very quickly. This rapid progression of information suggests that fine temporal details of the neural response may be important to the how the brain encodes visual signals. We investigated how changes in the relative timing of incoming visual stimulation affect the representation of object information by recording intracranial field potentials along the human ventral visual stream while subjects recognized objects whose parts were presented with varying asynchrony. Visual responses along the ventral stream were sensitive to timing differences between parts as small as 17 ms. In particular, there was a strong dependency on the temporal order of stimulus presentation, even at short asynchronies. This sensitivity to the order of stimulus presentation provides evidence that the brain may use differences in relative timing as a means of representing information.

%8 04/2014 %1

arXiv:1404.6420v1

%2

http://hdl.handle.net/1721.1/100170

%0 Journal Article %J J Vis %D 2014 %T Short temporal asynchrony disrupts visual object recognition. %A Jedediah Singer %A Gabriel Kreiman %K Adult %K Female %K Form Perception %K Humans %K Male %K Pattern Recognition, Visual %K Psychophysics %K Time Factors %K Vision, Ocular %K Visual Pathways %K Young Adult %X

Humans can recognize objects and scenes in a small fraction of a second. The cascade of signals underlying rapid recognition might be disrupted by temporally jittering different parts of complex objects. Here we investigated the time course over which shape information can be integrated to allow for recognition of complex objects. We presented fragments of object images in an asynchronous fashion and behaviorally evaluated categorization performance. We observed that visual recognition was significantly disrupted by asynchronies of approximately 30 ms, suggesting that spatiotemporal integration begins to break down with even small deviations from simultaneity. However, moderate temporal asynchrony did not completely obliterate recognition; in fact, integration of visual shape information persisted even with an asynchrony of 100 ms. We describe the data with a concise model based on the dynamic reduction of uncertainty about what image was presented. These results emphasize the importance of timing in visual processing and provide strong constraints for the development of dynamical models of visual shape recognition.

%B J Vis %V 14 %P 7 %8 2014 %G eng %N 5 %R 10.1167/14.5.7 %0 Journal Article %J Journal of Vision %D 2014 %T Short temporal asynchrony disrupts visual object recognition %A Jedediah Singer %A Gabriel Kreiman %B Journal of Vision %V 12 %G eng %N 14 %0 Generic %D 2014 %T Short temporal asynchrony disrupts visual object recognition %A Jedediah Singer %A Gabriel Kreiman %X

Intracranial field potential recordings, images and code from Liu et al. (2009). The data shows rapid responses along the ventral visual stream in the human brain with selectivity to faces and objects, tolerance to object transformations than can be decoded in single trials.

Used in conjunction with this mirrored CBMM Code entry

%I Journal of Vision %U http://klab.tch.harvard.edu/resources/singer_asynchrony.html %0 Generic %D 2014 %T Short temporal asynchrony disrupts visual object recognition %A Jedediah Singer %A Gabriel Kreiman %X

Intracranial field potential recordings, images and code from Liu et al. (2009). The data shows rapid responses along the ventral visual stream in the human brain with selectivity to faces and objects, tolerance to object transformations than can be decoded in single trials

Used in conjunction with this mirrored CBMM Dataset entry

%I Journal of Vision %U http://klab.tch.harvard.edu/resources/singer_asynchrony.html %0 Book %B Probing cognition %D 2014 %T Single Neuron Studies of the Human Brain. Probing Cognition %A Itzhak Fried %A Ueli Rutishauser %A Moran Cerf %A Gabriel Kreiman %B Probing cognition %I MIT Press %C Cambridge, MA %G eng %0 Journal Article %J Neuron %D 2014 %T Spatiotemporal Dynamics Underlying Object Completion in Human Ventral Visual Cortex %A Hanlin Tang %A Buia, Calin %A Radhika Madhavan %A NE Crone %A Joseph Madsen %A WS Anderson %A Gabriel Kreiman %K Circuits for Intelligence %K vision %X

Natural vision often involves recognizing objects from partial information. Recognition of objects from parts presents a significant challenge for theories of vision because it requires spatial integration and extrapolation from prior knowledge. Here we recorded intracranial field potentials of 113 visually selective electrodes from epilepsy patients in response to whole and partial objects. Responses along the ventral visual stream, particularly the Inferior Occipital and Fusiform Gyri, remained selective despite showing only 9-25% of the object areas. However, these visually selective signals emerged ~100 ms later for partial versus whole objects. These processing delays were particularly pronounced in higher visual areas within the ventral stream. This latency difference persisted when controlling for changes in contrast, signal amplitude, and the strength of selectivity. These results argue against a purely feed-forward explanation of recognition from partial information, and provide spatiotemporal constraints on theories of object recognition that involve recurrent processing.

%B Neuron %V 83 %P 736 - 748 %8 08/06/2014 %G eng %U http://linkinghub.elsevier.com/retrieve/pii/S089662731400539Xhttp://api.elsevier.com/content/article/PII:S089662731400539X?httpAccept=text/xmlhttp://api.elsevier.com/content/article/PII:S089662731400539X?httpAccept=text/plain %N 3 %! Neuron %R 10.1016/j.neuron.2014.06.017 %0 Book Section %B Single neuron studies of the human brain. Probing cognition %D 2014 %T Visual cognitive adventures of single neurons in the human medial temporal lobe %A Mormann, F %A Matias J. Ison %A Quiroga, RQ %A Christof Koch %A Itzhak Fried %A Gabriel Kreiman %B Single neuron studies of the human brain. Probing cognition %G eng %& 8 %0 Book Section %B Principles of neural coding %D 2013 %T Computational models of visual object recognition %A Gabriel Kreiman %B Principles of neural coding %G eng %0 Generic %D 2009 %T Timing, timing, timing: Fast decoding of object inforrmation from intracranial field potentials in human visual cortex %A Liu, H %A Agam, Y %A Joseph Madsen %A Gabriel Kreiman %X

Rapid responses along the ventral visual stream in the human brain show selectivity to faces and objects, tolerance to object transformations which can be decoded in single trials.

%I Neuron %U http://klab.tch.harvard.edu/resources/liuetal_timing3.html