%0 Journal Article
%J Science Advances
%D 2020
%T Efficient inverse graphics in biological face processing
%A Ilker Yildirim
%A Mario Belledonne
%A W. A. Freiwald
%A Joshua B. Tenenbaum
%X <p>Vision not only detects and recognizes objects, but performs rich inferences about the underlying scene structure that causes the patterns of light we see. Inverting generative models, or “analysis-by-synthesis”, presents a possible solution, but its mechanistic implementations have typically been too slow for online perception, and their mapping to neural circuits remains unclear. Here we present a neurally plausible efficient inverse graphics model and test it in the domain of face recognition. The model is based on a deep neural network that learns to invert a three-dimensional face graphics program in a single fast feedforward pass. It explains human behavior qualitatively and quantitatively, including the classic “hollow face” illusion, and it maps directly onto a specialized face-processing circuit in the primate brain. The model fits both behavioral and neural data better than state-of-the-art computer vision models, and suggests an interpretable reverse-engineering account of how the brain transforms images into percepts.</p>
%B Science Advances
%V 6
%P eaax5979
%8 03/2020
%G eng
%U https://advances.sciencemag.org/lookup/doi/10.1126/sciadv.aax5979
%N 10
%! Sci. Adv.
%R 10.1126/sciadv.aax5979

%0 Conference Paper
%B 39th Annual Conference of the Cognitive Science Society
%D 2017
%T Causal and compositional generative models in online perception
%A Ilker Yildirim
%A Michael Janner
%E Mario Belledonne
%E Christian Wallraven
%E W. A. Freiwald
%E Joshua B. Tenenbaum
%X <div>  <div>  <div>  <p>From a quick glance or the touch of an object, our brains map sensory signals to scenes composed of rich and detailed shapes and surfaces. Unlike the standard pattern recognition approaches to perception, we argue that this mapping draws on internal causal and compositional models of the outside phys- ical world, and that such internal models underlie the general- ization capacity of human perception. Here, we present a gen- erative model of visual and multisensory perception in which the latent variables encode intrinsic properties of objects such as their shapes and surfaces in addition to their extrinsic prop- erties such as pose and occlusion. These latent variables can be composed in novel ways and are inputs to sensory-specific causal models that output sense-specific signals. We present a novel recognition network that performs efficient inference in the generative model, computing at a speed similar to online perception. We show that our model, but not an alternative baseline model or a lesion of our model, can account for hu- man performance in an occluded face matching task and in a cross-modal visual-to-haptic face matching task.&nbsp;</p>  </div>  </div>  </div>
%B 39th Annual Conference of the Cognitive Science Society
%C London, UK
%G eng

%0 Conference Proceedings
%B 39th Annual Meeting of the Cognitive Science Society - COGSCI 2017
%D 2017
%T Causal and compositional generative models in online perception
%A Ilker Yildirim
%A Michael Janner
%A Mario Belledonne
%A Christian Wallraven
%A W. A. Freiwald
%A Joshua B. Tenenbaum
%X <p>From a quick glance or the touch of an object, our brains map sensory signals to scenes composed of rich and detailed shapes and surfaces. Unlike the standard approaches to perception, we argue that this mapping draws on internal causal and compositional models of the physical world and these internal models underlie the generalization capacity of human perception. Here, we present a generative model of visual and multisensory perception in which the latent variables encode intrinsic (e.g., shape) and extrinsic (e.g., occlusion) object properties. Latent variables are inputs to causal models that output sense-specific signals. We present a recognition network that performs efficient inference in the generative model, computing at a speed similar to online perception. We show that our model, but not alternatives, can account for human performance in an occluded face matching task and in a visual-to-haptic face matching task.</p>
%B 39th Annual Meeting of the Cognitive Science Society - COGSCI 2017
%C London, UK
%8 07/2017
%G eng
%U https://mindmodeling.org/cogsci2017/papers/0266/index.html