%0 Journal Article
%J Scientific Reports
%D 2020
%T Scale and translation-invariance for novel objects in human vision
%A Yena Han
%A Gemma Roig
%A Geiger, Gad
%A Tomaso Poggio
%X <p>Though the range of invariance in recognition of novel objects is a basic aspect of human vision, its characterization has remained surprisingly elusive. Here we report tolerance to scale and position changes in one-shot learning by measuring recognition accuracy of Korean letters presented in a flash to non-Korean subjects who had no previous experience with Korean letters. We found that humans have significant scale-invariance after only a single exposure to a novel object. The range of translation-invariance is limited, depending on the size and position of presented objects. to understand the underlying brain computation associated with the invariance properties, we compared experimental data with computational modeling results. our results suggest that to explain invariant recognition of objects by humans, neural network models should explicitly incorporate built-in scale-invariance, by encoding different scale channels as well as eccentricity-dependent representations captured by neurons’ receptive field sizes and sampling density that change with eccentricity. Our psychophysical experiments and related simulations strongly suggest that the human visual system uses a computational strategy that differs in some key aspects from current deep learning architectures, being more data efficient and relying more critically on eye-movements.</p>
%B Scientific Reports
%V 10
%8 01/2020
%G eng
%U http://www.nature.com/articles/s41598-019-57261-6
%N 1411
%! Sci Rep
%R 10.1038/s41598-019-57261-6

%0 Conference Paper
%B Vision Science Society
%D 2019
%T Eccentricity Dependent Neural Network with Recurrent Attention for Scale, Translation and Clutter Invariance
%A Jiaxuan Zhang
%A Yena Han
%A Tomaso Poggio
%A Gemma Roig
%B Vision Science Society
%C Florida, USA
%8 05/2019
%G eng

%0 Conference Paper
%B Vision Science Society
%D 2019
%T Properties of invariant object recognition in human oneshot learning suggests a hierarchical architecture different from deep convolutional neural networks
%A Yena Han
%A Gemma Roig
%A Geiger, Gad
%A Tomaso Poggio
%B Vision Science Society
%C St Pete Beach, FL, USA
%8 05/2019
%G eng
%U https://jov.arvojournals.org/article.aspx?articleid=2749961https://jov.arvojournals.org/article.aspx?articleid=2749961
%R 10.1167/19.10.28d

%0 Conference Paper
%B Vision Science Society
%D 2019
%T Properties of invariant object recognition in human one-shot learning suggests a hierarchical architecture different from deep convolutional neural networks
%A Yena Han
%A Gemma Roig
%A Geiger, Gad
%A Tomaso Poggio
%B Vision Science Society
%C Florida, USA
%8 05/2019
%G eng

%0 Generic
%D 2017
%T Do Deep Neural Networks Suffer from Crowding?
%A Anna Volokitin
%A Gemma Roig
%A Tomaso Poggio
%X <p>Crowding is a visual effect suffered by humans, in which an object that can be recognized in isolation can no longer be recognized when other objects, called flankers, are placed close to it. In this work, we study the effect of crowding in artificial Deep Neural Networks for object recognition. We analyze both standard deep convolutional neural networks (DCNNs) as well as a new version of DCNNs which is 1) multi-scale and 2) with size of the convolution filters change depending on the eccentricity wrt to the center of fixation. Such networks, that we call eccentricity-dependent, are a computational model of the feedforward path of the primate visual cortex. Our results reveal that the eccentricity-dependent model, trained on target objects in isolation, can recognize such targets in the presence of flankers, if the targets are near the center of the image, whereas DCNNs cannot. Also, for all tested networks, when trained on targets in isolation, we find that recognition accuracy of the networks decreases the closer the flankers are to the target and the more flankers there are. We find that visual similarity between the target and flankers also plays a role and that pooling in early layers of the network leads to more crowding. Additionally, we show that incorporating the flankers into the images of the training set does not improve performance with crowding.</p>    <p><a href="/node/2919">Associated code for this paper.</a></p>
%8 06/2017
%1 <p><a href="https://arxiv.org/abs/1706.08616">arXiv:1706.08616</a></p>
%2 <p><a href="http://hdl.handle.net/1721.1/110348">http://hdl.handle.net/1721.1/110348</a></p>

%0 Generic
%D 2017
%T Do Deep Neural Networks Suffer from Crowding? [code]
%A Anna Volokitin
%A Gemma Roig
%X <p>This code accompanies the paper <a href="/node/2914">"Do Deep Neural Networks Suffer from Crowding?"</a> by Anna Volokitin, Gemma Roig and Tomaso Poggio [1].</p>    <p>The main purpose of this repository is to provide an implementation of the eccentricity-dependent model [3], as well as to show an example of our experiments carried in [1]. This code is inspired by the implementation described in [2]. Yet, it is is not intended to replicate the results reported in [2].</p>    <p>The code is provided as is and is for academic purpose only.</p>    <p>Contact voanna AT vision.ee.ethz.ch and gemmar AT mit.edu for questions.</p>    <p><a href="https://github.com/CBMM/eccentricity">GitHub repository for this code for downloading, cloning, etc.</a></p>
%8 06/2017

%0 Generic
%D 2017
%T Eccentricity Dependent Deep Neural Networks for Modeling Human Vision
%A Gemma Roig
%A Francis Chen
%A X Boix
%A Tomaso Poggio
%B Vision Sciences Society

%0 Conference Paper
%B AAAI Spring Symposium Series, Science of Intelligence
%D 2017
%T Eccentricity Dependent Deep Neural Networks: Modeling Invariance in Human Vision
%A Francis Chen
%A Gemma Roig
%A Leyla Isik
%A X Boix
%A Tomaso Poggio
%X <p>Humans can recognize objects in a way that is invariant to scale, translation, and clutter. We use invariance theory as a conceptual basis, to computationally model this phenomenon. This theory discusses the role of eccentricity in human visual processing, and is a generalization of feedforward convolutional neural networks (CNNs). Our model explains some key psychophysical observations relating to invariant perception, while maintaining important similarities with biological neural architectures. To our knowledge, this work is the first to unify explanations of all three types of invariance, all while leveraging the power and neurological grounding of CNNs.</p>
%B AAAI Spring Symposium Series, Science of Intelligence
%G eng
%U https://www.aaai.org/ocs/index.php/SSS/SSS17/paper/view/15360

%0 Generic
%D 2017
%T On the Human Visual System Invariance to Translation and Scale
%A Yena Han
%A Gemma Roig
%A Gadi Geiger
%A Tomaso Poggio
%B Vision Sciences Society

%0 Conference Paper
%B AAAI Spring Symposium Series, Science of Intelligence
%D 2017
%T Is the Human Visual System Invariant to Translation and Scale?
%A Yena Han
%A Gemma Roig
%A Gadi Geiger
%A Tomaso Poggio
%B AAAI Spring Symposium Series, Science of Intelligence
%G eng

%0 Generic
%D 2016
%T Foveation-based Mechanisms Alleviate  Adversarial Examples
%A Luo, Yan
%A X Boix
%A Gemma Roig
%A Tomaso Poggio
%A Qi Zhao
%X <p>We show that adversarial examples,&nbsp;<em>i.e.</em>,&nbsp;the visually imperceptible perturbations that result in Convolutional Neural Networks (CNNs) fail, can be alleviated with a mechanism based on foveations---applying the CNN in different image regions. To see this, first, we report results in ImageNet that lead to a revision of the hypothesis that adversarial perturbations are a consequence of CNNs acting as a linear classifier: CNNs act locally linearly to changes in the image regions with objects recognized by the CNN, and in other regions the CNN may act non-linearly. Then, we corroborate that when the neural responses are linear, applying the foveation mechanism to the adversarial example tends to significantly reduce the effect of the perturbation. This is because, hypothetically, the CNNs for ImageNet are robust to changes of scale and translation of the object produced by the foveation, but this property does not generalize to transformations of the perturbation. As a result, the accuracy after a foveation is almost the same as the accuracy of the CNN without the adversarial perturbation, even if the adversarial perturbation is calculated taking into account a foveation.</p>
%8 01/2016
%G English
%1 <p><a href="http://arxiv.org/abs/1511.06292">arXiv:1511.06292</a></p>
%2 <p><a href="http://hdl.handle.net/1721.1/100981">http://hdl.handle.net/1721.1/100981</a></p>