Recognizing an object’s category and pose lies at the heart of visual understanding. Recent works suggest that deep neural networks (DNNs) often fail to generalize to category-pose combinations not seen during training. However, it is unclear when and how such generalization may be possible. Does the number of combinations seen during training impact generalization? Is it better to learn category and pose in separate networks, or in a single shared network? Furthermore, what are the neural mechanisms that drive the network’s generalization? In this paper, we answer these questions by analyzing state-of-the-art DNNs trained to recognize both object category and pose (position, scale, and 3D viewpoint) with quantitative control over the number of category-pose combinations seen during training. We also investigate the emergence of two types of specialized neurons that can explain generalization to unseen combinations—neurons selective to category and invariant to pose, and vice versa. We perform experiments on MNIST extended with position or scale, the iLab dataset with vehicles at different viewpoints, and a challenging new dataset for car model recognition and viewpoint estimation that we introduce in this paper, the Biased-Cars dataset. Our results demonstrate that as the number of combinations seen during training increases, networks generalize better to unseen category-pose combinations, facilitated by an increase in the selectivity and invariance of individual neurons. We find that learning category and pose in separate networks compared to a shared one leads to an increase in such selectivity and invariance, as separate networks are not forced to preserve information about both category and pose. This enables separate networks to significantly outperform shared ones at predicting unseen category-pose combinations.

%8 07/2020 %2https://hdl.handle.net/1721.1/126262

%0 Generic %D 2020 %T Do Neural Networks for Segmentation Understand Insideness? %A Kimberly M. Villalobos %A Vilim Štih %A Amineh Ahmadinejad %A Shobhita Sundaram %A Jamell Dozier %A Andrew Francl %A Frederico Azevedo %A Tomotake Sasaki %A Xavier Boix %XThe insideness problem is an image segmentation modality that consists of determining which pixels are inside and outside a region. Deep Neural Networks (DNNs) excel in segmentation benchmarks, but it is unclear that they have the ability to solve the insideness problem as it requires evaluating long-range spatial dependencies. In this paper, the insideness problem is analyzed in isolation, without texture or semantic cues, such that other aspects of segmentation do not interfere in the analysis. We demonstrate that DNNs for segmentation with few units have sufficient complexity to solve insideness for any curve. Yet, such DNNs have severe problems to learn general solutions. Only recurrent networks trained with small images learn solutions that generalize well to almost any curve. Recurrent networks can decompose the evaluation of long-range dependencies into a sequence of local operations, and learning with small images alleviates the common difficulties of training recurrent networks with a large number of unrolling steps.

%8 04/2020 %2https://hdl.handle.net/1721.1/124491

%0 Generic %D 2018 %T Can Deep Neural Networks Do Image Segmentation by Understanding Insideness? %A Kimberly M. Villalobos %A Jamel Dozier %A Vilim Stih %A Andrew Francl %A Frederico Azevedo %A Tomaso Poggio %A Tomotake Sasaki %A Xavier Boix %X**THIS MEMO IS REPLACED BY CBMM MEMO 105**

A key component of visual cognition is the understanding of spatial relationships among objects. Albeit effortless to our visual system, state-of-the-art artificial neural networks struggle to distinguish basic spatial relationships among elements in an image. As shown here, deep neural networks (DNNs) trained with hundreds of thousands of labeled examples cannot accurately distinguish whether pixels lie inside or outside 2D shapes, a problem that seems much simpler than image segmentation. In this paper, we sought to analyze the capability of ANN to solve such inside/outside problems using an analytical approach. We demonstrate that it is a mathematically tractable problem and that two previously proposed algorithms, namely the Ray-Intersection Method and the Coloring Method, achieve perfect accuracy when implemented in the form of DNNs.

%8 12/2018 %0 Conference Paper %B workshop on "AI for Social Good", NIPS 2018 %D 2018 %T The Language of Fake News: Opening the Black-Box of Deep Learning Based Detectors %A Nicole O'Brien %A Sophia Latessa %A Georgios Evangelopoulos %A Xavier Boix %XThe digital information age has generated new outlets for content creators to publish so-called “fake news”, a new form of propaganda that is intentionally designed to mislead the reader. With the widespread effects of the fast dissemination of fake news, efforts have been made to automate the process of fake news detection. A promising solution that has come up recently is to use machine learning to detect patterns in the news sources and articles, specifically deep neural networks, which have been successful in natural language processing. However, deep networks come with lack of transparency in the decision-making process, i.e. the “black-box problem”, which obscures its reliability. In this paper, we open this “black-box” and we show that the emergent representations from deep neural networks capture subtle but consistent differences in the language of fake and real news: signatures of exaggeration and other forms of rhetoric. Unlike previous work, we test the transferability of the learning process to novel news topics. Our results demonstrate the generalization capabilities of deep learning to detect fake news in novel subjects only from language patterns.

%B workshop on "AI for Social Good", NIPS 2018 %C Montreal, Canada %8 11/2018 %G eng %U http://hdl.handle.net/1721.1/120056 %0 Generic %D 2018 %T Single units in a deep neural network functionally correspond with neurons in the brain: preliminary results %A Luke Arend %A Yena Han %A Martin Schrimpf %A Pouya Bashivan %A Kohitij Kar %A Tomaso Poggio %A James J. DiCarlo %A Xavier Boix %Xhttp://hdl.handle.net/1721.1/118847

%0 Conference Paper %B AAAI Conference on Artificial Intelligence %D 2017 %T Active Video Summarization: Customized Summaries via On-line Interaction. %A Garcia del Molino, A %A X Boix %A Lim, J. %A Tan, A %B AAAI Conference on Artificial Intelligence %G eng %0 Generic %D 2017 %T Eccentricity Dependent Deep Neural Networks for Modeling Human Vision %A Gemma Roig %A Francis Chen %A X Boix %A Tomaso Poggio %B Vision Sciences Society %0 Conference Paper %B AAAI Spring Symposium Series, Science of Intelligence %D 2017 %T Eccentricity Dependent Deep Neural Networks: Modeling Invariance in Human Vision %A Francis Chen %A Gemma Roig %A Leyla Isik %A X Boix %A Tomaso Poggio %XHumans can recognize objects in a way that is invariant to scale, translation, and clutter. We use invariance theory as a conceptual basis, to computationally model this phenomenon. This theory discusses the role of eccentricity in human visual processing, and is a generalization of feedforward convolutional neural networks (CNNs). Our model explains some key psychophysical observations relating to invariant perception, while maintaining important similarities with biological neural architectures. To our knowledge, this work is the first to unify explanations of all three types of invariance, all while leveraging the power and neurological grounding of CNNs.

%B AAAI Spring Symposium Series, Science of Intelligence %G eng %U https://www.aaai.org/ocs/index.php/SSS/SSS17/paper/view/15360 %0 Generic %D 2017 %T Theory of Deep Learning III: explaining the non-overfitting puzzle %A Tomaso Poggio %A Keji Kawaguchi %A Qianli Liao %A Brando Miranda %A Lorenzo Rosasco %A Xavier Boix %A Jack Hidary %A Hrushikesh Mhaskar %X**THIS MEMO IS REPLACED BY CBMM MEMO 90**

A main puzzle of deep networks revolves around the absence of overfitting despite overparametrization and despite the large capacity demonstrated by zero training error on randomly labeled data. In this note, we show that the dynamical systems associated with gradient descent minimization of nonlinear networks behave near zero stable minima of the empirical error as gradient system in a quadratic potential with degenerate Hessian. The proposition is supported by theoretical and numerical results, under the assumption of stable minima of the gradient.

Our proposition provides the extension to deep networks of key properties of gradient descent methods for linear networks, that as, suggested in (1), can be the key to understand generalization. Gradient descent enforces a form of implicit regular- ization controlled by the number of iterations, and asymptotically converging to the minimum norm solution. This implies that there is usually an optimum early stopping that avoids overfitting of the loss (this is relevant mainly for regression). For classification, the asymptotic convergence to the minimum norm solution implies convergence to the maximum margin solution which guarantees good classification error for “low noise” datasets.

The implied robustness to overparametrization has suggestive implications for the robustness of deep hierarchically local networks to variations of the architecture with respect to the curse of dimensionality.

%8 12/2017 %1 %2http://hdl.handle.net/1721.1/113003

%0 Generic %D 2016 %T Foveation-based Mechanisms Alleviate Adversarial Examples %A Luo, Yan %A X Boix %A Gemma Roig %A Tomaso Poggio %A Qi Zhao %XWe show that adversarial examples, *i.e.*, the visually imperceptible perturbations that result in Convolutional Neural Networks (CNNs) fail, can be alleviated with a mechanism based on foveations---applying the CNN in different image regions. To see this, first, we report results in ImageNet that lead to a revision of the hypothesis that adversarial perturbations are a consequence of CNNs acting as a linear classifier: CNNs act locally linearly to changes in the image regions with objects recognized by the CNN, and in other regions the CNN may act non-linearly. Then, we corroborate that when the neural responses are linear, applying the foveation mechanism to the adversarial example tends to significantly reduce the effect of the perturbation. This is because, hypothetically, the CNNs for ImageNet are robust to changes of scale and translation of the object produced by the foveation, but this property does not generalize to transformations of the perturbation. As a result, the accuracy after a foveation is almost the same as the accuracy of the CNN without the adversarial perturbation, even if the adversarial perturbation is calculated taking into account a foveation.