Simulating a Primary Visual Cortex at the Front of CNNs Improves Robustness to Image Perturbations [video]
December 3, 2020
December 3, 2020
All Captioned Videos Publication Releases
Hear from the lead authors Tiago Maques (MIT), Joel Dapello (Harvard), and PI James DiCarlo (MIT), about their development of VOneNets, a new class of hybrid CNN vision models. This paper was accepted in the Advances in Neural Information Processing Systems 33 pre-proceedings (NeurIPS 2020).
[MUSIC PLAYING] TIAGO MARQUES: When we look at an image, our brains quickly process the visual information captured in the retina. Allowing us to identify the objects containing it within just a fraction of a second. This behavior is called Object Recognition, and it's been at the center of our lab's research for over a decade.
Convolutional Neural Networks, or CNNs for short, are a class of computer vision models which have been successful not only in achieving object recognition with human-level performance, but also in predicting primate neuronal and behavioral data related to this aspect of vision. However, currently this model still exhibits significant limitations when compared to human visual abilities.
In particular, they can be fooled by imperceptibly small, explicitly-crafted perturbations called adversarial attacks. To illustrate this limitation, we can see this image of my late cat, Milou, which the model has no difficulties recognizing as a cat. However, when we add this specific noise-like pattern, the same model now identifies the new perturbed image as a sleeping bag.
Beyond the major safety concern in real world deployment of CNN models-- for instance, in self-driving cars-- this phenomenon reveals a gross misalignment with the qualities of human vision. Developing CNNs that are more robust would have big implications in both neuroscience and machine learning. But how can we develop CNNs that robustly generalize like we do?
JOEL DAPELLO: In the past, our lab developed Brain-Score, a platform that allows us to evaluate how well models approximate different regions in the ventral stream, the set of cortical areas responsible for the visual processing underlying object recognition. Taking advantage of Brain-Score, we observed that a CNN's robustness to adversarial attacks is strongly correlated with its ability to approximate the primate primary visual cortex, also known as area V1, which is the first region of the cortex to receive visual information.
In this plot, we can see that models that better approximate V1 were also more robust to adversarial attacks. Inspired by this, we developed VOneNet, a new class of CNNs containing a neuroscientific model of the primary visual cortex as the front end. This front end mimics the visual processing that takes place in V1, and contains neuronal populations that can be mapped to those in the primate brain, such a simple and complex cells.
In spite of not being designed explicitly with that purpose, VOneNets were considerably more robust in standard models, showing a higher accuracy under white box attacks and the corresponding base models. And outperforming current state-of-the-art methods on a conglomerate benchmark of perturbations. Finally, we dissected the components of the V1 model that were responsible for this improvement. And we observed that they all work in synergy in supporting model robustness.
JAMES DICARLO: Although much work is left to be done, this study is quite exciting. Because it directly shows the developing models that work closely approximating the primate brain. Models that were really developed by neuroscience. That leads to substantial gains in computer vision.
And indeed, I see this is just one example of a turn in a virtuous cycle. That cycle whereby neuroscience influences models of Artificial Intelligence, and Artificial Intelligence models come back to influence neuroscience as hypotheses and tools. This is really quite an exciting time in our field, and I think of this study as just one example of that virtuous cycle.
Associated Research Module: