Researchers find similarities between how some computer-vision systems process images and how humans see out of the corners of our eyes.
Adam Zewe | MIT News Office
Perhaps computer vision and human vision have more in common than meets the eye?
Research from MIT suggests that a certain type of robust computer-vision model perceives visual representations similarly to the way humans do using peripheral vision. These models, known as adversarially robust models, are designed to overcome subtle bits of noise that have been added to image data.
The way these models learn to transform images is similar to some elements involved in human peripheral processing, the researchers found. But because machines do not have a visual periphery, little work on computer vision models has focused on peripheral processing, says senior author Arturo Deza, a postdoc in the Center for Brains, Minds, and Machines.
“It seems like peripheral vision, and the textural representations that are going on there, have been shown to be pretty useful for human vision. So, our thought was, OK, maybe there might be some uses in machines, too,” says lead author Anne Harrington, a graduate student in the Department of Electrical Engineering and Computer Science.
The results suggest that designing a machine-learning model to include some form of peripheral processing could enable the model to automatically learn visual representations that are robust to some subtle manipulations in image data. This work could also help shed some light on the goals of peripheral processing in humans, which are still not well-understood, Deza adds.
The research will be presented at the International Conference on Learning Representations.
Humans and computer vision systems both have what is known as foveal vision, which is used for scrutinizing highly detailed objects. Humans also possess peripheral vision, which is used to organize a broad, spatial scene. Typical computer vision approaches attempt to model foveal vision — which is how a machine recognizes objects — and tend to ignore peripheral vision, Deza says.
But foveal computer vision systems are vulnerable to adversarial noise, which is added to image data by an attacker. In an adversarial attack, a malicious agent subtly modifies images so each pixel has been changed very slightly — a human wouldn’t notice the difference, but the noise is enough to fool a machine. For example, an image might look like a car to a human, but if it has been affected by adversarial noise, a computer vision model may confidently misclassify it as, say, a cake, which could have serious implications in an autonomous vehicle.
To overcome this vulnerability, researchers conduct what is known as adversarial training, where they create images that have been manipulated with adversarial noise, feed them to the neural network, and then correct its mistakes by relabeling the data and then retraining the model.
“Just doing that additional relabeling and training process seems to give a lot of perceptual alignment with human processing,” Deza says.
He and Harrington wondered if these adversarially trained networks are robust because they encode object representations that are similar to human peripheral vision. So, they designed a series of psychophysical human experiments to test their hypothesis.
They started with a set of images and used three different computer vision models to synthesize representations of those images from noise: a “normal” machine-learning model, one that had been trained to be adversarially robust, and one that had been specifically designed to account for some aspects of human peripheral processing, called Texforms...
Read the full story on the MIT News website using the link below.