One thing to fool them all: generating interpretable, universal, and physically-realizable adversarial features

Title	One thing to fool them all: generating interpretable, universal, and physically-realizable adversarial features
Publication Type	Journal Article
Year of Publication	2022
Authors	Casper, S, Nadeau, M, Kreiman, G
Journal	arXiv
Date Published	01/2022
Abstract	It is well understood that modern deep networks are vulnerable to adversarial attacks. However, conventional attack methods fail to produce adversarial perturbations that are intelligible to humans, and they pose limited threats in the physical world. To study feature-class associations in networks and better understand their vulnerability to attacks in the real world, we develop feature-level adversarial perturbations using deep image generators and a novel optimization objective. We term these feature-fool attacks. We show that they are versatile and use them to generate targeted feature-level attacks at the ImageNet scale that are simultaneously interpretable, universal to any source image, and physically-realizable. These attacks reveal spurious, semantically-describable feature/class associations that can be exploited by novel combinations of objects. We use them to guide the design of “copy/paste” adversaries in which one natural image is pasted into another to cause a targeted misclassification.
URL	https://arxiv.org/abs/2110.03605
DOI	10.48550/arXiv.2110.03605

Download:

Associated Module: