|Title||One thing to fool them all: generating interpretable, universal, and physically-realizable adversarial features|
|Publication Type||Journal Article|
|Year of Publication||2022|
|Authors||Casper, S, Nadeau, M, Kreiman, G|
It is well understood that modern deep networks are vulnerable to adversarial attacks. However, conventional attack methods fail to produce adversarial perturbations that are intelligible to humans, and they pose limited threats in the physical world. To study feature-class associations in networks and better understand their vulnerability to attacks in the real world, we develop feature-level adversarial perturbations using deep image generators and a novel optimization objective. We term these feature-fool attacks. We show that they are versatile and use them to generate targeted feature-level attacks at the ImageNet scale that are simultaneously interpretable, universal to any source image, and physically-realizable. These attacks reveal spurious, semantically-describable feature/class associations that can be exploited by novel combinations of objects. We use them to guide the design of “copy/paste” adversaries in which one natural image is pasted into another to cause a targeted misclassification.
- CBMM Funded