One thing to fool them all: generating interpretable, universal, and physically-realizable adversarial features

TitleOne thing to fool them all: generating interpretable, universal, and physically-realizable adversarial features
Publication TypeJournal Article
Year of Publication2022
AuthorsCasper, S, Nadeau, M, Kreiman, G
JournalarXiv
Date Published01/2022
Abstract

It is well understood that modern deep networks are vulnerable to adversarial attacks. However, conventional attack methods fail to produce adversarial perturbations that are intelligible to humans, and they pose limited threats in the physical world. To study feature-class associations in networks and better understand their vulnerability to attacks in the real world, we develop feature-level adversarial perturbations using deep image generators and a novel optimization objective. We term these feature-fool attacks. We show that they are versatile and use them to generate targeted feature-level attacks at the ImageNet scale that are simultaneously interpretable, universal to any source image, and physically-realizable. These attacks reveal spurious, semantically-describable feature/class associations that can be exploited by novel combinations of objects. We use them to guide the design of “copy/paste” adversaries in which one natural image is pasted into another to cause a targeted misclassification.

URLhttps://arxiv.org/abs/2110.03605
DOI10.48550/arXiv.2110.03605
Download:  PDF icon 2110.03605.pdf

Associated Module: 

CBMM Relationship: 

  • CBMM Funded