4.7 Article

DI-AA: An interpretable white-box attack for fooling deep neural networks

Journal

INFORMATION SCIENCES
Volume 610, Issue -, Pages 14-32

Publisher

ELSEVIER SCIENCE INC
DOI: 10.1016/j.ins.2022.07.157

Keywords

Adversarial example; Deep learning; Interpretability; Robustness; White-box attack

Funding

  1. National Key R&D Program of China [2020YFB2103802]

Ask authors/readers for more resources

In this study, an interpretable white-box AE attack approach is proposed, which generates low-perturbation adversarial examples by selecting the most contributing features and optimizing the perturbation using a relaxation technique and Lp norm. Experimental results demonstrate that this method performs well in attacking nonrobust models, evading adversarial-training robust models, and provides flexibility in the generation saturation of AEs.
White-box adversarial example (AE) attacks on deep neural networks (DNNs) have a more powerful destructive capacity than black-box attacks using AE strategies. However, few studies have been conducted on the generation of low-perturbation adversarial examples from the interpretability perspective. Specifically, adversaries who conducted attacks lacked interpretation from the point of view of DNNs, and the perturbation was not further considered. To address these, we propose an interpretable white-box AE attack approach, DI-AA, which not only explores the application of the interpretable method of deep Taylor decomposition in selecting the most contributing features but also adopts the Lagrangian relaxation optimization of the logit output and Lp norm to make the perturbation more unnoticeable. We compare DI-AA with eight baseline attacks on four representative data -sets. Experimental results reveal that our approach can (1) attack nonrobust models with low perturbation, where the perturbation is closer to or lower than that of the state-of-the-art white-box AE attacks; (2) evade the detection of the adversarial-training robust models with the highest success rate; (3) be flexible in the degree of AE generation saturation. Additionally, the AE generated by DI-AA can reduce the accuracy of the robust black-box models by 16-31 % in the black-box manner.(c) 2022 Elsevier Inc. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available