Evaluating adversarial robustness is a challenging problem. Many defenses have been shown to provide a false sense of security by unintentionally obfuscating gradients, hindering the optimization process of gradient-based attacks. Such defenses have been subsequently shown to fail against adaptive attacks crafted to circumvent gradient obfuscation. In this work, we present Slope, a metric that detects obfuscated gradients by comparing the expected and the actual increase of the attack loss after one iteration. We show that our metric can detect the presence of obfuscated gradients in many documented cases, providing a useful debugging tool towards improving adversarial robustness evaluations.

Slope: A First-order Approach for Measuring Gradient Obfuscation

Demetrio, L.;Biggio, B.;Roli, F.
2021-01-01

Abstract

Evaluating adversarial robustness is a challenging problem. Many defenses have been shown to provide a false sense of security by unintentionally obfuscating gradients, hindering the optimization process of gradient-based attacks. Such defenses have been subsequently shown to fail against adaptive attacks crafted to circumvent gradient obfuscation. In this work, we present Slope, a metric that detects obfuscated gradients by comparing the expected and the actual increase of the attack loss after one iteration. We show that our metric can detect the presence of obfuscated gradients in many documented cases, providing a useful debugging tool towards improving adversarial robustness evaluations.
2021
9782875870827
File in questo prodotto:
File Dimensione Formato  
ES2021-99.pdf

accesso aperto

Tipologia: Documento in versione editoriale
Dimensione 2.04 MB
Formato Adobe PDF
2.04 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11567/1181555
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 3
  • ???jsp.display-item.citation.isi??? ND
social impact