Abstract
Given the rise of deep learning and its inherent black-box nature, the desire to interpret these systems and explain their
behaviour became increasingly more prominent. The main idea of so-called explainers is to identify which features of
particular samples have the most influence on a classifier’s prediction, and present them as explanations. Evaluating
explainers, however, is difficult, due to reasons such as a lack of ground truth. In this work, we construct adversarial
examples to check the plausibility of explanations, perturbing input deliberately to change a classifier’s prediction. This
allows us to investigate whether explainers are able to detect these perturbed regions as the parts of an input that strongly
influence a particular classification. Our results from the audio and image domain suggest that the investigated explainers
often fail to identify the input regions most relevant for a prediction; hence, it remains questionable whether explanations
are useful or potentially misleading
| Original language | English |
|---|---|
| Pages (from-to) | 10011-10029 |
| Number of pages | 19 |
| Journal | Neural Computing and Applications |
| Volume | 35 |
| Issue number | 14 |
| DOIs | |
| Publication status | Published - May 2023 |
Fields of science
- 202002 Audiovisual media
- 102 Computer Sciences
- 102001 Artificial intelligence
- 102003 Image processing
- 102015 Information systems
JKU Focus areas
- Digital Transformation
Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver