What Do Deep Networks Like to Hear?

  • Lukas Gandler

Research output: ThesisMaster's / Diploma thesis

Abstract

Advances in explainable artificial intelligence techniques shine light on how the inner workings of neural networks function. The gained insights also help to further architectural designs and mitigate flaws or vulnerabilities. Past works employed autoencoders to investigate the preferences of image and sentence classification networks at their input level. In this thesis, the idea is extended to environmental sound classification. To this end, audio waveforms are passed through a 1D convolutional autoencoder and the resulting reconstructions are passed to the classifier networks to make predictions. The prediction error is then backpropagated to only fine-tune the weights of the autoencoder. The weights of the classification network stay fixed.
For this thesis, three architectures are considered: The normal MobileNet-V3, a changed variant of the MobileNet-V3 architecture that introduces dynamic attention layers called Dynamic MobileNet and the vision transformer PaSST. These models were selected due to their strong performance in environmental sound classification and architectural differences. All experiments are conducted on the ESC-50 dataset which consists of five second environmental sound audio samples categorized into 50 classes.
Investigations of the learned reconstruction transformations show differences in how the three classification architectures perceive their inputs. Especially Dynamic MobileNet differs the most from the other two. Ablation studies are conducted to investigate which alterations to the MobileNet architecture caused these changes in behavior. The results also show indications that lower frequencies hold the most information for environmental sound classification.
Original languageEnglish
Supervisors/Reviewers
  • Widmer, Gerhard, Supervisor
Publication statusPublished - 2025

Fields of science

  • 102 Computer Sciences
  • 102003 Image processing
  • 202002 Audiovisual media
  • 102001 Artificial intelligence
  • 102015 Information systems
  • 101019 Stochastics
  • 103029 Statistical physics
  • 101018 Statistics
  • 101017 Game theory
  • 202017 Embedded systems
  • 101016 Optimisation
  • 101015 Operations research
  • 101014 Numerical mathematics
  • 101029 Mathematical statistics
  • 101028 Mathematical modelling
  • 101026 Time series analysis
  • 101024 Probability theory
  • 102032 Computational intelligence
  • 102004 Bioinformatics
  • 102013 Human-computer interaction
  • 101027 Dynamical systems
  • 305907 Medical statistics
  • 101004 Biomathematics
  • 305905 Medical informatics
  • 101031 Approximation theory
  • 102033 Data mining
  • 305901 Computer-aided diagnosis and therapy
  • 102019 Machine learning
  • 106007 Biostatistics
  • 102018 Artificial neural networks
  • 106005 Bioinformatics
  • 202037 Signal processing
  • 202036 Sensor systems
  • 202035 Robotics

JKU Focus areas

  • Digital Transformation

Cite this