Abstract
In this technical report, we describe the CP-JKU team’s submission for Task 1 Low-Complexity Acoustic Scene Classification of
the DCASE 22 challenge. We use Knowledge Distillation to teach low-complexity CNN student models from Patchout Spectrogram Transformer (PaSST) models. We use the pre-trained PaSST
models on Audioset and fine-tune them on the TAU Urban Acoustic Scenes 2022 Mobile development dataset. We experiment with
using an ensemble of teachers, different receptive fields of the student models, and mixing frequency-wise statistics of spectrograms
to enhance generalization to unseen devices. Finally, the student models are quantized in order to perform inference computations
using 8 bit integers, simulating the low-complexity constraints of edge devices.
Original language | English |
---|---|
Title of host publication | in Detection and Classification of Acoustic Scenes and Events (DCASE2022 Challenge), Technical Report |
Number of pages | 5 |
Publication status | Published - 2022 |
Fields of science
- 202002 Audiovisual media
- 102 Computer Sciences
- 102001 Artificial intelligence
- 102003 Image processing
- 102015 Information systems
JKU Focus areas
- Digital Transformation