Creating a Good Teacher For Knowledge Distillation in Acoustic Scene Classification

Research output: Chapter in Book/Report/Conference proceedingConference proceedingspeer-review

Abstract

Knowledge Distillation (KD) is a widespread technique for compressing the knowledge of large models into more compact and efficient models. KD has proved to be highly effective in building well-performing low-complexity Acoustic Scene Classification (ASC) systems and was used in all the top-ranked submissions to this task of the annual DCASE challenge in the past three years. There is extensive research available on establishing the KD process, designing efficient student models, and forming well-performing teacher ensembles. However, less research has been conducted on investigating which teacher model attributes are beneficial for low-complexity students. In this work, we try to close this gap by studying the effects on the student's performance when using different teacher network architectures, varying the teacher model size, training them with different data augmentation methods, and applying different ensembling strategies. The results show that teacher model sizes, data augmentation methods, the ensembling strategy and the ensemble size are key factors for a well-performing student network.
Original languageEnglish
Title of host publicationProceedings of the 8th Detection and Classification of Acoustic Scenes and Events 2023 Workshop (DCASE2023)
Number of pages5
Publication statusPublished - Sept 2023

Fields of science

  • 202002 Audiovisual media
  • 102 Computer Sciences
  • 102001 Artificial intelligence
  • 102003 Image processing
  • 102015 Information systems

JKU Focus areas

  • Digital Transformation

Cite this