Effective Pre-Training of Audio Transformers for Sound Event Detection

Research output: Chapter in Book/Report/Conference proceedingConference proceedingspeer-review

Abstract

We propose a pre-training pipeline for audio spectrogram transformers for frame-level sound event detection tasks. On top of common pre-training steps, we add a meticulously designed training routine on AudioSet frame-level annotations. This includes a balanced sampler, aggressive data augmentation, and ensemble knowledge distillation. For five transformers, we obtain a substantial performance improvement over previously available checkpoints both on AudioSet frame-level predictions and on frame-level sound event detection downstream tasks, confirming our pipeline's effectiveness. We publish the resulting checkpoints that researchers can directly fine-tune to build high-performance models for sound event detection tasks.
Original languageEnglish
Title of host publicationProceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2025)
Number of pages5
Publication statusAccepted/In press - 2025

Fields of science

  • 102003 Image processing
  • 202002 Audiovisual media
  • 102001 Artificial intelligence
  • 102015 Information systems
  • 102 Computer Sciences
  • 101019 Stochastics
  • 103029 Statistical physics
  • 101018 Statistics
  • 101017 Game theory
  • 202017 Embedded systems
  • 101016 Optimisation
  • 101015 Operations research
  • 101014 Numerical mathematics
  • 101029 Mathematical statistics
  • 101028 Mathematical modelling
  • 101026 Time series analysis
  • 101024 Probability theory
  • 102032 Computational intelligence
  • 102004 Bioinformatics
  • 102013 Human-computer interaction
  • 101027 Dynamical systems
  • 305907 Medical statistics
  • 101004 Biomathematics
  • 305905 Medical informatics
  • 101031 Approximation theory
  • 102033 Data mining
  • 305901 Computer-aided diagnosis and therapy
  • 102019 Machine learning
  • 106007 Biostatistics
  • 102018 Artificial neural networks
  • 106005 Bioinformatics
  • 202037 Signal processing
  • 202036 Sensor systems
  • 202035 Robotics

JKU Focus areas

  • Digital Transformation

Cite this