Zur Hauptnavigation wechseln Zur Suche wechseln Zum Hauptinhalt wechseln

Improving Audio Spectrogram Transformers ForSound Event Detection Through Multi-Stage Training

Publikation: Beitrag in Buch/Bericht/KonferenzbandKonferenzbeitragBegutachtung

Abstract

This technical report describes the CP-JKU team’s submission
for Task 4 Sound Event Detection with Heterogeneous Training
Datasets and Potentially Missing Labels of the DCASE 24 Chal
lenge. We fine-tune three large Audio Spectrogram Transformers,
PaSST, BEATs, and ATST, on the joint DESED and MAESTRO
datasets in a two-stage training procedure. The first stage closely
matches the baseline system setup and trains a CRNN model while
keeping the large pre-trained transformer model frozen. In the sec
ond stage, both CRNN and transformer are fine-tuned using heavily
weighted self-supervised losses. After the second stage, we com
pute strong pseudo-labels for all audio clips in the training set using
an ensemble of all three fine-tuned transformers. Then, in a sec
ond iteration, we repeat the two-stage training process and include a
distillation loss based on the pseudo-labels, boosting single-model
performance substantially. Additionally, we pre-train PaSST and
ATST on the subset of AudioSet that comes with strong temporal
labels, before fine-tuning them on the Task 4 datasets
OriginalspracheEnglisch
TitelTechnical report DCASE2024 Challenge, 2024
Seitenumfang5
PublikationsstatusVeröffentlicht - 2024

Publikationsreihe

NameDetection and Classification of Acoustic Scenes and Events

Wissenschaftszweige

  • 102003 Bildverarbeitung
  • 202002 Audiovisuelle Medien
  • 102001 Artificial Intelligence
  • 102015 Informationssysteme
  • 102 Informatik
  • 101019 Stochastik
  • 103029 Statistische Physik
  • 101018 Statistik
  • 101017 Spieltheorie
  • 202017 Embedded Systems
  • 101016 Optimierung
  • 101015 Operations Research
  • 101014 Numerische Mathematik
  • 101029 Mathematische Statistik
  • 101028 Mathematische Modellierung
  • 101026 Zeitreihenanalyse
  • 101024 Wahrscheinlichkeitstheorie
  • 102032 Computational Intelligence
  • 102004 Bioinformatik
  • 102013 Human-Computer Interaction
  • 101027 Dynamische Systeme
  • 305907 Medizinische Statistik
  • 101004 Biomathematik
  • 305905 Medizinische Informatik
  • 101031 Approximationstheorie
  • 102033 Data Mining
  • 305901 Computerunterstützte Diagnose und Therapie
  • 102019 Machine Learning
  • 106007 Biostatistik
  • 102018 Künstliche Neuronale Netze
  • 106005 Bioinformatik
  • 202037 Signalverarbeitung
  • 202036 Sensorik
  • 202035 Robotik

JKU-Schwerpunkte

  • Digital Transformation

Dieses zitieren