Zur Hauptnavigation wechseln Zur Suche wechseln Zum Hauptinhalt wechseln

A Knowledge DistillationApproach to Improving Language-Based Audio Retrieval Models

Publikation: Beitrag in Buch/Bericht/KonferenzbandKonferenzbeitragBegutachtung

Abstract

This technical report describes the CP-JKU team’s submissions
to the language-based audio retrieval task of the 2024 DCASE Chal
lenge (Task 8). All our submitted systems are based on the dual
encoder architecture that projects recordings and textual descrip
tions into a shared audio-caption space in which related examples
from the two modalities are similar. We used pretrained audio and
text embedding models and trained them on audio-caption datasets
(WavCaps, AudioCaps, and ClothoV2) via contrastive learning. We
further fine-tuned the resulting models on ClothoV2 via knowl
edge distillation from a large ensemble of audio retrieval models.
Our best single system submission based on PaSST and RoBERTa
achieves a mAP@10 of 39.77 on the ClothoV2 test split, outper
forming last year’s best single system submission by around 1pp.
without using metadata and synthetic captions. An ensemble of
three distilled models achieves 41.91 mAP@10 on the ClothoV2
test split. A repository with our implementation is available on
GitHub1
OriginalspracheEnglisch
TitelTechnical report DCASE2024 Challenge, 2024
Seitenumfang5
PublikationsstatusVeröffentlicht - 2024

Publikationsreihe

NameDetection and Classification of Acoustic Scenes and Events

Wissenschaftszweige

  • 102003 Bildverarbeitung
  • 202002 Audiovisuelle Medien
  • 102001 Artificial Intelligence
  • 102015 Informationssysteme
  • 102 Informatik
  • 101019 Stochastik
  • 103029 Statistische Physik
  • 101018 Statistik
  • 101017 Spieltheorie
  • 202017 Embedded Systems
  • 101016 Optimierung
  • 101015 Operations Research
  • 101014 Numerische Mathematik
  • 101029 Mathematische Statistik
  • 101028 Mathematische Modellierung
  • 101026 Zeitreihenanalyse
  • 101024 Wahrscheinlichkeitstheorie
  • 102032 Computational Intelligence
  • 102004 Bioinformatik
  • 102013 Human-Computer Interaction
  • 101027 Dynamische Systeme
  • 305907 Medizinische Statistik
  • 101004 Biomathematik
  • 305905 Medizinische Informatik
  • 101031 Approximationstheorie
  • 102033 Data Mining
  • 305901 Computerunterstützte Diagnose und Therapie
  • 102019 Machine Learning
  • 106007 Biostatistik
  • 102018 Künstliche Neuronale Netze
  • 106005 Bioinformatik
  • 202037 Signalverarbeitung
  • 202036 Sensorik
  • 202035 Robotik

JKU-Schwerpunkte

  • Digital Transformation

Dieses zitieren