Improving Natural-Language-based AudioRetrieval with Transfer Learning and Audio & Text Augmentations

Research output: Chapter in Book/Report/Conference proceedingConference proceedingspeer-review

Abstract

The absence of large labeled datasets remains asignificant challenge in manyapplication areas of deep learning. Researchers and practitionerstypicallyresort to transfer learning and data augmentation to alleviatethis issue. Westudy these strategies in the context of audio retrieval withnatural languagequeries (Task 6b of the DCASE 2022 Challenge). Our proposed systemusespre-trained embedding models to project recordings and textualdescriptionsinto a shared audio-caption space in which related examples fromdifferentmodalities are close. We employ various data augmentationtechniques on audioand text inputs and systematically tune their correspondinghyperparameterswith sequential model-based optimization. Our results show thatthe usedaugmentations strategies reduce overfitting and improve retrievalperformance.
Original languageEnglish
Title of host publicationProceedings of the Detection and Classificationof Acoustic Scenes and Events 2022 Workshop (DCASE2022)
Number of pages5
Publication statusPublished - 2022

Fields of science

  • 202002 Audiovisual media
  • 102 Computer Sciences
  • 102001 Artificial intelligence
  • 102003 Image processing
  • 102015 Information systems

JKU Focus areas

  • Digital Transformation

Cite this