Attention as a Perspective for Learning Tempo-invariant Audio Queries

Matthias Dorfer, Jan Hajic, Gerhard Widmer

Research output: Chapter in Book/Report/Conference proceedingConference proceedingspeer-review

Abstract

Current models for audio–sheet music retrieval via multimodal embedding space learning use convolutional neural networks with a fixed-size window for the input audio. Depending on the tempo of a query performance, this window captures more or less musical content, while notehead density in the score is largely tempo-independent. In this work we address this disparity with a soft attention mechanism, which allows the model to encode only those parts of an audio excerpt that are most relevant with respect to efficient query codes. Empirical results on classical piano music indicate that attention is beneficial for retrieval performance, and exhibits intuitively appealing behavior.
Original languageEnglish
Title of host publicationICML 2018 Joint Workshop on Machine Learning for Music, 2018
Number of pages3
Publication statusPublished - Jul 2018

Fields of science

  • 202002 Audiovisual media
  • 102 Computer Sciences
  • 102001 Artificial intelligence
  • 102003 Image processing
  • 102015 Information systems

JKU Focus areas

  • Computation in Informatics and Mathematics
  • Engineering and Natural Sciences (in general)

Cite this