Differentiable dictionary search: Integrating linear mixing with deep non-linearmodelling for audio source separation

Research output: Chapter in Book/Report/Conference proceedingConference proceedingspeer-review

Abstract

This paper describes several improvements to a newmethod for signal decomposition that we recently formulatedunder the name of Differentiable Dictionary Search (DDS). Thefundamental idea of DDS is to exploit a class of powerful deepinvertible density estimators called normalizing flows, to modelthe dictionary in a linear decomposition method such as NMF,effectively creating a bijection between the space of dictionaryelements and the associated probability space, allowing adifferentiable search through the dictionary space, guided bythe estimated densities. As the initial formulation was a proofof concept with some practical limitations, we will presentseveral steps towards making it scalable, hoping to improve boththe computational complexity of the method and its signaldecomposition capabilities. As a testbed for experimentalevaluation, we choose the task of frame-level pianotranscription, where the signal is to be decomposed into sourceswhose activity is attributed to individual piano notes. Tohighlight the impact of improved non-linear modelling ofsources, we compare variants of our method to a linearovercomplete NMF baseline. Experimental results will show thateven in the absence of additional constraints, our modelsproduce increasingly sparse and precise decompositions,according to two pertinent evaluation measures.
Original languageEnglish
Title of host publicationProceedings of the 24thInternational Congress on Acoustics (ICA 2022)
Number of pages8
Publication statusPublished - Oct 2022

Fields of science

  • 202002 Audiovisual media
  • 102 Computer Sciences
  • 102001 Artificial intelligence
  • 102003 Image processing
  • 102015 Information systems

JKU Focus areas

  • Digital Transformation

Cite this