Nonlinear Denoising, Linear Demixing

Rainer Kelz, Gerhard Widmer

Research output: Chapter in Book/Report/Conference proceedingConference proceedingspeer-review

Abstract

We cast the combinatorial problem of polyphonic piano transcription as a two stage process. A nonlinear denoising stage maps spectrogram representations of arbitrary piano music with unknown timbral characteristics onto a canonical spectrogram representation with known timbral characteristics. A subsequent linear demixing stage aims to exploit the knowledge about the canonical timbral characteristics. The idea behind this two stage process is to try to elegantly sidestep any musical bias inherent in the training dataset that is easily picked up by a single stage, nonlinear (neural) transcription system (with large capacity). The two stage process tries not to force the nonlinear system to solve a combinatorial problem, which is more amenable to being solved by a linear decomposition method that has the superposition property. Using the simplest setup we could think of, we obtain (rather mixed (pun intended)) results on a standard polyphonic piano transcription dataset — the two stage process still suffers from generalization problems after the first stage, which the second stage is unable to compensate.
Original languageEnglish
Title of host publicationICBINB@NeurIPS 2021 Workshop
Number of pages10
Publication statusPublished - Sept 2021

Fields of science

  • 202002 Audiovisual media
  • 102 Computer Sciences
  • 102001 Artificial intelligence
  • 102003 Image processing
  • 102015 Information systems

JKU Focus areas

  • Digital Transformation

Cite this