Exploring System Adaptations For Minimum Latency Real-Time Piano Transcription

Research output: Chapter in Book/Report/Conference proceedingConference proceedingspeer-review

Abstract

Advances in neural network design and the availability of large-scale labeled datasets have driven major improvements in piano transcription. Existing approaches target either offline applications, with no restrictions on computational demands, or online transcription, with delays of 128-320 ms. However, most real-time musical applications require latencies below 30 ms. In this work, we investigate whether and how the current state-of-the-art online transcription model can be adapted for real-time piano transcription. Specifically, we eliminate all non-causal processing, and reduce computational load through shared computations across core model components and variations in model size. Additionally, we explore different pre- and postprocessing strategies, and related label encoding schemes, and discuss their suitability for real-time transcription. Evaluating the adaptions on the MAESTRO dataset, we find a drop in transcription accuracy due to strictly causal processing as well as a tradeoff between the preprocessing latency and prediction accuracy. We release our system as a baseline to support researchers in designing models towards minimum latency real-time transcription.
Original languageEnglish
Title of host publicationProceedings of the 26th International Society for Music Information Retrieval Conference (ISMIR)
Subtitle of host publicationDaejeon, South Korea
Number of pages8
Edition1
Publication statusPublished - 2025
EventInternational Society for Music Information Retrieval Conference - KAIST, Daejeon, Korea, Republic of
Duration: 21 Sept 202525 Sept 2025
Conference number: 26
https://ismir2025.ismir.net/

Conference

ConferenceInternational Society for Music Information Retrieval Conference
Abbreviated titleISMIR
Country/TerritoryKorea, Republic of
CityDaejeon
Period21.09.202525.09.2025
Internet address

Fields of science

  • 102001 Artificial intelligence
  • 101026 Time series analysis
  • 102013 Human-computer interaction
  • 102019 Machine learning
  • 102018 Artificial neural networks
  • 202037 Signal processing

JKU Focus areas

  • Digital Transformation

Cite this