Abstract
Advances in neural network design and the availability of large-scale labeled datasets have driven major improvements in piano transcription. Existing approaches target either offline applications, with no restrictions on computational demands, or online transcription, with delays of 128-320 ms. However, most real-time musical applications require latencies below 30 ms. In this work, we investigate whether and how the current state-of-the-art online transcription model can be adapted for real-time piano transcription. Specifically, we eliminate all non-causal processing, and reduce computational load through shared computations across core model components and variations in model size. Additionally, we explore different pre- and postprocessing strategies, and related label encoding schemes, and discuss their suitability for real-time transcription. Evaluating the adaptions on the MAESTRO dataset, we find a drop in transcription accuracy due to strictly causal processing as well as a tradeoff between the preprocessing latency and prediction accuracy. We release our system as a baseline to support researchers in designing models towards minimum latency real-time transcription.
| Original language | English |
|---|---|
| Title of host publication | Proceedings of the 26th International Society for Music Information Retrieval Conference (ISMIR) |
| Subtitle of host publication | Daejeon, South Korea |
| Number of pages | 8 |
| Edition | 1 |
| Publication status | Published - 2025 |
| Event | International Society for Music Information Retrieval Conference - KAIST, Daejeon, Korea, Republic of Duration: 21 Sept 2025 → 25 Sept 2025 Conference number: 26 https://ismir2025.ismir.net/ |
Conference
| Conference | International Society for Music Information Retrieval Conference |
|---|---|
| Abbreviated title | ISMIR |
| Country/Territory | Korea, Republic of |
| City | Daejeon |
| Period | 21.09.2025 → 25.09.2025 |
| Internet address |
Fields of science
- 102001 Artificial intelligence
- 101026 Time series analysis
- 102013 Human-computer interaction
- 102019 Machine learning
- 102018 Artificial neural networks
- 202037 Signal processing
JKU Focus areas
- Digital Transformation