Abstract
We propose a system for tracking beats and downbeats with two objectives: generality across a diverse music range, and high accuracy. We achieve generality by training on multiple datasets -- including solo instrument recordings, pieces with time signature changes, and classical music with high tempo variations -- and by removing the commonly used Dynamic Bayesian Network (DBN) postprocessing, which introduces constraints on the meter and tempo. For high accuracy, among other improvements, we develop a loss function tolerant to small time shifts of annotations, and an architecture alternating convolutions with transformers either over frequency or time. Our system surpasses the current state of the art in F1 score despite using no DBN. However, it can still fail, especially for difficult and underrepresented genres, and performs worse on continuity metrics, so we publish our model, code, and preprocessed datasets, and invite others to beat this.
| Original language | English |
|---|---|
| Title of host publication | International Society for Music Information Retrieval Conference (ISMIR) |
| Number of pages | 6 |
| Publication status | Published - 2024 |
Fields of science
- 202002 Audiovisual media
- 102 Computer Sciences
- 102001 Artificial intelligence
- 101016 Optimisation
- 101026 Time series analysis
- 102032 Computational intelligence
- 102019 Machine learning
- 106007 Biostatistics
- 102018 Artificial neural networks
- 202037 Signal processing
JKU Focus areas
- Digital Transformation