Abstract
Due to the exploding amount of available music in recent years, media collections
cannot be managed manually any more, which makes automatic audio analysis crucial
for content-based search, organisation, and processing of data.
This thesis focuses on the automatic extraction of a metrical grid, determined by
beats, downbeats, and time signature, from a music piece. I propose several algorithms
to tackle this problem, all comprising three stages: First, (low-level) features are extracted
from the audio signal. Second, an acoustic model transfers these features into
probabilities in the music domain. Third, a probabilistic sequence model finds the most
probable sequence of labels under the model assumptions.
This thesis provides contributions to the second and third stage. I (i) explore acoustic
models based on machine learning methods, and (ii) develop models and algorithms
for efficient probabilistic inference for both online and offline scenarios. Further, I
design applications such as an automatic drummer which listens to and accompanies a
musician in a live setting.
The most recent algorithms developed in this thesis exhibit state-of-the-art per-
formance and clearly demonstrate the superiority of systems incorporating machine
learning over hand-designed systems, which were prevalent at the time of starting this
thesis. All algorithms developed in this thesis are publicly available as open-source
software. I also publish beat and downbeat annotations for the Ballroom dataset to
foster further research in this area.
Original language | English |
---|---|
Publication status | Published - Dec 2016 |
Fields of science
- 202002 Audiovisual media
- 102 Computer Sciences
- 102001 Artificial intelligence
- 102003 Image processing
- 102015 Information systems
JKU Focus areas
- Computation in Informatics and Mathematics
- Engineering and Natural Sciences (in general)