Abstract
This thesis is about the automatic detection and classification of sound
events (e.g. notes or percussive sounds) in musical audio. It deals with
four different sub-aspects, namely (i) the detection of the timing of
these events (onset detection), (ii) their position inside the metrical
grid (beat and downbeat tracking), (iii) the estimation of the dominant
periodicity (tempo estimation), as well as (iv) identifying the frequency
of the played notes (note onset transcription).
Historically, beat tracking, tempo estimation, and note transcription
systems were built upon onset detection algorithms. Most of them
incorporated hand-crafted features, designed specifically for the given
task, certain sounds or music styles. Unlike previous approaches, we
avoid hand-crafted features almost entirely, but rather learn them
directly from audio. We present several algorithms addressing the
before mentioned tasks to detect and classify the sound events. All
proposed methods perform state-of-the-art in their respective field over
a wide range of sounds and music styles, and show the superiority
of learned features both in regard to overall performance as well as
generalisation capabilities.
Reference implementations of the algorithms developed in this
thesis are released as an open-source audio processing and music
information retrieval (MIR) library written in Python. Additionally,
we make the data used to develop and train the algorithms publicly
available, stimulating further research and development in this area.
Original language | English |
---|---|
Publication status | Published - Dec 2016 |
Fields of science
- 202002 Audiovisual media
- 102 Computer Sciences
- 102001 Artificial intelligence
- 102003 Image processing
- 102015 Information systems
JKU Focus areas
- Computation in Informatics and Mathematics
- Engineering and Natural Sciences (in general)