Abstract
Musical onset detection is one of the most elementary tasks in music analysis, but still only solved imperfectly for polyphonic music
signals. Interpreted as a computer vision problem in spectrograms,
Convolutional Neural Networks (CNNs) seem to be an ideal fit. On
a dataset of about 100 minutes of music with 26k annotated onsets,
we show that CNNs outperform the previous state-of-the-art while
requiring less manual preprocessing. Investigating their inner workings, we find two key advantages over hand-designed methods: Using separate detectors for percussive and harmonic onsets, and combining results from many minor variations of the same scheme. The
results suggest that even for well-understood signal processing tasks,
machine learning can be superior to knowledge engineering.
Original language | English |
---|---|
Title of host publication | Proceedings of the 39th IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) |
Number of pages | 6 |
Publication status | Published - May 2014 |
Fields of science
- 202002 Audiovisual media
- 102 Computer Sciences
- 102001 Artificial intelligence
- 102003 Image processing
- 102015 Information systems
JKU Focus areas
- Computation in Informatics and Mathematics
- Engineering and Natural Sciences (in general)