Abstract
In this paper, we propose a system that extracts the downbeat
times from a beat-synchronous audio feature stream
of a music piece. Two recurrent neural networks are used
as a front-end: the first one models rhythmic content on
multiple frequency bands, while the second one models the
harmonic content of the signal. The output activations are
then combined and fed into a dynamic Bayesian network
which acts as a rhythmical language model. We show on
seven commonly used datasets of Western music that the
system is able to achieve state-of-the-art results.
Original language | English |
---|---|
Title of host publication | Proceedings of the 17th International Society for Music Information Retrieval Conference (ISMIR). |
Number of pages | 7 |
Publication status | Published - Aug 2016 |
Fields of science
- 202002 Audiovisual media
- 102 Computer Sciences
- 102001 Artificial intelligence
- 102003 Image processing
- 102015 Information systems
JKU Focus areas
- Computation in Informatics and Mathematics
- Engineering and Natural Sciences (in general)