Abstract
The recognition of boundaries, e.g., between chorus and verse, is an important task in music structure analysis. The goal is to automatically detect such boundaries in audio signals so that the results are close to human annotation. In this work, we apply Convolutional Neural Networks to the task, trained directly on mel-scaled magnitude spectrograms. On a representative subset of the SALAMI structural annotation dataset, our method outperforms current techniques in terms of boundary retrieval F -measure at different temporal tolerances: We advance the state-of-the-art from 0.33 to 0.46 for tolerances of ±0.5 seconds, and from 0.52 to 0.62 for tolerances of ±3 seconds. As the algorithm is trained on annotated audio data without the need of expert knowledge, we expect it to be easily adaptable to changed annotation guidelines and also to related tasks such as the detection of song transitions.
| Original language | German (Austria) |
|---|---|
| Title of host publication | Proceedings of the 15th International Society for Music Information Retrieval Conference, ISMIR 2014 |
| Pages | 417-422 |
| Publication status | Published - 2014 |
| Externally published | Yes |
Fields of science
- 102018 Artificial neural networks
- 202037 Signal processing