Abstract
The recognition of boundaries, e.g., between chorus and verse, is an important task in music structure analysis. The goal is to automatically detect such boundaries in audio signals so that the results are close to human annotation. In this work, we apply Convolutional Neural Networks to the task, trained directly on mel-scaled magnitude spectrograms. On a representative subset of the SALAMI structural annotation dataset, our method outperforms current techniques in terms of boundary retrieval F -measure at different temporal tolerances: We advance the state-of-the-art from 0.33 to 0.46 for tolerances of ±0.5 seconds, and from 0.52 to 0.62 for tolerances of ±3 seconds. As the algorithm is trained on annotated audio data without the need of expert knowledge, we expect it to be easily adaptable to changed annotation guidelines and also to related tasks such as the detection of song transitions.
| Originalsprache | Deutsch (Österreich) |
|---|---|
| Titel | Proceedings of the 15th International Society for Music Information Retrieval Conference, ISMIR 2014 |
| Seiten | 417-422 |
| Publikationsstatus | Veröffentlicht - 2014 |
| Extern publiziert | Ja |
Wissenschaftszweige
- 102018 Künstliche Neuronale Netze
- 202037 Signalverarbeitung
Dieses zitieren
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver