Boundary detection in music structure analysis using convolutional neural networks

Karen Ullrich, Jan Schlüter, Thomas Grill

Research output: Chapter in Book/Report/Conference proceedingConference proceedingspeer-review

Abstract

The recognition of boundaries, e.g., between chorus and verse, is an important task in music structure analysis. The goal is to automatically detect such boundaries in audio signals so that the results are close to human annotation. In this work, we apply Convolutional Neural Networks to the task, trained directly on mel-scaled magnitude spectrograms. On a representative subset of the SALAMI structural annotation dataset, our method outperforms current techniques in terms of boundary retrieval F -measure at different temporal tolerances: We advance the state-of-the-art from 0.33 to 0.46 for tolerances of ±0.5 seconds, and from 0.52 to 0.62 for tolerances of ±3 seconds. As the algorithm is trained on annotated audio data without the need of expert knowledge, we expect it to be easily adaptable to changed annotation guidelines and also to related tasks such as the detection of song transitions.
Original languageGerman (Austria)
Title of host publicationProceedings of the 15th International Society for Music Information Retrieval Conference, ISMIR 2014
Pages417-422
Publication statusPublished - 2014
Externally publishedYes

Fields of science

  • 102018 Artificial neural networks
  • 202037 Signal processing

Cite this