Zur Hauptnavigation wechseln Zur Suche wechseln Zum Hauptinhalt wechseln

Receptive-Field Regularized CNNs for Music Classification and Tagging

Publikation: Preprints, Working Paper und ForschungsberichteVorabpublikation

Abstract

Convolutional Neural Networks (CNNs) have beensuccessfully used in various Music Information Retrieval (MIR)tasks, both as end-to-end models and as feature extractorsfor more complex systems. However, the MIR field is stilldominated by the classical VGG-based CNN architecture vari-ants, often in combination with more complex modules suchas attention, and/or techniques such as pre-training on largedatasets. Deeper models such as ResNet – which surpassed VGGby a large margin in other domains – are rarely used in MIR.One of the main reasons for this, as we will show, is the lackof generalization of deeper CNNs in the music domain.In this paper, we present a principled way to make deeparchitectures like ResNet competitive for music-related tasks,based on well-designed regularization strategies. In particular,we analyze the recently introducedReceptive-Field Regulariza-tionandShake-Shake, and show that they significantly improvethe generalization of deep CNNs on music-related tasks, andthat the resulting deep CNNs can outperform current morecomplex models such as CNNs augmented with pre-training andattention. We demonstrate this on two different MIR tasks andtwo corresponding datasets, thus offering our deep regularizedCNNs as a new baseline for these datasets, which can also beused as a feature-extracting module in future, more complexapproaches.
OriginalspracheEnglisch
Seitenumfang8
DOIs
PublikationsstatusVeröffentlicht - 2020

Publikationsreihe

NamearXiv.org
ISSN (Druck)2331-8422

Wissenschaftszweige

  • 202002 Audiovisuelle Medien
  • 102 Informatik
  • 102001 Artificial Intelligence
  • 102003 Bildverarbeitung
  • 102015 Informationssysteme

JKU-Schwerpunkte

  • Digital Transformation

Dieses zitieren