Zur Hauptnavigation wechseln Zur Suche wechseln Zum Hauptinhalt wechseln

Music4All A+A: A Multimodal Dataset for Music Information Retrieval Tasks

Publikation: Preprints, Working Paper und ForschungsberichteVorabpublikation

Abstract

Music is characterized by aspects related to different modalities, such as the audio signal, the lyrics, or the music video clips. This has motivated the development of multimodal datasets and methods for Music Information Retrieval (MIR) tasks such as genre classification or autotagging. Music can be described at different levels of granularity, for instance defining genres at the level of artists or music albums. However, most datasets for multimodal MIR neglect this aspect and provide data at the level of individual music tracks. We aim to fill this gap by providing Music4All Artist and Album (Music4All A+A), a dataset for multimodal MIR tasks based on music artists and albums. Music4All A+A is built on top of the Music4All-Onion dataset, an existing track-level dataset for MIR tasks. Music4All A+A provides metadata, genre labels, image representations, and textual descriptors for 6,741 artists and 19,511 albums. Furthermore, since Music4All A+A is built on top of Music4All-Onion, it allows access to other multimodal data at the track level, including user--item interaction data. This renders Music4All A+A suitable for a broad range of MIR tasks, including multimodal music recommendation, at several levels of granularity. To showcase the use of Music4All A+A, we carry out experiments on multimodal genre classification of artists and albums, including an analysis in missing-modality scenarios, and a quantitative comparison with genre classification in the movie domain. Our experiments show that images are more informative for classifying the genres of artists and albums, and that several multimodal models for genre classification struggle in generalizing across domains. We provide the code to reproduce our experiments at https://github.com/hcai-mms/Music4All-A-A, the dataset is linked in the repository and provided open-source under a CC BY-NC-SA 4.0 license.
OriginalspracheEnglisch
Seitenumfang7
DOIs
PublikationsstatusVeröffentlicht - 18 Sep. 2025

Publikationsreihe

NamearXiv.org
Nr.2509.14891

Wissenschaftszweige

  • 102003 Bildverarbeitung
  • 101019 Stochastik
  • 103029 Statistische Physik
  • 101018 Statistik
  • 102001 Artificial Intelligence
  • 101017 Spieltheorie
  • 202017 Embedded Systems
  • 101016 Optimierung
  • 101015 Operations Research
  • 101014 Numerische Mathematik
  • 101029 Mathematische Statistik
  • 101028 Mathematische Modellierung
  • 101026 Zeitreihenanalyse
  • 101024 Wahrscheinlichkeitstheorie
  • 102032 Computational Intelligence
  • 102004 Bioinformatik
  • 102013 Human-Computer Interaction
  • 101027 Dynamische Systeme
  • 305907 Medizinische Statistik
  • 202002 Audiovisuelle Medien
  • 101004 Biomathematik
  • 305905 Medizinische Informatik
  • 101031 Approximationstheorie
  • 102033 Data Mining
  • 102 Informatik
  • 305901 Computerunterstützte Diagnose und Therapie
  • 102019 Machine Learning
  • 106007 Biostatistik
  • 102018 Künstliche Neuronale Netze
  • 106005 Bioinformatik
  • 202037 Signalverarbeitung
  • 102015 Informationssysteme
  • 202036 Sensorik
  • 202035 Robotik

JKU-Schwerpunkte

  • Digital Transformation

Dieses zitieren