LFM-2b: A Dataset of Enriched Music Listening > Events for Recommender Systems Research and Fairness Analysis

Markus Schedl, Stefan Brandl, Oleg Lesota, Emilia Parada-Cabaleiro, David Penz, Navid Rekabsaz

Research output: Chapter in Book/Report/Conference proceedingConference proceedingspeer-review

Abstract

We present the LFM-2b dataset containing the listening records of over 120,000 users of the music platform Last.fm. These users provide a total of more than two billion individual listening events that span a time range of over 15 years, from February 2005 until March 2020. These listening events refer to a total of 50 million distinct tracks of 5 million distinct artists. Beside the common metadata (i. e., artist and track name), LFM-2b contains additional information both regarding the users and items. This includes the demographic information of users, namely country, gender, and age, and the fine-grained genre and style of items together with the vector embeddings of their lyrics. LFM-2b is a rich dataset that enables research on a variety of recommender system algorithms, such as the ones based on collaborative filtering (e.g., leveraging the user–item interactions in the form of listening events), but also content-based approaches (e.g., exploiting genres and lyrics), or hybrid combinations thereof. Users’ demographic information furthermore enable experimentation on identifying and mitigating various data and algorithmic biases of recommender systems, and investigating fairness aspects of such systems, e.g., according to gender.
Original languageEnglish
Title of host publicationProceedings of the 7th ACM SIGIR Conference on Human Information > Interaction and Retrieval (CHIIR 2022)
Number of pages5
Publication statusPublished - Mar 2022

Fields of science

  • 202002 Audiovisual media
  • 102 Computer Sciences
  • 102001 Artificial intelligence
  • 102003 Image processing
  • 102015 Information systems

JKU Focus areas

  • Digital Transformation

Cite this