Abstract
We present the LFM-2b dataset containing the listening records of over 120,000 users of the music platform Last.fm. These users
provide a total of more than two billion individual listening events that span a time range of over 15 years, from February 2005 until
March 2020. These listening events refer to a total of 50 million distinct tracks of 5 million distinct artists. Beside the common
metadata (i. e., artist and track name), LFM-2b contains additional information both regarding the users and items. This includes the
demographic information of users, namely country, gender, and age, and the fine-grained genre and style of items together with the
vector embeddings of their lyrics. LFM-2b is a rich dataset that enables research on a variety of
recommender system algorithms, such as the ones based on collaborative filtering (e.g., leveraging the user–item interactions in the
form of listening events), but also content-based approaches (e.g., exploiting genres and lyrics), or hybrid combinations thereof. Users’
demographic information furthermore enable experimentation on identifying and mitigating various data and algorithmic biases of
recommender systems, and investigating fairness aspects of such systems, e.g., according to gender.
Original language | English |
---|---|
Title of host publication | Proceedings of the 7th ACM SIGIR Conference on Human Information > Interaction and Retrieval (CHIIR 2022) |
Number of pages | 5 |
Publication status | Published - Mar 2022 |
Fields of science
- 202002 Audiovisual media
- 102 Computer Sciences
- 102001 Artificial intelligence
- 102003 Image processing
- 102015 Information systems
JKU Focus areas
- Digital Transformation