How Significant is Statistically Significant? The Case of Audio Music Similarity and Retrieval.

J. Urbano, J.S Downie, B. McFee, Markus Schedl

Research output: Chapter in Book/Report/Conference proceedingConference proceedingspeer-review

Abstract

The principal goal of the annual Music Information Retrieval Evaluation eXchange (MIREX) experiments is to determine which systems perform well and which systems perform poorly on a range of MIR tasks. However, there has been no systematic analysis regarding how well these evaluation results translate into real-world user satisfaction. For most researchers, reaching statistical significance in the evaluation results is usually the most important goal, but in this paper we show that indicators of statistical significance (i.e., small p-value) are eventually of secondary importance. Researchers who want to predict the realworld implications of formal evaluations should properly report upon practical significance (i.e., large effect-size). Using data from the 18 systems submitted to the MIREX 2011 Audio Music Similarity and Retrieval task, we ran an experiment with 100 real-world users that allows us to explicitly map system performance onto user satisfaction. Based upon 2,200 judgments, the results show that absolute system performance needs to be quite large for users to be satisfied, and differences between systems have to be very large for users to actually prefer the supposedly better system. The results also suggest a practical upper bound of 80% on user satisfaction with the current definition of the task. Reflecting upon these findings, we make some recommendations for future evaluation experiments and the reporting and interpretation of results in peer-reviewing.
Original languageEnglish
Title of host publicationProceedings of the 13th International Society for Music Information Retrieval Conference (ISMIR 2012)
Number of pages6
Publication statusPublished - Oct 2012

Fields of science

  • 102 Computer Sciences
  • 102001 Artificial intelligence
  • 102003 Image processing

JKU Focus areas

  • Computation in Informatics and Mathematics
  • Engineering and Natural Sciences (in general)

Cite this