Frame-level Audio Similarity - A Codebook Approach.

Gerhard Widmer, Peter Knees, Klaus Seyerlehner

Research output: Chapter in Book/Report/Conference proceedingConference proceedingspeer-review

Abstract

Modeling audio signals by the long-term statistical distribution of their local spectral features - often denoted as bag of frames approach (BOF) - is a popular and powerful method to describe audio content. While modeling the distribution of local spectral features by semi-parametric distributions (e.g. Gaussian Mixture Models) has been studied intensively, we investigate a non-parametric variant based on vector quantization (VQ) in this paper. The essential advantage of the proposed VQ approach over stateof- the-art similarity measures is that the proposed audio similarity metric forms a normed vector space. This allows for more powerful search strategies, e.g. KD-Trees or Local Sensitive Hashing (LSH), making content-based audio similarity available for even larger music archives. Standard VQ approaches are known to be computationally very expensive; to counter this problem, we propose a multi-level clustering architecture. Additionally, we show that the multi-level vector quantization approach (ML-VQ), in contrast to standard VQ approaches, is comparable to state-ofthe- art frame-level similarity measures in terms of quality. Another important finding w.r.t. the ML-VQ approach is that, in contrast to GMM models of songs, our approach does not seem to suffer from the recently discovered hub problem.
Original languageEnglish
Title of host publicationProceedings of the 11th International Conference on Digital Audio Effects (DAFx 2008)
Pages349-356
Number of pages8
Publication statusPublished - 2008

Publication series

NameProceedings of the International Conference on Digital Audio Effects, DAFx
ISSN (Print)2413-6700
ISSN (Electronic)2413-6689

Fields of science

  • 102 Computer Sciences
  • 102001 Artificial intelligence
  • 102003 Image processing
  • 102015 Information systems
  • 202002 Audiovisual media

Cite this