Abstract
Detecting speech and music is an elementary step in extracting information
from radio broadcasts. Existing solutions either rely on
general-purpose audio features, or build on features specifically
engineered for the task. Interpreting spectrograms as images, we
can apply unsupervised feature learning methods from computer
vision instead. In this work, we show that features learned by a
mean-covariance Restricted Boltzmann Machine partly resemble
engineered features, but outperform three hand-crafted feature sets
in speech and music detection on a large corpus of radio recordings.
Our results demonstrate that unsupervised learning is a powerful
alternative to knowledge engineering.
| Originalsprache | Englisch |
|---|---|
| Titel | Proceedings of the 15th Int. Conference on Digital Audio Effects (DAFx-12), |
| Seitenumfang | 8 |
| Publikationsstatus | Veröffentlicht - Sep. 2012 |
Wissenschaftszweige
- 102 Informatik
- 102001 Artificial Intelligence
- 102003 Bildverarbeitung
JKU-Schwerpunkte
- Computation in Informatics and Mathematics
- TNF Allgemein
Dieses zitieren
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver