Abstract
In this paper, we evaluate the usefulness of several monaural
blind source separation (BSS) algorithms in the context
of vocal detection (VD). BSS is the problem of recovering
several sources, given only a mixture. VD is the problem of
automatically identifying the parts in a mixed audio signal,
where at least one person is singing. We compare the results
of three different strategies for utilising the estimated
singing voice signals from four state-of-the-art source separation
algorithms. In order to assess the performance of
those strategies on an internal data set, we use two different
feature sets, each fed to two different classifiers. After
selecting the most promising approach, the results on two
publicly available data sets are presented. In an additional
experiment, we use the improved VD for a simple postprocessing
technique: For the final estimation of the source
signals, we decide to use either silence, or the mixed, or the
separated signals, according to the VD. The results of traditionally
used BSS evaluation methods suggest that this is
useful for both the estimated background signals, as well
as for the estimated vocals.
| Original language | English |
|---|---|
| Title of host publication | Proceedings of the 16th International Society for Music Information Retrieval Conference |
| Editors | Meinard Muller, Frans Wiering |
| Pages | 309-315 |
| Number of pages | 7 |
| ISBN (Electronic) | 9788460688532 |
| Publication status | Published - Oct 2015 |
Fields of science
- 202002 Audiovisual media
- 102 Computer Sciences
- 102001 Artificial intelligence
- 102003 Image processing
- 102015 Information systems
JKU Focus areas
- Computation in Informatics and Mathematics
- Engineering and Natural Sciences (in general)