Abstract
We measure the effect of small amounts of systematic and
random label noise caused by slightly misaligned ground truth
labels in a fine grained audio signal labeling task. The task
we choose to demonstrate these effects on is also known as
framewise polyphonic transcription or note quantized multif0
estimation, and transforms a monaural audio signal into a
sequence of note indicator labels. It will be shown that even
slight misalignments have clearly apparent effects, demonstrating
a great sensitivity of convolutional neural networks
to label noise. The implications are clear: when using convolutional
neural networks for fine grained audio signal labeling
tasks, great care has to be taken to ensure that the annotations
have precise timing, and are free from systematic or
random error as much as possible - even small misalignments
will have a noticeable impact.
Original language | English |
---|---|
Title of host publication | Proceedings of the 2018 IEEE International Conference on Acoustics, |
Number of pages | 5 |
Publication status | Published - Apr 2018 |
Fields of science
- 202002 Audiovisual media
- 102 Computer Sciences
- 102001 Artificial intelligence
- 102003 Image processing
- 102015 Information systems
JKU Focus areas
- Computation in Informatics and Mathematics
- Engineering and Natural Sciences (in general)