Abstract
Detecting music notation symbols is the most immediate
unsolved subproblem in Optical Music Recognition for
musical manuscripts. We show that a U-Net architecture
for semantic segmentation combined with a trivial detector
already establishes a high baseline for this task, and
we propose tricks that further improve detection performance:
training against convex hulls of symbol masks,
and multichannel output models that enable feature sharing
for semantically related symbols. The latter is helpful
especially for clefs, which have severe impacts on the
overall OMR result. We then integrate the networks into an
OMR pipeline by applying a subsequent notation assembly
stage, establishing a new baseline result for pitch inference
in handwritten music at an f-score of 0.81. Given the automatically
inferred pitches we run retrieval experiments
on handwritten scores, providing first empirical evidence
that utilizing the powerful image processing models brings
content-based search in large musical manuscript archives
within reach.
| Original language | English |
|---|---|
| Title of host publication | Proceedings of 19th International Society for Music Information Retrieval Conference (ISMIR), |
| Editors | Emilia Gomez, Xiao Hu, Eric Humphrey, Emmanouil Benetos |
| Pages | 225-232 |
| Number of pages | 8 |
| ISBN (Electronic) | 9782954035123 |
| Publication status | Published - 2018 |
Fields of science
- 202002 Audiovisual media
- 102 Computer Sciences
- 102001 Artificial intelligence
- 102003 Image processing
- 102015 Information systems
JKU Focus areas
- Computation in Informatics and Mathematics
- Engineering and Natural Sciences (in general)