Abstract
Performance brings music to life. It imbues an abstract notated score with expressiveness and character. Listeners perceive music through performance first and foremost, it is what draws us in and what occupies professional performers. Many questions in music computing focalize the performance as primary object of interest: how can machines keep track of and entrain to the passing of musical time? how can they match a performance to its score? or how can they discern and characterize performances? In this thesis, we address these questions for MIDI performances, fine-grained recordings that encode individual notes and their attributes as played by expert pianists.
As a prerequisite for most quantitative performance analyses, the performance needs to first be connected to its musical score — to differentiate what is being played from how it is being played. In practice, this means one has to align the performance to the score, the technical solution of which is the focus of the first part of this thesis. We present both classical as well as machine learning-based models for offline symbolic music alignment. The three models presented surpass the technical state of the art in offline alignment, each with strengths in specific special cases. One of those cases is alignment in the presence of anchor points, which we use to produce the largest note-aligned performance and score dataset. We further present two symbolic tracking methods, again both a method classical as well as one based on machine learning. They present a new benchmark for symbolic music tracking and were successfully combined with real-time transcription for audio tracking.
The second part considers what can, could, and cannot be derived from such performances once connected to scores in three analyses. First, we connect audio and performance features to listener judgment of the character of performances. Second, we assess and compare computational and expert semantic similarity ratings of terms used to describe piano performances. Lastly, we investigate listeners’ capacity to discern slight changes in expressive features in otherwise real recordings. Our emphasis here lies on ecologically valid concert and studio recording data. Specifically, we design listening tests of modified human performances instead of artificial stimuli, and analyze free text annotations and term associations instead of predefined categorical ratings.
As a prerequisite for most quantitative performance analyses, the performance needs to first be connected to its musical score — to differentiate what is being played from how it is being played. In practice, this means one has to align the performance to the score, the technical solution of which is the focus of the first part of this thesis. We present both classical as well as machine learning-based models for offline symbolic music alignment. The three models presented surpass the technical state of the art in offline alignment, each with strengths in specific special cases. One of those cases is alignment in the presence of anchor points, which we use to produce the largest note-aligned performance and score dataset. We further present two symbolic tracking methods, again both a method classical as well as one based on machine learning. They present a new benchmark for symbolic music tracking and were successfully combined with real-time transcription for audio tracking.
The second part considers what can, could, and cannot be derived from such performances once connected to scores in three analyses. First, we connect audio and performance features to listener judgment of the character of performances. Second, we assess and compare computational and expert semantic similarity ratings of terms used to describe piano performances. Lastly, we investigate listeners’ capacity to discern slight changes in expressive features in otherwise real recordings. Our emphasis here lies on ecologically valid concert and studio recording data. Specifically, we design listening tests of modified human performances instead of artificial stimuli, and analyze free text annotations and term associations instead of predefined categorical ratings.
| Original language | English |
|---|---|
| Supervisors/Reviewers |
|
| Publication status | Published - 2025 |
Fields of science
- 102003 Image processing
- 202002 Audiovisual media
- 102001 Artificial intelligence
- 102015 Information systems
- 102 Computer Sciences
JKU Focus areas
- Digital Transformation