Abstract
The rapid advance of AI-based music generation tools presents new opportunities for the music industry but also poses significant challenges, necessitating reliable methods for detecting AI-generated content. Existing detectors, however, face key practical limitations: audio-based approaches struggle to generalize to unseen generators and are not robust to common audio perturbations, while lyrics-based methods depend on cleanly formatted lyrics that are unavailable in real-world settings. To address this gap, this thesis proposes and evaluates a novel, practically grounded approach that leverages lyrical content extracted directly from the audio signal. Our method first transcribes sung lyrics using a general-purpose automatic speech recognition (ASR) model, allowing established AI-generated text detection methods to be applied. To further improve performance, we introduce DE-detect, a multi-view late-fusion method that also incorporates audio-derived speech features capturing paralinguistic information. By focusing on lyrical and speech-related information rather than low-level audio artifacts, our method is designed for improved robustness and generalization. Experiments on a diverse dataset show that DE-detect achieves strong detection performance compared to text-only ones and, crucially, outperforms audio-based approaches, especially when tested against various audio perturbations and unseen music generators. This work thus presents an effective, robust, and practical solution for detecting AI-generated music.
| Original language | English |
|---|---|
| Supervisors/Reviewers |
|
| Publication status | Published - 2025 |
Fields of science
- 102 Computer Sciences
- 102003 Image processing
- 202002 Audiovisual media
- 102001 Artificial intelligence
- 102015 Information systems
- 101019 Stochastics
- 103029 Statistical physics
- 101018 Statistics
- 101017 Game theory
- 202017 Embedded systems
- 101016 Optimisation
- 101015 Operations research
- 101014 Numerical mathematics
- 101029 Mathematical statistics
- 101028 Mathematical modelling
- 101026 Time series analysis
- 101024 Probability theory
- 102032 Computational intelligence
- 102004 Bioinformatics
- 102013 Human-computer interaction
- 101027 Dynamical systems
- 305907 Medical statistics
- 101004 Biomathematics
- 305905 Medical informatics
- 101031 Approximation theory
- 102033 Data mining
- 305901 Computer-aided diagnosis and therapy
- 102019 Machine learning
- 106007 Biostatistics
- 102018 Artificial neural networks
- 106005 Bioinformatics
- 202037 Signal processing
- 202036 Sensor systems
- 202035 Robotics
JKU Focus areas
- Digital Transformation