Abstract
Epilepsy affects approximately 1% of the global population, with seizures that often occur without warning and vary widely between individuals. Reliable detection and early prediction remain critical challenges for improving patient safety, clinical response, and quality of life. Recent advances in deep learning - particularly the development of transformer-based foundation models - offer new opportunities to model complex biosignals and context information in a unified framework. However, existing approaches often focus on single modalities or task-specific architectures, limiting generalizability and clinical adaptability.
This thesis proposes a modular, multimodal model pretrained via self-supervised learning on a large clinical dataset of EEG and accompanying physiological signals, enriched with temporally aligned clinical markers and textual annotations. The model adopts a causal sequence transformer architecture designed to integrate EEG, ECG, SpO2, segment metadata, and contextual text into a shared tokenized input space. Training is guided by next-token prediction over a continuous embedding space, with auxiliary multitask objectives - such as time-to-seizure regression, segment classification, and marker reconstruction - serving to regularize and enrich the learned representations.
Experimental results demonstrate that the pretrained model captures generalizable and semantically rich features, achieving strong performance on both seizure detection and prediction tasks using linear probing and downstream evaluation. Input ablation confirms that contextual signals - especially time-past-seizure and clinical markers - considerably enhance performance. While not the primary focus, marker inference from biosignals proves feasible through similarity-based decoding in sentence embedding space, indicating representational depth and potential for generating annotations.
In conclusion, this work provides evidence that multimodal causal pretraining can yield transferable representations for seizure-related applications, supporting both interpretability and downstream adaptation. The approach bridges biosignals and clinical context, paving the way for generalizable seizure monitoring systems. Future research will need to extend this work to additional datasets, data acquisition and modality settings, and refine event-based evaluation strategies.
This thesis proposes a modular, multimodal model pretrained via self-supervised learning on a large clinical dataset of EEG and accompanying physiological signals, enriched with temporally aligned clinical markers and textual annotations. The model adopts a causal sequence transformer architecture designed to integrate EEG, ECG, SpO2, segment metadata, and contextual text into a shared tokenized input space. Training is guided by next-token prediction over a continuous embedding space, with auxiliary multitask objectives - such as time-to-seizure regression, segment classification, and marker reconstruction - serving to regularize and enrich the learned representations.
Experimental results demonstrate that the pretrained model captures generalizable and semantically rich features, achieving strong performance on both seizure detection and prediction tasks using linear probing and downstream evaluation. Input ablation confirms that contextual signals - especially time-past-seizure and clinical markers - considerably enhance performance. While not the primary focus, marker inference from biosignals proves feasible through similarity-based decoding in sentence embedding space, indicating representational depth and potential for generating annotations.
In conclusion, this work provides evidence that multimodal causal pretraining can yield transferable representations for seizure-related applications, supporting both interpretability and downstream adaptation. The approach bridges biosignals and clinical context, paving the way for generalizable seizure monitoring systems. Future research will need to extend this work to additional datasets, data acquisition and modality settings, and refine event-based evaluation strategies.
| Original language | English |
|---|---|
| Qualification | Master |
| Awarding Institution |
|
| Supervisors/Reviewers |
|
| Publication status | Published - Aug 2025 |
Fields of science
- 101019 Stochastics
- 102003 Image processing
- 103029 Statistical physics
- 101018 Statistics
- 101017 Game theory
- 102001 Artificial intelligence
- 202017 Embedded systems
- 101016 Optimisation
- 101015 Operations research
- 101014 Numerical mathematics
- 101029 Mathematical statistics
- 101028 Mathematical modelling
- 101026 Time series analysis
- 101024 Probability theory
- 102032 Computational intelligence
- 102004 Bioinformatics
- 102013 Human-computer interaction
- 101027 Dynamical systems
- 305907 Medical statistics
- 101004 Biomathematics
- 305905 Medical informatics
- 101031 Approximation theory
- 102033 Data mining
- 102 Computer Sciences
- 305901 Computer-aided diagnosis and therapy
- 102019 Machine learning
- 106007 Biostatistics
- 102018 Artificial neural networks
- 106005 Bioinformatics
- 202037 Signal processing
- 202036 Sensor systems
- 202035 Robotics
JKU Focus areas
- Digital Transformation
Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver