Abstract
Deep learning has significantly advanced machine learning in key domains such as computer vision and natural language processing, enabled by its ability to transform raw input data into meaningful, abstract representations. This shift—–from hand- crafted feature extraction to deep representation learning—–has allowed models to scale more effectively with increasing amounts of training data.
During this transition, research was constrained by limited computational resources and therefore focused on parameter-efficient, task-specific solutions. However, the rapid growth in computational power propelled a new direction: leveraging consistent performance gains from model scaling and embracing more flexible architectures, such as the Transformer, which have unified representation learning approaches across domains.
Despite these advances, a persistent challenge remains: scaling laws demand an exponential increase in training data—–a trend unlikely to be matched by the availability of high-quality datasets. Furthermore, models with fewer inductive biases, while offering greater scaling potential, also require more data to learn robust representations. This imbalance underscores the urgent need for data- and label-efficient learning methods.
This dissertation addresses this challenge by investigating approaches to improve the efficiency of deep representation learning. Specifically, it explores training paradigms such as masked language modeling, contrastive learning, multimodal learning, and masked image modeling, with a focus on how these methods utilize carefully designed pretext tasks and inductive biases to leverage unlabeled data effectively.
Ultimately, this research aims to deepen our understanding of what constitutes effective representations and how to learn them robustly in data- and label-scarce environments–—relative to the growing capacity of modern models.
During this transition, research was constrained by limited computational resources and therefore focused on parameter-efficient, task-specific solutions. However, the rapid growth in computational power propelled a new direction: leveraging consistent performance gains from model scaling and embracing more flexible architectures, such as the Transformer, which have unified representation learning approaches across domains.
Despite these advances, a persistent challenge remains: scaling laws demand an exponential increase in training data—–a trend unlikely to be matched by the availability of high-quality datasets. Furthermore, models with fewer inductive biases, while offering greater scaling potential, also require more data to learn robust representations. This imbalance underscores the urgent need for data- and label-efficient learning methods.
This dissertation addresses this challenge by investigating approaches to improve the efficiency of deep representation learning. Specifically, it explores training paradigms such as masked language modeling, contrastive learning, multimodal learning, and masked image modeling, with a focus on how these methods utilize carefully designed pretext tasks and inductive biases to leverage unlabeled data effectively.
Ultimately, this research aims to deepen our understanding of what constitutes effective representations and how to learn them robustly in data- and label-scarce environments–—relative to the growing capacity of modern models.
| Original language | English |
|---|---|
| Qualification | PhD |
| Awarding Institution |
|
| Supervisors/Reviewers |
|
| Publication status | Published - Jul 2025 |
Fields of science
- 101019 Stochastics
- 102003 Image processing
- 103029 Statistical physics
- 101018 Statistics
- 101017 Game theory
- 102001 Artificial intelligence
- 202017 Embedded systems
- 101016 Optimisation
- 101015 Operations research
- 101014 Numerical mathematics
- 101029 Mathematical statistics
- 101028 Mathematical modelling
- 101026 Time series analysis
- 101024 Probability theory
- 102032 Computational intelligence
- 102004 Bioinformatics
- 102013 Human-computer interaction
- 101027 Dynamical systems
- 305907 Medical statistics
- 101004 Biomathematics
- 305905 Medical informatics
- 101031 Approximation theory
- 102033 Data mining
- 102 Computer Sciences
- 305901 Computer-aided diagnosis and therapy
- 102019 Machine learning
- 106007 Biostatistics
- 102018 Artificial neural networks
- 106005 Bioinformatics
- 202037 Signal processing
- 202036 Sensor systems
- 202035 Robotics
JKU Focus areas
- Digital Transformation