TY - GEN
T1 - Improving Clinical Predictions with Multi-Modal Pre-training in Retinal Imaging
AU - Sükei, Emese
AU - Rumetshofer, Elisabeth
AU - Schmidinger, Niklas
AU - Mayr, Andreas
AU - Schmidt-Erfurth, Ursula
AU - Klambauer, Günter
AU - Bogunović, Hrvoje
PY - 2024
Y1 - 2024
N2 - Self-supervised learning has emerged as a foundational approach for creating robust and adaptable artificial intelligence (AI) systems within medical imaging. Specifically, contrastive representation learning methods, trained on extensive multi-modal datasets, have showcased remarkable proficiency in generating highly adaptable representations suitable for a multitude of downstream tasks. In the field of ophthalmology, modern retinal imaging devices capture both 2D fundus images and 3D optical coherence tomography (OCT) scans. As a result, large multi-modal imaging datasets are readily available and allow us to explore uni-modal versus multi-modal contrastive pre-training. After pre-training on 153,306 scan pairs, we showcase the transferability and efficacy of these acquired representations via fine-tuning on multiple external datasets, explicitly focusing on several clinically pertinent prediction tasks derived from OCT data. Additionally, we illustrate how multi-modal pre-training enhances the exchange of information between OCT, a richer modality, and the more cost-effective fundus imaging, ultimately amplifying the predictive capacity of fundus-based models.
AB - Self-supervised learning has emerged as a foundational approach for creating robust and adaptable artificial intelligence (AI) systems within medical imaging. Specifically, contrastive representation learning methods, trained on extensive multi-modal datasets, have showcased remarkable proficiency in generating highly adaptable representations suitable for a multitude of downstream tasks. In the field of ophthalmology, modern retinal imaging devices capture both 2D fundus images and 3D optical coherence tomography (OCT) scans. As a result, large multi-modal imaging datasets are readily available and allow us to explore uni-modal versus multi-modal contrastive pre-training. After pre-training on 153,306 scan pairs, we showcase the transferability and efficacy of these acquired representations via fine-tuning on multiple external datasets, explicitly focusing on several clinically pertinent prediction tasks derived from OCT data. Additionally, we illustrate how multi-modal pre-training enhances the exchange of information between OCT, a richer modality, and the more cost-effective fundus imaging, ultimately amplifying the predictive capacity of fundus-based models.
UR - https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10635447
UR - https://www.scopus.com/pages/publications/85203401337
U2 - 10.1109/ISBI56570.2024.10635447
DO - 10.1109/ISBI56570.2024.10635447
M3 - Conference proceedings
SN - 979-8-3503-1333-8
T3 - Proceedings - International Symposium on Biomedical Imaging
BT - 2024 IEEE International Symposium on Biomedical Imaging (ISBI)
A2 - IEEE, null
ER -