Zero-Shot Conditional Molecule Generation with Latent Diffusion Models from Contrastive Pre-Trained Embeddings

Stefan Hangler

Research output: ThesisMaster's / Diploma thesis

Abstract

Accurately predicting the biological activity of molecules is crucial in drug discovery, as it helps identify compounds likely to bind to specific targets, thus accelerating development and reducing experimental costs. Although recent advances have enabled predictive models to estimate molecular activity from textual assay descriptions and molecular structures, existing approaches remain focused primarily on classification and regression tasks. There is a clear gap in generating novel, biologically active molecules tailored specifically to given bioassays.
To address this limitation, this thesis introduces a generative approach integrating contrastively learned molecular embeddings from CLAMP, a state-of-the-art activity prediction model, with a latent diffusion framework. The proposed diffusion model is trained exclusively on molecular embeddings without explicit assay-specific conditioning, enabling zero-shot inference guided solely by textual assay descriptions. By leveraging CLAMP’s powerful embeddings, our method efficiently generates chemically valid, diverse, and biologically relevant molecules tailored to novel assays.
Our experiments demonstrate that assay-guided generation significantly enhances the predicted bioactivity of generated molecules compared to unguided generation. Further, linear probing experiments validate the robustness and biological informativeness of the generated molecular embeddings. Our findings indicate a clear trade-off between chemical diversity and assay specificity, with guided generation improving biological relevance at the cost of broader chemical exploration. This work advances computational drug discovery by introducing a flexible and scalable framework for zero-shot, assay-informed molecule generation, paving the way for more targeted and data-efficient drug design.
Original languageEnglish
QualificationMaster
Awarding Institution
  • Johannes Kepler University Linz
Supervisors/Reviewers
  • Klambauer, Günter, Supervisor
  • Seidl, Philipp, Co-supervisor
Publication statusPublished - Apr 2025

Fields of science

  • 101019 Stochastics
  • 102003 Image processing
  • 103029 Statistical physics
  • 101018 Statistics
  • 101017 Game theory
  • 102001 Artificial intelligence
  • 202017 Embedded systems
  • 101016 Optimisation
  • 101015 Operations research
  • 101014 Numerical mathematics
  • 101029 Mathematical statistics
  • 101028 Mathematical modelling
  • 101026 Time series analysis
  • 101024 Probability theory
  • 102032 Computational intelligence
  • 102004 Bioinformatics
  • 102013 Human-computer interaction
  • 101027 Dynamical systems
  • 305907 Medical statistics
  • 101004 Biomathematics
  • 305905 Medical informatics
  • 101031 Approximation theory
  • 102033 Data mining
  • 102 Computer Sciences
  • 305901 Computer-aided diagnosis and therapy
  • 102019 Machine learning
  • 106007 Biostatistics
  • 102018 Artificial neural networks
  • 106005 Bioinformatics
  • 202037 Signal processing
  • 202036 Sensor systems
  • 202035 Robotics

JKU Focus areas

  • Digital Transformation

Cite this