Context-enriched molecule representations improve few-shot drug discovery

Johannes Schimunek, Philipp Seidl, Lukas Friedrich, Daniel Kuhn, Friedrich Rippmann, Sepp Hochreiter, Günter Klambauer

Research output: Working paper and reportsPreprint

Abstract

A central task in computational drug discovery is to construct models from known active molecules to find further promising molecules for subsequent screening. However, typically only very few active molecules are known. Therefore, few-shot learning methods have the potential to improve the effectiveness of this critical phase of the drug discovery process. We introduce a new method for few-shot drug discovery. Its main idea is to enrich a molecule representation by knowledge about known context or reference molecules. Our novel concept for molecule representation enrichment is to associate molecules from both the support set and the query set with a large set of reference (context) molecules through a Modern Hopfield Network. Intuitively, this enrichment step is analogous to a human expert who would associate a given molecule with familiar molecules whose properties are known. The enrichment step reinforces and amplifies the covariance structure of the data, while simultaneously removing spurious correlations arising from the decoration of molecules. Our approach is compared with other few-shot methods for drug discovery on the FS-Mol benchmark dataset. On FS-Mol, our approach outperforms all compared methods and therefore sets a new state-of-the art for few-shot learning in drug discovery. An ablation study shows that the enrichment step of our method is the key to improve the predictive quality. In a domain shift experiment, we further demonstrate the robustness of our method.
Original languageEnglish
Number of pages30
DOIs
Publication statusPublished - 2023

Publication series

NamearXiv.org

Fields of science

  • 305907 Medical statistics
  • 202017 Embedded systems
  • 202036 Sensor systems
  • 101004 Biomathematics
  • 101014 Numerical mathematics
  • 101015 Operations research
  • 101016 Optimisation
  • 101017 Game theory
  • 101018 Statistics
  • 101019 Stochastics
  • 101024 Probability theory
  • 101026 Time series analysis
  • 101027 Dynamical systems
  • 101028 Mathematical modelling
  • 101029 Mathematical statistics
  • 101031 Approximation theory
  • 102 Computer Sciences
  • 102001 Artificial intelligence
  • 102003 Image processing
  • 102004 Bioinformatics
  • 102013 Human-computer interaction
  • 102018 Artificial neural networks
  • 102019 Machine learning
  • 102032 Computational intelligence
  • 102033 Data mining
  • 305901 Computer-aided diagnosis and therapy
  • 305905 Medical informatics
  • 202035 Robotics
  • 202037 Signal processing
  • 103029 Statistical physics
  • 106005 Bioinformatics
  • 106007 Biostatistics

JKU Focus areas

  • Digital Transformation

Cite this