Skip to main navigation Skip to search Skip to main content

In silico proof of principle of machine learning-based antibody design at unconstrained scale

  • Rahmad Akbar
  • , Philippe A. Robert
  • , Cedric R. Weber
  • , Michael Widrich
  • , Robert Frank
  • , Milena Pavlovic
  • , Lonneke Scheffer
  • , Maria Chernigovskaya
  • , Igor Snapkov
  • , Andrei Slabodkin
  • , Brij Bhushan Mehta
  • , Enkelejda Miho
  • , Fridtjof Lund-Johansen
  • , Jan Terje Andersen
  • , Sepp Hochreiter
  • , Ingrid Hobæk Haff
  • , Günter Klambauer
  • , Geir Kjetil Sandve
  • , Victor Greiff

Research output: Working paper and reportsPreprint

Abstract

Generative machine learning (ML) has been postulated to be a major driver in the computational design of antigen-specific monoclonal antibodies (mAb). However, efforts to confirm this hypothesis have been hindered by the infeasibility of testing arbitrarily large numbers of antibody sequences for their most critical design parameters: paratope, epitope, affinity, and developability. To address this challenge, we leveraged a lattice-based antibody-antigen binding simulation framework, which incorporates a wide range of physiological antibody binding parameters. The simulation framework enables both the computation of antibody-antigen 3D-structures as well as functions as an oracle for unrestricted prospective evaluation of the antigen specificity of ML-generated antibody sequences. We found that a deep generative model, trained exclusively on antibody sequence (1D) data can be used to design native-like conformational (3D) epitope-specific antibodies, matching or exceeding the training dataset in affinity and developability variety. Furthermore, we show that transfer learning enables the generation of high-affinity antibody sequences from low-N training data. Finally, we validated that the antibody design insight gained from simulated antibody-antigen binding data is applicable to experimental real-world data. Our work establishes a priori feasibility and the theoretical foundation of high-throughput ML-based mAb design.
Original languageEnglish
Number of pages36
DOIs
Publication statusPublished - 2021

Publication series

NamebioRxiv
ISSN (Print)2692-8205

Fields of science

  • 305907 Medical statistics
  • 202017 Embedded systems
  • 202036 Sensor systems
  • 101004 Biomathematics
  • 101014 Numerical mathematics
  • 101015 Operations research
  • 101016 Optimisation
  • 101017 Game theory
  • 101018 Statistics
  • 101019 Stochastics
  • 101024 Probability theory
  • 101026 Time series analysis
  • 101027 Dynamical systems
  • 101028 Mathematical modelling
  • 101029 Mathematical statistics
  • 101031 Approximation theory
  • 102 Computer Sciences
  • 102001 Artificial intelligence
  • 102003 Image processing
  • 102004 Bioinformatics
  • 102013 Human-computer interaction
  • 102018 Artificial neural networks
  • 102019 Machine learning
  • 102032 Computational intelligence
  • 102033 Data mining
  • 305901 Computer-aided diagnosis and therapy
  • 305905 Medical informatics
  • 202035 Robotics
  • 202037 Signal processing
  • 103029 Statistical physics
  • 106005 Bioinformatics
  • 106007 Biostatistics

JKU Focus areas

  • Digital Transformation

Cite this