Benchmarking recent Deep Learning methods on the extended Tox21 data set

Philipp Seidl, Christina Halmich, Andreas Mayr, Andreu Vall, Peter Ruch, Sepp Hochreiter, Günter Klambauer

Research output: Chapter in Book/Report/Conference proceedingConference proceedingspeer-review

Abstract

The Tox21 data set has evolved into a standard benchmark for computational QSAR methods in toxicology [1]. One limitation of the Tox21 data set is, however, that it only contains twelve toxic assays which strongly restricts its power to distinguish the strength of computational methods. We ameliorate this problem by benchmarking on the extended Tox21 dataset with 68 publicly available assays in order to allow for a better assessment and characterization. The broader range of assays also allows for multi-task approaches, which have been particularly successful as predictive models [2]. Furthermore, previous publications comparing methods on Tox21 did not include recent developments in the field of machine learning, such as graph neural and modern Hopfield networks [3]. Thus we benchmark a set of prominent machine learning methods including those new types of neural networks. The results of the benchmarking study show that the best methods are modern Hopfield networks and multi-task graph neural networks with an average area-under-ROC-curve of 0.91 ± 0.05 (standard deviation across assays), while traditional methods, such as Random Forests fall behind by a substantial margin. Our results of the full benchmark suggest that multi-task learning has a stronger effect on the predictive performance than the choice of the representation of the molecules, such as graph, descriptors, or fingerprints.
Original languageEnglish
Title of host publication19th International Workshop on (Q)SAR in Environmental and Health Sciences (QSAR2021), Poster Session, June 2021, online
Number of pages1
Publication statusPublished - 2021

Fields of science

  • 305907 Medical statistics
  • 202017 Embedded systems
  • 202036 Sensor systems
  • 101004 Biomathematics
  • 101014 Numerical mathematics
  • 101015 Operations research
  • 101016 Optimisation
  • 101017 Game theory
  • 101018 Statistics
  • 101019 Stochastics
  • 101024 Probability theory
  • 101026 Time series analysis
  • 101027 Dynamical systems
  • 101028 Mathematical modelling
  • 101029 Mathematical statistics
  • 101031 Approximation theory
  • 102 Computer Sciences
  • 102001 Artificial intelligence
  • 102003 Image processing
  • 102004 Bioinformatics
  • 102013 Human-computer interaction
  • 102018 Artificial neural networks
  • 102019 Machine learning
  • 102032 Computational intelligence
  • 102033 Data mining
  • 305901 Computer-aided diagnosis and therapy
  • 305905 Medical informatics
  • 202035 Robotics
  • 202037 Signal processing
  • 103029 Statistical physics
  • 106005 Bioinformatics
  • 106007 Biostatistics

JKU Focus areas

  • Digital Transformation

Cite this