Abstract
Predicting the biological activity of a chemical compound in-silico is a crucial step in the drug design process, because measuring the activity in-vitro or in-vivo is time consuming and cost intense. In this work, we evaluate different machine learning models that leverage high-throughput images to predict biological activity. In con- trast to quantitative structure-activity relationship (QSAR) models that are based on the chemical structure, the image-based models include biological information and are independent of chemicals structure. This allows the image-based models to de- tect novel, potentially unexpectedly active chemical structures. In this work, we compare two different approaches for image-based compound activity prediction: The first approach is a deep neural network (DNN) trained on precalculated image descriptors, whereas the second approach is based around a convolutionalneuralnetwork(CNN)whichistraineddirectlyonthehigh-throughput images. We show that the CNN significantly outperforms the DNN with an average area under the curve (AUC) of 0.60 (+-0.26) over all assays on an external test set. A significant and high predictivity (AUC>0.9) is achieved for 16.92% or 11 assays. Descriptor-based DNNs achieve an average AUC of 0.51 (+- 0.14) and no significant and high predictivity. Using these highly accurate predictive models, we were able to annotate 12.000 compounds in 11 assays, which amounts to 130.000 newly obtained data points almost equivalent to assay measurements. Instead of measuring a new com- pound for activities, a high-throughput image can be taken of the cells with that specific compound applied to them. This image can further be processed to predict different assays or effects of yet unknown compounds to speedup the drug design process.
| Original language | English |
|---|---|
| Qualification | Master |
| Awarding Institution |
|
| Supervisors/Reviewers |
|
| Publication status | Published - Aug 2018 |
Fields of science
- 102019 Machine learning
- 102018 Artificial neural networks
- 102032 Computational intelligence
- 102004 Bioinformatics
- 104022 Theoretical chemistry
- 101016 Optimisation
- 101028 Mathematical modelling
- 202037 Signal processing
- 101019 Stochastics
- 102003 Image processing
- 103029 Statistical physics
- 101018 Statistics
- 101017 Game theory
- 102001 Artificial intelligence
- 202017 Embedded systems
- 101015 Operations research
- 101014 Numerical mathematics
- 101029 Mathematical statistics
- 101026 Time series analysis
- 101024 Probability theory
- 102013 Human-computer interaction
- 101027 Dynamical systems
- 305907 Medical statistics
- 101004 Biomathematics
- 305905 Medical informatics
- 101031 Approximation theory
- 102033 Data mining
- 102 Computer Sciences
- 305901 Computer-aided diagnosis and therapy
- 106007 Biostatistics
- 106005 Bioinformatics
- 202036 Sensor systems
- 202035 Robotics
JKU Focus areas
- Digital Transformation