Abstract
Machine learning methods have a long tradition in data-driven, computational drug discovery. Drug discovery is a complex multi-stage process which requires expertise and time both being costly factors. Data-driven approaches, such as machine learning and especially Deep Learning, can drastically reduce these costs and the time needed by utilising experimentally gathered data. Furthermore, machine learning methods can improve drug discovery at several different stages. The aim of this thesis is to explore those multiple leverage points for Deep Learning models in order to facilitate the multifaceted process of drug discovery. First, we suggest a method to predict synergetic effects of drug combinations for different cancer types. Secondly, an evaluation metric for computational methods generating novel molecules is proposed. Thirdly, we investigate how Deep Learning models in drug discovery can be interpreted by identifying indicative substructures. These pieces can be combined to create powerful generative models, which both generate promising drug candidates or drug combinations as well as pharmacological knowledge. In this thesis, a Deep Learning approach is presented which expands current activity prediction approaches by two dimensions: drug combinations and disease-specific characteristics. More precisely, the presented deep neural network combines information about two different molecular structures with genetic features describing cancer cells in order to predict biological activity. Therefore, this work creates a broad foundation for data driven activity prediction, because it demonstrates how classical approaches can be expanded to multi-modal data sources. Highly accurate activity prediction models can in turn guide generative models through the enormous space of potential drugs towards regions of promising lead compounds. Although, many auspicious approaches for generative models have been presented so far, their evaluation was based on visual inspection and metrics which could easily be fooled. Therefore, in the second part of this thesis, we propose the Fréchet ChemNet Distance (FCD), a metric that captures in a single value whether a generative model is able to fit the distribution of real molecules. This metric compares the distribution of generated molecules to the distribution of real world molecules based on chemically and biologically relevant representations learned by a deep neural network. Compared to other existing metrics the FCD is able to detect multiple flaws of generative models and is therefore a reliable performance measure. Although generating highly potent drug candidates support the drug discovery process with promising solutions, it does not provide answers or generates knowledge. Therefore, in the third part of this thesis, we propose a method for deep neural networks which is designed to, on the one hand predict biological activities, and on the other hand provides the most indicative structures for the present task. For a mutagenicity data set this network was able to identify well-established toxic substructures. Hence, this method can generate knowledge important to understand the mechanisms of toxicity. Overall, we demonstrate how Deep Learning models can inform and guide the drug discovery process. We envision that the findings presented within this thesis can support drug discovery at multiple stages and could also be combined into a comprehensive generative model, capable of finding new promising candidates.
| Original language | English |
|---|---|
| Qualification | PhD |
| Awarding Institution |
|
| Supervisors/Reviewers |
|
| Publication status | Published - May 2019 |
Fields of science
- 101019 Stochastics
- 102003 Image processing
- 103029 Statistical physics
- 101018 Statistics
- 101017 Game theory
- 102001 Artificial intelligence
- 202017 Embedded systems
- 101016 Optimisation
- 101015 Operations research
- 101014 Numerical mathematics
- 101029 Mathematical statistics
- 101028 Mathematical modelling
- 101026 Time series analysis
- 101024 Probability theory
- 102032 Computational intelligence
- 102004 Bioinformatics
- 102013 Human-computer interaction
- 101027 Dynamical systems
- 305907 Medical statistics
- 101004 Biomathematics
- 305905 Medical informatics
- 101031 Approximation theory
- 102033 Data mining
- 102 Computer Sciences
- 305901 Computer-aided diagnosis and therapy
- 102019 Machine learning
- 106007 Biostatistics
- 102018 Artificial neural networks
- 106005 Bioinformatics
- 202037 Signal processing
- 202036 Sensor systems
- 202035 Robotics
JKU Focus areas
- Digital Transformation
Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver