Abstract
Protein kinases have a huge impact in human bodies, regulate almost all biochemical pathways, control cellular processes, and may phosphorylate up to 30% of all proteins. Alterations in their functions can cause genetic damage or diseases like cancer. Due to costs and availability of appropriate compounds, experimental genotoxicity is limited. Therefore, an efficient method for predicting protein kinase inhibitors is of major interest.
In this thesis, we evaluate the performance of the Maximum Common Subgraph kernel method concerning protein kinase activity prediction. This method is based on the similarity of chemical compounds, which is measured with a maximum common subgraph kernel and performs classification concerning the prediction of protein kinase inhibitors with the Potential Support Vector Machine. In order to provide robust results computations are done on a high quality and reliable data set containing a large number of compounds and activity values for over 170 protein kinases. We additionally pre-processed the data set by applying a clustering method. We compared the performance of the MCS kernel method against a number of walk based kernel methods and a subtree kernel method. According to several statistical tests, we assessed that the Maximum Common Subgraph kernel method outperforms many of these molecule kernels. Yet, on average, it has inferior performance to the walk based Tanimoto kernel with a depth of 7. However, according to the severest applied test routine, the Maximum Common Subgraph kernel method obtains best results for 2 kinases, namely PRKCl and STK33. Weakening criteria slightly and incorporating other test results, even for AKT2 it obtains best performance. In connection with these protein kinases it is the method of choice for predicting inhibitors. For several other kinases the MCS kernel method achieves comparable results even to the walk based Tanimoto method with depth 7.
Additionally we implemented the method as user-friendly R package such that chemical data sets can be handled easily. In the course of the implementation, we even improved the Maximum Common Subgraph kernel method. This makes the Maximum Common Subgraph kernel method available for many researches and can be applied to several biochemical prediction tasks.
In this thesis, we evaluate the performance of the Maximum Common Subgraph kernel method concerning protein kinase activity prediction. This method is based on the similarity of chemical compounds, which is measured with a maximum common subgraph kernel and performs classification concerning the prediction of protein kinase inhibitors with the Potential Support Vector Machine. In order to provide robust results computations are done on a high quality and reliable data set containing a large number of compounds and activity values for over 170 protein kinases. We additionally pre-processed the data set by applying a clustering method. We compared the performance of the MCS kernel method against a number of walk based kernel methods and a subtree kernel method. According to several statistical tests, we assessed that the Maximum Common Subgraph kernel method outperforms many of these molecule kernels. Yet, on average, it has inferior performance to the walk based Tanimoto kernel with a depth of 7. However, according to the severest applied test routine, the Maximum Common Subgraph kernel method obtains best results for 2 kinases, namely PRKCl and STK33. Weakening criteria slightly and incorporating other test results, even for AKT2 it obtains best performance. In connection with these protein kinases it is the method of choice for predicting inhibitors. For several other kinases the MCS kernel method achieves comparable results even to the walk based Tanimoto method with depth 7.
Additionally we implemented the method as user-friendly R package such that chemical data sets can be handled easily. In the course of the implementation, we even improved the Maximum Common Subgraph kernel method. This makes the Maximum Common Subgraph kernel method available for many researches and can be applied to several biochemical prediction tasks.
| Original language | English |
|---|---|
| Qualification | Master |
| Awarding Institution |
|
| Supervisors/Reviewers |
|
| Publication status | Published - Aug 2016 |
Fields of science
- 102019 Machine learning
- 102032 Computational intelligence
- 102004 Bioinformatics
- 104022 Theoretical chemistry
- 101016 Optimisation
- 101019 Stochastics
- 102003 Image processing
- 103029 Statistical physics
- 101018 Statistics
- 101017 Game theory
- 102001 Artificial intelligence
- 202017 Embedded systems
- 101015 Operations research
- 101014 Numerical mathematics
- 101029 Mathematical statistics
- 101028 Mathematical modelling
- 101026 Time series analysis
- 101024 Probability theory
- 102013 Human-computer interaction
- 101027 Dynamical systems
- 305907 Medical statistics
- 101004 Biomathematics
- 305905 Medical informatics
- 101031 Approximation theory
- 102033 Data mining
- 102 Computer Sciences
- 305901 Computer-aided diagnosis and therapy
- 106007 Biostatistics
- 102018 Artificial neural networks
- 106005 Bioinformatics
- 202037 Signal processing
- 202036 Sensor systems
- 202035 Robotics
JKU Focus areas
- Digital Transformation