Abstract
Uncertainty quantification (UQ) plays a crucial role in drug discovery, providing essential insights into the confidence of models used for estimating bioactivity. This study evaluates and compares various UQ methods, focusing on ensemble models, MC dropout, and evidential models, utilizing the Papyrus dataset—a large-scale, curated dataset for bioactivity estimations. A rigorous cheminformatics data standardization workflow was implemented, ensuring high data quality and consistency. The study employs a meticulous splitting technique to create truly Stratified and Scaffold cluster splitted datasets, using a clustering method. This method provides a robust framework for evaluation.
The results demonstrate that evidential models exhibit superior performance in Stratified experiments, achieving lower errors and higher correlation with observed values. However, in Scaffold cluster scenarios, ensemble models outperform others, highlighting their robustness and adaptability to new, unseen data. The study underscores the importance of a balanced approach to UQ, combining various uncertainty metrics. The analysis also identifies significant correlations among these metrics, recommending a focus on key metrics to streamline the evaluation process such as; RMSE, R^2, and PCC for performance evaluation, and MCA, NLL, CRPS, Interval, Sharpness and rank-based metrics for uncertainty assessment. Furthermore, the study highlights the limitations of using MCA alone as a differentiation metric post-recalibration, advocating for a multi-faceted approach to UQ assessment. Spearmans rank correlation coefficient (rho) and RMV-vs-RMSE plots were employed to analyze rank-based metrics, revealing that ensemble and MC dropout models maintain better calibration compared to evidential models. This suggests that while evidential models are effective in familiar data contexts, they struggle with generalization to new data, thereby limiting their practical usability in certain applications.
Future research directions include the application of UQ to multi-task learning with other benchmarking datasets, and exploring hybrid models which could enhance the robustness of models in drug discovery. The integration of distribution-free conformal prediction (CP) methods presents a promising avenue for future investigation. In conclusion, this study provides a comprehensive evaluation workflow of UQ methods in bioactivity assessment context, offering valuable insights into their strengths and limitations. By advancing UQ techniques, we can improve the reliability and effectiveness of machine learning models in drug discovery, ultimately accelerating the development of new therapeutics.
The results demonstrate that evidential models exhibit superior performance in Stratified experiments, achieving lower errors and higher correlation with observed values. However, in Scaffold cluster scenarios, ensemble models outperform others, highlighting their robustness and adaptability to new, unseen data. The study underscores the importance of a balanced approach to UQ, combining various uncertainty metrics. The analysis also identifies significant correlations among these metrics, recommending a focus on key metrics to streamline the evaluation process such as; RMSE, R^2, and PCC for performance evaluation, and MCA, NLL, CRPS, Interval, Sharpness and rank-based metrics for uncertainty assessment. Furthermore, the study highlights the limitations of using MCA alone as a differentiation metric post-recalibration, advocating for a multi-faceted approach to UQ assessment. Spearmans rank correlation coefficient (rho) and RMV-vs-RMSE plots were employed to analyze rank-based metrics, revealing that ensemble and MC dropout models maintain better calibration compared to evidential models. This suggests that while evidential models are effective in familiar data contexts, they struggle with generalization to new data, thereby limiting their practical usability in certain applications.
Future research directions include the application of UQ to multi-task learning with other benchmarking datasets, and exploring hybrid models which could enhance the robustness of models in drug discovery. The integration of distribution-free conformal prediction (CP) methods presents a promising avenue for future investigation. In conclusion, this study provides a comprehensive evaluation workflow of UQ methods in bioactivity assessment context, offering valuable insights into their strengths and limitations. By advancing UQ techniques, we can improve the reliability and effectiveness of machine learning models in drug discovery, ultimately accelerating the development of new therapeutics.
| Original language | English |
|---|---|
| Qualification | Master |
| Awarding Institution |
|
| Supervisors/Reviewers |
|
| Publication status | Published - 2024 |
Fields of science
- 102019 Machine learning
- 102032 Computational intelligence
- 102004 Bioinformatics
- 104022 Theoretical chemistry
- 101016 Optimisation
- 101028 Mathematical modelling
- 101031 Approximation theory
- 101019 Stochastics
- 102003 Image processing
- 103029 Statistical physics
- 101018 Statistics
- 101017 Game theory
- 102001 Artificial intelligence
- 202017 Embedded systems
- 101015 Operations research
- 101014 Numerical mathematics
- 101029 Mathematical statistics
- 101026 Time series analysis
- 101024 Probability theory
- 102013 Human-computer interaction
- 101027 Dynamical systems
- 305907 Medical statistics
- 101004 Biomathematics
- 305905 Medical informatics
- 102033 Data mining
- 102 Computer Sciences
- 305901 Computer-aided diagnosis and therapy
- 106007 Biostatistics
- 102018 Artificial neural networks
- 106005 Bioinformatics
- 202037 Signal processing
- 202036 Sensor systems
- 202035 Robotics
JKU Focus areas
- Digital Transformation