Abstract
Analytics has become indispensable in many organizations. The use of historical data to determine the best course of action to address current challenges has become standard practice. When an analyst is responsible to determine the reliability of an analysis result, often only very little information is provided about the business case, the used prediction model, or the correct interpretation of the result. In this thesis, we propose a novel approach centered on knowledge representation to systematically judge the reliability of analysis results. We introduce a reference process for reliability assessment, which captures analytics-related knowledge along the entire life cycle of an analysis project. We illustrate how the reference process can be adapted for specific types of analytics, namely, predictive and descriptive analytics. For predictive and descriptive analytics, we present different approaches that can be used to assess the reliability of individual results. We demonstrate how the perturbation approach to reliability assessment can be applied to the real-world use case of flight delay prediction. Furthermore, we discuss how knowledge patterns can be used for reliability assessment within descriptive analytics.
The proposed reference process for enabling reliability assessment of analysis results is aligned with the Cross-Industry Standard Process for Data Mining. We describe three abstraction levels—generic, method-specific, and problem-specific—at which knowledge about an analytics process can be modeled and captured. In order to assess the reliability of analysis results, analytics-related knowledge is gathered along all stages of the analytics process. The knowledge representation of the analytics-related knowledge employs the PROV ontology as the fundamental for modeling classes and properties.
We demonstrate in detail how the reliability-assessment approach of perturbation can be applied to the real-world use case of flight delay prediction. We also describe how the required knowledge for reliability assessment was captured in the analytics process and we describe which actions were performed to assess the reliability of individual flight delay predictions. In addition, we illustrate how the use of tool support can help to apply the perturbation approach for a specific use case. Adapting the reference process for descriptive analytics, we illustrate how the knowledge pattern approach can be used for reliability assessment of descriptive analysis results in the context of the health insurance domain. We further investigate by conducting expert interviews whether the knowledge patterns found in the health insurance domain could also be used to judge the reliability of descriptive analysis results in other domains, e.g., public transport or finance.
The proposed reference process for enabling reliability assessment of analysis results is aligned with the Cross-Industry Standard Process for Data Mining. We describe three abstraction levels—generic, method-specific, and problem-specific—at which knowledge about an analytics process can be modeled and captured. In order to assess the reliability of analysis results, analytics-related knowledge is gathered along all stages of the analytics process. The knowledge representation of the analytics-related knowledge employs the PROV ontology as the fundamental for modeling classes and properties.
We demonstrate in detail how the reliability-assessment approach of perturbation can be applied to the real-world use case of flight delay prediction. We also describe how the required knowledge for reliability assessment was captured in the analytics process and we describe which actions were performed to assess the reliability of individual flight delay predictions. In addition, we illustrate how the use of tool support can help to apply the perturbation approach for a specific use case. Adapting the reference process for descriptive analytics, we illustrate how the knowledge pattern approach can be used for reliability assessment of descriptive analysis results in the context of the health insurance domain. We further investigate by conducting expert interviews whether the knowledge patterns found in the health insurance domain could also be used to judge the reliability of descriptive analysis results in other domains, e.g., public transport or finance.
| Original language | English |
|---|---|
| Qualification | PhD |
| Awarding Institution |
|
| Supervisors/Reviewers |
|
| Award date | 11 Sept 2025 |
| Publication status | Published - Sept 2025 |
Fields of science
- 102030 Semantic technologies
- 502050 Business informatics
- 102010 Database systems
- 102035 Data science
- 503008 E-learning
- 502058 Digital transformation
- 509026 Digitalisation research
- 102033 Data mining
- 102 Computer Sciences
- 102027 Web engineering
- 102028 Knowledge engineering
- 102016 IT security
- 102015 Information systems
- 102025 Distributed systems
JKU Focus areas
- Digital Transformation