DQ-MeeRKat: Automating Data Quality Monitoring with a Reference-Data-Profile-Annotated Knowledge Graph

Lisa Ehrlinger, Alexander Gindlhumer, Lisa-Marie Huber, Wolfram Wöß

Research output: Chapter in Book/Report/Conference proceedingConference proceedingspeer-review

Abstract

High data quality (e.g., completeness, accuracy, non-redundancy) is essential to ensure the trustworthiness of AI applications. In such applications, huge amounts of data is integrated from different heterogeneous sources and complete, global domain knowledge is often not available. This scenario has a number of negative effects, in particular, it is difficult to monitor data quality centrally and manual data curation is not feasible. To overcome these problems, we developed DQ-MeeRKat, a data quality tool that implements a new method to automate data quality monitoring. DQ-MeeRKat uses a knowledge graph to represent a global, homogenized view of local data sources. This knowledge graph is annotated with reference data profiles, which serve as quasi-gold-standard to automatically verify the quality of modified data. We evaluated DQ-MeeRKat on six real-world data streams with qualitative feedback from the data owners. In contrast to existing data quality tools, DQ-MeeRKat does not require domain experts to define rules, but can be fully automated.
Original languageEnglish
Title of host publicationProceedings of the 10th International Conference on Data Science, Technology and Applications - DATA,
Editors Christoph Quix, Slimane Hammoudi, Wil van der Aalst
PublisherSciTePress
Pages215-222
Number of pages8
Volume1
ISBN (Print)978-989-758-521-0
DOIs
Publication statusPublished - 2021

Fields of science

  • 102001 Artificial intelligence
  • 102010 Database systems
  • 102015 Information systems
  • 102019 Machine learning
  • 102025 Distributed systems
  • 102028 Knowledge engineering
  • 102033 Data mining
  • 102035 Data science
  • 509018 Knowledge management

JKU Focus areas

  • Digital Transformation

Cite this