The Next Level of Automated Data Quality Measurement

  • Lisa Ehrlinger (Speaker)

Activity: Talk or presentationInvited talkscience-to-science

Description

Central and automated monitoring of the data quality in integrated enterprise information systems is still a major challenge. Enterprise data is usually distributed across several heterogeneous information systems and is maintained by different parties (departments). In most cases, local information systems are developed autonomously and therefore the schema development cannot be tracked reliably and complete global domain knowledge is not available. Such a heterogeneous scenario has a number of negative effects. In particular, it is difficult to monitor data quality (e.g., completeness, accuracy, and timeliness) centrally. To overcome these problems, we have developed DQ-MeeRKat, a DQ tool that exploits the power of knowledge graphs to provide a global, homogenized view of data schemas. In this presentation, I will introduce the concept and advantages of "reference data profiles", which are annotated to the knowledge graph and serve as quasi-gold-standard to automatically and continuously verify the quality of manipulated data (insert, update, delete). To ensure that changes in the knowledge graph are globally visible and reliably traceable, a blockchain is used to make the knowledge graph tamper-proof. With DQ-MeeRKat, a knowledge graph for the global integrated schema and the annotated reference data profiles, chief data officers can reach the next level in DQ measurement.
Period19 Aug 2020
Event titleThe 14th Annual MIT Chief Data Officer and Information Quality (MITCDOIQ) Symposium
Event typeConference
LocationAustriaShow on map

Fields of science

  • 102019 Machine learning
  • 102033 Data mining
  • 102035 Data science
  • 102001 Artificial intelligence
  • 102015 Information systems
  • 102014 Information design

JKU Focus areas

  • Digital Transformation