A Survey on the Functionalities of Data Catalog Tools

Jasmin Kropshofer*, Johannes Schrott, Wolfram Wöß, Lisa Ehrlinger

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Finding all data distributed across numerous systems, understanding its meaning, and assessing its quality are major challenges for many companies and organizations. As a result, both researchers and practitioners have become increasingly interested in data catalogs, as such tools maintain a repository of technical metadata annotated with domain knowledge. Data catalog tools thus significantly improve the findability, accessibility, interoperability, and reusability (FAIR principles) of datasets. Currently, there is no generally accepted definition or interpretation regarding the required functionality of data catalog tools. This has not only led to a wide range of so-called data catalog tools but has also made it difficult for practitioners to gain an overview in order to make a targeted selection of a tool. Therefore, the main contributions of this paper are (i) an analysis and discussion of the most important data cataloging functionalities and (ii) a systematic survey that investigates the extent to which existing data catalog tools implement the core data cataloging functionalities. The detailed results of this survey (i.e., the identified features, data source connectors, support of artificial intelligence for each data catalog tool) are additionally provided in a table that can be customized, sorted, and filtered. While the evaluation table is intended primarily to support practitioners, we want to promote a common interpretation of data catalogs in the scientific community with the results compiled in this paper.
Original languageEnglish
Article number3568542
Pages (from-to)83297-83319
Number of pages23
JournalIEEE Access
Volume13
DOIs
Publication statusPublished - 2025

Fields of science

  • 505002 Data protection
  • 102010 Database systems
  • 102035 Data science
  • 102033 Data mining
  • 102019 Machine learning
  • 102015 Information systems
  • 102025 Distributed systems

JKU Focus areas

  • Digital Transformation

Cite this