Abstract
Finding all data distributed across numerous systems, understanding its meaning, and assessing its quality are major challenges for many companies and organizations. As a result, both researchers and practitioners have become increasingly interested in data catalogs, as such tools maintain a repository of technical metadata annotated with domain knowledge. Data catalog tools thus significantly improve the findability, accessibility, interoperability, and reusability (FAIR principles) of datasets. Currently, there is no generally accepted definition or interpretation regarding the required functionality of data catalog tools. This has not only led to a wide range of so-called data catalog tools but has also made it difficult for practitioners to gain an overview in order to make a targeted selection of a tool. Therefore, the main contributions of this paper are (i) an analysis and discussion of the most important data cataloging functionalities and (ii) a systematic survey that investigates the extent to which existing data catalog tools implement the core data cataloging functionalities. The detailed results of this survey (i.e., the identified features, data source connectors, support of artificial intelligence for each data catalog tool) are additionally provided in a table that can be customized, sorted, and filtered. While the evaluation table is intended primarily to support practitioners, we want to promote a common interpretation of data catalogs in the scientific community with the results compiled in this paper.
| Original language | English |
|---|---|
| Article number | 3568542 |
| Pages (from-to) | 83297-83319 |
| Number of pages | 23 |
| Journal | IEEE Access |
| Volume | 13 |
| DOIs | |
| Publication status | Published - 2025 |
Fields of science
- 505002 Data protection
- 102010 Database systems
- 102035 Data science
- 102033 Data mining
- 102019 Machine learning
- 102015 Information systems
- 102025 Distributed systems
JKU Focus areas
- Digital Transformation