Implementing Query Operations for Knowledge Graph OLAP in Apache Spark

Jennifer Klar

Research output: ThesisMaster's / Diploma thesis

Abstract

Knowledge Graph OLAP combines the concept of knowledge graphs (KG) and a multidimensional view on data as employed in online analytical processing (OLAP). KG-OLAP cubes contain knowledge in the form of RDF triples that are context-dependent, defined through hierarchically structured dimen-sions creating contextualized knowledge graphs. The model enables contextual and graph operations on the data for various kinds of analyses. A SPARQL-based implementation has proven not to be applicable for big volumes of data, accentuating the need for a scalable implementation. This thesis therefore aims at providing an implementation that is scalable for large amounts of data within the KG-OLAP setting and can perform the required graph operations on contextualized knowledge in the form of RDF data. Consequently, a prototypical implementation using the distributed processing framework Apache Spark is proposed that executes KG-OLAP graph operations on RDF quadruples. More spe-cifically, the graph processing framework GraphX built on top of Spark is used. Thus, RDF quadruples are mapped to the Apache Spark GraphX graph representation. The Java implementation then allows for the construction of the initial graph from the RDF source data as well as for performing the following KG-OLAP graph operations on the base graph: individual-generating abstraction, triple-generating ab-straction, value-generating abstraction, reification and pivot. The functionality and applicability of the Spark-based prototype is demonstrated in experiments on a provided large benchmark dataset con-taining data regarding air traffic management.
Original languageEnglish
Supervisors/Reviewers
  • Schütz, Christoph Georg, Supervisor
  • Ahmad, Bashar, Co-supervisor
Publication statusPublished - Feb 2023

Fields of science

  • 102 Computer Sciences
  • 102010 Database systems
  • 102015 Information systems
  • 102016 IT security
  • 102025 Distributed systems
  • 102027 Web engineering
  • 102028 Knowledge engineering
  • 102030 Semantic technologies
  • 102033 Data mining
  • 102035 Data science
  • 509026 Digitalisation research
  • 502050 Business informatics
  • 502058 Digital transformation
  • 503008 E-learning

JKU Focus areas

  • Digital Transformation

Cite this