A Flexible Architecture for Virtual Information Integration based on Semantic Web Concepts

Andreas Langegger

Research output: ThesisDoctoral thesis

Abstract

In this dissertation a novel approach for virtual information integration based on Semantic Web technologies is presented. Compared to traditional approaches based on the relational model, the system is able to integrate distributed, heterogeneous data sources based on ontologies. This strategy enables a concept-based integration approach, where data is described based on its meaning, instead of a functional data model. The system is based on the mediator-wrapper architecture. Wrappers are used to translate source data to the Resource Description Framework (RDF), which is the core data model of the SemanticWeb. In order to accurately represent all the integrated information from different kinds of information systems (relational databases, XML files and databases, spreadsheets, CSV files, web services, etc.), the global metamodel requires a high level of expressiveness, which RDF provides. The proposed approach is very flexible, since no explicit global schema needs to be maintained and data sources can be easily added and removed. A major contribution is the federation approach based on RDF graph statistics, which are generated by a sub-component called RDFStats. Based on histograms, it is possible to estimate query pattern cardinalities offline which enables scalable query federation and optimization at the mediator. Two other contributions are the optimization of D2R-Server, an RDF wrapper for relational database systems and XLWrap, which is currently the only spreadsheet-to-RDF wrapper that is able to wrap any spreadsheet layouts (including multidimensional cross tables) to arbitrary RDF target graphs. Combined with latest research towards new graphical user interfaces for linked data on the Semantic Web, the proposed approach is well suited for large-scale collaborative knowledge sharing in research as well as in industry.
Original languageEnglish
Publication statusPublished - 2009

Fields of science

  • 102001 Artificial intelligence
  • 102006 Computer supported cooperative work (CSCW)
  • 102010 Database systems
  • 102014 Information design
  • 102015 Information systems
  • 102016 IT security
  • 102028 Knowledge engineering
  • 102019 Machine learning
  • 102022 Software development
  • 102025 Distributed systems
  • 502007 E-commerce
  • 505002 Data protection
  • 506002 E-government
  • 509018 Knowledge management
  • 202007 Computer integrated manufacturing (CIM)
  • 102033 Data mining
  • 102035 Data science

Cite this