Effizientes Clustering von horizontal verteilten Daten

Stefan Schaubschläger

Research output: ThesisMaster's / Diploma thesis

Abstract

The increasing use of information technology in science, business and administration has led to the emergence of massive amounts of data. Data mining methods and techniques, such as cluster analysis, are applied in many fields to discover coherences and patterns in the data, and to deduce information from these patterns. Due to the rise of global networking, data is more and more often distributed among different sites. The distribution of the data among several sites is complicating the access for algorithms which need to analyze the whole amount of data, like clustering methods do. This diploma thesis addresses the field of distributed clustering and shows suitable strategies for clustering horizontally distributed data. Furthermore, the thesis examines, how clustering methods can be adapted to distributed environments in a way, that an efficient and effective clustering is guaranteed. Moreover, it is shown, that the distribution of massive amounts of data over several sites can result in an increased scalability of clustering methods. A prototype for distributed clustering has been implemented, whose implementation aspects and test results are presented.
Original languageGerman (Austria)
Supervisors/Reviewers
  • Schrefl, Michael, Supervisor
  • Goller, Mathias, Co-supervisor
Publication statusPublished - Oct 2005

Fields of science

  • 102 Computer Sciences
  • 102015 Information systems

Cite this