An Elastic Approximate Similarity Search in Very Large Datasets with MapReduce

Josef Küng, Tran Khanh Dang, Trong Nhan Phan

Research output: Chapter in Book/Report/Conference proceedingConference proceedingspeer-review

Abstract

The outbreak of data brings an era of big data and more challenges than ever before to traditional similarity search which has been spread to a wide range of applications. Furthermore, an unprecedented scale of data being processed may be infeasible or may lead to the paralysis of systems due to the slow performance and high overheads. Dealing with such an unstoppable data growth paves the way not only to similarity search consolidates but also to new trends of data-intensive applications. Aiming at scalability, we propose an elastic approximate similarity search that efficiently works in very large datasets. Moreover, our proposed scheme effectively adapts itself to the well-known similarity searches with pairwise documents, pivot document, range query, and k-nearest neighbour query. Last but not least, these methods, together with our filtering strategies, are implemented and verified by experiments on real large data collections in Hadoop showing their promising effectiveness and efficiency.
Original languageEnglish
Title of host publicationData Management in Cloud, Grid and P2P Systems,
Place of PublicationBerlin, Heidelberg
PublisherSpringer
Pages44-57
Number of pages14
Volume8648
ISBN (Print)978-3-319-10066-1
Publication statusPublished - Nov 2014

Publication series

NameLecture Notes in Computer Science (LNCS)

Fields of science

  • 202007 Computer integrated manufacturing (CIM)
  • 102 Computer Sciences
  • 102001 Artificial intelligence
  • 102006 Computer supported cooperative work (CSCW)
  • 102010 Database systems
  • 102014 Information design
  • 102015 Information systems
  • 102016 IT security
  • 102022 Software development
  • 102025 Distributed systems
  • 502007 E-commerce
  • 505002 Data protection
  • 506002 E-government
  • 509018 Knowledge management

JKU Focus areas

  • Computation in Informatics and Mathematics

Cite this