eHSim: An Efficient Hybrid Similarity Search with MapReduce

Josef Küng, Trong Nhan Phan, Khanh Tran Dang

Research output: Chapter in Book/Report/Conference proceedingConference proceedingspeer-review

Abstract

n this paper, we study the problems of scalability and performance for similarity search by proposing eHSim, an efficient hybrid similarity search with MapReduce. More specifically, we introduce clustering schemes that partition objects into different groups by their length. Additionally, we equip our proposed schemes with pruning strategies that quickly discard irrelevant objects before truly computing their similarity. Moreover, we design a hybrid MapReduce architecture that deals with challenges from big data. Furthermore, we implement our proposed methods with MapReduce and make them compatible with the hybrid MapReduce architecture. Last but not least, we evaluate the proposed methods with real datasets. Empirical experiments show that our approach is considerably more efficient than state-of-the-arts in terms of query processing, batch processing, and data storage.
Original languageEnglish
Title of host publicationAdvanced Information Networking and Applications (AINA), 2016 IEEE 30th International Conference on Advanced Information Networking and Applications
PublisherIEEE
Pages422-429
Number of pages8
Publication statusPublished - 2016

Fields of science

  • 202007 Computer integrated manufacturing (CIM)
  • 102001 Artificial intelligence
  • 102006 Computer supported cooperative work (CSCW)
  • 102010 Database systems
  • 102014 Information design
  • 102015 Information systems
  • 102016 IT security
  • 102022 Software development
  • 102025 Distributed systems
  • 102033 Data mining
  • 502007 E-commerce
  • 505002 Data protection
  • 506002 E-government
  • 509018 Knowledge management

JKU Focus areas

  • Computation in Informatics and Mathematics

Cite this