Abstract
n this paper, we study the problems of scalability and performance for similarity search by proposing eHSim, an efficient hybrid similarity search with MapReduce. More specifically, we introduce clustering schemes that partition objects into different groups by their length. Additionally, we equip our proposed schemes with pruning strategies that quickly discard irrelevant objects before truly computing their similarity. Moreover, we design a hybrid MapReduce architecture that deals with challenges from big data. Furthermore, we implement our proposed methods with MapReduce and make them compatible with the hybrid MapReduce architecture. Last but not least, we evaluate the proposed methods with real datasets. Empirical experiments show that our approach is considerably more efficient than state-of-the-arts in terms of query processing, batch processing, and data storage.
Original language | English |
---|---|
Title of host publication | Advanced Information Networking and Applications (AINA), 2016 IEEE 30th International Conference on Advanced Information Networking and Applications |
Publisher | IEEE |
Pages | 422-429 |
Number of pages | 8 |
Publication status | Published - 2016 |
Fields of science
- 202007 Computer integrated manufacturing (CIM)
- 102001 Artificial intelligence
- 102006 Computer supported cooperative work (CSCW)
- 102010 Database systems
- 102014 Information design
- 102015 Information systems
- 102016 IT security
- 102022 Software development
- 102025 Distributed systems
- 102033 Data mining
- 502007 E-commerce
- 505002 Data protection
- 506002 E-government
- 509018 Knowledge management
JKU Focus areas
- Computation in Informatics and Mathematics