Optimizing Hash Join with MapReduce on Multi-Core CPUs

Tong YUAN  Zhijing LIU  Hui LIU  

Publication
IEICE TRANSACTIONS on Information and Systems   Vol.E99-D   No.5   pp.1316-1325
Publication Date: 2016/05/01
Online ISSN: 1745-1361
DOI: 10.1587/transinf.2015EDP7306
Type of Manuscript: PAPER
Category: Data Engineering, Web Information Systems
Keyword: 
hash join,  database system,  MapReduce,  multi-core CPU,  cuckoo hashing,  

Full Text: PDF>>
Buy this Article




Summary: 
In this paper, we exploit MapReduce framework and other optimizations to improve the performance of hash join algorithms on multi-core CPUs, including No partition hash join and partition hash join. We first implement hash join algorithms with a shared-memory MapReduce model on multi-core CPUs, including partition phase, build phase, and probe phase. Then we design an improved cuckoo hash table for our hash join, which consists of a cuckoo hash table and a chained hash table. Based on our implementation, we also propose two optimizations, one for the usage of SIMD instructions, and the other for partition phase. Through experimental result and analysis, we finally find that the partition hash join often outperforms the No partition hash join, and our hash join algorithm is faster than previous work by an average of 30%.