For Full-Text PDF, please login, if you are a member of IEICE,|
or go to Pay Per View on menu list, if you are a nonmember of IEICE.
Reducing I/O Cost in OLAP Query Processing with MapReduce
Woo-Lam KANG Hyeon-Gyu KIM Yoon-Joon LEE
IEICE TRANSACTIONS on Information and Systems
Publication Date: 2015/02/01
Online ISSN: 1745-1361
Type of Manuscript: LETTER
Category: Data Engineering, Web Information Systems
MapReduce, Hadoop, OLAP, data warehouse, TPC-H benchmark,
Full Text: PDF(353.2KB)
>>Buy this Article
This paper presents a method to reduce I/O cost in MapReduce when online analytical processing (OLAP) queries are used for data analysis. The proposed method consists of two basic ideas. First, to reduce network transmission cost, mappers are organized to receive only data necessary to perform a map task, not an entire set of input data. Second, to reduce storage consumption, only record IDs are stored for checkpointing, not the raw records. Experiments conducted with TPC-H benchmark show that the proposed method is about 40% faster than Hive, the well-known data warehouse solution for MapReduce, while reducing the size of data stored for checkpoining to about 80%.