Eager Memory Management for In-Memory Data Analytics

Hakbeom JANG  Jonghyun BAE  Tae Jun HAM  Jae W. LEE  

IEICE TRANSACTIONS on Information and Systems   Vol.E102-D   No.3   pp.632-636
Publication Date: 2019/03/01
Publicized: 2018/12/11
Online ISSN: 1745-1361
DOI: 10.1587/transinf.2018EDL8199
Type of Manuscript: LETTER
Category: Computer System
in-memory computing,  spark,  garbage collection,  data spill,  

Full Text: PDF>>
Buy this Article

 | Errata[Uploaded on April 1,2019]

This paper introduces e-spill, an eager spill mechanism, which dynamically finds the optimal spill-threshold by monitoring the GC time at runtime and thereby prevent expensive GC overhead. Our e-spill adopts a slow-start model to gradually increase the spill-threshold until it reaches the optimal point without substantial GCs. We prototype e-spill as an extension to Spark and evaluate it using six workloads on three different parallel platforms. Our evaluations show that e-spill improves performance by up to 3.80× and saves the cost of cluster operation on Amazon EC2 cloud by up to 51% over the baseline system following Spark Tuning Guidelines.