For Full-Text PDF, please login, if you are a member of IEICE,|
or go to Pay Per View on menu list, if you are a nonmember of IEICE.
A Tree-Based Checkpointing Architecture for the Dependability of FPGA Computing
Hoang-Gia VU Shinya TAKAMAEDA-YAMAZAKI Takashi NAKADA Yasuhiko NAKASHIMA
IEICE TRANSACTIONS on Information and Systems
Publication Date: 2018/02/01
Online ISSN: 1745-1361
Type of Manuscript: Special Section PAPER (Special Section on Reconfigurable Systems)
Category: Device and Architecture
checkpointing, FPGA, dependability, tree-based,
Full Text: PDF(2.1MB)>>
Modern FPGAs have been integrated in computing systems as accelerators for long running applications. This integration puts more pressure on the fault tolerance of computing systems, and the requirement for dependability becomes essential. As in the case of CPU-based system, checkpoint/restart techniques are also expected to improve the dependability of FPGA-based computing. Three issues arise in this situation: how to checkpoint and restart FPGAs, how well this checkpoint/restart model works with the checkpoint/restart model of the whole computing system, and how to build the model by a software tool. In this paper, we first present a new checkpoint/restart architecture along with a checkpointing mechanism on FPGAs. We then propose a method to capture consistent snapshots of FPGA and the rest of the computing system. Third, we provide “fine-grained” management for checkpointing to reduce performance degradation. For the host CPU, we also provide a stack which includes API functions to manage checkpoint/restart procedures on FPGAs. Fourth, we present a Python-based tool to insert checkpointing infrastructure. Experimental results show that the checkpointing architecture causes less than 10% maximum clock frequency degradation, low checkpointing latencies, small memory footprints, and small increases in power consumption, while the LUT overhead varies from 17.98% (Dijkstra) to 160.67% (Matrix Multiplication).