An FPGA Realization of a Random Forest with k-Means Clustering Using a High-Level Synthesis Design

Akira JINGUJI  Shimpei SATO  Hiroki NAKAHARA  

IEICE TRANSACTIONS on Information and Systems   Vol.E101-D   No.2   pp.354-362
Publication Date: 2018/02/01
Online ISSN: 1745-1361
DOI: 10.1587/transinf.2017RCP0006
Type of Manuscript: Special Section PAPER (Special Section on Reconfigurable Systems)
Category: Emerging Applications
machine learning,  random forest,  k-means clustering,  FPGA,  

Full Text: PDF>>
Buy this Article

A random forest (RF) is a kind of ensemble machine learning algorithm used for a classification and a regression. It consists of multiple decision trees that are built from randomly sampled data. The RF has a simple, fast learning, and identification capability compared with other machine learning algorithms. It is widely used for application to various recognition systems. Since it is necessary to un-balanced trace for each tree and requires communication for all the ones, the random forest is not suitable in SIMD architectures such as GPUs. Although the accelerators using the FPGA have been proposed, such implementations were based on HDL design. Thus, they required longer design time than the soft-ware based realizations. In the previous work, we showed the high-level synthesis design of the RF including the fully pipelined architecture and the all-to-all communication. In this paper, to further reduce the amount of hardware, we use k-means clustering to share comparators of the branch nodes on the decision tree. Also, we develop the krange tool flow, which generates the bitstream with a few number of hyper parameters. Since the proposed tool flow is based on the high-level synthesis design, we can obtain the high performance RF with short design time compared with the conventional HDL design. We implemented the RF on the Xilinx Inc. ZC702 evaluation board. Compared with the CPU (Intel Xeon (R) E5607 Processor) and the GPU (NVidia Geforce Titan) implementations, as for the performance, the FPGA realization was 8.4 times faster than the CPU one, and it was 62.8 times faster than the GPU one. As for the power consumption efficiency, the FPGA realization was 7.8 times better than the CPU one, and it was 385.9 times better than the GPU one.