Reconfigurable Out-of-Order System for Fluid Dynamics Computation Using Unstructured Mesh

Takayuki AKAMINE  Mohamad Sofian ABU TALIP  Yasunori OSANA  Naoyuki FUJITA  Hideharu AMANO  

IEICE TRANSACTIONS on Information and Systems   Vol.E97-D   No.5   pp.1225-1234
Publication Date: 2014/05/01
Online ISSN: 1745-1361
DOI: 10.1587/transinf.E97.D.1225
Type of Manuscript: PAPER
Category: Computer System
computational fluid dynamics (CFD),  field programmable gate array (FPGA),  scientific computations,  reconfigurable hardware,  out-of-order system,  

Full Text: PDF(1MB)>>
Buy this Article

Computational fluid dynamics (CFD) is an important tool for designing aircraft components. FaSTAR (Fast Aerodynamics Routines) is one of the most recent CFD packages and has various subroutines. However, its irregular and complicated data structure makes it difficult to execute FaSTAR on parallel machines due to memory access problem. The use of a reconfigurable platform based on field programmable gate arrays (FPGAs) is a promising approach to accelerating memory-bottlenecked applications like FaSTAR. However, even with hardware execution, a large number of pipeline stalls can occur due to read-after-write (RAW) data hazards. Moreover, it is difficult to predict when such stalls will occur because of the unstructured mesh used in FaSTAR. To eliminate this problem, we developed an out-of-order mechanism for permuting the data order so as to prevent RAW hazards. It uses an execution monitor and a wait buffer. The former identifies the state of the computation units, and the latter temporarily stores data to be processed in the computation units. This out-of-order mechanism can be applied to various types of computations with data dependency by changing the number of execution monitors and wait buffers in accordance with the equations used in the target computation. An out-of-order system can be reconfigured by automatic changing of the parameters. Application of the proposed mechanism to five subroutines in FaSTAR showed that its use reduces the number of stalls to less than 1% compared to without the mechanism. In-order execution was speeded up 2.6-fold and software execution was speeded up 2.9-fold using an Intel Core 2 Duo processor with a reasonable amount of overhead.