Interconnect-Aware Pipeline Synthesis for Array-Based Architectures

Shanghua GAO  Hiroaki YOSHIDA  Kenshu SETO  Satoshi KOMATSU  Masahiro FUJITA  

IEICE TRANSACTIONS on Fundamentals of Electronics, Communications and Computer Sciences   Vol.E92-A   No.6   pp.1464-1475
Publication Date: 2009/06/01
Online ISSN: 1745-1337
DOI: 10.1587/transfun.E92.A.1464
Print ISSN: 0916-8508
Type of Manuscript: PAPER
Category: VLSI Design Technology and CAD
software pipelining,  interconnect delay,  high level synthesis,  scheduling,  performance,  

Full Text: PDF>>
Buy this Article

In the deep-submicron era, interconnect delays are becoming one of the most important factors that can affect performance in the VLSI design. Many state-of-the-art research in high level synthesis try to consider the effect of interconnect delays. These research indeed achieve better performance compared with traditional ones which ignore interconnect delays. When applications contain large loops, however, there is still much room to improve the performance by exploiting the parallelism. In this paper, we, for the first time, propose a method to utilize pipelining techniques and take interconnect delays into account together so as to improve the quality of high level synthesis. The proposed method has the following two characteristics: 1) it separates the consideration of interconnect delay from computation delay, and allows concurrent data transfer and computation; 2) it belongs to modulo scheduling framework, in the sense that all iterations have identical schedules, and are initiated periodically. We evaluate our method from two different points of view. Firstly, we compare our method with an existing interconnect-aware high level synthesis that does not utilize pipelining techniques, and the experimental results show that our method can obtain about 3.4 times performance improvement on average. Secondly, we compare our method with an existing pipeline synthesis that does not consider interconnect delays, and the results show that our method can obtain about 1.5 times performance improvement on average. In addition, we also evaluate our proposed architecture and the experimental results demonstrate that it is better than existing architecture in [1].