For Full-Text PDF, please login, if you are a member of IEICE,|
or go to Pay Per View on menu list, if you are a nonmember of IEICE.
Interconnect-Aware Pipeline Synthesis for Array-Based Architectures
Shanghua GAO Hiroaki YOSHIDA Kenshu SETO Satoshi KOMATSU Masahiro FUJITA
IEICE TRANSACTIONS on Fundamentals of Electronics, Communications and Computer Sciences
Publication Date: 2009/06/01
Online ISSN: 1745-1337
Print ISSN: 0916-8508
Type of Manuscript: PAPER
Category: VLSI Design Technology and CAD
software pipelining, interconnect delay, high level synthesis, scheduling, performance,
Full Text: PDF>>
In the deep-submicron era, interconnect delays are becoming one of the most important factors that can affect performance in the VLSI design. Many state-of-the-art research in high level synthesis try to consider the effect of interconnect delays. These research indeed achieve better performance compared with traditional ones which ignore interconnect delays. When applications contain large loops, however, there is still much room to improve the performance by exploiting the parallelism. In this paper, we, for the first time, propose a method to utilize pipelining techniques and take interconnect delays into account together so as to improve the quality of high level synthesis. The proposed method has the following two characteristics: 1) it separates the consideration of interconnect delay from computation delay, and allows concurrent data transfer and computation; 2) it belongs to modulo scheduling framework, in the sense that all iterations have identical schedules, and are initiated periodically. We evaluate our method from two different points of view. Firstly, we compare our method with an existing interconnect-aware high level synthesis that does not utilize pipelining techniques, and the experimental results show that our method can obtain about 3.4 times performance improvement on average. Secondly, we compare our method with an existing pipeline synthesis that does not consider interconnect delays, and the results show that our method can obtain about 1.5 times performance improvement on average. In addition, we also evaluate our proposed architecture and the experimental results demonstrate that it is better than existing architecture in .