For Full-Text PDF, please login, if you are a member of IEICE,|
or go to Pay Per View on menu list, if you are a nonmember of IEICE.
Pipeline-Based Partition Exploration for Heterogeneous Multiprocessor Synthesis
Kang ZHAO Jinian BIAN Sheqin DONG Yang SONG Satoshi GOTO
IEICE TRANSACTIONS on Fundamentals of Electronics, Communications and Computer Sciences
Publication Date: 2009/09/01
Online ISSN: 1745-1337
Print ISSN: 0916-8508
Type of Manuscript: PAPER
Category: VLSI Design Technology and CAD
application partitioning, CAD algorithm, MPSoC, ASIP, synthesis,
Full Text: PDF(1.6MB)>>
To achieve an automated implementation for the application-specific heterogeneous multiprocessor systems-on-chip (MPSoC), partitioning and mapping the sequential programs onto multiple parallel processors is one of the most difficult challenges. However, the existing traditional parallelizing techniques cannot solve the MPSoC-related problems effectively, so designers are still required to manually extract the concurrency potentials in the program. To solve this bottleneck, an automated application partition technique is needed. However, completely automatic parallelism is ineffective, so it is promising to explore concurrency for certain practical special structures. To settle those issues, this paper proposes a template-based algorithm to automatically partition a special load-compute-store (LCS) loop structure. Since specific-instruction customization for the application specific instruction-set processors (ASIPs) has interactions with task partitioning, the proposed algorithm integrates the dynamic pipelining and ASIP techniques using an iterative improvement strategy: first, an initial pipelining scheme is generated to obtain the maximum parallelism; second, under the primary partition results specific instructions are customized respectively for each subprogram; third, the program is repartitioned via pipelining under the specific instruction configurations. The proposed method has been implemented in the context of a commercial extensible multiprocessor design flow, using the Xtensa-based XTMP platform from Tensilica Inc. Based on a case study of Fast Fourier Transform (FFT), the experimental results indicate that the partitioned programs by the proposed method demonstrate an average speedup of 10 compared to the original sequential programs which have not been partitioned and run on the uniprocessor system.