Pipeline-Based Partition Exploration for Heterogeneous Multiprocessor Synthesis

Kang ZHAO  Jinian BIAN  Sheqin DONG  Yang SONG  Satoshi GOTO  

IEICE TRANSACTIONS on Fundamentals of Electronics, Communications and Computer Sciences   Vol.E92-A   No.9   pp.2283-2294
Publication Date: 2009/09/01
Online ISSN: 1745-1337
DOI: 10.1587/transfun.E92.A.2283
Print ISSN: 0916-8508
Type of Manuscript: PAPER
Category: VLSI Design Technology and CAD
application partitioning,  CAD algorithm,  MPSoC,  ASIP,  synthesis,  

Full Text: PDF>>
Buy this Article

To achieve an automated implementation for the application-specific heterogeneous multiprocessor systems-on-chip (MPSoC), partitioning and mapping the sequential programs onto multiple parallel processors is one of the most difficult challenges. However, the existing traditional parallelizing techniques cannot solve the MPSoC-related problems effectively, so designers are still required to manually extract the concurrency potentials in the program. To solve this bottleneck, an automated application partition technique is needed. However, completely automatic parallelism is ineffective, so it is promising to explore concurrency for certain practical special structures. To settle those issues, this paper proposes a template-based algorithm to automatically partition a special load-compute-store (LCS) loop structure. Since specific-instruction customization for the application specific instruction-set processors (ASIPs) has interactions with task partitioning, the proposed algorithm integrates the dynamic pipelining and ASIP techniques using an iterative improvement strategy: first, an initial pipelining scheme is generated to obtain the maximum parallelism; second, under the primary partition results specific instructions are customized respectively for each subprogram; third, the program is repartitioned via pipelining under the specific instruction configurations. The proposed method has been implemented in the context of a commercial extensible multiprocessor design flow, using the Xtensa-based XTMP platform from Tensilica Inc. Based on a case study of Fast Fourier Transform (FFT), the experimental results indicate that the partitioned programs by the proposed method demonstrate an average speedup of 10 compared to the original sequential programs which have not been partitioned and run on the uniprocessor system.