For Full-Text PDF, please login, if you are a member of IEICE,|
or go to Pay Per View on menu list, if you are a nonmember of IEICE.
A Reconfigurable Functional Unit with Conditional Execution for Multi-Exit Custom Instructions
Hamid NOORI Farhad MEHDIPOUR Koji INOUE Kazuaki MURAKAMI
IEICE TRANSACTIONS on Electronics
Publication Date: 2008/04/01
Online ISSN: 1745-1353
Print ISSN: 0916-8516
Type of Manuscript: Special Section PAPER (Special Section on Advanced Technologies in Digital LSIs and Memories)
custom instructions, extensible processor, reconfigurable functional unit, conditional execution,
Full Text: PDF(1.5MB)>>
Encapsulating critical computation subgraphs as application-specific instruction set extensions is an effective technique to enhance the performance of embedded processors. However, the addition of custom functional units to the base processor is required to support the execution of these custom instructions. Although automated tools have been developed to reduce the long design time needed to produce a new extensible processor for each application, short time-to-market, significant non-recurring engineering and design costs are issues. To address these concerns, we introduce an adaptive extensible processor in which custom instructions are generated and added after chip-fabrication. To support this feature, custom functional units (CFUs) are replaced by a reconfigurable functional unit (RFU). The proposed RFU is based on a matrix of functional units which is multi-cycle with the capability of conditional execution. A quantitative approach is utilized to propose an efficient architecture for the RFU and fix its constraints. To generate more effective custom instructions, they are extended over basic blocks and hence, multiple exits custom instructions are proposed. Conditional execution has been added to the RFU to support the multi-exit feature of custom instructions. Experimental results show that multi-exit custom instructions enhance the performance by an average of 67% compared to custom instructions limited to one basic block. A maximum speedup of 4.7, compared to a general embedded processor, and an average speedup of 1.85 was achieved on MiBench benchmark suite.