A Reconfigurable Functional Unit with Conditional Execution for Multi-Exit Custom Instructions

Hamid NOORI  Farhad MEHDIPOUR  Koji INOUE  Kazuaki MURAKAMI  

Publication
IEICE TRANSACTIONS on Electronics   Vol.E91-C   No.4   pp.497-508
Publication Date: 2008/04/01
Online ISSN: 1745-1353
DOI: 10.1093/ietele/e91-c.4.497
Print ISSN: 0916-8516
Type of Manuscript: Special Section PAPER (Special Section on Advanced Technologies in Digital LSIs and Memories)
Category: 
Keyword: 
custom instructions,  extensible processor,  reconfigurable functional unit,  conditional execution,  

Full Text: PDF(1.5MB)>>
Buy this Article




Summary: 
Encapsulating critical computation subgraphs as application-specific instruction set extensions is an effective technique to enhance the performance of embedded processors. However, the addition of custom functional units to the base processor is required to support the execution of these custom instructions. Although automated tools have been developed to reduce the long design time needed to produce a new extensible processor for each application, short time-to-market, significant non-recurring engineering and design costs are issues. To address these concerns, we introduce an adaptive extensible processor in which custom instructions are generated and added after chip-fabrication. To support this feature, custom functional units (CFUs) are replaced by a reconfigurable functional unit (RFU). The proposed RFU is based on a matrix of functional units which is multi-cycle with the capability of conditional execution. A quantitative approach is utilized to propose an efficient architecture for the RFU and fix its constraints. To generate more effective custom instructions, they are extended over basic blocks and hence, multiple exits custom instructions are proposed. Conditional execution has been added to the RFU to support the multi-exit feature of custom instructions. Experimental results show that multi-exit custom instructions enhance the performance by an average of 67% compared to custom instructions limited to one basic block. A maximum speedup of 4.7, compared to a general embedded processor, and an average speedup of 1.85 was achieved on MiBench benchmark suite.