A Scalable and Reconfigurable Fault-Tolerant Distributed Routing Algorithm for NoCs

Zewen SHI
Xiaoyang ZENG
Zhiyi YU

IEICE TRANSACTIONS on Information and Systems   Vol.E94-D    No.7    pp.1386-1397
Publication Date: 2011/07/01
Online ISSN: 1745-1361
DOI: 10.1587/transinf.E94.D.1386
Print ISSN: 0916-8532
Type of Manuscript: PAPER
Category: Computer System
fault-tolerant routing,  network-on-chip (NoC),  deadlock-free,  divide-and-conquer,  system partition,  

Full Text: PDF(3.2MB)>>
Buy this Article

Manufacturing defects in the deep sub-micron VLSI process and aging resulted problems of devices during lifecycle are inevitable, and fault-tolerant routing algorithms are important to provide the required communication for NoCs in spite of failures. The proposed algorithm, referred to as scalable and reconfigurable fault-tolerant distributed routing (RFDR), partitions the system into nine regions using the concept of divide-and-conquer. It is a distributed algorithm, and each router guarantees fault-tolerance within one's own region and the system can be still sustained with multiple fault areas. The proposed RFDR has excellent scalability with hardware cost keeping constant independent of system size. Also it is completely reconfigurable when new nodes fail. Simulations under various synthetic traffic patterns show its better performance compared to Extended-XY routing algorithm. Moreover, there is almost no hardware overhead compared to Logic-Based Distributed Routing (LBDR), but the fault-tolerance capacity is enhanced in the proposed algorithm. Hardware cost is reduced 37% compared to Reconfigurable Distributed Scalable Predictable Interconnect Network (R-DSPIN) which only supports single fault region.

open access publishing via