A 98 GMACs/W 32-Core Vector Processor in 65 nm CMOS

Xun HE  Xin JIN  Minghui WANG  Dajiang ZHOU  Satoshi GOTO  

IEICE TRANSACTIONS on Fundamentals of Electronics, Communications and Computer Sciences   Vol.E94-A   No.12   pp.2609-2618
Publication Date: 2011/12/01
Online ISSN: 1745-1337
DOI: 10.1587/transfun.E94.A.2609
Print ISSN: 0916-8508
Type of Manuscript: Special Section PAPER (Special Section on VLSI Design and CAD Algorithms)
Category: High-Level Synthesis and System-Level Design
SIMD,  cache coherence,  NoC,  GMACs,  multicore processor,  

Full Text: PDF(3.6MB)>>
Buy this Article

This paper presents a high-performance dual-issue 32-core SIMD platform for image and video processing. The SIMD cores support 8/16 bits SIMD MAC instructions, and vertical vector access. Eight cores with a 4-ports L2 cache are connected by CIB bus as a cluster. Four clusters are connected by mesh network. This hierarchical network can provide more than 192 GB/s low latency inter-core BW in average. The 4-ports L2 cache architecture is also designed to provide 192 GB/s L2 cache BW. To reduce coherence operation in large-scale SMP, an application specified protocol is proposed. Compared with MOESI, 67.8% of L1 cache energy can be saved in 32 cores case. The whole system including 32 vector cores, 256 KB L2 cache, 64-bit DDRII PHY and two PLL units, occupy 25 mm2 in 65 nm CMOS. It can achieve a peak performance of 375 GMACs and 98 GMACs/W at 1.2 V.