Architecture and Evaluation of Low Power Many-Core SoC with Two 32-Core Clusters

Takashi MIYAMORI  Hui XU  Hiroyuki USUI  Soichiro HOSODA  Toru SANO  Kazumasa YAMAMOTO  Takeshi KODAKA  Nobuhiro NONOGAKI  Nau OZAKI  Jun TANABE  

IEICE TRANSACTIONS on Electronics   Vol.E97-C   No.4   pp.360-368
Publication Date: 2014/04/01
Online ISSN: 1745-1353
DOI: 10.1587/transele.E97.C.360
Type of Manuscript: Special Section PAPER (Special Section on Solid-State Circuit Design,---,Architecture, Circuit, Device and Design Methodology)
many-core,  network-on-chip,  VLIW,  low power,  face detection,  H.264,  super resolution,  

Full Text: PDF>>
Buy this Article

New media processing applications such as image recognition and AR (Augment Reality) have become into practical on embedded systems for automotive, digital-consumer and mobile products. Many-core processors have been proposed to realize much higher performance than multi-core processors. We have developed a low-power many-core SoC for multimedia applications in 40nm CMOS technology. Within a 210mm2 die, two 32-core clusters are integrated with dynamically reconfigurable processors, hardware accelerators, 2-channel DDR3 I/Fs, and other peripherals. Processor cores in the cluster share a 2MB L2 cache connected through a tree-based Network-on-Chip (NoC). Its total peak performance exceeds 1.5TOPS (Tera Operations Per Second). The high scalability and low power consumption are accomplished by parallelized software for multimedia applications. In case of face detection, the performance scales up to 64 cores and the SoC consumes only 2.21W. Moreover, it can execute the 1080p 48fps H.264 decoding about 520mW by 28 cores and the 4K2K 15fps super resolution about 770mW by 32 cores in one cluster. Exploiting parallelism by low power processor cores, the many-core SoC provides several tens of times better energy efficiency than that of a high performance desk-top quad-core processor.