Performance Evaluation of a Processing Element for an On-Chip Multiprocessor

Masafumi TAKAHASHI  Hiroshige FUJII  Emi KANEKO  Takeshi YOSHIDA  Toshinori SATO  Hiroyuki TAKANO  Haruyuki TAGO  Seigo SUZUKI  Nobuyuki GOTO  

IEICE TRANSACTIONS on Electronics   Vol.E77-C   No.7   pp.1092-1100
Publication Date: 1994/07/25
Online ISSN: 
Print ISSN: 0916-8516
Type of Manuscript: Special Section PAPER (Special Issue on Super Chip for Intelligent Integrated Systems)
multiprocessor,  shared FPU,  on-chip cache,  prefetch,  

Full Text: PDF>>
Buy this Article

A 250-MIPS, 125-MFLOPS peak performance processing element (PE), which is being developed for an on-chip multiprocessor, has been modeled and evaluated. The PE includes the following new architecture components: an FPU shared by several IUs in order to increase the efficiency of the FPU pipelines, an on-chip data cache with a prefetch mechanism to reduce clock cycles waiting for memory, and an interface to high speed DRAM, such as Rambus DRAM and Synchronous DRAM. As a result, a PE model with an FPU shared by four or eight IUs causes only 10% performance reduction compared to a model with an un-shared FPU model while saving the cost of three FPUs. Furthermore, a PE model with prefetch operates 1.2 to 1.8 times faster than a model without prefetch at 250-MHz clock rate when the Rambus DRAM is connected. It becomes clear that this PE architecture can bring a high effective performance at over 250-MHz, and is cost-effective for the on-chip multiprocessor.