For Full-Text PDF, please login, if you are a member of IEICE,|
or go to Pay Per View on menu list, if you are a nonmember of IEICE.
Performance Evaluation of a Processing Element for an On-Chip Multiprocessor
Masafumi TAKAHASHI Hiroshige FUJII Emi KANEKO Takeshi YOSHIDA Toshinori SATO Hiroyuki TAKANO Haruyuki TAGO Seigo SUZUKI Nobuyuki GOTO
IEICE TRANSACTIONS on Electronics
Publication Date: 1994/07/25
Print ISSN: 0916-8516
Type of Manuscript: Special Section PAPER (Special Issue on Super Chip for Intelligent Integrated Systems)
multiprocessor, shared FPU, on-chip cache, prefetch,
Full Text: PDF>>
A 250-MIPS, 125-MFLOPS peak performance processing element (PE), which is being developed for an on-chip multiprocessor, has been modeled and evaluated. The PE includes the following new architecture components: an FPU shared by several IUs in order to increase the efficiency of the FPU pipelines, an on-chip data cache with a prefetch mechanism to reduce clock cycles waiting for memory, and an interface to high speed DRAM, such as Rambus DRAM and Synchronous DRAM. As a result, a PE model with an FPU shared by four or eight IUs causes only 10% performance reduction compared to a model with an un-shared FPU model while saving the cost of three FPUs. Furthermore, a PE model with prefetch operates 1.2 to 1.8 times faster than a model without prefetch at 250-MHz clock rate when the Rambus DRAM is connected. It becomes clear that this PE architecture can bring a high effective performance at over 250-MHz, and is cost-effective for the on-chip multiprocessor.