Design of a Register Cache System with an Open Source Process Design Kit for 45nm Technology

Junji YAMADA  Ushio JIMBO  Ryota SHIOYA  Masahiro GOSHIMA  Shuichi SAKAI  

IEICE TRANSACTIONS on Electronics   Vol.E100-C   No.3   pp.232-244
Publication Date: 2017/03/01
Online ISSN: 1745-1353
DOI: 10.1587/transele.E100.C.232
Type of Manuscript: Special Section PAPER (Special Section on Low-Power and High-Speed Chips)
register file,  register cache,  digital design,  freePDK,  

Full Text: PDF>>
Buy this Article

An 8-issue superscalar core generally requires a 24-port RAM for the register file. The area and energy consumption of a multiported RAM increase in proportional to the square of the number of ports. A register cache can reduce the area and energy consumption of the register file. However, earlier register cache systems suffer from lower IPC caused by register cache misses. Thus, we proposed the Non-Latency-Oriented Register Cache System (NORCS) to solve the IPC problem with a modified pipeline. We evaluated NORCS mainly from the viewpoint of microarchitecture in the original article, and showed that NORCS maintains almost the same IPC as conventional register files. Researchers in NVIDIA adopted the same idea for their GPUs. However, the evaluation was not sufficient from the viewpoint of LSI design. In the original article, we used CACTI to evaluate the area and energy consumption. CACTI is a design space exploration tool for cache design, and adopts some rough approximations. Therefore, this paper shows design of NORCS with FreePDK45, an open source process design kit for 45nm technology. We performed manual layout of the memory cells and arrays of NORCS, and executed SPICE simulation with RC parasitics extracted from the layout. The results show that, from a full-port register file, an 8-entry NORCS achieves a 75.2% and 48.2% reduction in area and energy consumption, respectively. The results also include the latency which we did not present in our original article. The latencies of critical path is 307ps and 318ps for an 8-entry NORCS and a conventional multiported register file, respectively, when the same two cycles are allocated to register file read.