A Next-Generation Enterprise Server System with Advanced Cache Coherence Chips

Mariko SAKAMOTO  Akira KATSUNO  Go SUGIZAKI  Toshio YOSHIDA  Aiichiro INOUE  Koji INOUE  Kazuaki MURAKAMI  

Publication
IEICE TRANSACTIONS on Electronics   Vol.E90-C   No.10   pp.1972-1982
Publication Date: 2007/10/01
Online ISSN: 1745-1353
DOI: 10.1093/ietele/e90-c.10.1972
Print ISSN: 0916-8516
Type of Manuscript: Special Section PAPER (Special Section on VLSI Technology toward Frontiers of New Market)
Category: VLSI Architecture for Communication/Server Systems
Keyword: 
enterprise server system,  cache coherence,  distributed shared memory,  online transaction processing,  performance evaluation,  

Full Text: PDF(4.9MB)>>
Buy this Article




Summary: 
Broadcast and synchronization techniques are used for cache coherence control in conventional larger scale snoop-based SMP systems. The penalty for synchronization is directly proportional to system size. Meanwhile, advances in LSI technology now enable placing a memory controller on a CPU die. The latency to access directly linked memory is drastically reduced by an on-die controller. Developing an enterprise server system with these CPUs allows us an opportunity to achieve higher performance. Though the penalty of synchronization is counted whenever a cache miss occurs, it is necessary to improve the coherence method to receive the full benefit of this effect. In this paper, we demonstrate a coherence directory organization that fits into DSM enterprise server systems. Originally, a directory-based method was adopted in high performance computing systems because of its huge scalability in comparison with snoop-based method. Though directory capacity miss and long directory access latency are the major problems of this method, the relaxed scalability requirement of enterprise servers is advantageous to us to solve these problems along with an advanced LSI technology. Our proposed directory solves both problems by implementing a full bit vector level map of the coherence directory on an LSI chip. Our experimental results validate that a system controlled by our proposed directory can surpass a snoop-based system in performance even without applying data localization optimization to an online transaction processing (OLTP) workload.