A New Recovery Mechanism in Superscalar Microprocessors by Recovering Critical Misprediction

Jiongyao YE  Yu WAN  Takahiro WATANABE  

IEICE TRANSACTIONS on Fundamentals of Electronics, Communications and Computer Sciences   Vol.E94-A    No.12    pp.2639-2648
Publication Date: 2011/12/01
Online ISSN: 1745-1337
DOI: 10.1587/transfun.E94.A.2639
Print ISSN: 0916-8508
Type of Manuscript: Special Section PAPER (Special Section on VLSI Design and CAD Algorithms)
Category: High-Level Synthesis and System-Level Design
misprediction recovery,  critical path prediction,  trace cache,  

Full Text: PDF>>
Buy this Article

Current trends in modern out-of-order processors involve implementing deeper pipelines and a large instruction window to achieve high performance, which lead to the penalty of the branch misprediction recovery being a critical factor in overall processor performance. Multi path execution is proposed to reduce this penalty by executing both paths following a branch, simultaneously. However, there are some drawbacks in this mechanism, such as design complexity caused by processing both paths after a branch and performance degradation due to hardware resource competition between two paths. In this paper, we propose a new recovery mechanism, called Recovery Critical Misprediction (RCM), to reduce the penalty of branch misprediction recovery. The mechanism uses a small trace cache to save the decoded instructions from the alternative path following a branch. Then, during the subsequent predictions, the trace cache is accessed. If there is a hit, the processor forks the second path of this branch at the renamed stage so that the design complexity in the fetch stage and decode stage is alleviated. The most contribution of this paper is that our proposed mechanism employs critical path prediction to identify the branches that will be most harmful if mispredicted. Only the critical branch can save its alternative path into the trace cache, which not only increases the usefulness of a limited size of trace cache but also avoids the performance degradation caused by the forked non-critical branch. Experimental results employing SPECint 2000 benchmark show that a processor with our proposed RCM improves IPC value by 10.05% compared with a conventional processor.