Automatic Multi-Stage Clock Gating Optimization Using ILP Formulation

Xin MAN  Takashi HORIYAMA  Shinji KIMURA  

IEICE TRANSACTIONS on Fundamentals of Electronics, Communications and Computer Sciences   Vol.E95-A   No.8   pp.1347-1358
Publication Date: 2012/08/01
Online ISSN: 1745-1337
DOI: 10.1587/transfun.E95.A.1347
Print ISSN: 0916-8508
Type of Manuscript: PAPER
Category: VLSI Design Technology and CAD
actual power reduction,  automatic multi-stage clock gating optimization,  ILP formulation,  switching activity,  BDD,  MIP optimizer,  

Full Text: PDF(2.8MB)>>
Buy this Article

Clock gating is supported by commercial tools as a power optimization feature based on the guard signal described in HDL (structural method). However, the identification of control signals for gated registers is hard and designer-intensive work. Besides, since the clock gating cells also consume power, it is imperative to minimize the number of inserted clock gating cells and their switching activities for power optimization. In this paper, we propose an automatic multi-stage clock gating algorithm with ILP (Integer Linear Programming) formulation, including clock gating control candidate extraction, constraints construction and optimum control signal selection. By multi-stage clock gating, unnecessary clock pulses to clock gating cells can be avoided by other clock gating cells, so that the switching activity of clock gating cells can be reduced. We find that any multi-stage control signals are also single-stage control signals, and any combination of signals can be selected from single-stage candidates. The proposed method can be applied to 3 or more cascaded stages. The multi-stage clock gating optimization problem is formulated as constraints in LP format for the selection of cascaded clock-gating order of multi-stage candidate combinations, and a commercial ILP solver (IBM CPLEX) is applied to obtain the control signals for each register with minimum switching activity. Those signals are used to generate a gate level description with guarded registers from original design, and a commercial synthesis and layout tools are applied to obtain the circuit with multi-stage clock gating. For a set of benchmark circuits and a Low Density Parity Check (LDPC) Decoder (6.6k gates, 212 F.F.s), the proposed method is applied and actual power consumption is estimated using Synopsys NanoSim after layout. On average, 31% actual power reduction has been obtained compared with original designs with structural clock gating, and more than 10% improvement has been achieved for some circuits compared with single-stage optimization method. CPU time for optimum multi-stage control selection is several seconds for up to 25k variables in LP format. By applying the proposed clock gating, area can also be reduced since the multiplexors controlling register inputs are eliminated.