4-Cycle-Start-Up Reference-Clock-Less Digital CDR Utilizing TDC-Based Initial Frequency Error Detection with Frequency Tracking Loop

SUMMARY This paper proposes a reference-clock-less quick-start-up CDR that resumes from a stand-by state only with a 4-bit preamble utilizing a phase generator with an embedded Time-to-Digital Converter (TDC). The phase generator detects 1-UI time interval by using its internal TDC and works as a self-tunable digitally-controlled delay line. Once the phase generator coarsely tunes the recovered clock period, then the residual time difference is finely tuned by a fine Digital-to-Time Converter (DTC). Since the tuning resolution of the fine DTC is matched by design with the time resolution of the TDC that is used as a phase detector, the fine tuning completes instantaneously. After the initial coarse and fine delay tuning, the feedback loop for frequency tracking is activated in order to improve Consecutive Identical Digits (CID) tolerance of the CDR. By applying the frequency tracking architecture, the proposed CDR achieves more than 100 bits of CID tolerance. A prototype implemented in a 65 nm bulk CMOS process operates at a 0.9–2.15 Gbps continuous rate. It consumes 5.1–8.4 mA in its active state and 42 µA leakage current in its stand-by state from a 1.0 V supply.

key words: Clock-and-data recovery (CDR), time-to-digital converter (TDC), serial communication, quick start-up, internet-of-things

1. Introduction

A Clock-and-Data Recovery (CDR) circuit, which recovers the clock signal from the incoming random data sequences to retame the data, is one of the crucial building blocks in data communication devices [1, 2]. Though a Phase-Locked Loop (PLL) based architecture is often used for CDR [3–11], it usually requires long settling time to recover the target clock signal. In the applications that keep communicating for a long period of time, the performance overhead in this start-up is not of critical concern. However, in some applications that work intermittently such as mobile and Internet-of-Things (IoT) sensor node applications [12, 13], the effective usage of a low-power stand-by state is important as well as a dynamic power efficiency in an active state. Thus, in such cases, the data link has to resume its communication immediately after it is activated not to waste power for state transitions.

Fig. 1: Conceptual comparison of frequency-locking behavior among three CDR architectures.

To realize quick start-up, several different CDR architectures have been proposed [14–20]. Among them a burst-mode CDR (BMCDR) has been used in point-to-multipoint fiber serial communications such as passive optical networks (PON), in which a quick phase locking is crucial [20–26]. As a BMCDR can effectively utilize a low-power stand-by state thanks to its quick start-up feature, it can be considered as a useful choice to realize energy-efficient serial communication systems. Figure 1 conceptually compares start-up features of the CDR techniques. The conventional reference-less PLL-based CDR precisely but slowly locks to the input signal, while the BMCDR can reduce the wait time for phase locking and saves power during the start-up. However, most of the conventional BMCDRs use an external reference clock and need to lock to a predetermined frequency prior to the actual random data input. Thus, these BMCDR techniques still require some dynamic power consumption even in the stand-by state. In order to eliminate the external reference and a priori frequency locking, a reference-clock-less BMCDR scheme utilizing a time-to-digital converter (TDC) based architecture has been proposed [27]. Based on the recent progress in CMOS process scaling, time resolution is becoming more and more superior to a voltage resolution due to the high-speed transistors and the reduced supply voltage [28–32]. By fully utilizing the time-domain information, a power-efficient CDR architecture could be realized. This scheme uses a Cycle-Lock Gated ring Oscillator (CLGO) that quickly locks to the input frequency by detecting 1-UI cycle time of the incoming data stream utilizing Self-Tunable Digitally-Controlled Delay Lines (ST-DCDLs). By employing coarse-fine hierar-
2. Architecture of the Proposed CDR

A block diagram of the proposed CDR is illustrated in Fig. 2, which is similar to the one in [33]. It consists of a gating circuit, a Vernier Phase Detector (PD), a digital controller and a ST-DCDL with a fine DTC. This CDR recovers a half-rate clock, then the incoming data are latched by both the rising and falling edges of the recovered clock. Thus the serial input data will appear at DATAOUT1 and DATAOUT2 alternately. A detailed block diagram of the CDR circuit is shown in Fig. 3(a). Though the actual circuit is implemented with a differential configuration, this figure shows a single-ended version for simplicity. The coarse-fine hierarchical structure is chosen for wide frequency tuning range as well as fine resolution. The gating circuit in Fig. 2 is composed of an edge detector and a polarity adaptor that find the data transition then align its rise/fall polarity to the present transition of the recovered clock [19, 27].

The proposed CDR has two operation phases as in [33], i.e., quick start-up and frequency tracking. Figures 3(b) and (c) respectively highlight the active blocks in these two operation phases. During the initial quick start-up phase, by using “1010” 4-bit preamble, 1-UI cycle time is detected with the coarse ST-DCDL and the Vernier TDC. Then in the frequency tracking phase, the Vernier PD and the digital controller are activated to enable the feedback loop.
A schematic diagram of the trigger generator is illustrated in Fig. 4. During the quick start-up phase, the trigger generator sends trigger signals to the coarse ST-DCDL and the Vernier TDC by detecting the 1st and 2nd falling transitions in “1010” 4-bit preamble signal, respectively. Simulated waveforms during the quick start-up phase are shown in Fig. 5. The coarse trigger signal rises at the first falling edge of the positive input (red line of the top waveform). The time difference between the fine DTC output and the coarse trigger is measured by the ST-DCDL, so that the total delay through the MUX, the fine DTC and the coarse ST-DCDL in Fig. 3 is coarsely tuned to be 1-UI time interval. After that, at the second falling edge of the positive input, the fine trigger signal rises to measure the residual time difference between the output of the coarse ST-DCDL and the fine trigger by using the Vernier TDC. Then the fine DTC tunes the delay according to the code from the Vernier TDC to generate a finely-tuned half-rate recovered clock waveform.

To make this circuit recover the half-rate clock, the signal path delays must be designed properly so that the internal TDCs can detect the correct delays. But the process, voltage and temperature (PVT) variations would cause the delay mismatch between different signal paths that may lead to erroneous TDC outputs. Thus the circuits need to be delay-matched are designed with large-enough transistor sizes using the same circuit structures as much as possible in order to mitigate the impact of the PVT variations within the range that can be taken care by the frequency tracking feedback loop, which will be activated after the quick start-up phase.

Fig. 6(a) shows a block diagram of the ST-DCDL, which is simply composed of a delay-line-based TDC. Its input-to-output delay is tunable by selecting one of the outputs of the delay line through the internal MUX. As it shares the single delay line to measure and generate the delay, the ST-DCDL tunes the delay by itself utilizing the output code from the internal TDC. As shown in Fig. 6(b) the buffer circuit for the coarse delay line is implemented with a pseudo differential buffer with switches to disable the outputs. Once the coarse 1-UI time interval detection is finished and the ST-DCDL selects the proper output, the buffers for more delay will be disabled by turning off these switches to save power.

Fig. 7 illustrates the schematic diagrams of the fine DTC, which is composed of a series connection of pseudo differential buffers with 1-bit delay tunability. By turning on and off one of the two current source/sink, each buffer stage can finely change its delay between fast and slow settings. With N stages of this buffer tuned with a thermometer coded delay tuning word as shown in Fig. 7, we can achieve a digitally tunable delay with fine resolution while its monotonicity is guaranteed.

A schematic diagram of the Vernier TDC used to detect the residual time difference after coarse tuning is illustrated in Fig. 8. In a Vernier TDC, two delay lines with delay difference of Δ is used to achieve a fine time resolution of Δ [29]. Here in this implementation, the same buffer circuit used in the fine DTC in Fig. 7 is employed to realize two different delays. The slower delay line (the upper one in Fig. 8) uses the slow setting by turning off the internal cur-
Operational

Then at the 2nd falling edge, the fine trigger is asserted to
by measuring the 1-UI pulse width using its internal TDC.

Fig. 10: Conceptual waveforms of the proposed CDR operation.

Fig. 11: A block diagram of the digital controller.

To tune these coarse and fine delays, the proposed CDR
requires a 4-bit “1010” preamble at the start-up. As shown
in Fig. 10, at the 1st falling edge in the preamble the coarse
trigger is asserted and the ST-DCDL coarsely tunes itself
in Fig. 10, at the 1st falling edge in the preamble the coarse
tuning. The ST-DCDL

achieves

Fig. 9: Delay step simulation results of (a) coarse and (b) fine delay lines.

measure the residual delay difference with the Vernier TDC.
In the quick start-up phase, the output code from the Vernier
TDC is directly fed to the fine DTC to tune the delay in-
stantaneously thanks to the shared buffer structure with the
same time resolution. Now we can have a finely tuned re-
covered clock. However, this initially-tuned delays might
contain certain amount of error due to PVT variations and
jitter, as well as the input frequency drift. This frequency
error accumulates as phase error and results in poor CID
tolerance. Thus, as in [33], a feedback loop for frequency
tracking shown in Figs. 2 and 3 is activated after the ini-
tial clock recovery process. After activating the feedback
loop, the Vernier PD, which uses the same Vernier TDCs,
detects the phase error between the incoming data and the
recovered clock as illustrated in Fig. 10(b). The PD output
is processed by the digital controller whose block diagram
is shown in Fig. 11. The digital controller realizes a lowpass
characteristic with output dithering by the $\Delta$ modulator to
effectively improve the frequency resolution. The output of
the digital controller is fed to the fine DTC to achieve fre-
cency tracking.

Fig. 8: A schematic diagram of the Vernier TDC.

Fig. 9: A block diagram of the digital controller.

3. Chip Implementation and Measurement Results

The prototype of the proposed CDR is fabricated in a 65 nm
bulk CMOS process, whose chip micrograph and layout im-
age are shown in Fig. 12. The proposed CDR occupies 98
$\times$ 157 $\mu$m$^2$ area. The CDR operation is verified from 0.90
to 2.15 Gb/s continuous rate while it consumes 5.1–8.4 mA
current from 1.0 V supply. The measured leakage current
During the stand-by state, the digital controller reduces the clock signal tolerance from 70 mV to 100 ps, maintaining the same benefits of the 4-cycle quick start-up behavior for 2.15 Gb/s inputs. As shown in Fig. 2, the incoming data are latched by both the rising and falling edges of the half-rate recovered clock. Thus, the serial input data will appear at DATAOUT1 and DATAOUT2 alternately. We can also observe that the transition edge in the incoming data is injected to realign the phase of the recovered clock. Fig. 14 shows the measured eye diagrams of the input data and the half-rate recovered clock signals in the cases of without and with the dithering in the digital controller. The clock jitter is \( \sim 19.3 \text{ ps}_{\text{rms}} \) and \( \sim 17.4 \text{ ps}_{\text{rms}} \) for the cases without and with the dithering, respectively. Fig. 15 compares the CID tolerance measurement results between the cases without and with the dithering. Before and after the intentional consecutive 0 and 1 input at the middle of the waveforms, PRBS data sequence is injected to the circuit. The CID tolerance improves from 44 bits to 102 bits by enabling the dithering in the digital controller. These results demonstrate that the dithering effectively improves the frequency tuning resolution, which leads to better frequency tracking capability. Fig. 16 shows the jitter tolerance measurement result. Especially for high-frequency jitter, the proposed CDR has relatively poor jitter tolerance. This is mainly because the frequency characteristics of the digital LPF is not well optimized and the authors expect that it can be improved by carefully tuning the parameters in the digital controller. It would also be possible to gather more insight to improve the CDR performance by investigating in detail on the loop dynamics. These are included in our future works.

Finally, Table 1 compares the proposed work with similar prior works. The proposed CDR does not require an external reference clock and the frequency lock prior to the data input, which leads to the immediate start-up with no dynamic power consumption during the stand-by state. Especially compared with [33], the proposed CDR reduces the dynamic power consumption during the active state about 3 \times\ while maintaining the same benefits of the 4-cycle quick start-up as well as the wide-range continuous-rate frequency tracking features. Although the maximum CID tolerance is reduced to 102 bits, this number is large enough because we may be able to choose proper coding schemes such as 64b/66b to have enough transitions by accepting some over-
head in the data rate.

4. Conclusion

This paper proposed a reference-clock-less quick-start-up CDR that resurfaces from a stand-by state only with a 4-bit preamble utilizing the ST-DCDL and the fine DTC whose tuning resolution is matched by design with the internal Vernier TDC. The feedback loop for frequency tracking activated after the initial coarse and fine delay tuning works to achieve more than 100 bits of CID tolerance. A prototype implemented in a 65 nm bulk CMOS process operates at 0.9 – 2.15 Gbps continuous rate. It consumes 5.1 – 8.4 mA in its active state and 42 µA leakage current in its stand-by state from a 1.0 V supply. Since the proposed CDR does not need an external reference clock thus consumes no dynamic power in its stand-by state, it can improve the total power efficiency of serial communication systems that work intermittently such as mobile and IoT sensor node applications.

Acknowledgement

This work is supported in part by a project commissioned by the New Energy and Industrial Technology Development Organization (NEDO) and in part by JSPS KAKENHI Grant Number 21H03406. The LSI chip fabricated in this work has been supported through the activities of VDEC, the University of Tokyo in collaboration with Cadence Design Systems Inc., Synopsys, Inc., and Mentor Graphics, Inc.

References

Table 1: Performance comparison with prior works.

<table>
<thead>
<tr>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td>[17]</td>
<td>GVCO w/ FLL</td>
<td>35</td>
<td>3.3/1.2</td>
<td>2.5</td>
<td>13.68</td>
<td>0.04</td>
<td>253bits</td>
<td>Prior to data</td>
<td>&lt;1bit</td>
<td>Required</td>
<td></td>
</tr>
<tr>
<td>[18]</td>
<td>GDCCO w/ FLL &amp; PLL</td>
<td>90</td>
<td>N/A</td>
<td>2.2</td>
<td>24.6</td>
<td>0.44</td>
<td>128bits</td>
<td>Prior to data</td>
<td>&lt;1bit</td>
<td>No</td>
<td></td>
</tr>
<tr>
<td>[27]</td>
<td>ST-DCDL</td>
<td>65</td>
<td>1.4</td>
<td>1.40–2.06</td>
<td>6.1</td>
<td>0.0064</td>
<td>13bits</td>
<td>4bit</td>
<td>&lt;1bit</td>
<td>No</td>
<td></td>
</tr>
<tr>
<td>[33]</td>
<td>ST-DCDL w/ FLL</td>
<td>65</td>
<td>1.2</td>
<td>1.2–2.3</td>
<td>9.5</td>
<td>0.019</td>
<td>No</td>
<td>4bit</td>
<td>&lt;1bit</td>
<td>No</td>
<td></td>
</tr>
<tr>
<td>[32]</td>
<td>ST-DCDL w/ Fine DTC &amp; FLL</td>
<td>65</td>
<td>0.90–2.15</td>
<td>5.1–8.4</td>
<td>6.4</td>
<td>0.015</td>
<td>No</td>
<td>4bit</td>
<td>&lt;1bit</td>
<td>No</td>
<td></td>
</tr>
</tbody>
</table>


Meikan Chin received the B.S. and M.S. degrees in electronic engineering from the University of Tokyo, Tokyo, Japan, in 2016 and 2019, respectively. His research interests include design and architecture of wireline communication circuits.

Toru Nakura was born in Fukuoka, Japan in 1972. He received the B. S., and M. S. degree in electronic engineering from The University of Tokyo, Tokyo, Japan, in 1995 and 1997 respectively. Then he worked as a circuit designer of high-speed communication using SOI devices for two years, and worked as a EDA tool developer for three years. He joined the University of Tokyo again as a Ph.D student in 2002, and received the degree in 2005. After two years industrial working period, he is back to academia as an associate professor at VLSI Design and Education Center (VDEC), and Electrical Engineering and Information Systems, in The University of Tokyo. He is currently working as a full professor in the department of Electronics Engineering and Computer Science in Fukuoka University. His current interest includes signal integrity, reliability, power supply, digitally-assist analog circuits, and fully automated analog circuit synthesis.

Kunihiro Asada was born in Fukui, Japan, in 1952. He received the B.S., M.S., and Ph.D. degrees in electronic engineering from the University of Tokyo, Tokyo, Japan, in 1975, 1977, and 1980, respectively. He joined the Faculty of Engineering, University of Tokyo, in 1980, and became a Lecturer, an Associate Professor, and a Professor in 1981, 1985, and 1995, respectively. From 1985 to 1986, he was with the University of Edinburgh, Edinburgh, U.K., as a Visiting Scholar supported by the British Council. From 1990 to 1992, he served as the first Editor of the English version of IEICE Transactions on Electronics. In 1996, he established the VLSI Design and Education Center (VDEC), with his colleagues in the University of Tokyo, which is the center to promote education and research of VLSI design in all the universities and colleges in Japan. He was in charge of the Director of VDEC until 2018, and is currently a Professor Emeritus at the University of Tokyo. He has authored over 400 technical papers in journals and conference proceedings. His current research interests include design and evaluation of integrated systems and component devices. Dr. Asada is a member of the Institute of Electrical and Electronics Engineers (IEEE) and the Institute of Electrical Engineers of Japan (IEEEJ). He has received Best Paper Awards from IEEJ, IEICE, and ICMTS1998/IEEE. He also served as the Chair of the IEEE/SSCS Japan Chapter from 2001 to 2002 and the IEEE Japan Chapter Operation Committee from 2007 to 2008.