A Compact Fully-Differential Distributed Amplifier With Coupled Inductors in 0.18-μm CMOS Technology

Keisuke KAWAHARA†‡, Student Member, Yohtaro UMEDA†§, Member, Kyoya TAKANO††‡, Member, and Shinsuke HARA†††, Member

SUMMARY This paper presents a compact fully-differential distributed amplifier using a coupled inductor. Differential distributed amplifiers are widely required in optical communication systems. Most of the distributed amplifiers reported in the past are single-ended or pseudo-differential topologies. In addition, the differential distributed amplifiers require many inductors, which increases the silicon cost. In this study, we use differentially coupled inductors to reduce the chip area to less than half and eliminate the difficulties in layout design. The challenge in using coupled inductors is the capacitive parasitic coupling that degrades the flatness of frequency response. To address this challenge, the odd-mode image parameters of a differential artificial transmission line are derived using a simple loss-less model. Based on the analytical results, we optimize the dimensions of the inductor with the gradient descent algorithm to achieve accurate impedance matching and phase matching. The amplifier was fabricated in 0.18-μm CMOS technology. The core area of the amplifier is 0.27 mm², which is 57% smaller than the previous work. Besides, we demonstrated a small group delay variation of ±2.7 ps thanks to the optimization. The amplifier successfully performed 30-Gbps NRZ and PAM4 transmissions with superior jitter performance. The proposed technique will promote the high-density integration of differential traveling wave devices.

key words: Distributed amplifier, Differential amplifier, Coupled inductor, CMOS integrated circuit, Optimization, Optical fiber communication.

1. Introduction

The demand for information and communication technologies is increasing year by year due to the spread of cloud services. Recent communications are based on digital coherent optical transmission systems. These systems require analog circuits with high operating speeds of more than 100 Gbaud. Since advanced CMOS processes such as 22 nm have maximum oscillation frequencies $f_{\text{max}}$ around 200 GHz, a bandwidth wider than $0.25 \times f_{\text{max}}$ is required for 100 Gbaud transmissions. Distributed amplifiers have broad bandwidth, high output power, and excellent input-output matching. These characteristics meet the requirements. Some distributed modulator drivers reached bandwidths of nearly 100 GHz [1]–[3]. Recently, distributed drivers were also developed for quantum key transfer systems [4]. Another application of distributed amplifiers is frequency interleaving. In that case, distributed amplifiers are used for signal combining and frequency conversion [5]–[6]. Distributed amplifiers are, thus, an essential component in the development of future communication technologies. On the other hand, most of the circuits used in optical transceivers have differential topology. Differential circuits have advantages in common-mode noise rejection and DC coupled cascading. Nevertheless, most of the distributed amplifiers reported in the past are single-ended [7]–[10] or pseudo-differential topologies [11]–[14]. This is because distributed amplifiers require many inductors or transmission lines to form artificial transmission lines at the input and output. In differential topology, a total of four artificial transmission lines are required, which not only consumes a large chip area but also makes layout design difficult. Although the pseudo-differential topology can simplify the layout, it does not provide the aforementioned advantages. Besides, the previous differential distributed amplifiers consume core areas of more than 0.5 mm² [3], [15], which raises an issue of manufacturing cost. Inductive coupling is an effective way to reduce the number and areas of inductors. For example, small-area laser drivers using coupled inductors have been reported [16]–[17]. For distributed circuits, some transversal filters and mixers adopt coupled inductors [18]–[20]. In this study, we developed a small-area differential distributed amplifier using coupled inductors. We set target specifications to bandwidth > $0.25 \times f_{\text{max}}$ and core area < 0.5 mm² for applying to state-of-the-art optical fiber communication systems. Fig. 1 shows a schematic of the proposed amplifier. Coupled inductors reduce the area to about half that of conventional amplifiers and enable a simple layout design. A challenge to using coupled inductors is that capacitive parasitic coupling between the two inductors affects the characteristics of the lines. Undesirable capacitive coupling lowers the impedance of the artificial transmission lines and causes reflections at the termination. This not only deteriorates input-output matching but also leads to increased group delay and gain variations at low frequencies [21]. The parasitic coupling also changes the phase constant of the lines, which causes a roll-off in the
gain response due to a phase mismatch between the gate and drain lines. This study derives odd-mode image parameters of the differential artificial transmission lines. The lines are optimized by a gradient descent algorithm based on the analytical solution. In recent circuit design, simulation-based optimization is widely used. Although simulation-based optimization works well when accurate device models are available, it requires significant effort for measurement and modeling. We used a simple loss-less model of the inductors extracted by electromagnetic field analysis. This reduces design time and effort, and it ensures the explainability of the circuit operation. This paper is an extended version of [22]. We added an explanation of circuit analysis with image parameters and a detailed procedure for modeling and optimization. The effectiveness of the proposed technique was demonstrated by experiments and comparison with state-of-the-art research.

2. Analysis and Design

2.1 Analysis

Fig. 2(a) shows a small signal equivalent circuit of the amplifier cell, where $C_{gs}$, $C_{gd}$, and $C_{ds}$ are gate-source capacitance, gate-drain capacitance, and drain-source capacitance of the MOSFETs. $C_X$ is a cross-coupled capacitor that makes the amplifier unilateral for a differential signal [23]. Assuming a differential input signal, we can consider an odd-mode half circuit shown in Fig. 2(b). The voltage at node Z equals zero, and it can be regarded as a virtual ground. The half circuit is a paralleled connection of a hybrid-$pi$ model of the MOSFETs and cross-coupled capacitors. The admittance matrix of the half circuit is given by

\[
Y_{\text{half}} = \begin{bmatrix}
 j\omega(C_{gs} + C_{gd} + C_X) & j\omega(-C_{gd} + C_X) \\
 g_m + j\omega(-C_{gd} + C_X) & j\omega(C_{ds} + C_{gd} + C_X)
\end{bmatrix}.
\]

When $C_X = C_{gd}$, then

\[
Y_{\text{half}} = \begin{bmatrix}
 j\omega(C_{gs} + 2C_{gd}) & 0 \\
 g_m + j\omega(C_{ds} + 2C_{gd})
\end{bmatrix}.
\]

The amplifier becomes unilateral due to the cross-coupled capacitors. The input miller capacitance without $C_X$ is $(1 + A_v)C_{gs}$ with a voltage gain of $A_v$ [24], and thus, $C_X$ decreases the input capacitance when $A_v > 1$. On the other hand, the output miller capacitance is given by $(1 + 1/A_v)C_{gs}$, and thus, $C_X$ slightly increases the output capacitance. However, it is almost no problem, because the bandwidth of the distributed amplifier is generally limited by the gate line, not the drain line. The important advantage of using $C_X$ is that the drain line and gate line can be separately designed. It allows a designer to optimize the inductors as discussed later.

Fig. 3 shows a layout and loss-less model of a coupled inductor. $L$, $M$, $C_{OX}$, and $C_M$ are self-inductance, mutual inductance, the capacitance between the winding wire and substrate, and mutual capacitance, respectively. This loss-less model is consistent with ignoring all the parasitic resistances in the model used in [25]. The same was also used in [26]. The sign of the mutual inductance is typically chosen to be negative so that the magnetic flux is enhanced by the differential currents. Fig. 4(a) shows an equivalent circuit of the gate line. We assumed that the input signal is in odd-mode and the amplifier cell can be regarded as a shunt capacitor $C_S = C_{gs} + 2C_{gd}$. This circuit is similar to the segmented model of a differential transmission line, and its odd-

![Fig. 1 Schematic of the proposed fully-differential distributed amplifier.](image1)

![Fig. 2 Amplifier cell. (a) Small signal equivalent circuit. (b) Odd-mode half circuit.](image2)
The phase delay per section is given by

\[ \tau_T = \frac{2}{\omega_G} = \sqrt{(L - M)(C_G + 2C_{OX} + 4C_M)}. \]  

The same holds for the drain line by replacing \( C_G \) to \( C_D \), and \( Z_D \) and \( \tau_D \) are obtained. To minimize linear distortion, impedance and phase matching must be ensured. In other words, \( Z_G = Z_D = R_T \) and \( \tau_G = \tau_D \) must hold.

### 2.2 Modeling of the Coupled Inductor

The values of \( L, M, C_{OX}, \) and \( C_M \) vary depending on the dimensions of the coupled inductor such as a diameter \( d \), line width \( w \), and line space \( s \). These relationship needs to be clarified for design optimization. The four-port \( Y \)-parameters of the inductor are obtained by

\[ Y_{11} = j \left( \frac{-L}{\omega(L^2 - M^2)} + \omega(C_{OX} + C_M) \right), \]  
\[ Y_{21} = j \left( \frac{M}{\omega(L^2 - M^2)} - \omega C_M \right), \]  
\[ Y_{31} = j \left( \frac{L}{\omega(L^2 - M^2)} \right), \]  
\[ Y_{41} = j \left( \frac{-M}{\omega(L^2 - M^2)} \right), \]

where \( Y_{11} = Y_{22} = Y_{33} = Y_{44} \), \( Y_{21} = Y_{12} = Y_{43} = Y_{34} \), \( Y_{31} = Y_{13} = Y_{42} = Y_{24} \), and \( Y_{41} = Y_{14} = Y_{32} = Y_{23} \) hold due to symmetry. Solving the simultaneous equations from (11) to (14) gives

\[ L = \frac{\text{Im}(Y_{31})}{\omega \text{Im}(Y_{31})^2 - \omega \text{Im}(Y_{41})^2}, \]  
\[ M = \frac{-\text{Im}(Y_{41})}{\omega \text{Im}(Y_{31})^2 - \omega \text{Im}(Y_{41})^2}, \]  
\[ C_{OX} = \frac{\text{Im}(Y_{11}) + \text{Im}(Y_{31}) + \text{Im}(Y_{41}) + \text{Im}(Y_{21})}{\omega}, \]
\[ C_M = -\text{Im}(Y_{41}) - \text{Im}(Y_{21})/\omega. \] (18)

Thus, the values of \( L, M, C_{OX}, \) and \( C_M \) can be extracted from four-port \( Y \)-parameters that are obtained by electromagnetic analysis or measurement. In the following, a modeling procedure is explained with self-inductance as a representative, and the same holds for \( M(d, s, w) \), \( C_{OX}(d, s, w) \), and \( C_M(d, s, w) \). The model formula for self-inductance is defined as a following quadratic polynomial.

\[
L(d, s, w) \equiv a_0 d^2 + a_1 s^2 + a_2 w^2 + a_3 ds + a_4 dw + a_5 sw + a_6 d + a_7 s + a_8 w + a_9,
\] (19)

where \( a_i \) (\( i \in \mathbb{N}, i < 10 \)) are coefficients to be determined. We analyzed various inductors listed in Table 1 with Keysight’s Momentum electromagnetic simulator, and a vector of the self-inductance was obtained as follows.

\[
L = \begin{bmatrix} L_1(d_1, s_1, w_1) \\ L_2(d_2, s_2, w_2) \\ \vdots \\ L_N(d_N, s_N, w_N) \end{bmatrix},
\] (20)

where \( N \) is the number of inductors. The model parameters minimizing the squared error between the model and extracted values can be determined by [28]

\[
[\tilde{a}_0 \quad \tilde{a}_1 \quad \ldots \quad \tilde{a}_9]^T = (D^T D)^{-1} D^T L,
\] (21)

where \( D \) is a matrix of the dimensions defined as

\[
D = \begin{bmatrix} d_1^2 & s_1^2 & w_1^2 & d_1 s_1 & \ldots & w_1 & 1 \\ d_2^2 & s_2^2 & w_2^2 & d_2 s_2 & \ldots & w_2 & 1 \\ \vdots & \vdots & \vdots & \vdots & \ddots & \vdots & \vdots \\ d_N^2 & s_N^2 & w_N^2 & d_N s_N & \ldots & w_N & 1 \end{bmatrix}.
\] (22)

The fitted model formula \( \tilde{L}(d, s, w) \) is obtained by substituting the model parameters \( \tilde{a}_i \) into (19). Fig. 5 shows a modeling result at \( d = 120 \mu m, s = 6 \mu m, \) and \( w = 5 \mu m. \) This dimension was not used for modeling. The broken lines and solid lines represent \( Y \)-parameters obtained from the electromagnetic analysis (EM) and model. The model agreed well with the EM from 10 GHz to 30 GHz.

### 2.3 Optimization

Impedance and phase matching must be ensured to design a differential distributed amplifier with superior performance. The goal of the optimization is to find dimensions that satisfy the impedance and phase matching conditions. Since these conditions are primarily determined by the imaginary part of the impedance, the use of a loss-less model is appropriate. The mismatch between the nominal impedance and terminating resistance causes reflection waves. These reflections degrade not only input and output matching but also the flatness of frequency response of gain and group delay [21]. That is to say, the following equation needs to be satisfied.

\[
\min |Z_G - R_T| \cap \min |Z_D - R_T|,
\] (23)

On the other hand, a mismatch between \( \tau_G \) and \( \tau_D \) causes a roll-off like a transversal filter. The delay matching is corresponded to the cutoff matching by considering (10). Thus, the following equation needs to be satisfied.

\[
\min |\tau_G - \tau_D| \iff \min |\omega_G - \omega_D|.
\] (24)

The other requirement is maximizing the cutoff frequency. The artificial transmission lines have a significantly large peak in group delay response near the cutoff frequency, which degrades the signal quality. The cutoff frequency is limited by the gate line because \( C_G \) is typically larger than \( C_D \). This requirement is given by

\[
\max \omega_G.
\] (25)

The image impedance and cutoff frequency are determined by the dimensions of the inductors. The design of the inductors is a multi-objective and multi-variable optimization problem, and it is difficult to find the optimum dimensions by repeating the electromagnetic analysis. In this research, we determined the dimensions by numerical optimization based on the analytical solutions. We defined an evaluation function for the gate line as follows.

\[
E_G(d, s, w) = \zeta_0 (Z_G - R_T)^2 + \zeta_1 (-\omega_G),
\] (26)

where \( \zeta_0 \) and \( \zeta_1 \) are weighting factors that are positive constants. \( Z_G \) and \( \omega_G \) are given by (5) and (6), and \( L, M, C_{OX}, \) and \( C_M \) are functions of \( d, s, \) and \( w \) as described in
\( E_G \) is, therefore, a function of \( d, s, \) and \( w \). This captures the trade-off between \( L, M, C_{\text{OX}}, \) and \( C_M \). The optimum dimensions of the gate inductors can be found by minimizing \( E_G \). It satisfies the impedance matching condition and maximizes the cutoff frequency. Similarly, the evaluation function for the drain line was defined as follow.

\[
E_D(d, s, w) = \eta_0 (Z_D - R_T)^2 + \eta_1 (\omega_G - \omega_D)^2,
\]

where \( \eta_0 \) and \( \eta_1 \) are weighting factors. The first term and second term represent the requirements for impedance matching and phase matching. The dimensions of the drain line inductors are also determined by minimizing \( E_D \).

The optimum point was searched using the gradient descent algorithm [29]. The ranges of the dimensions were constrained in \( 90 \, \mu \text{m} \leq d \leq 150 \, \mu \text{m} \), \( 2 \, \mu \text{m} \leq s \leq 10 \, \mu \text{m} \) and \( 4 \, \mu \text{m} \leq w \leq 8 \, \mu \text{m} \) so that the quality factor of the inductors would not become too small. The number of steps was 10000, and the dimensions were randomly changed every 100 steps to prevent convergence to the local optimum solutions. Fig. 6 shows an optimization result for the gate line, where the solid lines and gradation represent \( Z_G \) and \( f_G = \omega_G / 2\pi \), respectively. The optimization converged to a point where \( d = 116 \, \mu \text{m}, s = 7 \, \mu \text{m}, \) and \( w = 4 \, \mu \text{m} \). \( Z_G \) and \( f_G \) converged 50.0 \, \Omega and 24.5 GHz at the optimum point. The \( f_G \) was maximized on the 50-\( \Omega \) contour line, which satisfied impedance matching and maximized the cutoff frequency. Fig. 7 shows an optimization result for the drain line. The dimensions converged to \( d = 120 \, \mu \text{m}, s = 4 \, \mu \text{m}, \) and \( w = 6 \, \mu \text{m} \). \( Z_D \) and \( f_D = \omega_D / 2\pi \) were 50.1 \, \Omega and 24.9 GHz at the optimum point. The error between \( f_G \) and \( f_D \) was only 2\%, which satisfied the phase matching condition. This error was mainly caused by rounding the dimensions to the integer. The optimization took only 4.14 seconds on an author’s laptop.

2.4 Prototyping

The amplifier was fabricated in Rohm’s one-poly five-metal 0.18-\( \mu \text{m} \) standard CMOS process. The measured \( f_{\text{max}} \) of the n-MOSFET was 65 GHz [30]. The top and bottom metal layers were used for the inductor winding and ground wiring, respectively. The bottom metal under the inductors was removed, and the silicon substrate was exposed. A higher resistance silicide block area was specified on the exposed silicon substrate to prevent eddy current losses. The number of turns of the edge inductors was set to one and their diameters were varied to be close to \( L_g/2 \) and \( L_d/2 \) while maintaining the width and spacing. The self-inductances of the gate and drain edge inductors were 270 \, \text{pH} and 250 \, \text{pH} at 20 GHz, respectively. \( C_X \) was implemented with metal-insulator-metal capacitors, and \( R_T \) was implemented with poly resistors. The poly resistors were placed on a thick insulator for shallow trench isolation, which provided a small parasitic capacitance of 4.8 \, \text{fF}. Fig. 8 shows a micrograph of the fabricated chip. The coupled inductors reduced the complexity of wiring. The core area was 0.35 mm \times 0.76 mm = 0.27 mm\(^2\), and the whole chip area including all the test pads was 1.1 mm \times 0.66 mm = 0.73 mm\(^2\). The bias voltages were \( V_{\text{DD}} = 2.5 \, \text{V} \) and \( V_{\text{CG}} = 1.4 \, \text{V} \). The value of \( V_T \) was adjusted so that the tail current of each stage was 7.5 mA in both simulation and measurement. The amplifier drew a current of 30 mA and consumed a power of 75 mW.

3. Measurement and Discussion

The four-port S-parameters were measured using Keysight’s PNA-X N5247A vector network analyzer. The measurement frequency range was 0.1 GHz to 30 GHz with a frequency
The bandwidth was 18.5 GHz. The signal was a differential pseudo-random signal with a length of $2^9-1$. The differential outputs were connected to Ch.1 and Ch.3 of the oscilloscope, and the Ch.3 signal was observed. The linear distortion due to the RF cables and probes was removed by the channel de-embedding before the measurement. Fig. 10(a) shows an eye diagram for a 30-Gb/s NRZ signal. The inset shows the through measurement with a calibration standard. The output swing was 100 mVpp and 160 mVpp for the through and amplifier. The estimated timing margin at a bit error rate of $10^{-12}$ for the throughput and amplifier was 0.88UI and 0.75UI, and the root-mean-square jitter was 0.34 ps and 0.42 ps, respectively. The optimization reduced the group delay variation, resulting in small jitter degradation. Fig. 10(b) shows an eye diagram for a 30-Gb/s PAM4 signal. The output swing was 130 mVpp and 220 mVpp. The eye width was 0.5UI and 0.4UI for the through and amplifier, achieving 30-Gb/s PAM4 transmission.

Table 2 lists a performance summary and comparison with the distributed amplifiers previously reported. Rito et al. [3] reported a distributed modulator driver using SiGe:C BiCMOS technology with $f_{\text{max}} = 500$ GHz. Taking advantage of the excellent device performance, 90-Gb/s PAM4 transmission was demonstrated. However, the bandwidth normalized by $f_{\text{max}}$ is 0.18, which is not a particularly wide bandwidth from the point of view of the circuit. Wu et al. [7] introduced a frequency interleaving technique and reported excellent bandwidth. However, this technique causes rapid group delay variations at the sub-band junctions. Lee et al. [8] proposed a distributed amplifier using deeply doped channel technology, which has very low power consumption.

---

Fig. 9 Measured S-parameters. The solid lines and dashed lines represent the measurement and simulation. Unless otherwise indicated, the simulation results are slow corner. (a) Differential gain and reflection. (b) Common-mode gain and isolation. (c) Group delay response.

Fig. 10 Measured eye diagram. (a) 30-Gb/s NRZ. (b) 30-Gbps PAM4. Inset: eye diagram with a through structure.

---

[Image of graphs and diagrams]
The amplifiers reported in [11]–[14] used a pseudo-differential topology. Pseudo-differential does not provide the advantages of differential circuits. The distributed amplifier reported in this study has a small core area of 0.27 mm\(^2\) and a fully-differential topology. Compared to the most similar circuit [15], the coupled inductors allow a 57% reduction in silicon area. Although [4]–[8] used the single-ended topology, they have a larger core area than this work. The developed amplifier has a small power consumption of 75 mW at the cost of the gain. The gain can be increased by cascading amplifiers [7]. A future challenge is to develop a high-gain distributed amplifier that takes advantage of the fully-differential topology that allows true DC coupled cascading. The bandwidth normalized by \(f_{\text{max}}\) was 0.28, achieving the target specification. By applying the proposed circuit to advanced processes with \(f_{\text{max}} > 200\) GHz, a bandwidth of 56 GHz or higher is expected and enables 100 Gbaud transmissions. In addition, loss compensation can be used to further extend the bandwidth reached to the theoretical bandwidth of 25 GHz. Besides, PAM4 transmission was demonstrated only in this study and [3]. These studies reported small group delay variations of less than ±5 ps. Therefore, it is important to strictly satisfy the impedance and phase matching conditions for multi-level transmission.

4. Conclusion

We presented a compact fully-differential distributed amplifier that reduces a silicon area by more than 50%. To achieve a flat frequency response, we analyzed the differential artificial transmission lines using a simple lossless model. The lines were optimized by a gradient descent algorithm based on the derived odd-mode image parameters. The phase and impedance matching were strictly satisfied by the optimization, and we demonstrated a small group delay variation of ±2.7 ps and 30-Gbps NRZ and PAM4 transmission. Therefore, these results will accelerate the high-density integration of differential distributed circuits.

Acknowledgments

This work was supported through the activities of VDEC, The University of Tokyo, in collaboration with Rohm Corporation, Toppan Printing Corporation, Cadence Design Systems, Mentor Graphics, and Keysight Technologies.

References


Kisuke Kawahara received the B.E. and M.E. degrees in engineering from Tokyo University of Science, Chiba, Japan, in 2020 and 2022, respectively. He is currently pursuing a Ph.D. degree with the Graduate School of Engineering Science, Yokohama National University, Kanagawa, Japan. His research interests include electronic and photonic integrated circuits for high-speed communications.

Yohtaro Umeda received the B.S. and M.S. degrees in Physics and his Ph.D. degree in Electrical Engineering from the University of Tokyo, Tokyo, Japan, in 1982, 1984, and 2000, respectively. He joined Nippon Telegraph and Telephone (NTT) Corporation in 1984, where he was engaged in research and development on monolithic microwave and high-speed digital ICs with GaAs MESFETs and InP-based HEMTs. Since 2006, he has been a Professor in the Department of Electrical Engineering, Tokyo University of Science, Chiba, Japan. His current research interests include high-speed anolog circuits and signal processing techniques for radio and optical communication systems.

Kyoya Takano received the B.E., M.E., and Ph.D. degrees in Electrical Engineering from the University of Tokyo in 2006, 2008 and 2012, respectively. From 2012 to 2018, he was a project assistant professor with the Graduate School of Advanced Sciences of Matter, Hiroshima University. In 2018, he joined Tokyo University of Science as an assistant professor. He has been a junior associate professor since 2021 and an associate professor in the Department of Electrical Engineering since 2022. His current research interests include design of millimeter-wave and terahertz integrated circuits.

Shinsuke Hara received the B.E., M.E., and Ph.D. degrees in physics from Tokyo University of Science in 2000, 2002 and 2005, respectively. In 2013, he joined the National Institute of Information and Communication Technology (NICT), Koganei, Japan as a Researcher. His research interests are millimeter wave CMOS circuits design and nanoscale semiconductor devices.