A Multichannel 3.5mW/Gbps/Channel Gated Oscillator Based CDR in a 0.18µm Digital CMOS Technology

Armin Tajalli(1), Paul Muller(2), Mojtaba Atarodi(1), and Yusuf Leblebici(2)

(1) Electrical Engineering Department
Sharif University of Technology, SUT
Tehran, Iran
(2) Ecole Polytechnique Fédérale de Lausanne, EPFL
Microelectronic Systems Laboratory (LSM)
CH-1015 Lausanne, Switzerland

Abstract:
This article presents a very low-power clock and data recovery (CDR) circuit with 8 parallel channels achieving an aggregate data rate of 20 Gbps. A structural top-down design methodology has been applied to minimize the power dissipation while satisfying the required specifications for short-haul receivers. Implemented in a 0.18µm digital CMOS technology, total power dissipation is 70.2 mW or 3.51mW/Gbps/Ch and each channel occupies 0.045µm² silicon area.

1. Introduction
While clock frequency and throughput of processors are increasing with each new technology generation, the lack of high bandwidth I/Os becomes a prominent limiting factor in communication performance of computer systems. Using very high-speed serial interfaces is a good solution to increase the data communication bandwidth. For this purpose, design of very low-cost serial link data transceivers is very desirable. Implementing the proposed transceivers in a digital CMOS technology, has the advantage that these high-speed interfaces could be integrated with digital processors on the same substrate, without any extra cost for additional analog process options.
This article introduces an 8-channel clock and data recovery (CDR) system for short-haul applications. The main goal of this work is implementing a small area circuit with a power dissipation of less than 5mW/Ch/Gbps. These criteria were derived to make it feasible to implement tens or even hundreds of identical channel on a single chip. For this purpose, a gated-oscillator (GO)-based topology has been applied to benefit from both its simple structure [1]-[3] and its inherently high jitter tolerance (JTOL). Extensive behavioral modeling and simulations have been carried out to explore the performance of the proposed topology. Based on the required specifications for the proposed GO architecture, extracted from behavioral simulations, a circuit for optimized area and power consumption has been designed.

First we will present the architecture and system level design and simulation results for the proposed 8-channel CDR, while circuit design and simulations results plus the layout of the test chip will be explained in the following.

2. System Design
2.1. Multichannel GO Based Topology
Figure 1 shows a multichannel gated controlled oscillator (GCCO) CDR. In this architecture, a shared PLL generates a local high frequency clock \( f_{out} \) from a reference clock \( f_{in} \) while \( f_{out} \) is exactly equal to the baud rate of the received data. The proposed PLL uses the same oscillator which has been applied in each GCCO. To have a better matching between channels and PLL, current controlled oscillators (CCO) are used instead of voltage controlled oscillators in each channel. A copy of the control current \( I_C \) produced by PLL is delivered to all matched oscillators in each channel \( I_{CTL}[1:N] \).
Providing well matched CCOs, the clock frequencies of all channels \( C_{k_{out}}[1:N] \) are identical and equal to \( f_{out} \). Figure 2 shows the topology of the proposed GO based CDR. In this architecture, at each data edge, an edge detector circuit, based on a delay line and an XOR gate, generates a synchronization signal \( EDET \) for the GCCO. At an incoming data edge \( D_{in} \), \( EDET \) goes low for a duration defined by the delay line and freezes the output clock \( C_{k_{out}} \) to high via the first stage of the oscillator.

Figure 1: Multichannel GCCO-based CDR uses a shared PLL and bias circuit for frequency tuning
At the rising edge of EDET, the oscillator is released and goes back to its free oscillation mode at a frequency determined by the controlling current and in phase with the last received data edge. Sampling the delayed data (DDin) instead of input data (Din), eliminates the delay introduced by the delay line. Meanwhile, parasitic delays coming from the XOR gate or the delay mismatch between both inputs of the NAND gate in the oscillator should be compensated by proper dummy gates (not shown in this figure).

2.2. Jitter Tolerance

Jitter tolerance (JTOL) is a measure of capability of a CDR in tolerating the input jitter. JTOL is usually tested by adding a sinusoidal jitter at a given frequency range to the data stream, which already includes the deterministic and random jitter components added in the channel [4]. The maximum jitter amplitude, which is a function of jitter frequency, at which the CDR still operates at a given BER, is called jitter tolerance. In this situation input frequency would be

\[ \omega(t) = \omega_0 + \Delta \omega \cdot \cos \omega_f t \]  

in which \( \omega_0 \) indicates the instantaneous frequency, \( \omega_0 \) is the sinusoidal jitter frequency, and based on [5]

\[ \Delta \omega = \pi \cdot U_{p-p} \cdot \omega_f \]  

Here, \( U_{p-p} \) is the peak to peak jitter amplitude normalized to the nominal data period. Figure 3 shows the simulated JTOL in a GO based CDR in presence of random (Gaussian) and deterministic (uniform) jitter [4].

\[ n \leq \frac{f_{ck}}{2f_{sk}} \]  

where \( f_0 \) is data frequency, \( f_{ck} \) is oscillator frequency, and \( n \) indicates the maximum number of consecutive identical digits (CID). Obviously, any jitter on the input data or recovered clock will degrade the JTOL. Figure 4(a) shows the achievable JTOL in presence of random jitter on received data and random jitter accumulated on recovered clock. As can be seen, an increase in clock or data jitter leads to a degradation of JTOL.

As can be seen, this architecture shows a very good JTOL due to its inherent high bandwidth.

2.3. Frequency Tolerance

Unlike in conventional PLL based CDRs, a frequency difference can exist between the gated oscillator in the receiver of a channel and the incoming data stream. In practical applications, the data rate is specified within ±100ppm accuracy. The frequency tolerance (FTOL), defined as the maximum frequency difference at which the BER remains lower than a specified value (usually 10^{-12}). Ideally, when there is no jitter on data or clock, frequency error must be smaller than \( |f_0 - f_{sk}| < f_0 / 2n \), where \( f_0 \) is data frequency, \( f_{sk} \) is oscillator frequency, and \( n \) indicates the maximum number of consecutive identical digits (CID). Obviously, any jitter on the input data or recovered clock will degrade the FTOL. Figure 4(a) shows the achievable FTOL in presence of random jitter on received data and random jitter accumulated on recovered clock. As can be seen, an increase in clock or data jitter leads to a degradation of FTOL.

The main source of jitter on the recovered clock is accumulated jitter during free running of the gated oscillator that increases with the time interval of free running as [6]
In Equation 3, $\sigma_\Delta$ indicates the rms (root mean square) jitter value on the clock accumulated in the time interval on $\Delta T$ and $\kappa$ is a proportionality factor which depends on the topology power consumption and process parameters. Here, $\Delta T$ depends on the number of CIDs. The 8b/10b encoding scheme used in short distance communications reduces the CID to not more than 5 digits. Therefore, according to Figure 4(a) and using Equation 3, a FTOL specification of about $9\%$, requires $\kappa \leq 9.4 \times 10^{-8}$. This criterion could be used to determine the bias condition and thus the sizing of transistors in each delay cell. We consider relatively large frequency offsets which include device mismatches and supply variations in different regions of the chip.

3. Circuit Design

3.1. Phase Noise Requirement

Frequency stability and timing jitter are the two most important specifications of the oscillator in a GCCO topology. Timing jitter of ring oscillators, or its frequency domain analogy phase noise, has been extensively studied in [6] and [7]. Equation 4 can be used to have a good estimation about jitter-power consumption trade-off in a gated ring oscillator, where the minimum achievable $\kappa$ can be calculated as [7]:

$$\kappa_{\text{min}} = \frac{8}{3\eta} \sqrt{N \cdot \frac{kT}{P}} \left( \frac{V_{\text{dd}}}{V_{\text{char}}} + \frac{V_{\text{dsat}}}{R_L I_{\text{SS}}} \right)$$  \hspace{1cm} \text{Eq. 4}

in which $\eta$ indicates the relation between rise time and delay in each delay cell, $P$ is the oscillator power dissipation, $N$ is the number of delay stages in ring oscillator, $R_L$ is the load resistance, $I_{\text{SS}}$ is the tail current of delay cell, $V_{\text{dd}}$ is supply voltage, $V_{\text{char}}=V_{\text{dsat}}$ (drain-source overdrive voltage) for long channel devices and $V_{\text{char}}=E_{\text{CL}}/\gamma$ for short-channel devices. Shown in Figure 4(b), this equation can help us to determine the minimum achievable power dissipation and required FTOL value. In this design, the bias current of transistors and thus, the device sizing has been chosen based on this graph. This figure also compares the estimated $\kappa$ value derived in [6] and [7].

3.2. GCCO-Based CDR

Based on the topology shown in Figure 1, an 8-channel CDR has been implemented in a 0.18µm digital CMOS technology. The proposed shared-PLL uses a high order loop filter to suppress the ripples on controlling signal and thus have a very little jitter generation.

To achieve good matching, all the delay cells in delay line and the ring oscillator are built with the identical current-mode logic (CML) two-input multiplexer (MUX) gates optimized for this application. The minimum acceptable bias current has been chosen based on Equation 3 and Figure 4(b).

Figure 5 shows the transistor level simulation result while each channel is driven by a random 2.5Gbps input data stream.

3.3. Phase-Locked Loop (PLL)

Figure 6(a) shows the block diagram of the proposed PLL. A third order loop filter has been applied to attenuate the ripples on the control signal. A transconductor ($g_m$) cell has been used to convert the control voltage to current. Copies of this current will be applied to all CDRs to tune their oscillators to the desired frequency.

In the circuit shown in Figure 6(a), the parasitic pole introduced by the $g_m$ cell and parasitic capacitors at the transconductor output can push the loop towards instability. To avoid this problem, one can use this parasitic pole, i.e., $g_m / C_{\text{parasitic}}$ instead of $1/(R_1C_3)$ for filtering purpose.

Figure 6(b) shows the transfer characteristic of the proposed transconductor. This non-linear characteristic helps to achieve both a high current swing for a wide CCO tuning range and also relatively constant CCO gain ($K_{\text{CCO}}$) over process corners. In low speed corners where $K_{\text{CCO}}$ is low and higher control current is required to achieve the desired oscillation frequency,
transconductance is high. For the same reason, transconductance is low when the control current is low.

3.4. Test Circuit Layout

Table I, summarizes the specifications of the proposed 20Gbps 8-channel CDR. Occupying 0.045mm$^2$ silicon area per channel, the total power consumption is 70.2mW or 3.51mW/channel/Gbps (including bias and PLL circuits), which is well suited for modern multi-channel serial link applications. Power consumption of each part in the proposed multi-channel CDR can be seen in Table 1.

<table>
<thead>
<tr>
<th>Technology</th>
<th>0.18µm Digital CMOS</th>
</tr>
</thead>
<tbody>
<tr>
<td>Supply (V)</td>
<td>YL.6 - 2.0</td>
</tr>
<tr>
<td>Per channel bit rate (Gbps/Ch)</td>
<td>2.5</td>
</tr>
<tr>
<td>Total bit rate (Gbps)</td>
<td>20</td>
</tr>
<tr>
<td>Power consumption (mW @ 1.8V)</td>
<td>PLL 7.2, GCCO 70.2, Total 10.8</td>
</tr>
<tr>
<td>PLL settling time (µs)</td>
<td>1.3</td>
</tr>
<tr>
<td>Area of GCCO (mm$^2$)</td>
<td>0.045 (250µm x 180µm)</td>
</tr>
<tr>
<td>Area of PLL (mm$^2$)</td>
<td>0.1 (300µm x 320µm)</td>
</tr>
</tbody>
</table>

Table 1: Specifications of the Proposed 8-Channel CDR

Table 2 compares this design to the previous work. As can be seen, PI (phase-interpolator) and PLL based CDRs have larger area and normalized power consumption ($P_{diss,n}$) with respect to the GOs. The proposed approach shows about 30% reduction in $P_{diss,n}$ with respect to the low-power CDR reported in [3]. The CDR reported in [8] shows a high $P_{diss,n}$ mainly due to the structure applied to operate in 1/5 data rate.

An 8-channel CDR system has been implemented using a conventional digital 0.18µm CMOS process with no analog options to experimentally evaluate the capabilities of the proposed architecture. Figure 7 shows the mask layout of a single channel GCCO-based CDR including output buffers. The proposed chip is currently in manufacturing and measurement results should be available by the time of publication.

Figure 7: Mask layout of the proposed GCCO CDR including output I/O buffers

<table>
<thead>
<tr>
<th>Year</th>
<th>Technology</th>
<th>Supply (V)</th>
<th>Data Rate (Gbps/Ch)</th>
<th>Power (mW/Gbps/Ch)</th>
<th>Area (mm$^2$/Ch)</th>
<th>CDR Type</th>
<th>No. of Channels</th>
</tr>
</thead>
<tbody>
<tr>
<td>[9]</td>
<td>2005</td>
<td>0.11 µm</td>
<td>1.5</td>
<td>10</td>
<td>22</td>
<td>0.35</td>
<td>Phase Interpolator 1</td>
</tr>
<tr>
<td>[10]</td>
<td>2001</td>
<td>0.18 µm</td>
<td>2.5</td>
<td>10</td>
<td>7.2</td>
<td>0.99</td>
<td>PLL 1</td>
</tr>
<tr>
<td>[8]</td>
<td>2002</td>
<td>0.18 µm</td>
<td>1.8</td>
<td>5</td>
<td>18</td>
<td>GCCO 32</td>
<td>PLL 1</td>
</tr>
<tr>
<td>[3]</td>
<td>2003</td>
<td>0.15 µm</td>
<td>1.5</td>
<td>10</td>
<td>0.02</td>
<td>GCCO 4</td>
<td>PLL 1</td>
</tr>
<tr>
<td>This Work</td>
<td>2005</td>
<td>0.18 µm</td>
<td>1.8</td>
<td>2.5</td>
<td>3.5</td>
<td>0.05</td>
<td>GCCO 8</td>
</tr>
</tbody>
</table>

Table 2: Comparison with Previous Work (all examples built in CMOS technology)

4. Conclusion

In this article, an 8-channel clock and data recovery system operating with an aggregate data rate of 20 Gbps has been presented. A structural top-down design methodology has been introduced to implement a multichannel CDR using a digital 0.18µm CMOS technology.

This work has been partially supported by the Swiss National Science Foundation Grant 200021-100625 and the Microelectronics R&D Center (MERDCI), Tehran.

References: