# FPGA Vernier Digital-to-Time Converter With 1.58 ps Resolution and 59.3 Minutes Operation Range

Poki Chen, Member, IEEE, Po-Yu Chen, Juan-Shan Lai, and Yi-Jin Chen

Abstract—The first FPGA multiple channel digital-to-time converter, or digital pulse generator, is proposed to further extend FPGA applications into analog domain. Based on vernier principle, the effective resolution is made equivalent to the period difference of two phase-locked loop (PLL) outputs. The finer than ever DTC resolution of 1.58 ps is achieved with an Altera Stratix III FPGA chip. The DNL and INL are verified to be  $-0.086 \sim +0.12$  LSB and  $-0.93 \sim +0.75$  LSB respectively for input value varied from 1 to 1026. The widest operation range of 59.3 minutes is accomplished with 51 functioning input bits. Except for 2 shared PLLs, there are only 422 combinational ALUTs and 84 dedicated logic registers utilized per channel for 224-channel circuit implementation. The power consumption per channel is simulated to be 3.04 mW only. With a simple but powerful structure, the design cost is substantially reduced from those of its predecessors.

*Index Terms*—ATE, BIST, digital pulse generator, digital-to-time converter, FPGA and vernier principle.

## I. INTRODUCTION

IGITAL-TO-TIME converter (DTC) is used to generate a time signal with a width proportional to a programmed input value. It is one of the most important cores of automatic test equipments (ATE) or measurement instruments, such as VLSI functional tester, PLL tester, IC pulse parametric tester, system trigger, laser diode tester, timing generator, time-to-digital converter (TDC) tester, delay compensator and pulsed R/F measurement equipment [1]–[13], [22]. It is also extensively used by digital IC BIST (built-in self-test) applications for cost reduction. Due to different operational principles adopted, digital-to-time conversion can be fulfilled by absolute [1], [2], [6], [8], [10]–[12] or relative [3]–[5], [7], [9] time generation. The absolute time generation can be utilized to produce wide delay range with low offset. However its demerits are comparatively poor resolution and more sensitivity to PVT (process, voltage and temperature) variations. For the relative time generation, the effective resolution equals the delay difference between different transmission paths or delay elements and can be made extremely fine. Nevertheless, its performance is easily hampered by path or element mismatches. With an input value equal to 0,

the delays of different paths cannot be made exactly the same after fabrication and there always exists a large offset.

For performance enhancement, conventional DTCs were usually realized with GaAs or Bipolar processes based on the absolute time generation principle. A 7-bit DTC was proposed to set the output pulse width by comparing the output voltages of a digital-to-analog converter (DAC) and a ramp generator whose charging rate was determined by an external current source and one external capacitor [1]. The achieved operation range and resolution were 15.875 ns and 125 ps with  $\pm 1$  LSB differential nonlinearity (DNL). Later, another 8-bit version was presented to get 10 ps resolution with  $\pm 1$  LSB integral nonlinearity (INL) and 50 MHz operation frequency [2]. Its full-scale operation range was merely 2.5 ns which could be extended to  $10 \ \mu s$ with 39 ns resolution. The accuracy and maximum operation frequency of these DTCs were dominated by the performance of DAC and comparator which would become harder to design for finer resolution or wider operation range with more input bits.

Alternatively, a programmable vernier delay line with calibration RAM was utilized to get 40 ps resolution [3]. Afterward the vernier delay line was replaced with delay matrices which composed of multiple delay cells with rather small delay differences. Some multiplexers were adopted to vary the effective transmission path of the matrices according to the programmed delay. The operation range (2.55 ns [4] or 3 ns [5]) was restricted by the maximum achievable delay of the delay matrices. Although the effective resolutions were as low as 10 ps and 8 ps, their open-loop structures always owned some uncompensated PVT sensitivities which would cause measurement errors in turn.

To cut down fabrication cost and power consumption and to boost circuit integration, CMOS processes were gradually adopted for DTC designs. A fully integrated CMOS force timing generator was proposed for pin electronics (PE) [6]. The realized programmable delay lines consisted of three different delay elements: a shift register for coarse delay adjustment, an inverter chain for the finer resolution generation, and a set of ratioed inverter delays for the finest resolution formation. The effective resolution was 600 ps. With an open-loop structure, the circuit was also vulnerable to PVT variations. One possible way for suppressing PVT sensitivity is to adopt the close-loop negative feedback mechanism of the locked loops. An array of Mdelay-locked loops (DLLs) with N delay elements in each loop was proposed to generate a small phase difference by interlacing their output phases through the help of an additional DLL with fewer (M) delay elements in the delay chain. The DTC resolution was verified to be 1/MN of the reference period and could be made extremely fine with large MN [7]. However the DTC

Manuscript received January 16, 2009; revised May 23, 2009. This paper was recommended by Associate Editor V. De. This work was supported by National Science Council under Grant NSC 96-2221-E-011-151.

P. Chen is with the Department of Electronic Engineering and Graduate Institute of Electro-Optical Engineering, National Taiwan University of Science and Technology, Taipei 10617, Taiwan (e-mail: poki@mail.ntust.edu.tw).

P.-Y. Chen, J.-S. Lai, and Y.-J. Chen are with the Department of Electronic Engineering, National Taiwan University of Science and Technology, Taipei 10617, Taiwan.

Digital Object Identifier 10.1109/TCSI.2009.2028748

had totally M + 1 DLLs or more specifically  $M \times (N + 1)$  delay elements which were not only much area-consuming but also very hard to be matched. The effective resolution was realized as 154 ps with an rms error of 44 ps.

By using a single cyclic delay line and 8× phase interpolators, a CMOS DTC was proposed to get 37.5 ps resolution and 5 ms programmable delay range. However, the INL error was as large as  $0 \sim 7$  LSB which could be reduced to  $\pm 0.4$  LSB only with chip-by-chip calibration [8]. Instead, a CMOS DTC was invented to reduce the device mismatch impact by storing calibration data in high speed SRAMs. The operation frequency range and the resolution were  $100 \sim 400$  MHz and 19.5 ps respectively, but the output range was merely 2.5 ns and the INL still reached 35 ps [9]. Due to the use of large quantity of calibration SRAMs, the chip size was increased tremendously. More address/data bits of the calibration SRAMs must be consumed to achieve finer resolution. A 100-fold increase in calibration SRAM size might be required to achieve a resolution of several picoseconds [10]. Another state-of-the-art DTC was presented to get rid of the calibration SRAMs through the utilization of a DLL to conquer the problems caused by PVT variations. Its resolution reached 1.83 ps and the INL error was less than 8ps  $(\pm 4 \text{ ps})$  [10], [11]. However, the monotonicity could only be ensured by using a 1-stage current-controlled high-linearity finedelay circuit with delay adjusted by two DACs. Also, active noise cancellers were required to eliminate  $I_{DD}/I_{SS}$  fluctuations and a PLL/DLL multiple feedback system was utilized to eliminate the timing drift and jitter. It made the circuit rather complicated and very hard to design. Recently, a self-calibrating DTC was proposed to claim a sub-picosecond resolution by using a cascade of coarse active delay locked line and a passive programmable fine delay for phase interpolation [12]. An integrated Dual Mixer Time Domain (DMTD) circuit was adopted to overcome device mismatch and process variations for self-calibration. However, no experiment result was demonstrated to reveal the actual performance of DTC.

Most conventional DTCs depend on full-custom design which consumes much time and manpower. To alleviate the need of full-custom design and hasten prototyping, FPGA which is conventionally considered as a digital development platform has been successfully applied to some analog applications such as smart temperature sensor and time-to-digital converter, although it rarely happens [14], [15]. In this paper, a vernier DTC realizable with modern FPGA chips is proposed to expand FPGA application further into the analog field. It will be proven to promise a resolution as fine as 1.58 ps and an INL error less than  $\pm 1$  LSB. The remainder of the paper is organized as follows. Section II describes the operation principle of the proposed circuit. Section III details the circuit structure. Section IV discusses important FPGA implementation issues and presents the measurement results. A summary of the paper is given in Section V.

#### II. OPERATION PRINCIPLE

The vernier principle is widely applied to time-to-digital converters with a typical timing diagram shown in Fig. 1(a)[16], [17]. When  $S_{\text{Start}}$  signal arrives,  $S_s$  signal is triggered to oscil-



Fig. 1. Timing diagrams of (a) the vernier TDC and (b) the proposed vernier DTC.

late. Similarly, the occurrence of  $S_{\text{Stop}}$  signal activates  $S_f$  signal to vibrate with a period  $T_f$  slightly shorter than  $T_s$ . Since  $S_f$  signal oscillates a little bit faster than  $S_s$  signal, it will catch up with  $S_s$  signal after some oscillations. At the phase coincidence of  $S_s$  and  $S_f$  signals, the input time width can be calculated as

$$T_{\rm in} = n\Delta T \tag{1}$$

where n and  $\Delta T$  are the number of oscillations before the phase coincidence and the difference between  $T_s$  and  $T_f$  respectively. Since the resolution  $\Delta T$  equals the period difference, it can be made rather fine even with low operation frequencies. However, the measurement range is limited to one  $T_s$ . If the TDC operation is reversed as depicted in Fig. 1(b), the vernier principle can be applied likewise to create a brand new high-accuracy DTC. Both  $S_f$  and  $S_s$  signals oscillate continuously with a small period difference  $\Delta T$ . After the phase coincidence of  $S_f$  and  $S_s$ signals, each oscillation will induce one more  $\Delta T$  delay between  $S_f$  and  $S_s$  signals. If both signals are programmed to oscillate n cycles after phase coincidence to set  $S_{\text{Start}}$  and  $S_{\text{Stop}}$ signals respectively, the width of the output interval can also be derived as

$$T_{\rm out} = n\Delta T \tag{2}$$

which is exactly proportional to the input value n and successfully fulfills the function of digital-to-time conversion.

It deserves noticing that the output interval width is no longer limited to one  $T_s$  since no restriction is imposed on n which can be set to any value when necessary. However, the output latency after phase coincidence reaches  $nT_f$  which will be turned out to be unbearable for large n. Fortunately, a single-stage vernier TDC with self interpolation was invented to substantially expand the operation range of vernier principle with a timing diagram illustrated in Fig. 2(a) [18]. After the arrival of  $S_{\text{Start}}$ signal, a coarse counter is utilized to count the activated  $S_s$ signal oscillations until  $S_{\text{Stop}}$  signal arrives. Hereafter another fine counter is stimulated to count the racing oscillations be-



Fig. 2. Timing diagrams of (a) The single-stage vernier TDC with extended operation range and (b) The modified vernier DTC.



Fig. 3. Simplified block diagram of the vernier DTC.

tween  $S_f$  and  $S_s$  signals before the next phase coincidence. The input time width is derived as

$$T_{\rm in} = T_{coarse} + T_{fine}$$
  
=  $\alpha T_s + \beta (T_s - T_f) = \alpha T_s + \beta \Delta T,$  (3)

where  $\alpha$  and  $\beta$  represent the coarse and fine count values correspondingly. The operation principle can also be reversely applied to DTC for output latency reduction as depicted in Fig. 2(b) where  $\text{CNT}_f$  and  $\text{CNT}_s$  are the down counters clocking by  $S_f$  and  $S_s$  signals respectively to count the cycles for setting  $S_{\text{Start}}$  and  $S_{\text{Stop}}$  signals. After phase coincidence, both counters begin counting and the first  $\beta$  cycles of  $S_f$  and  $S_s$  signals are used to generate  $\beta \Delta T$  delay difference between  $S_{\text{Start}}$  and  $S_{\text{Stop}}$  signals. Afterward  $\text{CNT}_s$  is allowed to count extra  $\alpha$  cycles for generating additional  $\alpha T_s$  delay difference. The output time width becomes

$$T_{\text{out}} = (\alpha + \beta) T_s - \beta T_f$$
  
=  $\alpha T_s + \beta (T_s - T_f) = \alpha T_s + \beta \Delta T.$  (4)



Fig. 4. Schematic of the output pulse generator.



Fig. 5. Realized circuit of the proposed DTC with 50% duty cycle.

For a given output width  $n\Delta T$  in Fig. 1(b) or equivalent  $\alpha T_s + \beta\Delta T$  in Fig. 2(b), we have

$$\alpha = \left\lfloor \frac{n}{\frac{T_s}{\Delta T}} \right\rfloor, \ \beta = n \bmod \frac{T_s}{\Delta T}$$
(5)

where  $\lfloor x \rfloor$  denotes the largest integer less than or equal to xand  $y \mod z$  computes the remainder of  $y \div z$ . Usually,  $\Delta T$ is designed to be much smaller than  $T_s$ . The output latency is reduced to merely  $\beta T_f$  which is tremendously less than  $nT_f$  for wide output.

## **III. CIRCUIT DESCRIPTION**

Fig. 3 shows the simplified block diagram of the proposed DTC which directly realizes the digital-to-time conversion function described in Fig. 2(b). Two oscillators with very close periods of  $T_s$  and  $T_f$  are utilized to generate the resolution  $\Delta T$  as fine as possible. Theoretically, a simple D-type flip-flop (DFF) with  $S_s$  and  $S_f$  signals as the clock and data inputs is good enough to be the phase detector for phase coincidence detection. Its output signal  $S_{PD}$  is used to trigger the corresponding output pulse generators of  $S_{\text{Start}}$  and  $S_{\text{Stop}}$  signals. Fig. 4 depicts the proposed circuit for the output pulse generator. Before the rise edge of  $S_{PD}$  signal, the preset count value  $\beta$  or  $\alpha + \beta$ is loaded into down counter  $CNT_f$  or  $CNT_s$ . After  $CNT_f$  or  $CNT_s$  counts to 0, the asynchronous clear of  $DFF'_o$  will be released. When the next rise edge of  $S_f$  or  $S_s$  signal arrives, the outputs of  $DFF'_{o}$  and  $DFF_{o}$  will be triggered in turn to set  $S_{Start}$ or  $S_{\text{Stop}}$  signal to 1 as required in Fig. 2(b). However, when the rise edges of  $S_f$  and  $S_s$  signals get too close to each other, the meta-stability problem will cause the phase detector to malfunction and the phase coincidence cannot be detected perfectly. Although an additional flip-flop can be inserted after the phase detector to wait one more period of  $S_s$  signal before sampling the phase detector output to form the so called synchronization register chains [19], the probability of meta-stability can only be



Fig. 6. Schematics of the period signal generator and the modified output pulse generator.



Fig. 7. Timing diagram of the modified output pulse generator.

reduced instead of completely eliminated [20]. If the meta-stability occurs, the phase coincidence detection will be postponed by one  $T_s$  to induce one  $\Delta T$  error in the output width. Moreover, the insertion of the meta-stability suppression flip-flop causes another  $T_s$  delay for activating  $S_{\text{Start}}$  and  $S_{\text{Stop}}$  output pulse generators. Consequently, it provokes one more  $\Delta T$  error which can be compensated for by subtracting one from the input value at the expense of more complicated circuit.

For error reduction, the circuit of the proposed DTC is modified as shown in Fig. 5. Since the rise edges of  $S_s$  and  $S_f$  signals can be synchronized to that of  $S_{ref}$  signal by bang-bang phase-locked loops, no phase detector is required for phase coincidence detection. Each rise edge of  $S_{ref}$  signal indicates one phase coincidence of  $S_s$  and  $S_f$  signals. A period signal generator is added to set the output repetition rate and two output pulse generators are employed to generate  $S_{\text{Start}}$  and  $S_{\text{Stop}}$  signals with a delay difference set by the DTC input. As depicted in Fig. 6, the period signal generator can be simply realized by a reloadable down counter with a reloading cycle equal to one half of the desired output period followed by a divide-by-two counter to generate the period signal  $S_{per}$  with 50% duty cycle. The output pulse generator is also modified from Fig. 4 to ensure 50% duty cycles for  $S_{\text{Start}}$  and  $S_{\text{Stop}}$  signals. The corresponding timing diagram is illustrated in Fig. 7. For setting  $S_{\text{Start}}/S_{\text{Stop}}$  signal,  $\text{CNT}_f/\text{CNT}_s$  is preloaded by  $S_{LOAD} =$ 0 before the rise edge of  $S_{per}$  signal and then activated to count by the rise edge of  $S_{LOAD}$  signal. When  $\text{CNT}_f/\text{CNT}_s$  counts to 0, the state of  $S_{per} = 1$  will be latched to set  $S_{\text{Start}}/S_{\text{Stop}}$ output. A similar operation is adopted for resetting  $S_{\text{Start}}/S_{\text{Stop}}$ signal. It makes  $S_{\text{Start}}/S_{\text{Stop}}$  signal an exact replica of  $S_{per}$ signal with  $\beta T_f/(\alpha + \beta)T_s$  delay. With a symmetric  $S_{per}$  signal,  $S_{\text{Start}}$  and  $S_{\text{Stop}}$  signals are also guaranteed to own 50% duty cycle.

Since all logic gates in the output pulse generator are synchronized to  $S_s$  or  $S_f$  signal, the timing error caused by the latency mismatch among  $S_{per}$ ,  $S_{Load}$ ,  $S_s$  or  $S_f$  signals can be minimized if the timing mismatch is controlled under one  $T_s$  or  $T_f$ . Similarly, the arrival time mismatch between  $S_s$  and  $S_f$  signals seen at the input ports of the output pulse generators will effectively impose an offset on DTC output. A suitable number of dummy delay cells can be inserted in the output path of  $S_{\text{Start}}$  or  $S_{\text{Stop}}$  signal to reduce the offset to an acceptable level. Alternatively, the Altera Quartus II software offers push-button netlist optimizations and physical synthesis options that can improve design performance at the expense of considerable increases of compilation time and area [21]. For example, the set input delay constraint is used to specify the data arrival times of  $S_s$  and  $S_f$  signals with respect to the reference clock  $S_{\rm ref}$  to minimize the DTC output offset.

Since both coarse delay  $\alpha T_s$  and fine delay  $\beta \Delta T$  are generated by the same phase-locked loops, the coarse to fine resolution ratio  $T_s/\Delta T$  will be accurately maintained by the negative feedback mechanism of the phase-locked loops. It assures the performance of the proposed DTC is insensitive to PVT variations. Assume the divisors of  $PLL_f$  and  $PLL_s$  frequency dividers (or prescalers) in Fig. 5 are designed to be A and B respectively, the periods  $T_f$  and  $T_s$  can be derived as

$$T_f = \frac{T_{\text{ref}}}{A}, \ T_s = \frac{T_{\text{ref}}}{B}.$$
 (6)

With A larger than B, the effective resolution becomes

$$\Delta T = T_s - T_f = \frac{A - B}{AB} T_{\text{ref}}.$$
 (7)

The coarse to fine resolution ratio equals

$$\frac{T_s}{\Delta T} = \frac{A}{A - B}.$$
(8)

To produce a given output width  $n\Delta T$ , the corresponding values of  $\alpha$  and  $\beta$  can be derived from (5) to be

$$\alpha = \left\lfloor \frac{n}{\frac{A}{(A-B)}} \right\rfloor \text{ and } \beta = n \mod \frac{A}{A-B}.$$
 (9)

To ease the above decomposition calculation of  $\alpha$  and  $\beta$ , the divisors A and B are recommended to be designed as  $2^{K}$  and  $2^{K}-1$  respectively.  $T_{f}$  and  $T_{s}$  become  $T_{ref}/2^{K}$  and  $T_{ref}/(2^{K}-1)$ . By (7), the effective resolution can be derived as

$$\Delta T = \frac{T_{\text{ref}}}{2^K (2^K - 1)} \tag{10}$$

which can be made extremely fine with large K. The coarse to fine resolution ratio  $T_s/\Delta T$  becomes  $2^K$  exactly. By (9),  $\alpha$  and  $\beta$  are recalculated as

$$\alpha = \left\lfloor \frac{n}{2^K} \right\rfloor \text{ and } \beta = n \mod 2^K.$$
 (11)

For an N-bit DTC,  $\alpha$  and  $\beta$  can be simplified as the (N-K)-bit MSB value  $[D_{N-1}:D_K]$  and the K-bit LSB value  $[D_{K-1}:D_0]$ correspondingly. The complicated division hardware described in (9) is no longer needed. The simplified input processing circuit for  $CNT_f$  and  $CNT_s$  is drawn in Fig. 8 where  $CNT_f$  only needs K input bits to load the value  $[D_{K-1}:D_0]$  of  $\beta$ . However, the maximum value of  $\alpha + \beta$  is

$$(\alpha + \beta)_{\max} = (2^{N-K} - 1) + (2^K - 1)$$
  
= 2<sup>N-K</sup> + 2<sup>K</sup> - 2. (12)

The number of  $\text{CNT}_s$  bits is required to be  $\max(N-K,K)+1$  at least. In practical realization,  $\text{CNT}_f$  and  $\text{CNT}_s$  are designed to have the same number of input bits to equalize their transmission delays for reducing DTC offset at the expense of more logic gates utilized.

#### IV. FPGA IMPLEMENTATION AND EXPERIMENTAL RESULTS

Except for the phase-locked loops, all sub-circuits in Fig. 5 only utilize standard digital logic gates and can be readily



Fig. 8. Input processing circuit for  $CNT_f$  and  $CNT_s$ .



Fig. 9. Simplified PLL block diagram for the adopted Altera and Xilinx FPGAs.

 TABLE I

 PLL Specifications for the Adopted FPGAs

|                                | Altera<br>Stratix III | Stratix II<br>GX<br>Altera | Xilinx<br>Virtex-5 |
|--------------------------------|-----------------------|----------------------------|--------------------|
| Fabrication Process            | 65 <i>nm</i>          | 90 <i>nm</i>               | 65 <i>nm</i>       |
| Input frequency (Fin)          | 5~717 MHz             | 2~500                      | 19~1000            |
|                                |                       | MHz                        | MHz                |
| Output frequency (Fout)        | 600~1600              | 2~550                      | 3.125~             |
|                                | MHz*                  | MHz                        | 450 MHz            |
| Input reference divisor (D)    | 1~512                 | 1~512                      | 1~52               |
| Output frequency divisor (M)   | 1~512                 | 1~512                      | 1~64               |
| Post-scale counter divisor (C) | 1~512                 | 1~512                      | 1~128              |

\* 600~1600 MHz for internal use but ≤717 MHz for external signaling

implemented with FPGA chips. Moreover, current FPGA chips usually embed several high performance phase-locked loops which can be used as  $PLL_f$  and  $PLL_s$ . By full FPGA realization, the design effort and cost of the proposed DTC can be significantly reduced. For function verification and performance evaluation, Xilinx Virtex-5, Altera Stratix II GX and Altera Stratix III FPGAs are adopted for circuit implementation. Since the DTC accuracy is dominated by the phase-locked loop performance, the simplified bang-bang PLL block diagram of the above FPGAs is re-plotted in Fig. 9 along with the important parameters listed in Table I for design reference [23]–[25]. To achieve the finest resolution, the divisors (M) of PLL output frequency dividers should be designed the closest to each other.

For Xilinx Virtex-5 FPGA, the PLL output frequency  $F_{out}$  is limited to 450 MHz. The finest resolution can be gotten by setting the input frequency  $F_{in}$  to 28.125 MHz, the output frequency devisor M to 63/64, the input reference divisor D to 2



Fig. 10. Measured output width versus input value for Virtex-5 FPGA DTC.

and the post-scale counter divisor C to 2. The output frequency can be derived as

$$F_{\text{out}} = (F_{\text{in}} \div D \times M \div C). \tag{13}$$

We have

$$T_f = \frac{1}{28.125 \div 2 \times 64 \div 2 \text{ MHz}} = 2.22222 \text{ ns}$$
$$T_s = \frac{1}{28.125 \div 2 \times 63 \div 2 \text{ MHz}} = 2.25749 \text{ ns.}$$

The effective resolution becomes

$$\Delta T = (T_s - T_f) = 35.27 \text{ ps.}$$
(14)

To figure out the actual performance, the output interval width of Xilinx FPGA DTC was measured from  $1\Delta T$  to  $3T_s$  for every input value to validate both coarse and fine resolutions. The reference clock was generated by Agilent 81130A 400/660 MHz Pulse/Pattern Generator. The delay difference between  $S_{\text{Start}}$ and  $S_{\text{Stop}}$  signals was accurately measured by Tektronix DPO 70404 digital oscilloscope with 25 GS/s real time sample rate. Unlike the conventional versions, the proposed DTC has no device mismatch problem and thus possesses excellent linearity as shown in Fig. 10. The DNL and INL errors are verified to be merely  $-0.15 \sim +0.16$  LSB and  $\pm 0.4$  LSB as depicted in Figs. 11 and 12 respectively, It ensures that every input bit is valid. The realized DTC only utilizes 473 slice LUTs, 117 slice registers and 2 phase-locked loops.

Under the constraint of Fout  $\leq 550$  MHz, the DTC implemented with Altera Stratix II GX FPGA was verified to own an effective resolution of 3.56 ps with  $F_{\rm in} = 2.148$  MHz, M = 511/512, D = 2 and C = 1 [26]. The DTC output was measured from 1 $\Delta$ T to  $2T_s$  for every input value to reveal the excellence of the propose circuit as redrawn in Fig. 13. The DNL and INL errors are calculated to be  $\pm 0.07$  LSB and  $-0.23 \sim +0.2$  LSB only as replotted in Figs. 14 and 15 correspondingly. Again, the INL error is small enough to ensure every input bit is valid. To demonstrate the stability of the DTC against temperature variation, the measurement was done for every 20°C to cover the temperature operation range ,  $0 \sim 85^{\circ}$ C, of the FPGA chip in a Programmable Temperature



Fig. 11. Measured DNL error for Virtex-5 FPGA DTC.



Fig. 12. Measured INL error for Virtex-5 FPGA DTC.

& Humidity Chamber MHU-408LRBDA. Fig. 16 depicts the measurement result of the Stratix II GX FPGA DTC under temperature variation for the effective resolution which only spreads over  $3.538 \sim 3.574$  ps for a wide temperature range of 85°C. The deviation is merely  $-0.6 \sim 0.4\%$  which may be caused solely by the finite accuracy of the measurement equipments since all PVT variations are supposed to be automatically calibrated out by phase-locked loops. The DTC functions well with 49 input bits and the adjustable operation range is verified to be as large as 33.4 minutes.

To pursue the finer than ever resolution, Altera Stratix III FPGA was also adopted for DTC realization. In contrast to Stratix II version, the PLL output frequency  $F_{out}$  can be set as high as 1600 MHz for driving internal circuits. The resolution can be made even higher than that of Stratix II DTC. However the values of the input reference divisor D, output frequency devisor M and post-scale counter divisor C are optimized automatically before compilation by the Altera Quartus II software. Some DTC design flexibility is lost. The finest resolution is accomplished by setting  $F_{in} = 20$  MHz, M = 154/205, D = 3/4 and C = 1. We have

$$T_f = \frac{1}{20 \div 3 \times 154 \text{ MHz}} = 974.02597 \text{ ps}$$
$$T_s = \frac{1}{20 \div 4 \times 205 \text{ MHz}} = 975.60976 \text{ ps}.$$

The effective resolution reaches

$$\Delta T = (T_s - T_f) = 1.58 \text{ ps.}$$
(15)



Fig. 13. Measured output width versus input value for Stratix II GX FPGA DTC.



Fig. 14. Measured DNL error for Stratix II GX FPGA DTC.



Fig. 15. Measured INL error for Stratix II GX FPGA DTC.

For such an extraordinary fine resolution, the DTC output was still measured from  $1\Delta T$  to  $2T_s$  for every input value as illustrated in Fig. 17. The DNL and INL errors are calculated to be merely  $-0.086 \sim +0.12$  LSB and  $-0.93 \sim +0.75$  LSB as shown in Figs. 18 and 19 correspondingly. The DTC functions well with 51 input bits and the adjustable operation range is verified to be as large as 59.3 minutes.

For multi-channel realization, the logic utilizations of Vertex-5, Stratix II GX and Stratix III FPGAs are summarized in Table II. Except for two shared phase-locked loops, only 422



Fig. 16. Temperature sensitivity of the effective resolution for Stratix II GX FPGA DTC.



Fig. 17. Measured output width versus input value for Stratix III FPGA DTC.



Fig. 18. Measured DNL error for Stratix III FPGA DTC.

adaptive LUTs and 84 logic registers in average are consumed for each channel implemented with Stratix III FPGA. The maximum numbers of realizable DTC channels for Vertex-5, Stratix II GX and Stratix III FPGAs are 321, 271 and 224 respectively. The power consumption per channel of Stratix III FPGA DTC is simulated by PowerPlay Early Power Estimator to be 3.04 mW which is significantly reduced from its predecessors'.

Since the Altera FPGA owns both wider operation frequency range and larger divisor adjustment flexibility, the achievable DTC resolution is much finer than that of Xilinx FPGA. The



Fig. 19. Measured INL error for Stratix III FPGA DTC.

 TABLE II

 LOGIC UTILIZATION FOR MULTI-CHANNEL IMPLEMENTATION

| Number . | Stratix III |                                 | Stratix II GX |                                 | Virtex-5 |           |
|----------|-------------|---------------------------------|---------------|---------------------------------|----------|-----------|
| of CHs   | ALUTs       | Dedicated<br>Logic<br>Registers | ALUTs         | Dedicated<br>Logic<br>Registers | LUTs     | Registers |
| 1        | 446         | 93                              | 191           | 91                              | 473      | 117       |
| 2        | 868         | 177                             | 375           | 175                             | 537      | 135       |
| 3        | 1,290       | 261                             | 559           | 259                             | 601      | 153       |
|          |             |                                 |               |                                 |          |           |
| 10       | 4,244       | 849                             | 1,847         | 847                             | 1,049    | 279       |
|          |             |                                 |               |                                 |          |           |
| 100      | 42,224      | 8,409                           | 18,407        | 8,407                           | 6809     | 1899      |
| 200      | 84,424      | 16,809                          | 36,807        | 16,807                          | 14009    | 3699      |
| 224      | 94,552      | 18,825                          | 41,223        | 18,823                          | 15641    | 4131      |
|          | N/A         | N/A                             |               | •••                             |          |           |
| 271      | N/A         | N/A                             | 49,871        | 22,771                          | 18837    | 4977      |
| 321      | N/A         | N/A                             | N/A           | N/A                             | 22169    | 5859      |

 TABLE III

 Specifications for Altera and Xilinx FPGA DTCs

|                                   | Xilinx<br>Virtex-5 | Altera<br>Stratix II GX | Altera<br>Stratix III |
|-----------------------------------|--------------------|-------------------------|-----------------------|
| ALUT/LUT per channel*             | 70                 | 184                     | 422                   |
| Register per channel*             | 19                 | 84                      | 84                    |
| Shared PLLs                       | 2                  | 2                       | 2                     |
| Input reference frequency         | 28.125<br>MHz      | 2.148 MHz               | 20 MHz                |
| Input reference divisor (D)       | 2                  | 1 / 2                   | 3 / 4                 |
| Output frequency divisor (M)      | 64 / 63            | 256 / 511               | 154 / 205             |
| Post-scale counter divisor (C)    | 2                  | 1                       | 1                     |
| PLL <sub>f</sub> output frequency | 450 MHz            | 550 MHz                 | 1026.67<br>MHz        |
| PLL <sub>s</sub> output frequency | 443 MHz            | 548.9 MHz               | 1025 MHz              |
| Resolution                        | 35.27ps            | 3.56 ps                 | 1.58 ps               |

\*all with 224-channel realization for fair comparison

fulfilled specifications of all Altera and Xilinx FPGA DTCs are summarized in Table III for easy comparison.

#### V. CONCLUSIONS

The proposed DTC utilizes a single vernier delay stage to realize the digital-to-time conversion function. Since the DTC fully adopts the close-loop operation which is stabilized by two phase-locked loops, both of the coarse and fine resolutions are promised to be insensitive to PVT variations. Realized with Altera Stratix III FPGA, the proposed DTC is verified to own the finest measured resolution of 1.58ps, the widest operation range of 59.3 minutes, the least power consumption per channel of 3.04 mW and the smallest measured error of  $-1.47 \sim 1.19$  ps till now. The effective resolution only deviates  $-0.049 \sim 0.034$  ps over the full temperature operation range of the FPGA chip. The performance is even better than that of some commercial digital pulse generator with list price over tens of thousand US dollars [15]. Moreover, with FPGA realization, the circuit porting is proven to be very easy for BIST or embedded applications. It makes the proposed DTC excellent for low cost but high accuracy instrumentation or testing applications. The comparison among different DTCs is concluded in Table IV for quick reference.

### ACKNOWLEDGMENT

The authors would like to thank National Chip Implementation Center (CIC) for the support of FPGA design and simulation tools. They also thank GALAXY Taiwan and ULINX Taiwan for useful design discussions and valuable help in Altera and Xilinx FPGA implementations.

#### REFERENCES

- S. Katsu, T. Ueda, M. Kazumura, and G. Kano, "A GaAs programmable timer with 125 ps delay-time resolution," in *IEEE ISSCC Dig.*, Feb. 1988, pp. 16–17.
- [2] AD9501 Digitally Programmable Delay Generator. Analog Devices, Inc. [Online]. Available: www.analog.com.
- [3] C.-W. Branson, "Integrated pin electronics for a VLSI test system," IEEE Trans. Ind. Electron., vol. 36, pp. 23–27, May 1989.
- [4] T.-I. Otsuji and N. Narumi, "A 10-ps resolution, process-Insensitive timing generator IC," *IEEE J. Solid-State Circuits*, vol. 24, no. 10, pp. 1412–1417, Oct. 1989.
- [5] T.-I. Otsuji and N. Narumi, "A 3-ns range, 8-ps resolution, timing generator LSI utilizing Si bipolar gate array," *IEEE J. Solid-State Circuits*, vol. 26, no. 5, pp. 806–811, May 1991.
- [6] J. A. Gasbarro and M. A. Horowitz, "Integrated pin electronic for VLSI functional testers," *IEEE J. Solid-State Circuits*, vol. 24, no. 4, pp. 331–337, Apr. 1989.
- [7] J. Christiansen, "An integrated high resolution CMOS timing generator based on an array of delay locked loops," *IEEE J. Solid-State Circuits*, vol. 31, no. 7, pp. 952–957, Jul. 1996.
- [8] T.-Y. Wang, S.-M. Lin, and H.-W. Tsao, "Multiple channel programmable timing generators with single cyclic delay line," *IEEE Trans. Instrum. Meas.*, vol. 53, pp. 1295–1303, Aug. 2004.
- [9] B. Arkin, "Realizing a production ATE custom processor and timing IC containing 400 independent low-power and high-linearity timing verniers," in *IEEE ISSCC Dig.*, Feb. 2004, pp. 348–349.
- [10] M. Suda, K. Yamamoto, T. Okayasu, S. Kantake, S. Sudou, and D. Watanabe, "CMOS high-speed, high-precision timing generator for 4.266-Gbps memory test system," in *Proc. IEEE ITC*, Nov. 2005, p. 866.
- [11] T. Okayasu, M. Suda, K. Yamamoto, S. Kantake, S. Sudou, and D. Watanabe, "1.83 ps-resolution CMOS dynamic arbitrary timing generator for > 4 GHz ATE applications," in *IEEE ISSCC Dig.*, Feb. 2006, pp. 2122–2131.
- [12] G. Nagaraj, S. Miller, B. Stengel, G. Cafaro, T. Gradishar, S. Olson, and R. Hekmann, "A self-calibrating sub-picosecond resolution digital-totime converter," in *IEEE MTT-S Int. Microwave Symp. Dig.*, Jun. 2007, pp. 2201–2204.
- [13] Y. Shim, Y. Jo, S. Kim, S. Kim, and K. Cho, "A register controlled delay locked loop using a TDC and a new fine delay line scheme," in *Proc. IEEE ISCAS*, Sep. 2006, pp. 3922–3925.
- [14] R. Szplet, J. Kalisz, and R. Szymanowski, "Interpolating time counter with 100 ps resolution on a single FPGA device," *IEEE Trans. Instrum. Meas.*, vol. 49, pp. 879–882, Aug. 2000.

|                      | This work               | [1]                    | [5]                    | [8]                  | [9]                   | [10]                | [22]                      |
|----------------------|-------------------------|------------------------|------------------------|----------------------|-----------------------|---------------------|---------------------------|
| Process              | 65 <i>n</i> m CMOS      | GaAs<br>MESFETs        | Si bipolar<br>(SST-1A) | 0.35µm CMOS          | 0.18µm CMOS           | 0.18µm CMOS         | N/A                       |
| Area / Channel       | N/A*                    | 12.045 mm <sup>2</sup> | N/A                    | $1.685 \text{ mm}^2$ | 5.158 mm <sup>2</sup> | $3.22 \text{ mm}^2$ | N/A                       |
| Linearity Error      | -0.93~0.75 LSB<br>(INL) | ±125ps                 | ±2ps                   | -1~7LSB<br>(INL)     | <35ps                 | $< \pm 4 ps$ (INL)  | ± 250ps at least<br>(INL) |
| Power / Channel      | 3.04 mW                 | 0.9 W                  | 2.7 W                  | 79.2 mW              | 200 mW                | 1.09 W              | 150 W max.                |
| Operation<br>Range   | 59.3 minutes            | 0.25-15.875 ns         | 3 ns                   | 5 ms                 | 2.5 ns                | 937.5 ps            | 999.5 s                   |
| Input Bits           | 51                      | 7                      | N/A                    | 19                   | N/A                   | 9                   | N/A                       |
| Coarse<br>Resolution | 975.60976 ps            | N/A                    | N/A                    | 300 ps               | 100 ps                | 937.5 ps            | N/A                       |
| Fine Resolution      | 1.58 ps                 | 125 ps                 | 8 ps                   | 37.5 ps              | 19.53125 ps           | 1.83 ps             | 5 ps at best              |
| Channel(s)           | 224                     | 1                      | 1                      | 3                    | 50                    | 40                  | 2                         |

TABLE IV Comparison With Previous Works

\* In average, 422 combinational ALUTs and 84 dedicated logic registers consumed per channel

- [15] P. Chen, M.-C. Shie, Z.-Y. Zheng, Z.-F. Zheng, and C.-Y. Chu, "A fully digital time domain smart temperature sensor realized with 140 FPGA logic elements," *IEEE Trans. Circuits Syst. I*, vol. 54, no. 12, pp. 2661–2668, Dec. 2007.
- [16] T.-I. Otsuji, "A picosecond-accuracy, 700-MHz range, Si bipolar time interval counter LSI," *IEEE J. Solid-State Circuits*, vol. 28, no. 9, pp. 941–947, Sep. 1993.
- [17] T.-E. Rahkonen and J.-T. Kostamovaara, "The use of stabilized CMOS delay line for the digitization of short time intervals," *IEEE J. Solid-State Circuits*, vol. 28, no. 8, pp. 887–894, Aug. 1993.
- [18] P. Chen, C.-C. Chen, J.-C. Zheng, and Y.-S. Shen, "A PVT insensitive vernier-based time-to-digital converter with extended input range and high accuracy," *IEEE Trans. Nucl. Sci.*, vol. 54, no. 4, pp. 294–302, Apr. 2007.
- [19] "Quartus II Version 8.0 Handbook, Volume 3 Verification: Section II. Timing Analysis," Altera Corp. [Online]. Available: www.altera.com
- [20] N. Weste and K. Eshraghian, *Principles of CMOS VLSI Design*, 2nd ed. Boston, MA: Addison-Wesley Longman, 1994.
- [21] "Quartus II Version 8.0 Handbook, Volume 2: Design Implementation and Optimization: Section II. Area, Timing, and Power Optimization," Altera Corp. [Online]. Available: www.altera.com
- [22] 81110A 165/330 MHz Pulse/Data Generator. Agilent Corp. [Online]. Available: www.agilent.com
- [23] Virtex-5 FPGA datasheet: DC and Switching Characteristics. Xilinx Corp. [Online]. Available: www.xilinx.com
- [24] PLLs in Stratix II and Stratix II GX Devices. Altera Corp [Online]. Available: www.altera.com
- [25] Clock Networks and PLLs in Stratix III Devices. Altera Corp. [Online]. Available: www.altera.com
- [26] P. Chen, J.-S. Lai, and P.-Y. Chen, "FPGA vernier digital-to-time converter with 3.56 ps resolution and −0.23 ~ +0.2 LSB inaccuracy," in *Proc. IEEE CICC*, Sep. 2008, pp. 209–212.



**Poki Chen** was born in Chia-Yi, Taiwan, in 1963. He received the B.S., M.S., and Ph.D. degrees in electrical engineering from National Taiwan University (NTU), Taipei, Taiwan, in 1985, 1987 and 2001, respectively.

During 1998–2001 and 2001–2006, he was a Lecturer and an Assistant Professor correspondingly in Electronic Engineering Department of National Taiwan University of Science and Technology. He is an Associate Professor in the same department now. His research interests are in analog integrated circuits and systems with special interests focused on time-domain processing circuits, such as time-to-digital converters, time-domain smart temperature sensors, digital pulse generators, digital pulse width modulators and duty cycle correctors.



**Po-Yu Chen** was born in Taoyuan, Taiwan, in 1984. He received the M.S. degree from the Department of Electronics Engineering, National Taiwan University of Science and Technology (NTUST) Taipei, Taiwan, in 2009.

His research interests include mixed-mode integrated circuits design and FPGA application.





His research interests include mixed-mode integrated circuits design and system design.



Yi-Jin Chen was born in Yilan, Taiwan, in 1984. She received the M.S. degree from the Department of Electronics Engineering, National Taiwan University of Science and Technology (NTUST) Taipei, Taiwan, in 2009.

Her research interests include mixed-mode integrated circuits design and system design.