

METROLOGY AND MEASUREMENT SYSTEMS

Index 330930, ISSN 0860-8229 www.metrology.wat.edu.pl



# KINTEX ULTRASCALE'S MULTI-SEGMENT DIGITAL TAPPED DELAY LINES WITH CONTROLLED CHARACTERISTICS FOR PRECISE TIME-TO-DIGITAL CONVERSION

Robert Frankowski<sup>1)</sup>, Maciej Gurski<sup>1)</sup>, Ryszard Szplet<sup>2)</sup>

1) Institute of Engineering and Technology, Faculty of Physics, Astronomy and Informatics,

Nicolaus Copernicus University in Toruń, ul. Grudziądzka 5, 87-100 Toruń, Poland (🖂 robef@umk.pl)

2) Faculty of Electronics, Military University of Technology, ul. Kaliskiego 2, 00-908, Warsaw, Poland

## Abstract

This paper describes an efficient method of designing and implementing in FPGA devices complex tapped delay lines (CTDL) with pico and sub-picosecond resolution. Achieving a higher resolution and better linearity is possible by appropriate selection of single time coding tapped delay lines (TDL) involved in creation of CDTL. The proposed TDL selection algorithm significantly optimizes the size of the device's logical resources required to implement CDTL with assumed parameters and provides a proper selection scenario. Ultimately, the presented solution allows to create CTDLs with different user-defined configurations based on a fixed set of available logical resources. Therefore, it is particularly recommended for prototyping in smaller FPGA devices. In this work, we investigate how the order of line selection influences the increase of the multiple time coding lines resolution. Furthermore, we determine the relation between the equivalent resolution value and the number of TDLs involved. Obtained results allow to estimate the upper limit of resolution that can be achieved using a given technology. In addition, the ranges of resolutions achievable with a fixed number of lines is also examined. The presented research results have been performed on a Kintex UltraScale FPGA chip, manufactured by Xilinx in the 20-nm CMOS process.

Keywords: delay line, precise time metrology, time to digital converter, field programmable gate arrays.

#### 1. Introduction

Developments of technology and research methods have led to a growing interest in more and more precise measurement of time intervals between physical events. Highly precise time measurements are crucial in the study of physical particles and structure of matter, determination of isotopic composition, distance measurements, flow intensity, signal phase fluctuations and data transmission. Currently, the time interval measurement systems are based on integrated *time-to-digital converters* (TDCs), which are often implemented in *field programmable gate array* (FPGA) devices due to the flexibility of such a solution and the relatively low price of development.

Copyright © 2024. The Author(s). This is an open-access article distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives License (CC BY-NC-ND 4.0 https://creativecommons.org/licenses/by-nc-nd/4.0/), which permits use, distribution, and reproduction in any medium, provided that the article is properly cited, the use is non-commercial, and no modifications or adaptations are made.

Article history: received January 16, 2024; revised February 29, 2024; accepted March 7, 2024; available online March 15, 2024.

The measurement resolution of TDCs is mainly related to the selected measurement method and the microelectronic technology used. The most popular digital conversion methods are based on coding information of the measured time interval in TDL. For such TDC architectures, further increase of resolution is possible by using simultaneous measurement in multiple TDLs [1,2] or by encoding multiple edges in a single line [3,4]. There are also cases where both methods are used simultaneously [5]. In FPGAs, TDLs are typically built using elements such as buffers, logic gates or provided by *configurable logic blocks* (CLB): embedded *look-up tables* (LUT), fast *carry chains components* (CARRY) and D-type flip-flops or latches. The mechanism of clock distribution and CLB logic blocks connections [6] are also important in this case. The above components and carry chain patches are characterized by very small propagation times (of the order of tens or even single picoseconds) and they are strongly correlated with the technology used. The newer technology provides shorter propagation times and, consequently, better resolution. However, this is not a preferable design solution because it requires changing the *integrated circuit* (IC).

Achieving a higher resolution in the same IC is possible by appropriate selection of delay segments of the multiple TDLs [7]. The selected set is specified at the design process and its size and composition depends on available logical resources intended for TDL implementation. Therefore, the proposed solution is particularly important for designs to be implemented in small FPGA structures. A suitable example of such a situation is prototyping in *multiprocessor systems-on-chip* (*MPSoC*) structures, where the logical resources of the programmable part of the device are limited by the processing system area. Proposed in this article selection algorithm allows for a significant reduction in the required logical resources and introduces a potential possibility to increase the multiple time coding lines resolution.

This paper is organized as follows. In the first step we analyze digital methods of precise time interval measurement with different CTDL architectures obtained as a composition of multiple tapped delay lines. Next, the principle of increasing the line resolution is presented for selected solution (Section 2). In Section 3, we explain the essential issues concerning the Kintex UltraScale's TDL implementation. The crucial information about the bubble and metastability errors minimization [8, 9] with a sorting procedure have been described in Section 4. In the next sections, we propose a line selection algorithm developed to reduce the value of equivalent resolution, thereby to increase the resolution and linearity of TDC conversion. Finally, we present the experimental results of possible variants of CTDL creation and summarize our study.

#### 2. CTDL implementations

In many measurements, it is important to accurately determine the timing of random events and their relationships. This necessitates the design of a wide spectrum of TDC architectures with metrological parameters adequate to the observed phenomenon [10–12]. Usually, the precision of TDC conversion in such systems is determined by a phase measurement module made of one or more TDLs. The TDLs are used to discreetly delay a signal according to the propagation times of logic resources used in the design process.

In order to increase the resolution of TDLs built in programmable integrated circuits, the construction of independent multiple delay lines [13] is used. An example of such lines is presented in [14]. The proposed carry-chain based four-stage merged TDL architecture has been implemented in a Kintex-7 device. A common reference clock signal at the frequency of 554 MHz was used to obtain a 760-tap TDL with 2.37 ps resolution. When just averaging the results read from multiple TDLs, the standard deviation of the measurement decreases with the square root of the number of TDLs. In the next article [15], the 128 pairs of TDLs (separately for the START and STOP)

pulse) are used to average the result of a single measurement. The system implemented in an *application specific integrated circuit* (ASIC) yielded a root-mean-square error ( $\sigma_{rms}$ ) of 3 ps with 64-tap delay lines and an average resolution of 71 ps. The TDL architecture based on LUTs arrays was implemented in Xilinx's Virtex XCV300 FPGA device. A 500 ps resolution TDL was obtained by utilizing two LUT-based delay lines with a resolution of 1 ns [16]. The difference in the propagation times of the signals coming out of the LUT array table located in the CLB and the signals at the LUT inputs of the other two CLBs was used in this case. The position of the CLBs was determined experimentally so that the offset of these signals is half the delay of one TDL. In order to linearize the characteristics of a TDL, one to several tri-state buffers were connected to the information outputs of appropriate CLBs [17]. This resulted in increasing the time constant of a given output and increasing the time of the rising edge of the output signal, yielding, in turn, a change in the characteristics of the entire TDL to a more linear one. A similar procedure was applied to implement the quadruple line with 250 ps resolution. Further use of additional CARRY chain components such as MUXCY and XORCY enables the creation of delay lines with a resolution better than 50 ps.

Very popular in FPGAs, the CARRY chain components, used to design high-speed arithmetic circuits, are characterized by the best timing parameters. The single delay varies from a few to tens of picoseconds and depends on the technology used. The timing parameters of this component also show a weak temperature dependence compared to other logic blocks. Therefore, these components are increasingly used in the creation of TDLs. For example, in the paper [18] a delay line made of CARRY chains components was presented. The delay line was characterized by high nonlinearity because segments of higher delay were interleaved with segments of lower delay, and their delays were spread from 30 ps to 110 ps. Such properties of lines built from CARRY chain elements are their serious drawback when trying to build a highly linear TDL. However, this property can be exploited when implementing an *equivalent coding line* (ECL) [2].

The principle of ECL creation is shown in Fig. 1. The above method involves using at least two TDLs with different delay characteristics. This is a typical situation in FPGAs, due to the different delays of signal distribution paths, process variation, *etc.* Taking into account the widths of the quantization steps, as well as their relative offsets, it can be determined which bins from different TDLs overlap completely or partially. In this way, new bins with much smaller widths than the bins from the original TDLs can be determined, and their number is marked as n + m - 1, where n and m represent the numbers of bins of the first and second TDLs, respectively. By applying the ECL method for 16 TDLs implemented in a Spartan-6 device, the resolution of 1.2 ps was achieved [2].



Fig. 1. Principle of the equivalent coding line (ECL) creation.

# 3. UltraScale's TDLs architecture

A simple CLB of Xilinx Kintex UltraScale FPGA [19] consists of eight six-input LUTs (which can be configured also as five inputs for two functions functionality), sixteen flip-flops (twice grouped) and appropriate carry chains multiplexers and xor gates. The interesting CLBs functionality is arranged around a single carry chain column. For experimental purposes the TDL with 480 taps was implemented using the structural VHDL coding method. The CARRY8 component (Fig. 2) from the Xilinx manufacturer's library (for the Kintex UltraScale chip) was used in this implementation. To each of these taps it is possible to connect two flip-flops, which results in a maximum of 960 D-type flip-flops that can be attached to the entire TDL. This line occupies 480/8 = 60 CLBs. This value is adequate to the number of CLBs which are possible to implement inside one column of carry chains path placed in one clock region area. This approach guarantees that all of the 960 flip-flops can be triggered directly at the same time, without additional delays provided by additional logic resources (*e.g.* clock buffers) and connections. In other cases, when the trigger passes through two or more clock regions, we observe significant differences in propagation times.



Fig. 2. Simplified diagram of CARRY8 with flip-flops in a single CLB.

The assignment of TDL layout built on a carry chain path is ensured by adequate description in the *Xilinx Design Constraints* (XDC) file. In this file, in addition to assigning leads and specifying the clock signals frequency, the designer can determine the position of a TDL module by designating the implementation area. A characteristic feature of TDLs built using an arithmetic carry-chain elements is strong non-linearity resulting from differences in the data and clock signals propagation times. Therefore, the implementation process requires prior elimination of unusable line segments and their appropriate sorting in the next stage. For the Kintex UltraScale XCKU040-2FFVA1156E device the calculated average delay value for that TDL architecture is about 5.3 ps.

### 4. Sorting process flow

In the case of single-stage converters [20], periodic waveforms of the reference clock are recorded, creating sequences of zeros and ones, instead of the result in a thermometric code, as occurs in interpolators. The clock phase decoding process is performed by using priority decoders. Such decoder identifies the first occurrence of the "10" or "01" sequence and, based on this information, determines the phase of the reference clock.

#### Metrol. Meas. Syst., Vol. 31 (2024), No. 2, pp. 417–429 DOI: 10.24425/mms.2024.149697

In the recorded reference clock waveforms, obtained from carry chain based TDLs we relatively often observed the bubble errors [21]. Bubble error refers to undesirable bit sequences occurring as a result of signal processing by flash converters. These errors occur due to chip manufacturing spreads, noises, disruptions and imperfect distribution of the clock signals around the TDL's flip-flops. As a result of the above-mentioned factors, it may happen that as a result of the conversion process, instead of uniform sequences of ones and zeroes, we obtain "101" or "1001" (as well as "010" and "0110", *etc.*) sequences especially at the boundary of a logic state change between neighboring bits from "1" to "0" (and analogously "0" to "1"). This situation is well known in ADCs as well. When verifying the performance of a high-resolution delay line, bubble error will be evident in the form of quantization steps which will have no counts registered when decoding with a priority decoder (Fig. 3). This results in fewer effective bins, *i.e.* those with any counts registered during calibration. The TDL calibration and taps sorting process in most cases involves feeding to the delay line under test either signals from a precision timestep generator or a HIT signal delayed adequately to a reference clock [22, 23].



Fig. 3. Characteristic of TDL0 with unsorted flip-flops,  $\overline{q} = 16.39$  ps,  $q_{eqv} = 25.88$  ps.

For the tested set of TDLs (implemented in the X1Y0 and X2Y0 clock domains), the segment sorting procedure consists of two stages. In the first stage, *i.e.* in the data acquisition mode, the input hits are registered without decoders. In this mode, an external generator signal that is not correlated with the system clock frequency is applied to the measurement input. Then the waveform registration mode of the external generator is initiated and the waveforms of the reference clock latched in the flip-flops are stored in the BRAM memory. When the BRAM memory is full, its contents are copied to an external DDR memory for post-processing on the PC. During the copying process, the data is searched by a MicroBlaze processor for the occurrence of a previously stored waveform. If this waveform has already occurred before, the occurrence counter of this waveform is incremented, otherwise a new waveform with an empty occurrence counter is stored. After recording measurements in the order of  $10^6 - 10^8$ , the data are transferred to the PC, where the software performs identification and sorting procedures. As a result, a modified order of delay line segments is obtained. The reorganized delay line is characterized by significantly better mean and equivalent resolutions. For example, for the test TDL the mean resolution has been improved threefold (from 16.39 ps to 5.33 ps), while the equivalent resolution has been improved twice (from 25.88 ps to 12.26 ps, Fig. 4). Such a delay line can then be connected to the decoder using a modified connection matrix.





Fig. 4. Characteristic of TDL0 with sorted flip-flops,  $\overline{q} = 5.33$  ps,  $q_{eqv} = 12.26$  ps.

# 5. Selection criteria

The methodology of CTDL creation depends on the adopted selection criterion. The basic criterion is the creation of a CTDL with a preset linearity or expected equivalent resolution. High linearity is a one of the most important parameters in systems where the shape of transfer characteristic [24] is not taken into account during calculation of the measurement result. In such a case, the maximum conversion error due to *integral nonlinearity* (INL) is assumed, and the measurement result is taken as the product of the mean value of the quantization steps and the number of the quantization steps involved. In this type of measurement system, the maximum difference in quantization steps should not exceed 10% of the mean value. The second criterion is particularly important for strongly non-linear transfer characteristics that are quite often observed for CTDLs implemented in FPGA devices. In this case, the equivalent resolution is the more accurate mean-square value of the quantization error. It corresponds to the value of the mean resolution of the ideal converter whose quantization error is equal to the quantization error of the ideal converter for real quantization steps. The equivalent resolution is given by the formula [2]:

$$q_{\rm eqv} = \sqrt{\frac{1}{T_0} \sum_{i}^{n} q_i^3},\tag{1}$$

where  $q_i$  represents *i*-th quantization steps and  $T_0$  is the reference clock period. Let *W* denote the set of all TDLs that can be used in a CTDL. The obtained CTDL should have the smallest equivalent resolution value. If there are *n* TDLs in the set, it is sufficient to make an assemblage of each TDL with every other TDL, as described by the relation:

$$L = \frac{n\left(n-1\right)}{2}.$$
(2)

In the above formula *n* represents the number of quantization steps. In the case of a measurement system with one CTDL where, from a set *W* of count *n*, *k* elements can be used (TDL), the number of possible combinations  $L_k$  is given from the binomial theorem:

$$L_k = \begin{pmatrix} n \\ k \end{pmatrix} = \frac{n!}{(n-k)!k!}.$$
(3)

Equation (3) gives the number of different combinations of k elements that can be chosen from an *n*-element set.

# 6. CTDL selection algorithm

In practice, it is impossible to obtain an ideal TDL with equal quantization bins. Therefore, this paper proposes a multi-line selection algorithm to improve equivalent resolution. The algorithm seeks to achieve the most uniform quantization steps by selecting TDLs with complementary characteristics. Depending on the available logic resources of the FPGA chip and the criteria imposed by the designer, the target CTDL can be built based on all or part of the set of TDLs used. At the beginning of the design stage, the metrological parameters of the system and the resources needed to implement the system are initially determined. Based on this, the number of TDLs used in the target CTDL is determined. In the described work, a CTDL with 16 TDLs was created on the basis of a set of 32 TDLs. The TDLs are implemented at various locations in the FPGA, where their characteristics and time offsets between each TDL are determined. Based on the obtained data, the algorithm searches for the best combinations of TDLs included in the CTDL. An equivalent line is created from the selected TDL subset, and then the equivalent resolution value is calculated. According to the formula (3), the number of possible combinations to check for a subset of 16 TDLs out of 32 is  $601.1 \cdot 10^6$ .

It is possible to use the direct method of checking all combinations and selecting the best one that provides the smallest value of equivalent resolution, however, this method is expensive for computational reasons. Therefore, approximate methods were used, significantly reducing the required time. The first method involves randomly selecting a subset of 16 TDLs from a set W, performing calculations and storing the combination with the smallest value of equivalent resolution. The number of iterations in this method is determined in advance. Due to the randomness, the  $q_{eqv}$  value as well as the TDL subset may vary, after each run of the program.

Another method is to select the best CTDL pair. This method involves selecting a CTDL composed of two different TDLs from all possible combinations, which as a result of the assembly, yields the smallest equivalent resolution value. Then, to the subset thus selected, another TDL is checked and then selected to always obtain the smallest value of equivalent resolution. As a result of this method, the  $q_{eqv}$  value is not the smallest one, but it is compensated by the significantly reduced computation time.

The algorithms were implemented in Python Version 3.x programming language [25]. In order to speed up the calculations, the numpy numerical calculation library was used. The program uses the itertools library, whose purpose is to generate permutations with TDL numbers from the set. Unfortunately, the itertools library requires a large amount of computer RAM. The algorithms computation results are presented in Table 1.

| Algorithm variant                          | q <sub>eqv</sub> [ps] | Computation time [s]   |
|--------------------------------------------|-----------------------|------------------------|
| Direct method                              | 0.870                 | 151239.880 (c.a. 42 h) |
| Random selection (10 <sup>5</sup> repeats) | 0.885                 | 41.370                 |
| Select the best pair                       | 0.899                 | 0.074                  |

Table 1. Comparison of computation results for the algorithm variants performed on a PC with an I9-10850K processor, 128 GB RAM for a system of 16 TDLs selected from a 32 TDL set.

The best results are obtained by the direct method, but they come at the cost of a significant computation time. The pairwise selection method is instantaneous in comparison with the direct method, but the results will always be the same and will differ from the values found by the direct method. The random method is slower than the pairwise selection method, but it gives a chance to find the minimum value of  $q_{eqv}$  because the combination with the smallest value of  $q_{eqv}$  may be selected by chance if the number of iterations is significant relative to the number of combinations for a given number of TDLs in the CTDL.

R. Frankowski, M. Gurski, R. Szplet: KINTEX ULTRASCALE'S MULTI-SEGMENT DIGITAL TAPPED DELAY LINES....

#### 7. Experimental results

The results of computer tests of the considered algorithms indicate that the method of selecting the best pair gives only a slightly worse equivalent resolution than the direct method (only 3.2%), providing a very short calculation time (74 ms). Therefore, this method was further investigated more thoroughly. To examine the effectiveness of the selection algorithm we chose TDLs with worse characteristics. The wider bins at their beginning are probably results of deteriorating the clock's distribution parameters such as clock skew or propagation time at the edge of the clock region. The example composition of two TDLs giving the best result (smallest equivalent resolution value) is shown in Fig. 5. For this case, selected TDL16 (tapped delay line number 16) with 368 delay segments and the equivalent resolution  $q_{eqv} = 13.19$  ps as well as TDL29 with 368 segments and equivalent resolution  $q_{eqv} = 13.77$  ps were used. For both TDLs, the time shift between lines was taken into account. The assembly of these lines resulted in a CTDL with 735 segments and equivalent resolution  $q_{eqv} = 6.33$  ps.



Fig. 5. Quantization bin widths of a) TDL16, b) TDL29, before the combining operation, c) the result of combining the two TDLs a) and b) into a one CTDL.

Based on the very promising results of combining two TDLs, a collection of 32 TDLs was then used to create a CTDL composed of selected 16 TDLs. According to the formula (3), the number of possible variants for that configuration is equal to  $601.1 \cdot 10^6$ . With the increase of the

number of possible lines in the W collection, the number of combinations of their compositions significantly grows. The process of analyzing all possible solutions is based on the algorithm proposed in Section 6. Thus, the test compositions were made for 4, 8, 16 and 32 lines. The experimental results are shown in Fig. 6. The obtained equivalent resolutions are 3.23 ps, 1.40 ps,



Fig. 6. Quantization bin widths of CTDLs composed of 4 a), 8 b), 16 c) and 32 TDLs d).

0.87 ps, 0.50 ps, respectively. The last value of  $q_{eqv}$  is the minimum value for the set of all 32 TDLs, and no better performance can be achieved for this set. The values of equivalent resolution were also checked for all possible combinations for a set of 32 TDLs. In Fig. 7, the obtained results of equivalent resolution values for different numbers of TDLs included in the CTDL are presented. Equivalent resolution values for individual TDLs are also calculated there.



Fig. 7. Equivalent resolution values obtained for the complex tapped delay line (CTDL) based on all possible TDLs combinations. The range of equivalent resolution values for individual TDLs is highlighted in green. The resolution ranges for 16- and 26-fold CTDLs are marked in red and blue, respectively.

The research results presented in Fig. 7 show that, using the line selection algorithm, it is possible to determine the ranges of equivalent resolutions for each of specified number of lines combined into a CTDL. Thus, performing in the design process of the system with a fixed number of lines, random lines implementations will not necessarily be the best solution. Only the use of the described algorithm allows to assess whether the implementation performed has given the best result. By determining the ranges of possible resolutions for each set of lines, one can also postulate the potential parameters of the TDC ultimately implemented in a given structure using a given set of lines. A smaller set W of TDLs usually reduces the probability of obtaining the best equivalent resolution. For a set consisting of 16 lines, the equivalent resolution of a CTDL16 was 0.96 ps. Increasing the W set to 32 lines has improved an equivalent resolution to 0.87 ps. Moreover, it is worth noting that any combination of two TDLs always improves  $q_{eqv}$  significantly. However, any combination of random TDLs does not automatically improve resolution. In a particular case, one can get a CTDL with more TDLs and obtain worse parameters than for fewer TDLs in the CTDL. The widest spread of  $q_{eqv}$  values for the 16-fold CTDL is mainly the result of a larger number of possible combinations than in other cases, for a set of 32 TDLs. Additionally, from the Fig. 7 it can also be read how many matched lines should be used in the measurement system to obtain a given value of equivalent resolution, or how many randomly matched lines should be involved in the system to meet the assumed parameters.

# 8. Conclusions

In high resolution TDCs implemented in FPGAs, one of the methods preferred to increase the system resolution is to increase the number of multi-segment delay lines used in parallel. Typically, the transfer characteristics of these type of converters are strongly nonlinear. Therefore, in systems with multiple multi-segment delay lines, only an appropriate selection can guarantee a further increase in their resolution. Lines randomly combined into a single measurement channel can only sporadically prove to be an effective operation. As shown in the TDC system with 16 multi-segment delay lines, it is possible to obtain an equivalent resolution value of 0.870 ps, but random selection from a set of 32 TDLs, in the most unfavorable case, can deteriorate the resolution value to 1.943 ps (more than 200%) - for this particular case. In the same system for 15 TDLs, the spread of changes in equivalent resolution is in the range from 0.934 ps to 1.718 ps. It can happen, in a very unfavorable scenario, that by adding another TDL, instead of an improvement, one can get a deterioration of the equivalent resolution value, and thus a deterioration of the metrological parameters of the whole system.

The use of the delay line selection algorithm presented in this article makes it possible to significantly improve the resolution and linearity of the time-interval measurement system. The experimental results obtained on FPGA's Kintex UltraScale device show that from a set of 32 multi-segment delay lines, by appropriate selection of the lines, it is possible to improve the equivalent resolution without increasing the required hardware resources. In other words, to achieve a similar value of equivalent resolution (in comparison with a 16-fold CTDL), without selecting multi-segment delay lines for the system, the system would have at least 26 multi-segment delay lines (Fig. 7) in the measurement channel ( $q_{eqv}$  ranges from 0.58 to 0.83 ps). This saves more than 60% of the resources required for the above CTDL implementation.

### Acknowledgements

This research was funded in part by the National Science Centre, Poland, Grant No. 2021/05/X/ST7/00730.

# References

- Chaberski, D., Frankowski, R., Gurski, M., & Zieliński, M. (2017). Comparison of Interpolators Used for Time-Interval Measurement Systems Based on Multiple-Tapped Delay Line. *Metrology and Measurement Systems*, 24(2), 401–412. https://doi.org/10.1515/mms-2017-0033
- [2] Szplet, R., Jachna, Z., Kwiatkowski, P., & Rozyc, K. (2013). A 2.9 ps equivalent resolution interpolating time counter based on multiple independent coding lines. *Measurement Science and Technology*, 24(3), 1–15. https://doi.org/10.1088/0957-0233/24/3/035904
- [3] Wu, J., & Shi, Z. (2008). The 10-ps wave union TDC: Improving FPGA TDC resolution beyond its cell delay. Proceedings of the IEEE Nuclear Science Symposium Conference Record, Dresden, 3440–3446. https://lss.fnal.gov/archive/2008/conf/fermilab-conf-08-498-e.pdf
- [4] Kwiatkowski, P., Sondej, D., & Szplet, R. (2023). Subpicosecond resolution time interval counter with multisampling wave union type B TDCs in 28 nm FPGA device. *Measurement*, 209, 112510. https://doi.org/10.1016/j.measurement.2023.112510
- [5] Xie, W., Chen, H., & Li, D. D.-U. (2022). Efficient Time-to-Digital Converters in 20 nm FPGAs with Wave Union Methods. *IEEE Transactions on Industrial Electronics*, 69(1), 1021–1031. <u>https://doi.org/10.1109/tie.2021.3053905</u>
- [6] Kalisz, J., Szplet, R., Pasierbinski, J., & Poniecki, A. (1997). Field-programmable-gate-array-based time-to-digital converter with 200-ps resolution. *IEEE Transactions on Instrumentation and Measurement*, 46(1), 51–55. https://doi.org/10.1109/19.552156
- [7] Frankowski, R., Gurski, M., & Płóciennik, P. (2016). Optical methods of the delay cells characteristics measurements and their applications. *Optical and Quantum Electronics*, 48(1), 1–19. https://doi.org/10.1007/s11082-016-0465-6

R. Frankowski, M. Gurski, R. Szplet: KINTEX ULTRASCALE'S MULTI-SEGMENT DIGITAL TAPPED DELAY LINES....

- [8] Rahman, M., Baishnab, K. L., & Talukdar, F. A. (2010, February). A novel ROM architecture for reducing bubble and metastability errors in high speed flash ADCs. In 2010 20th International Conference on Electronics Communications and Computers (CONIELECOMP) (pp. 15–19). IEEE. https://doi.org/10.1109/CONIELECOMP.2010.5440805
- [9] Gurski, M., Frankowski, R., & Zieliński, M. (2021). Algorytmy minimalizacji błędu bąbelkowego w precyzyjnej metrologii odcinka czasu. *Przegląd Elektrotechniczny*, 97(10), 100–102. <u>https://doi.org/10.15199/48.2021.10.20</u> (in Polish)
- [10] Szplet, R., & Klepacki, K. (2010). An FPGA-integrated time-to-digital converter based on twostage pulse shrinking. *IEEE Transactions on Instrumentation and Measurement*. 59(6), 1663–1670. https://doi.org/10.1109/TIM.2009.2027777
- [11] Park, B. K., Kim, Y., Kwon, O., Han, S., & Moon, S. (2015). High-performance reconfigurable coincidence counting unit based on a field programmable gate array. *Applied Optics*, 54(15), 4727–4731. https://doi.org/10.1364/AO.54.004727
- [12] Machado, R., Cabral, J. & Alves, F. S. (2019). Recent Developments and Challenges in FPGA-Based Time-to-Digital Converters. *IEEE Transactions on Instrumentation and Measurement*, 68(11), 4205–4221. https://doi.org/10.1109/TIM.2019.2938436
- [13] Zieliński, M. (2009). Review of single-stage time-interval measurement modules implemented in FPGA devices. *Metrology and Measurement Systems*, 16(4), 641–647.
- [14] Wang, Y., Cao, Q., & Liu, C., (2018). A Multi-chain Merged Tapped Delay Line for High Precision Time-to-Digital Converters in FPGAs, *IEEE Transactions on Circuits and Systems II: Express Briefs*, 65(1), 96-100. https://doi.org/10.1109/TCSII.2017.2698479
- [15] Jansson, J.-P., Keränen, P., Jahromi, S., & Kostamovaara, J. (2020). Enhancing Nutt-Based Time-to-Digital Converter Performance with Internal Systematic Averaging. *IEEE Transactions on Instrumentation and Measurement*, 69(6), 3928–3935. https://doi.org/10.1109/TIM.2019.2932156
- [16] Zieliński, M., Chaberski, D., Kowalski, M., Frankowski, R., & Grzelak, S. (2004). High-resolution time-interval measuring system implemented in single FPGA device. *Measurement*, 35(3), 311–317. https://doi.org/10.1016/j.measurement.2003.12.001
- [17] Chaberski, D., Frankowski, R., Zieliński, M., & Zaworski, Ł. (2016). Multiple-tapped-delayline hardware-linearisation technique based on wire load regulation. *Measurement*, 92, 103–113. https://doi.org/10.1016/j.measurement.2016.06.002
- [18] Frankowski, R., Chaberski, D., & Kowalski, M. (2015). An optical method for the time-to-digital converters characterization. *Proc. IEEE ICTON 2015*, Budapest, Hungary, paper We.P.14, 1–4. https://doi.org/10.1109/ICTON.2015.7193659
- [19] Xilinx. UltraScale Architecture Configurable Logic Block User Guide, UG574 (v1.5) February 28, 2017. https://docs.xilinx.com/v/u/en-US/ug574-ultrascale-clb
- [20] Szplet, R. (2014). Time-to-Digital Converters. Design, Modeling and Testing of Data Converters, Springer, pp. 211–246. <u>https://www.researchgate.net/publication/292674565\_Time-to-Digital\_Converters</u>
- [21] Ito, S., Nishimura, S., Kobayashi, H., Uemori, S., Tan, Y., Takai, N., Yamaguchi, T. J., & Niitsu, K. (2010). (2010, December). Stochastic TDC architecture with self-calibration. In 2010 IEEE Asia Pacific Conference on Circuits and Systems (pp. 1027–1030). IEEE. https://doi.org/10.1109/APCCAS.2010.5774740

- [22] Chen, Y. H. (2013, May). A high resolution FPGA-based merged delay line TDC with nonlinearity calibration. In 2013 IEEE International Symposium on Circuits and Systems (ISCAS) (pp. 2432–2435). IEEE. https://doi.org/10.1109/ISCAS.2013.6572370
- [23] Zheng, Y., Mei, S., Sun, S., & Zhao, Y. (2024). A digital background calibration method for timeinterleaved ADCs based on frequency shifting technique. *Metrology and Measurement Systems*, 31(3). https://doi.org/10.24425/mms.2024.150282
- [24] Sondej, D., Szymanowski, R., & Szplet, R. (2021). Methods of precise determining the transfer function of picosecond time-to-digital converters. *Metrology and Measurement Systems*, 28(3), 539–549. https://doi.org/10.24425/mms.2021.137697
- [25] Van Rossum, G., & Drake, F. L. (2009). Python 3 Reference Manual. Scotts Valley, CA: CreateSpace.



Robert Frankowski received the M.Sc. degree in physics, specialty: physical basics of microelectronics from the Nicolaus Copernicus University (NCU), Torun, Poland, in 2001. In 2011 he received the Ph.D. degree in Technical Sciences from the Warsaw University of Technology. From 2006 to 2008, he participated in experiments conducted at the National Laboratory for Atomic, Molecular and Optical Physics. In 2009 he received the team Award of

the Rector of the NCU for Scientific and Research Activities for developing the methodology for the design and verification of accurate time interval measurement systems. He is currently an Assistant Professor at the Institute of Engineering and Technology, NCU, Torun. His main research interest is: designing precision measuring and control systems integrated in the field programmable gate arrays technology used in the time interval metrology for specific applications in many branches of science.



Ryszard Szplet received the M.Sc. degree in electronic engineering and the Ph.D. and Habilitation degrees in applied sciences from the Military University of Technology (MUT), Warsaw, Poland, in 1989, 1997, and 2013, respectively. From 2000 to 2001, he spent one year as a researcher with the University of Oulu, Oulu, Finland. He is currently Full Professor with the MUT (title obtained in 2019), where he is also the Dean of the Faculty of Electronics. He has been involved in various re-

search projects, sponsored by public and private funds. He has authored or coauthored over 140 papers appearing in international journals and conference proceedings. He is a member of the Committee on Metrology and Scientific Instrumentation of the Polish Academy of Sciences (PAS). His current research interests include fast digital electronics as well as methods and techniques for precise time metrology, especially instrumentation for advanced time interval and frequency measurements.



Maciej Gurski was born in Toruń, Poland in 1977. He received the M.Sc. (2002) degree from Nicolaus Copernicus University in Toruń. His main research interest is developing precise methods and instruments for time metrology and control systems in the FPGA technology.