Development of a Data Concentrator ASIC for the High Luminosity upgrade of the CMS Outer Tracker detector with tracking trigger Présentée le 10 juin 2021 Faculté des sciences et techniques de l'ingénieur Laboratoire de systèmes microélectroniques Programme doctoral en microsystèmes et microélectronique pour l'obtention du grade de Docteur ès Sciences par #### Simone SCARFÌ Acceptée sur proposition du jury Prof. G. De Micheli, président du jury Prof. Y. Leblebici, Dr K. Kloukinas, directeurs de thèse Dr P. Valerio, rapporteur Dr R. Beccherle, rapporteur Prof. A. P. Burg, rapporteur ### **Abstract** With the increasing capabilities of the microelectronics technology, future particle detectors in high energy physics will be able to yield high level features that are not only simple geometrical positions or energy measurement in the silicon sensors used, but also high level primitives. The ability to compute such high level primitives in near real-time is what we characterize as "intelligence" and will allow constructing detectors with novel functionalities and easing the offline analysis in experiments to handle immediately more complex features of the measurements. This thesis presents a novel approach adopted for the development of silicon sensor detectors capable of rejecting locally signals from low transverse momentum $(p_T)$ particles. The basic concept consists of correlating signals in two closely-spaced sensors. The readout and control electronics for these type of sensors require the development of a set of ASICs that incorporate dedicated specialized signal processing techniques, generating in real time trigger primitives and transmitting them to the Level-1 (L1) central trigger system, while, at the same time, transmitting the L1 triggered data to the detector readout system. In order to exploit the full bandwidth of the optical fiber transmission links it is necessary to implement a data concentrator ASIC, called CIC (Concentrator IC). The thesis focuses on system level studies needed to assist and develop the CIC ASIC architecture. In this context, system level simulations and modeling with modern EDA tools have been employed to optimize the readout chain and set synchronization techniques for the interconnects with the readout ASICs. The thesis describes also the CIC ASIC architecture and the prototype results to evaluate final performances. The developed CIC ASIC should be able to operate within a very tight power budget and in a radiation environment of 100 Mrad. Radiation tolerance design techniques have been employed in order to mitigate the effects of Total Ionizing Dose as well as of Single Event Upsets. For the implementation of the readout ASICs it is proposed to use a 65 nm CMOS process. During my doctoral studies I have developed a new methodology for readout chain modeling. In the target application, particles hits represent the input vectors of the readout chain. I could extract statistics from Monte Carlo data from physics simulations in worst case scenario and create a programmable input stimuli for a complex simulation environment with the full readout chain of the front-end ASICs, via the CIC ASIC to the off-detector electronics. Thanks to the simulation environment, developed in UVM and SystemVerilog, I could study and compare different architectures in terms of performances, power consumption and radiation tolerance. For the first time a clock-cycle accurate simulation of the complete readout electronics chain (from sensors to off-detector data receivers) is utilised in a particle physics experiment to enable the development of a chipset of ASICs and to facilitate their functional and performance verification. My work is focusing primarily on the following topics: - help in the development of the CIC ASIC to decrease the required readout bandwidth, power budget and radiation tolerance; - architectural choices at RTL level for power optimization; - development of radiation hardening techniques for the whole ASIC; - full design from RTL, through synthesis, implementation and verification signoff of digital IP blocks, as the configuration block. Main requirements were radiation tolerance and power consumption optimization; - FIFO sizing in the ASICs composing the readout chain to achieve a buffer inefficiency of $10^{-6}$ in worst case particles occupancy and based on Poissonian arrival time of the signal requesting the full event; - high energy particles can potentially flip every node in the circuit in the 65 nm CMOS technology. I have added a Single Event Effect (SEE) component to emulate particles effects on ASIC node to reproduce corrupted data packets at system level. I have helped developing a robust and complex FSM at the level of MPA and CIC ASICs for handling better system level synchronization issues arising due to SEE or FIFO overflow. #### Résumé Avec les capacités croissantes de la technologie microélectronique, les futurs détecteurs de particules en physique des hautes énergies pourront acquérir des caractéristiques de haut niveau qui ne sont pas seulement de simples positions géométriques ou des mesures d'énergie dans les capteurs au silicium utilisés, mais également des primitives de haut niveau. La capacité de calculer ces primitives de haut niveau en temps quasi réel est ce que nous qualifions d'intelligence et permettra de construire des détecteurs avec de nouvelles fonctionnalités et de faciliter l'analyse hors ligne dans les expériences pour traiter immédiatement des caractéristiques plus complexes des mesures. Cette thèse présente une nouvelle approche adoptée pour le développement de détecteurs de capteur au silicium capables de rejeter localement des signaux de particules à faible impulsion transversale (p<sub>T</sub>). Le concept de base consiste à corréler les signaux dans deux capteurs très rapprochés. L'électronique de lecture et de contrôle pour ce type de capteurs nécessite le développement d'un ensemble d'ASIC qui intègrent des techniques de traitement de signal spécialisées dédiées, générant en temps réel des primitives de déclenchement et les transmettant au système de déclenchement central de niveau 1 (L1), tout en en même temps, transmettre les données déclenchées par L1 au système de lecture du détecteur. Afin d'exploiter toute la bande passante des liaisons de transmission par fibre optique, il est nécessaire de mettre en œuvre un concentrateur de données ASIC, appelé CIC (Concentrator IC). La thèse se concentre sur les études au niveau du système nécessaires pour aider et développer l'architecture CIC ASIC. Dans ce contexte, des simulations et une modélisation au niveau du système avec des outils EDA modernes ont été utilisées pour optimiser la chaîne de lecture et définir des techniques de synchronisation pour les interconnexions avec les ASIC de lecture. La thèse décrit également l'architecture CIC ASIC et les résultats du prototype pour évaluer les performances finales. L'ASIC CIC développé devrait être capable de fonctionner dans un budget de puissance très restreint et dans un environnement de rayonnement de 100 Mrad. Des techniques de conception de tolérance aux radiations ont été employées afin d'atténuer les effets de la dose ionisante totale ainsi que des perturbation par une particule isolée (SEE). Pour la mise en œuvre des ASIC de lecture, il est proposé d'utiliser un processus CMOS 65 nm. Au cours de mes études doctorales, j'ai développé une nouvelle méthodologie pour la modélisation de la chaîne de lecture. Dans l'application cible, les impacts de particules représentent les vecteurs d'entrée de la chaîne de lecture. J'ai pu extraire des statistiques des données de Monte Carlo à partir de simulations physiques dans le scénarion le moins favorable et créer des stimuli d'entrée programmables pour un environnement de simulation complexe avec la chaîne de lecture complète des ASIC frontaux, via le CIC ASIC vers l'électronique hors détecteur. Grâce à l'environnement de simulation, développé en UVM et SystemVerilog, j'ai pu étudier et comparer différentes architectures en termes de performances, de consommation d'énergie et de tolérance aux rayonnements. Pour la première fois, une simulation précise du cycle d'horloge de la chaîne électronique de lecture complète (des capteurs aux récepteurs de données hors détecteur) est utilisée dans une expérience de physique des particules pour permettre le développement d'un chipset d'ASIC, pour faciliter leur vérification fonctionnelle et la vérification des performances. Mon travail se concentre principalement sur les sujets suivants: - l'aide au développement du CIC ASIC pour réduire la bande passante de lecture requise, le bilan de puissance et la tolérance aux rayonnements; - les choix architecturaux au niveau RTL pour l'optimisation de la puissance; - le développement de techniques de durcissement par rayonnement pour l'ensemble de l'ASIC; - la conception complète de RTL, à travers la synthèse, l'implémentation et la validation de vérification des blocs IP numériques, en tant que bloc de configuration. Les principales exigences étaient la tolérance aux rayonnements et l'optimisation de la consommation d'énergie; - le dimensionnement FIFO dans les ASICs composant la chaîne de lecture pour atteindre une inefficacité de tampon de 10<sup>-6</sup> dans une situation critique d'occupation des particules et sur la base du temps d'arrivée poissonnien du signal demandant l'événement complet; - l'ajout d'un composant SEE (Single Event Effect), étant donné la capacité des particules de haute énergie de potentiellement retourner tous les nœuds du circuit dans la technologie 65 nm CMOS, pour émuler les effets de particules sur le nœud ASIC afin de reproduire les paquets de données corrompus au niveau du système. J'ai aidé à développer un FSM robuste et complexe au niveau des ASIC MPA et CIC pour gérer de meilleurs problèmes de synchronisation au niveau du système résultant d'un débordement de SEE ou FIFO. # **Contents** | Ał | ostra | ct / Rés | sumé | 7 | |----|---------|----------|------------------------------------------------------------------------|------| | Li | st of ] | Figures | S | XV | | Gl | ossaı | ry and | Acronyms | xxi | | 1 | Intr | oducti | on | 1 | | 2 | The | Challe | enges of State of the Art CMS Outer Tracker Upgrade | 7 | | | 2.1 | Silico | n for particle detection | 8 | | | 2.2 | The C | MS experiment upgrade for High Luminosity | 10 | | | | 2.2.1 | The CMS tracker | 13 | | | | 2.2.2 | Silicon tracker upgrade | 14 | | | 2.3 | CMS t | rigger system at HL-LHC | 17 | | | 2.4 | Innov | ative approach: the $p_T$ module concept $\ldots \ldots \ldots \ldots$ | 19 | | | 2.5 | CMS t | tracker structure | 20 | | | 2.6 | The p | T modules in the CMS Outer Tracker | 21 | | | | 2.6.1 | 2S-module structure | 22 | | | | 2.6.2 | PS-module structure | 23 | | | | 2.6.3 | Charge detection in the analog front-end electronics | 24 | | | | 2.6.4 | Digital readout system | 27 | | | | 2.6.5 | Data aggregation | 27 | | | 2.7 | PS-me | odule and 2S-module studies | 28 | | | | 2.7.1 | Data bandwidth estimation for pT modules | 30 | | | | 2.7.2 | Concentrator ASIC in pT modules | 32 | | | 2.8 | Radia | tion effects on CMOS technology | 33 | | | | 2.8.1 | Cumulative effects: Total Ionizing Dose and Displacement Damage | e 33 | | | | 2.8.2 | Single Event Effects | 35 | | 3 | pT-1 | module | es studies and system level simulation environment | 37 | #### **Contents** | | 3.1 | Advan | tages of a SystemVerilog and UVM simulation environment | 39 | |---|-----|---------|---------------------------------------------------------|-----| | | 3.2 | UVM s | simulation environment for pT-modules | 40 | | | | 3.2.1 | UVM active Agent for stimuli generation | 42 | | | | 3.2.2 | UVM passive Agent for DUV output | 43 | | | | 3.2.3 | Reference model in TLM | 44 | | | | 3.2.4 | UVM Scoreboard component | 44 | | | | 3.2.5 | Library of test cases | 44 | | | | 3.2.6 | Code and functional coverage | 45 | | | 3.3 | CIC AS | SIC studies for two different pT-modules | 46 | | | | 3.3.1 | Stub occupancy and efficiency studies for PS-module | 47 | | | | 3.3.2 | L1 data studies for PS-module | 48 | | | | | 3.3.2.1 FIFO sizing methodology | 49 | | | | | 3.3.2.2 MPA-SSA buffer sizes and data rate studies | 51 | | | | | 3.3.2.3 CIC buffer sizes and data rate studies | 54 | | | | 3.3.3 | SEU simulation | 55 | | | 3.4 | LpGB | Γ communication for pT-modules | 55 | | | 3.5 | UVM s | simulation environment features and performances | 56 | | | | | | | | 4 | | _ | ent of the Concentrated Integrated Circuit ASIC | 59 | | | 4.1 | | mission of high-pT particles primitives | 61 | | | 4.2 | | eneral architecture | 61 | | | 4.3 | Input | data phase alignment | 61 | | | | 4.3.1 | Input stub data format from MPA or CBC | 64 | | | | 4.3.2 | Deserialization and word alignment | 65 | | | | 4.3.3 | Bitonic Stub sorting algorithm | 66 | | | | 4.3.4 | Stub packet formatter | 67 | | | 4.4 | Transı | mission of triggered raw data | 70 | | | | 4.4.1 | Input raw data format from CBC or MPA | 70 | | | | 4.4.2 | CIC L1 input block | 72 | | | | 4.4.3 | FIFO controller | 72 | | | | 4.4.4 | Priority encoder for clusters sorting | 74 | | | | 4.4.5 | Output packet formatter | 75 | | | | 4.4.6 | Output serializer | 76 | | | 4.5 | CIC slo | ow control and fast commands | 77 | | | 4.6 | Design | n and ASIC implementation | 79 | | | 1.0 | D00181 | | • • | | | | Con | tents | |-------|---------|-----------------------------------------------------|-------| | | 4.6.2 | Timing corners and TID effects | 79 | | | 4.6.3 | SEU hardening techniques and TID tolerance | 82 | | | | 4.6.3.1 Triplicated memory cells placement | 84 | | | 4.6.4 | Floorplan and power distribution | 85 | | | 4.6.5 | Power consumption | 88 | | | 4.6.6 | IR drop | 89 | | | 4.6.7 | Triplicated clock distribution | 89 | | S CIC | C proto | type characterization results | 93 | | 5.1 | Stand | alone testbench for CIC prototype | 95 | | | 5.1.1 | CIC phases and modes | 97 | | 5.2 | First s | silicon prototype: CIC1 | 98 | | | 5.2.1 | Tests with 2S hybrid prototype | 98 | | | 5.2.2 | Power consumption | 99 | | 5.3 | Secon | nd silicon prototype: CIC2 | 102 | | | 5.3.1 | Power consumption | 102 | | | 5.3.2 | Temperature characterization | 103 | | | 5.3.3 | IR-drop issue | 10 | | | 5.3.4 | TID tests | 100 | | | 5.3.5 | SEU tests | 10' | | | 5.3.6 | SEU results analysis and cross-section measurements | 109 | Bibliography About the author 115 125 # **List of Figures** | 1.1 | chamber | 2 | |------|---------------------------------------------------------------------|----| | 1.2 | The complex of the LHC accelerators | 4 | | 2.1 | n-in-p silicon detector structure | 8 | | 2.2 | Cutaway diagram of CMS detector | 11 | | 2.3 | Coordinate system used by CMS experiment at the LHC | 12 | | 2.4 | The pseudorapidity function | 12 | | 2.5 | CMS current silicon tracker | 13 | | 2.6 | Material budget in the CMS tracker | 16 | | 2.7 | Integrated particle fluence for the Phase-2 tracker | 17 | | 2.8 | Total Ionizing Dose for the the Phase-2 tracker | 17 | | 2.9 | Data flow in CMS Trigger DAQ system | 18 | | 2.10 | The pT spectrum for CMS tracker at HL-LHC | 19 | | 2.11 | Illustration of the pT-module concept | 20 | | 2.12 | Tracker Layout section and $p_T$ -modules disposition, $r - z$ view | 22 | | 2.13 | Exploided view of 2S module structure | 23 | | 2.14 | Exploided view of PS module structure | 24 | | 2.15 | Capacitance model of a pixel sensor | 25 | | 2.16 | Block diagram of a generic analog FE circuit | 26 | | 2.17 | Stub rate in CMS Outer Tracker | 29 | | 2.18 | PS-module cross section | 29 | | 2.19 | MPA-SSA architecture | 30 | | 2.20 | PS-module structure with ASICs bandwidth | 31 | | 2.21 | 2S-module structure with ASICs bandwidth | 31 | | 2.22 | CIC input/output lines | 32 | | 2.23 | Effect of ionising radiation in MOS devices | 34 | | 3.1 | General block diagram of the UVM framework implementation | 41 | | 3.2 | UVM Agent for stimuli generation | 42 | | 3.3 | UVM Agent for stimuli generation | 43 | #### **List of Figures** | 3.4 | UVM Scoreboard summary for stub data collected at the output of each | |------|------------------------------------------------------------------------------------------| | | ASIC of the readout chain | | 3.5 | DUV for PS module | | 3.6 | DUV for 2S module | | 3.7 | PS module efficiency at the MPA output and CIC output for two different | | | output frequencies | | 3.8 | CMS Outer Tracker half-module readout chain | | 3.9 | CIC latency for L1 data packets measured in BXs at the nominal trigger | | | rate frequency of 750 kHz $\ \ldots \ \ldots \ \ldots \ \ldots \ \ldots \ \ldots$ | | 3.10 | MPA latency for L1 data packets measured in BXs at the nominal trigger | | | rate frequency of 750 kHz | | 3.11 | Full PS-module chain latency for L1 data packets measured in BXs at | | | the nominal trigger rate frequency of 750 kHz | | | Buffer inefficiency with respect to I/O rate ratio $\ \ldots \ \ldots \ \ldots \ \ldots$ | | 3.13 | FIFO-based architecture of the front-end ASICs | | 3.14 | Cluster occupancy at pileup 300 in the innermost barrel layer (cylinder | | | external surface) | | 3.15 | FIFO-based architecture of the Data Concentrator ASIC $\ldots\ldots$ | | 3.16 | UVM framework with LpGBT and FPGA added to the DUV | | 4.1 | Block diagram of the entire CIC ASIC | | 4.2 | Block diagram of the analog phase aligner | | 4.3 | CBC stub format | | 4.4 | MPA stub format | | 4.5 | MPA stub block diagram | | 4.6 | Basic sorting cell composed of a comparator and 2 multiplexers | | 4.7 | Bitonic sorting example | | 4.8 | CIC stub format for 2S-module | | 4.9 | CIC stub format for PS-module | | 4.10 | CIC output configuration modes | | 4.11 | CBC raw format | | 4.12 | MPA raw format | | 4.13 | Block diagram for FIFO controller | | 4.14 | Block diagram for the CIC raw data output formatter | | 4.15 | CIC raw data output format for PS-module | | 4.16 | CIC raw data output format for 2S-module in zero-suppression mode . | | | CIC raw data output format for 2S-module in debug mode | | 4.18 | Delay propagation for some standard cells in different corners with | | | respect to typical corner | | 4.19 | Table with corners used for CIC implementation | | 4.20 | Delay propagation for SS corners at two different temperatures with and | | |------|----------------------------------------------------------------------------|-----| | | without radiation models | 82 | | | Triple Module Redundancy implementation in the CIC | 83 | | | 2 Implementation of the C4 bump and and wire-bond metallizations | 85 | | | 3 View of the final CIC ASIC top-metal layer | 86 | | | View of the final CIC ASIC layout | 86 | | 4.25 | 5 View of the final CIC ASIC with metal 6 distributed vertically and metal | | | | 7 horizontally for uniform power distribution | 87 | | 4.26 | Table with power consumption during word alignment and data sending | | | | for the two different modules in typical case and worst case | 88 | | 4.27 | CIC IR drop simulation with normal activity during data sending | 89 | | 5.1 | CIC ASIC with wire bonds sitting on the carrier board | 95 | | 5.2 | Block diagram of CIC standalone testbench | 95 | | 5.3 | CIC Carrier board plugged on a custom interface board | 96 | | 5.4 | CIC FPGA board | 96 | | 5.5 | Prototype of half of a 2S-module hybrid including eight CBC3.1 ASICs | | | | and one CIC1 | 99 | | 5.6 | CIC1 power consumption during fast control locking mode for different | | | | power supply voltages | 99 | | 5.7 | CIC1 power consumption during phase alignment procedure for differ- | | | | ent power supply voltages | 100 | | 5.8 | CIC1 power consumption during word alignment procedure for different | | | | power supply voltages | 100 | | 5.9 | CIC1 power consumption worst case scenario | 100 | | 5.10 | CIC1 power consumption in data taking mode with 10 L1 triggers in | | | | PS-module | 101 | | 5.11 | CIC1 power consumption in data taking mode with 10 L1 triggers in | | | | 2S-module | 101 | | 5.12 | 2 CIC2 power consumption during data taking at different power supply | | | | voltages | 103 | | 5.13 | Core power variation at different temperatures in the range between | | | | -30°C and 45°C | 104 | | 5.14 | Output delay variation for each CIC2 ASIC output line at different tem- | | | | peratures in the range between $-30^{\circ}$ C and $45^{\circ}$ C | 104 | | 5.15 | 6 CIC IR drop simulation around current peak due to L1 trigger with high | | | | input activity | 105 | | 5.16 | 6 CIC IR drop simulation around current peak due to L1 trigger with high | | | | input activity | 105 | | | | | #### **List of Figures** | 5.17 CIC2 output lines delay variation over TID irradia | ation at 1.0V power | | |-------------------------------------------------------------|-----------------------|-----| | supply | | 106 | | 5.18 CIC2 output lines delay variation over TID irradia | ation at 1.2V power | | | supply | | 107 | | 5.19 CIC2 ready for SEU on a cooling plate $\dots \dots$ | | 108 | | 5.20 Stub SEU errors for CIC2 ASIC at different values of I | LET for two different | | | configurations | | 111 | | 5.21 L1 SEU errors for CIC2 ASIC at different values of L | ET for two different | | | configurations | | 111 | # **Glossary and Acronyms** **2S module** p<sub>T</sub>-module composed by two ation, modification, analysis, or optimizasilicon-strip detectors 0.0, 1.0, 2.6–2.7, 3.0, tion of a design 3.3, 4.0–4.1, 4.3, 4.6, 5.3 1.0 4.4-4.6, 5.0, 5.3 ATLAS Toroidal LHC ApparatuS detector. path Large Hadron Collider (LHC), 1.0 $20 \,\mathrm{cm} < r < 120 \,\mathrm{cm}$ , 2.6 **BEBC** Big European Bubble Chamber, 1.0 **BX** The instant at which the particles bunch are brought into collision. In the LHC, the BX rate is 40 MHz., 1.0, 2.2, 2.4, conductor, 1.0, 2.1, 4.6 2.7, 3.3, 3.5, 4.0–4.1, 4.3 also known as Flip-Chip technology, is a Hadron Collider (LHC), 1.0, 2.0, 2.2–2.3, method for interconnecting semiconduc- 2.7, 3.0, 3.3, 4.1, 6.0 tor devices with solder bumps deposited **cross-section** The section normal to the in the die area, 2.1 **CBC** CMS Binary Chip (CBC). Is the strip ALICE A Large Ion Collider Experiment, sensor readout ASIC of the CMS Outer Tracker 2S-module, 2.6-2.7, 3.3 **ASIC** Application-Specific Integrated Cir- **CCOpt** Clock Concurrent Optimization, it cuit, 1.0, 2.0, 2.2, 2.6–2.7, 3.0–3.2, 4.0–4.1, is used during CTS to optimize clock tree distribution taking into account the data- It is a general-purpose detector at the **CERN** in French "Conseil Européen pour la Recherche Nucléaire", or European Orbarrel Concentric cylindrical layers cen-ganization for Nuclear Research, 1.0, 6.0 tered in the in the nominal interaction CIC Concentrator Integrated Circuit. It point and located in the CMS tracker at is the data concentrator ASIC of the CMS Outer Tracker PS and 2S modules for the phase-2 upgrade, 2.6–2.7, 3.0, 3.3, 4.0–4.1, 4.4-4.6, 5.0 **CMOS** Complementary Metal Oxide Semi- **CMS** Compact Muon Solenoid detector. It C4 Controlled Collapse Chip Connection, is a general-purpose detector at the Large beam direction outside of that the particle CAD Computer-aided design. It repre- is not deflected. It can be considered as sents software that aim to aid in the cre- a measure of the interaction probability, 5.3 FIFO First In First Out circuit element, **CSA** Charge Sensitive Amplifier, 2.6 3.0, 3.2 - 3.3FPGA Field Programmable Gate Array, 3.4 **CTS** Clock Tree Synthesis, it is a step of the physical implementation flow 0.0, 4.6 FSM Finite State Machine, 3.3 DAC Digital to Analog Converter, 2.6 GL Gate Level, 3.1-3.2 **HCAL** Hadron Calorimeter of the CMS ex-**DAQ** Data AcQuisition system, 4.5 DD Displacement Damage, 2.8 periment, 2.2 DLL Delay-Locked Loop. Similar to a PLL HDL Hardware Description Language, where the internal voltage-controlled os- 3.1 cillator is replaced by a delay line, 4.3, 4.5 **HEP** High Energy Physics, 1.0, 2.8 **DRC** Design Rule Checking, it is a step of **high-p**<sub>T</sub> high transverse momentum (p<sub>T</sub>) sign-off to determine whether the chip lay- particle. In this context it refers to partiout satisfies a number of rules as defined cles with $p_T > 2 \text{ GeV}/c$ , 2.4–2.5, 3.0, 4.1, 4.3 by the semiconductor manufacturer, 4.6 **HL-LHC** High Luminosity Large Hadron **DUV** Design Under Verification, 3.1–3.2 Collider, 1.0, 2.0, 2.2–2.3, 3.3, 4.1, 6.0 ECAL Electromagnetic Calorimeter of the HLT High Level Trigger system, 2.3 CMS experiment, 2.2 **HPD** Hybrid Pixel Detector, 2.1 EDA Electronic Design Automation, also integrated luminosity The integrated lureferred to as electronic computer-aided minosity over the operation time ${\mathscr L}$ defines the total amount of data recorded by design (ECAD) **E**<sub>**DEP**</sub> Deposited ionizing energy an experiment ELT Enclosed Layout Transistor. It is a lay- IT Inner Tracker, 2.2, 2.5 out technique to reduce the leakage cur- L1 Level-1 trigger system (hardware trigrent increase due to the charge trapped in ger) of the CMS detector, 2.3-2.7, 3.3, 4.0the STIs for devices exposed to ionizing 4.1, 4.4 radiation, 4.6 L1 data Raw sensor image transmitted end-cap Parallel disks centered in the in only when required by a L1 trigger the z located in the CMS tracker at z >**L1 latency** latency between the transmis-130 cm, 2.6 sion of a trigger request and the cor-**ESD** Electro-Static Discharge responding event occurrence. It corre-FC7 flexible, µTCA compatible Advanced sponds to the time available for the L1 Mezzanine Card (AMC) for generic data data processing acquisition/control applications L1 rate Average occurrence frequency of **FE** Front-End 0.0, 2.6 the CMS Level-1 trigger request **FEC** Forward Error Correction, 3.4 latch-up Latch-up is a type of short circuit which can occur in an integrated circuit. , (i.e in the PS module is $100 \times 1467 \,\mu\text{m}$ ) 2.8 of energy that an ionizing particle trans- duces a limiting factor for the detection fers to the material traversed per unit dis- ratios of particles that may interact with tance, 2.8, 4.6, 5.3 Level-1 Level-1 trigger system (hardware expected to recognize trigger) of the CMS detector LHC Large Hadron Collider, 1.0, 2.0, 2.7-2.8, 4.0 **LHCb** Large Hadron Collider beauty, 1.0 **low-p**<sub>T</sub> Low transverse momentum $(p_T)$ cles with $p_T < 2 \text{ GeV}/c$ , 2.4 **LpGBT** Low Power GigaBit Transceiver (LpGBT). Is a radiation tolerant serializer/deserializer device that can be used on pixel-sensor readout ASIC of the CMS the front-end electronics of the HL-LHC detectors. This component is foreseen to tem upgrades, 2.6–2.7, 3.4, 4.0, 4.5 tector to allow the upgrade operations LSB Least Significant Bit **luminosity** The events rate occurrence in a particle interaction is defined as $\frac{dN}{dt}$ = $\sigma\ell$ , where $\ell$ represents the instantaneous **nMOS** n-channel Metal Oxide Semiconnumber of interactions per second, called ductor (MOS) device, 2.8 luminosity while $\sigma$ represents the cross- **OT** Outer Tracker, 2.2 section of the interaction, 1.0 the ASIC layout corresponds to the orig- 3.0, 4.0–4.1 inal schematic or circuit diagram of the PCB Printed Circuit Board design, 4.6 macro-pixel pixel with high aspect ratio Phase-2 Upgrade CMS detector upgrade material budget the quantity of mate-LET Linear Energy Transfer. The amount rial in the tracker volume, that introit and compromise what the detector is MIP Minimum Ionizing Particle, 2.1 MMMC Multi-Mode Multi-Corner analysis Monte Carlo A broad class of computational algorithms that rely on repeated particle. In this context it refers to parti-random sampling to obtain numerical results, 2.7 **MOS** Metal Oxide Semiconductor MPA Macro Pixel ASIC (MPA). It is the Outer Tracker PS-module, 2.6-2.7 MPW Multi Project Wafer, because IC fabbe used by CMS and ATLAS for their sys-rication costs are extremely high, it makes sense to share mask and wafer resources LS3 Third long shutdown of the CMS de- to produce designs in low quantities, 4.6 NDR Non-Default Rules, are additional parameters to set during CTS step to control better clock physical implementation , 4.6 **Outer Tracker** CMS tracker barrel layers **LVS** Layout Versus Schematic, whether and end-cap disk located at $r > 200 \,\mathrm{mm}$ , PDK Process Design Kit, 4.6 during HL-LHC LS3 **pileup** in HEP experiments it represents the average amount of overlapped signals in the event reconstruction, and in our case is the number of proton-proton collisions per BX, 2.0, 2.2, 2.4, 3.3, 4.1, 4.4, 5.3 **PLL** Phase-Locked Loop. It is a control system that generates an output signal whose phase is related to the phase of an input signal **pMOS** p-channel Metal Oxide Semiconductor (MOS) device, 2.8 pp proton-proton collisions, 2.2-2.4 **pseudorapidity** Kinematical variable of a relativistic particle defined as $\eta =$ $-\ln \tan \frac{\theta}{2}$ , where $\theta$ is the particle zenith angle referenced to the direction of the crossing beams, 2.2 **PS module** p<sub>T</sub>-module that combines a silicon micro-strip sensor with a siliconpixel sensor 0.0, 1.0, 2.6-2.7, 3.0, 3.3, 4.0-4.1, 4.3, 4.6, 5.3 **p**<sub>T</sub> Particle transverse momentum 0.0, 2.2, 2.4 pable to provide transverse momentum measurements 0.0, 2.0, 2.4-2.7, 3.0, 3.2-3.4 **PVT variations** Process Voltage Tempera-, 4.0, 4.6 ture (PVT) variations, 1.0, 4.3, 4.6 **Python** An interpreted, general-purpose programming language traction tool ideal for advanced nodes **RDL** Re-Distribution Layer, 4.6 RTL Register Transfer Level, is a design abstraction which models a synchronous digital circuit in terms of the flow of digital signals (data) between hardware registers, and the logical operations performed on those signals, 3.1–3.3 SDF Standard Delay Format (SDF) is an IEEE standard for the representation and interpretation of timing data for use at any stage of an electronic design process, 4.6 **SEE** Single Event Effect. Effects caused by one single ionizing particle striking a sensitive node in a micro-electronic device, 2.8, 4.0, 4.6 **SEL** Single Event Latch-up. Latch-up is a type of short circuit which can occur in an integrated circuit., 2.8 **SET** Single Event Transient. Time limited change of logical state caused by one single ionizing particle striking a sensitive node in a micro-electronic device, 2.8, 5.3 **SEU** Single Event Upset. Change of logical state caused by one single ionizing parti**p**<sub>T</sub>-module Silicon detectors modules ca- cle striking a sensitive node in a microelectronic device, 1.0, 2.8, 3.3, 4.4–4.5, 5.3 **SLVS** Scalable Low Voltage Signaling. It is a differential signal transmission standard **SM** Standard Model in particle physics high-level, is the theory describing three of the four known fundamental forces (electromag-Quantus QRC Quantus Extraction Solunetic, weak, and strong interactions, and tion, it is production-proven signoff ex- not including the gravitational force) and classifying the known elementary particles **SOI** Silicon On Insulator technology **SPS** Super Proton Synchroton, 1.0 **SSA** Short Strip ASIC (SSA). Is the microstrip-sensor readout ASIC of the CMS Outer Tracker PS-module, 2.6–2.7, 4.6 **STI** Shallow Trench Isolation. Isolation which prevents electric current leakage between adjacent semiconductor devices , 2.8 **strip** Detectors obtained by segmenting the doped side into strips over the full length of the detector, 1.0, 2.6 **stub** High- $p_T$ particles primitives transmitted by $p_T$ -modules , 2.4, 2.7, 4.1 **SystemVerilog** IEEE 1800 standard hardware description and verification language used to model, design and simulate electronic systems, 3.1 **TCL** Tool Command Language. It is a high-level, general-purpose, interpreted, dynamic programming language, 4.6 **TID** Total Ionizing Dose, The cumulative damage of the semiconductor lattice caused by ionizing radiation over the exposition time, 1.0, 2.2, 2.5, 2.8, 4.6, 5.3 **TLM** Transaction Level Modeling abstraction, 3.1–3.3 **TMR** Triple Modular Redundancy. Circuit technique to increase tolerance to radiation related single-event effects, 4.0, 4.6 **ToA** Time of Arrival of a particle **ToF** Time of Flight, in this context it refers to the particles time required to reach the silicon detector **ToT** Time over Threshold method to determine the amplitude of an analog signal. The signal is compared to a threshold and the duration of the output pulse is measured **tracker** CMS sub-detector that allows reconstructing the particle trajectory and transverse momentum $p_T$ in the 3.8T magnetic field provided by the superconducting solenoid, 2.2, 2.6 **tracker volume** material crossed by a straigth line between the origin and the farthest silicon sensor met by the line tracking volume tracking volume, 2.2 **Trigger DAQ** Trigger Data AcQuisition system, 2.3, 3.3 **UBM** Under Bump Metalization **UVC** Universal Verification Methodology Verification Component , 3.2 **UVM** Universal Verification Methodology, is a standardized methodology for verifying integrated circuit designs, 3.0–3.2, 3.5 **VCD** Value Change Dump **Verilog** IEEE 1364 standard hardware description language (HDL) used to model electronic systems , 3.1 **Verilog-AMS** It is a derivative of Verilog HDL language that includes analog and mixed-signal extensions (AMS) **vManager** vManager is a Cadence tool for verification planning for tests management of complex verification projects, 3.2 V<sub>T</sub> Threshold voltage, 4.6 ## 1 Introduction High energy physics (HEP) is a branch of physics that explores the nature of particles and the fundamental interactions necessary to explain their behaviour. Particles, or their nuclear decay products, have been studied for the first time more than 100 years ago, using scintillation screens and photographic films. In that case there was no need for a particle detector, since particles were visible with the human eye [16]. Since then, several technologies have contributed to achieve significant results in high energy physics field. In particular, in 1911 C. T. Wilson invented the cloud chambers [17], that played a dominant role in particle physics between 1920 and 1950. An energetic charged particle interacts with the supersaturated vapor inside the cloud chamber, leaving behind a trail of ionized particles that act as condensation centers. Small droplets are observable as a "cloud" track lasting for several seconds. Each particle has a characteristic shape and can then be identified. The cloud chamber allowed the discovery of the positron in 1932 [18] and of the K meson in the 1947 [19]. In 1952, Donald A. Glaser invented the bubble chamber [20]. The working principle was the same of the cloud chamber. The cylindrical shape chamber is filled with a liquid heated to just below the boiling temperature. Exactly when particles enter the chamber, a piston decreases its pressure, and the liquid enters into a superheated, metastable phase. The liquid vaporizes around the ionizing track generated by the charged particles forming visible bubbles. A constant magnetic field is applied to the whole chamber so that charged particles travel along helical paths giving the opportunity to measure particle momentum. Bigger is the chamber, larger are the bubbles to be seen or photographed. Figure 1.1 shows the decay products of a kaon particle spiraling in the magnetic field of a bubble chamber [21]. **Figure 1.1.** Photograph of the real track left by a positive kaon. The decay products formed a typical spiral shape in the magnetic field of a bubble chamber [21]. In 1954 the European Organization for Nuclear Research (CERN) was founded thanks to the common effort of 12 European countries. Its objective is to provide a scientific facility for particle physics research. Right at CERN, in 1973 the Gargamelle bubble chamber detector, filled with heavy-liquid, led to the discovery of weak neutral currents [22] thanks to the beam provided by the Super Proton Synchroton (SPS). The largest bubble chamber ever built, and also the last one, is the Big European Bubble Chamber (BEBC) with 6.3 millions of photographs taken and 3000 km of developed film [23]. It allowed the discovery of D mesons [24]. The main limitations of this detector was the large amount of time needed to develop the film and optically scan it for interesting events. The first electronic particle detector, with high statistics and reasonable resolution, was the Multi Wire Proportional Chamber (BEBC), developed in 1968 [25]. It consisted of 1000 wires spaced of 1 mm. As the charged particle passes, the hit wire sends out an electrical signal. That was the end of the bubble chamber detectors, in favor of faster electronic particle detectors. Another way of building a particle detector is exploiting the properties of semiconductor devices. Charged particles create electron-hole pairs in the depletion region that are drifted to the electrodes. The drift current creates the signal which is amplified by an amplifier connected to each strip. The first example of working silicon-strip detector was built at CERN in 1983 [26]. In order to detect short-living particles, in the late eighties, silicon-strip detectors became more and more common thanks to an excellent spatial resolution of $5\mu$ m-range and, in particular, the introduction of planar technology that boosted their industrial production. The next breakthrough came with the possibility of integrating detectors and electronics on the same device [27]. At the end of nineties, also pixel detectors started being used successfully in the experiments. Still today, strip and pixel detectors are the leading technology for particles tracking. Nowadays, CERN is supported by 23 member states and 8 associated members [28] and hosts the world's largest and most powerful particle accelerator: the Large Hadron Collider (LHC) [29], [30]. Built between 1998 and 2008 in collaboration with over 10000 scientists and engineers from over 100 countries, the LHC sits in a circular tunnel 27 km long excavated 100 m underground. Superconducting magnets, as well as accelerating structures, boost the energy of protons along the way up to 7 TeV. Two counter-rotating protons beams, travelling in opposite directions in separate tubes kept at ultrahigh vacuum, are made to collide at a center of mass energy of 14 TeV. In the LHC there are four collisions points, around which different experiments are set up, as shown in Figure 1.2: (ATLAS) [31], (CMS) [32], LHCb [33] and ALICE [34] detectors. The number of instantaneous collisions occurring in a particle accelerator is given by the **luminosity**, defined as $\ell = \sigma \frac{dN}{dt}$ , where N is the number of particles in a given time t squeezed through a given space, called cross-section $\sigma$ of the interaction. A parameter to characterize the performance of a particle accelerator is the integrated luminosity, that gives an idea of the total number of collisions happened over the operation time. The integrated luminosity can be expressed as: $$\mathscr{L} = \int_0^T \frac{\ell_0}{(1+t/\tau)}.$$ The LHC reached a luminosity of $2.1 \cdot 10^{34}$ cm<sup>-2</sup>s<sup>-1</sup> during the 2018 [36]. The expected integrated luminosity collected by the CMS experiment in its most recent run is **Figure 1.2.** The complex of the LHC accelerators [35]. around 300 fb<sup>-1</sup>. The collision rate is 40 MHz and is called **bunch-crossing (BX) rate**. LHC was commissioned to observe the missing particle in the Standard Model [37]: the Higgs boson. A new particle was observed at about 125 GeV [38], as theorized in 1964 by P. Higgs, R. Brout and F. Englert [39], [40], and announced by CMS and ATLAS experiments in 2012. After Higgs boson discovery, that represents a milestone in the history of particle science, new theories need to be validated and proven: existence of super-symmetry, nature of dark matter and existence of extra dimensions. The Higgs boson itself will need to be studied and characterized better [41]. LHC needs a major upgrade to exploit all its potential increasing its luminosity of a factor five up to $5 \cdot 10^{34} \, \mathrm{cm}^{-2} \mathrm{s}^{-1}$ , and integrated luminosity of a factor ten up to $7.5 \cdot 10^{34} \, \mathrm{cm}^{-2} \mathrm{s}^{-1}$ [42]. The novel machine configuration, called High Luminosity LHC (HL-LHC or HiLumi), will be ready in 2025 after 10 years of development. The HL-LHC upgrade was approved in 2016 representing the highest priority of the European Strategy for particle physics [43], [44]. Consequently, CMS detector will face a major upgrade in 2025 [45]. In this context, the CMS tracker will be completely replaced by a new one able to work with the expected luminosity of HL-LHC. CMS tracker will feature for the first time intelligent particle tracking. The front-end ASICs will be able to locally discriminate interesting physics event before transmitting them to the back-end. This new approach would make possible to do event-reconstruction in the HL-LHC. This thesis focuses on the studies and design of a data concentrator ASIC to be employed in two different readout systems: a two layers pixel-strip detector readout system and two layer strip-strip detector readout system, both for CMS Outer Tracker. #### This thesis is structured as follows: - Chapter 1 (this chapter) is an introduction to HEP experiments in the history up to today; - Chapter 2 describes High Luminosity-LHC improved performances and CMS Tracker upgrade requirements for electronics, with particular focus on new tracker features and radiation effects on CMOS electronics; - Chapter 3 describes the simulation framework developed to study and assist the design of a data concentrator ASIC working in two different modules: PS module and 2S module; - Chapter 4 describes the different steps of the design from the architecture to the final implementation of the data concentrator ASIC; - Chapter 5 shows the results of the silicon prototype characterization under different PVT variations and under irradiation, such as TID and SEU effects; - Chapter 6 is the conclusion of the main research points providing a brief summary of the achieved objectives together with some recommendations for work on a similar topic. The explanation of all terminology used in the thesis are collected in the glossary. # 2 The Challenges of State of the Art CMS Outer Tracker Upgrade The Compact Muon Solenoid (CMS) experiment has the objective of studying the Standard Model, including Higgs boson, and searching for new physics at the new energies and luminosity frontiers. The current CMS detector is using a silicon strip tracker composed of 15000 modules with a sensitive area of $200\,\mathrm{m}^2$ and has been designed to work with an integrated luminosity of $500\,\mathrm{fb}^{-1}$ . The LHC is expected to increase its integrated luminosity to $3000\,\mathrm{fb}^{-1}$ . This configuration is called High Luminosity (HL-LHC). CMS experiment will receive a fundamental upgrade of its tracker detector and front-end electronics, featuring higher granularity and readout bandwidth to cope with the large number of pileup events. The main challenges for CMS tracker upgrade will be described in section 2.2 together with the innovative concept of $p_T$ -module in section 2.4. Section 2.2.2 introduces the requirements and the choice of the silicon detector and the readout electronics. In the last part of the chapter, section 2.8 describes the challenges due to radiation effects on electronics and strict power requirements for ASICs operating in the innermost regions of the LHC experiments. #### 2.1 Silicon for particle detection Silicon is a semiconductor solid material that thanks to its properties revolutionized the development of electronics and has many other uses, as silicon sensors. The small energy band gap, which is 1.12 eV at room temperature allows to an ionizing particle passing trough silicon producing a large number of charge carriers proportional to the energy loss. The major advantage of silicon is the availability of a developed technology, which allows the simple integration of detector and electronics on the same substrate [46]. The working principle of a silicon sensor (pixel o strip), shown in Figure 2.1, is based on a narrow and highly doped silicon layer on a substrate of the opposite polarity to form a diode structure, which is reversed biased. As charged particles traverse the pixel or the strip, they generate electron-hole pairs. Due to the high bias voltage applied to the back-side plane, the full volume is depleted so that holes drift to the $p^+$ doped layer and electrons drift to the $n^{++}$ , or vice versa for a n-type sensor. Thus, the electric field would create small ionization currents that can be detected and measured. Figure 2.1. n-in-p silicon detector structure. The average charge loss of a charged particle passing trough a medium, silicon or gas, is given by the Bethe-Bloch formula: $$-\frac{dE}{dx} = 4\pi N_a r_e^2 m_e c^2 z^2 \frac{Z}{A} \frac{1}{\beta^2} \left[ \frac{1}{2} \ln \left( \frac{2m_e c^2 \beta^2 \gamma^2 T_{max}}{I^2} \right) - \beta^2 - \frac{\delta(\gamma)}{2} \right],$$ where: - $N_a$ : is Avogadro's number. - $r_e$ : is the classical electron radius. - $m_e$ : is the mass of an electron. - *c*: is the speed of light. - *z*: is the charge of the incident particle. - *Z*: is the atomic number or proton number. - *A*: is the atomic mass. - $\beta$ : is given by the ratio v/c. - $\gamma$ : is given by the formula $1\sqrt{1-\beta^2}$ . - $T_{max}$ : is the maximum kinetic energy that can be provided to a free electron in a single collision. - *I*: is the mean excitation energy. - $\delta$ : is the density effect correction. The minimum of the deposited energy in a given medium is found for $\beta\gamma\approx 3$ . A detector has to be designed to detect the Minimum Ionizing Particle (MIP) with the minimum deposited energy. For every 3.6 eV, released by a particle crossing silicon, one electron-hole pair is produced. Silicon has an intrinsic energy resolution, better than other materials, for instance 30 eV energy loss creates only one electron-hole pair in a gas detector. In a silicon detector, the number of electron-hole pairs is high due to the high silicon density of $2.33\,\mathrm{g\cdot cm^{-3}}$ , e.g., the average energy loss of $309\,\mathrm{eV/\mu m}$ create $\sim 108$ electron-hole pairs. Therefore, the silicon detector thickness is the result of a trade-off between energy resolution and capability to produce a signal large enough to be measured. Indeed, charges are collected in few ns on the doped strips (or pixels) and then induced, by capacitive coupling, to the aluminum readout strips (or pixels). If a thin layer, few $\mu m$ , of $SiO_2$ or $Si_3N_4$ covers the doped strips or pixels, it prevents leakage currents to flow through the electronics and the silicon detector is called AC-coupled. On the other hand, if the aluminum contact is directly connected to the doped strips or pixels, the silicon detector is called DC-coupled [47], [48]. The aluminum contact allows to electrically connect strips or pixels to the readout electronics. Strips aluminum contacts can be easily wire-bonded to the readout electronics because they reach the end of the sensor. On other hand, pixels aluminum contacts, featuring a two-dimensional arrangement, exploit a bump-bonding technology, called C4, that allows to solder bumps directly onto the chip pads. This kind of pixel detector is called Hybrid Pixel Detector (HPD) [49]. In HPD the pixel sensor and the readout electronics are produced separately, but using the same technology: standard Complementary Metal-Oxide-Semiconductor (CMOS) technology, which provides a microscopic structuring thanks to industrial lithography process. Silicon sensors combine a high precision and resolution with a high readout speed, and, therefore, applicable in HEP experiments [50]. #### 2.2 The CMS experiment upgrade for High Luminosity The CMS detector has a cylindrical shape and is 21.6 m long with an outer radius of 7.5 m. It is built around one of the LHC collision points and is composed of two main regions: barrel in the central region (subdetectors arranged along the radius) and endcaps in the two lateral regions (subdetectors arranged along z coordinate). As shown in the cutaway diagram in Figure 2.2, the tracker is located at the centre of the detector and is immersed in a uniform magnetic field of 3.8 T provided by a superconducting solenoid. The tracker is able to detect secondary particles trajectory and transverse momentum $p_T$ , that is a physical quantity linked to the speed and the mass of the particle. Both electromagnetic calorimeter (ECAL) and hadronic calorimeter (HCAL), located farther away from the collision point, but still within the solenoid volume, are able to measure the particle energy. Muon chambers are located outside the solenoid volume and are able to detect muons, particles that can penetrate deep in matter. A more detailed description of the detector is reported in the CMS collaboration report [32]. The coordinate system adopted by CMS is shown in Figure 2.3. The origin of the coordinate system is located in the collision point. The beam direction defines the *z*-axis, while the *x-y* plane is transverse to the beam. The positive *x*-axis points from the collision point to the centre of LHC ring, whereas the *y*-axis points upwards. The transverse momentum $\mathbf{p_T}$ , the transverse energy $\mathbf{E_T}$ , the missing transverse energy $\mathbf{E_T}$ , and other transverse variables are defined in the *x-y* plane, called transverse plane. Due to the shape of CMS detector, a cylindrical coordinates system can be also adopted. In this context, given a vector r, its module is defined as $|r| = \sqrt{\eta^2 + \phi^2}$ , where $\eta$ is defined as **pseudorapidity** and $\phi$ is the azimuthal angle. The pseudorapidity is commonly used as spatial coordinate and describes the angle of a particle relative to the beam axis following the relation $\eta = -\ln(\tan(\theta/2))$ , where $\theta$ is the polar angle on the r-z plane from beam axis. On the other hand, the azimuthal angle $\phi$ is measured around the beam axis, with $\phi = 0$ in the positive x-axis and increasing clockwise [52]. The CMS experiment will require a substantial upgrade of its detector to handle the increased integrated luminosity of the HL-LHC [45], up to $3000\,\mathrm{fb}^{-1}$ that corresponds to $5\cdot10^{34}\,\mathrm{cm}^{-2}\mathrm{s}^{-1}$ . Higher luminosity translates into: Figure 2.2. Cutaway diagram of CMS detector [51]. **Figure 2.3.** Coordinate system used by CMS experiment at the LHC [52]. - higher radiation damages on electronics; - higher number of collisions that could cause, eventually, an overlap of signals coming from collisions occurred in the same BX or consecutive BXs in the case of low energy particles creating loops due to the high magnetic field. The amount of signals generated by multiple pp collisions in the same BX is defined as **pileup**. The CMS experiments employs a particle flow event reconstruction [53] exploiting the combination of all the subdetectors information: charged-particles trajectories revealed by the **tracker**; neutral particles energy deposition measured by the **calorimeters**; muons, able to penetrate deep in matter, revealed by **muon system** [54]. An increased granularity and higher resolution are necessary to keep acceptable detector performance in the extreme pileup conditions. **Figure 2.4.** The pseudorapidity function. Figure 2.5. CMS silicon tracker currently installed. #### 2.2.1 The CMS tracker The CMS silicon tracker is the subdetector closest to the collision point [55]. It consists of two parts, Inner Tracker (IT) and Outer Tracker (OT). The current IT is based on silicon pixel detectors that are arranged in multiple layers providing true 2D points with 10 – 15 µm resolution. Farther away from the collision point where the flux of particles is reduced, silicon strip sensors are used. More than 24000 micro-strip sensors with a total of 10 million channels are arranged in multiple layers covering an area of ~ 210 m<sup>2</sup> [56]. Modules with different geometries and strip pitches ranging from $80\mu m$ to $205\mu m$ are currently used to achieve acceptable resolution. The signals coming from the silicon strips of a module are read by more strip readout ASICs, called APV25, implemented in 0.25 µm CMOS technology [57]. It can sample up to 128 channels at 40 MHz. The data are sent out only upon the reception of a trigger signal. The back-end of CMS experiment decides to send a trigger according to continuous data coming from calorimeters and muon system. The maximum trigger rate in the current CMS detector is limited to 100 kHz and the latency is 3.2 µs [55]. The tracker, together with information from the calorimeters and the muon system, allows reconstructing precisely the trajectory of interesting particles through the bending in the magnetic field. # 2.2.2 Silicon tracker upgrade By 2025 the HL-LHC will increase its integrated luminosity to 250 fb<sup>-1</sup> per year for further 10 years of operation. Under these conditions, both the events pileup and cumulative radiation effects will increase substantially. The current 50 pileup events will increase to 250 in the new HL-LHC. The current CMS tracker cannot guarantee acceptable performances in the new HL-LHC environment and will need to be replaced by a new CMS silicon tracker with enhanced functionalities and radiation robustness to keep high performances in particle trajectories recognition [45]. Cumulative radiation damages concern all the electronics, and in particular pixel sensors, causing a reduced spatial resolution and eventually reduced hit efficiency. The new CMOS technologies allow for higher transistor density and more complex ASICs, able to feature higher number of functionalities. At the HL-LHC. the nominal pileup will be 5 times higher the one in the current LHC, consequently, the large amount of data alone provided by the calorimeter does not guarantee an acceptable event selection rate. For this reason it is fundamental to include information at the level of the CMS Tracker. For the first time, the CMS Tracker will be able to send out relevant information for the trigger decision taken by the experiment back-end. Moreover, the full data events will be stored locally for a maximum to 12.5 µs and the trigger rate will be increased to 750 KHz. The new tracker will consist of an Inner Tracker (IT) based on silicon pixel modules and an Outer Tracker (OT) made from silicon modules with strip and macro-pixel sensors. The main requirements for the tracker upgrade are: - **Increased granularity**. In order to guarantee efficient tracking performance with high level of pileup, up to 300, the channel occupancy for the IT should be below the per mille level, while that for the OT below the per cent level (less than 3 %). - **Robust pattern recognition**. Track finding under pileup conditions becomes increasingly more difficult and time consuming. - Local storage and increased latency. The higher number of collisions requires a local storage of full event data for a maximum of 12.5 µs. In the current CMS detector the maximum latency is 3.8 µs. For the first time, data from the tracker will be used for the trigger decision. The increased latency is required by the CMS experiment back-end to evaluate more data than before and take the trigger decision. - **Higher trigger rate**. Due to the higher average number of collisions in the HL-LHC, the trigger rate needs to be increased from the current 100kHz to 750kHz. - Improved two-track separation. The current CMS tracker is limited in tracking finding performance in highly energetic jets, due to hit merging in the pixel detector. In order to handle large amount of collisions, two-track separation needs to be improved. - Increased bandwidth and data compression. Higher bandwidth, data compression algorithms and higher data rates are required to handle a substantial increase in the amount of data allowing CMS experiment to be efficient at the expected luminosity of $5 \cdot 10^{34}$ cm<sup>-2</sup>s<sup>-1</sup>. - Reduced material budget in the tracking volume. The current CMS tracker performances are affected by the large amount of material that interact with the particles. Therefore, in the future tracker, called Phase-2 tracker, the material budget has been reduced. The estimation of the material volume is evaluated in units of nuclear radiation lengths, defined as the characteristic length that describes the energy decay of a beam of electrons. Both the current CMS tracker material budget and that of Phase-2 tracker are shown in Figure 2.6. These simulations have been performed using tkLayout tool [58]. The material budget is calculated in function of pseudorapidity $(\eta)$ . The current CMS calorimeter performances are highly affected by the large amount of material in CMS tracker, in particular for a value of $|\eta|$ around 1.5 (left in Figure 2.6). The large amount of material used causes particle energy loss and scattering which affects tracking resolution. The Phase-2 tracker material budget (right in Figure 2.6) expects to be significantly decreased for an improvement in detector performances [2]. Due to the silicon sensors optimal operational temperature of −30°C [59] a cooling system is required. The new cooling system will be base on a CO2 twophase cooling to reduce the amount of passive material in the tracking volume [60]. As it will be highlighted in Section 2.6, the maximum power consumption and power density for the ASIC discussed in Chapter 4 is respectively 250 mW in one module and 312.5 mW in the other one. An excess in power consumption would overload the power converters and the cooling structure, increasing the temperature of the module. **Figure 2.6.** Material budget in the current (left) and future (right) CMS tracker detector. The material in front of the Inner Tracker is shown in brown, that inside IT tracking volume in yellow, that between IT and OT in green, and that inside the Outer Tracker tracking volume in blue [2]. • Radiation tolerance. The new CMS detector has to work efficiently up to a target integrated luminosity of 3000 fb<sup>-1</sup>. The Phase-2 tracker requires to be fully operational for 10 years without maintenance for the Outer Tracker (OT). On the other hand, the Inner Tracker (IT) will be more accessible giving the option to extract the IT and replace those modules that accumulated too high radiation damage. FLUKA simulations have been performed to estimate the radiation damage of different detector regions, which is about one order of magnitude higher than the one used to project the current CMS tracker [61], [62]. The integrated particle fluence corresponding to a total of 3000 fb<sup>-1</sup> is shown in Figure 2.7. The maximum value in the innermost regions is about $2.3 \cdot 10^{16} \, n_{EO}/cm^2$ , corresponding to a Total Ionizing Dose (TID) accumulated in 10 years of 12 MGy (1200 Mrad), as shown in Figure 2.8. Both fluence and TID decrease along the radius direction moving away from the beam axis. The OT modules are at distance from the beam axis ranging between about 20 cm to 120cm collecting around 100Mrad in the regions closer to the beam axis. Radiation hardening techniques need to be employed in the development and design of custom readout electronics [2]. **Figure 2.7.** Integrated particle fluence for the Phase-2 tracker corresponding to 3000 fb<sup>-1</sup> of proton-proton collisions at 14 TeV, obtained with FLUKA [2]. **Figure 2.8.** Total Ionizing Dose for the Phase-2 tracker corresponding to 3000 fb<sup>-1</sup> of proton-proton collisions at 14 TeV, obtained with FLUKA [2]. # 2.3 CMS trigger system at HL-LHC The current CMS detector is designed to measure precisely physical properties of leptons, photons and jets, together with other particles, in proton-proton (pp) collisions at LHC. Protons collide at a center-of-mass energy of 14 TeV and a instantaneous luminosity of $10^{34}\,\mathrm{cm^{-2}s^{-1}}$ with a rate larger 1 GHz. Out of millions of collisions only $\sim$ 20 are interesting for CMS physics program. At the LHC the proton beams will cross each other at a frequency of 40 MHz. Thus, due to the limit in events storage for offline analysis, CMS detector delegates to a Trigger Data Acquisition (Trigger DAQ) system the work of selecting events of potential physics interest for new discoveries. CMS Trigger DAQ system uses two levels of event selection, as shown in Figure 2.9. The first level (L1) of the CMS trigger is implemented in custom hardware with a fixed latency. In the current CMS detector, within $4\mu s$ of a collision the L1 trigger system decides if an event is selected or not, using only information from the calorimeter and muon detectors. This step allows reducing the total events rate of $40\,\mathrm{MHz}$ to interesting events rate of $100\,\mathrm{KHz}$ . A second level trigger, called High-Level Trigger (HLT), composed of array of commercially available computers running high-level physics algorithms, manages to achieve a value of accepted interesting events rate of $100\,\mathrm{Hz}$ [63], [64]. Figure 2.9. Data flow in CMS Trigger DAQ system [64]. In the HL-LHC the pp collisions increase from 20 to 200 would make the current Trigger DAQ system not efficient enough. The large amount of detected hits would cause random combinations of hits creating a combinatorial background that would affect eventually the event reconstruction. A solution could be to increase the L1 trigger rate, but it has been demonstrated that this would not be enough. Therefore, the help of tracker information used in real time for the L1 trigger decision is fundamental to be able to reconstruct events efficiently improving substantially the rejection of combinatorial background [65], [66]. Therefore, an average L1 trigger rate of 750 KHz with a maximum latency of 12.5 $\mu$ s has been found to be optimal to reconstruct events efficiently [45]. # 2.4 Innovative approach: the $p_T$ module concept The new HL-LHC will be characterized by an increased number of collisions that will generate a large amount of hit data at each Bunch Crossing (BX). The new tracker should provide real time data from each BX important for the L1 trigger decision while respecting the power budget and bandwidth limitations. As explained in Section 2.2.2, material budget in the tracker should be reduced to not affect calorimeter performance. Therefore, an increase in power cables and transmission lines in the tracker would not be a viable solution. The L1 trigger should use a limited set of data from the tracker. Figure 2.10 (left) shows the number of tracks per event in function of $p_T$ . In other words, it shows the $p_T$ spectrum, normalised per BX, for a pileup of 400. **Figure 2.10.** The pT spectrum, averaged per BX, (on the left) for charged particle that generate hit data at a radius of 25 cm for a pileup of 400 pp collisions per BX. The fraction of tracks with $p_T$ less than the given $p_T$ [67]. It is important to notice that particles with a transverse momentum $p_T < 0.7$ are not of interest for the trigger decision since they do not reach other subdetectors due to the bending power of the 3.8T magnetic field [67]. Figure 2.10 (right) shows that the fraction of tracks with $p_T$ less than 1 GeV is around 85%. The L1 trigger decision and event reconstruction require the development of an "intelligent" module, which exploits the power of the strong magnetic field over particle trajectories thanks to two closely spaced layers made of silicon strip or pixel sensors with a pitch of around 100 $\mu m$ . This is referred to as $p_T$ -module, shown in Figure 2.11. **Figure 2.11.** Illustration of the $p_T$ -module concept [2]. The module performs a binary readout providing one bit information per channel to indicate whether the charge deposited by the particle, passing through the silicon sensor, is above the threshold or not. A particle, hitting the module, would create one cluster per layer (black squares in Figure 2.11). The particle energy and transverse momentum are inversely proportional to the particle trajectory curvature. The front-end electronics on the module must be able to locally select high- $p_T$ particles that thanks to their high energy are less affected by the strong magnetic field of 3.8 T. On the other hand, low- $p_T$ particles, falling below a certain threshold, are locally rejected. An accepted pair of clusters is called **stub** and this is the information that is sent out at every BX to the Trigger DAQ system to make the L1 trigger decision [68]. At the same time, the front-end electronics needs to store in a memory the full event data read upon the reception of a L1 trigger. The maximum time the full event data remains in the memory is 12.6 $\mu$ s, before it is discarded [45]. The ultimate objective of the $p_T$ threshold is to reduce the data rate processed by the L1 trigger decision and to maintain an adequate resolution. With a threshold set to 2 GeV/c and L1 trigger rate of 750 KHz data rate reduces of a factor 10 [69]. #### 2.5 CMS tracker structure The CMS Tracker is the subdetector closest to the collision point extending up to 1.2 m radially from the beam axis. As seen in section 2.2.2, the radiation levels are higher in the innermost regions and gradually decrease for modules along the radius. Moreover, modules closer to the collision point require higher granularity and resolution. According to the distance from the beam axis, the CMS tracker is composed of two parts: - Inner Tracker (IT) extends between 3 cm and 20 cm from the beam axis. At this distance, it is impossible to detect particle trajectories bending due to the magnetic field, or, in other words, it is not possible to locally discriminate interesting high-p<sub>T</sub> particles. Therefore, the IT does not participate to the L1 trigger decision. The main requirement for the module that will be developed for this region is the radiation hardness. As seen in section 2.2.2 the TID collected at about 3 cm from the beam axis over 10 years of operation will be of 1 Grad. - Outer Tracker (OT) extends between 20 cm and 120 cm from the beam axis. The distance from the collision point allows exploiting the p<sub>T</sub>-module concept. Moreover, the TID expected over 10 years of operation is 100 Mrad, still challenging. # 2.6 The p<sub>T</sub> modules in the CMS Outer Tracker CMS Outer Tracker design, presented in Figure 2.12, has been optimized in terms of performance and material used. Assuming the collision point at the coordinate (0,0), six cylindrical barrel layers of modules are used for $|z| < 1200 \, \text{mm}$ . For $|z| > 1200 \, \text{mm}$ five "end-cap" double-discs. This disposition of modules allows an efficient and robust event reconstruction [70], [71]. Moreover, in order to respect the material budget requirement two different $p_T$ -modules are being developed for the Outer Tracker: - **Pixel-Strip(PS) modules**, in blue in Figure 2.12, are 7084, disposed at a radius distance between 20 cm and 60 cm, covering an area of $60 \,\mathrm{m}^2$ . They consist of a pixel layer and a strip layer and provide the particle trajectories in $\phi$ and z coordinates. - **Strip-Strip(2S) modules**, in red in Figure 2.12, are 8424, disposed at a radius larger than 60 cm, covering an area of $150 \,\mathrm{m}^2$ . Since they have two strip layers, they only provide the particle trajectories in $\phi$ coordinate. **Figure 2.12.** Tracker Layout section (one quarter) and $p_T$ -modules disposition PS modules need to offer higher resolution, therefore, a precise measurements of the z coordinate is achieved thanks to a pixelated sensor. However, a pixel sensor cannot be used everywhere in the tracker due to the higher power consumption and, consequently, higher material needed for cooling. ${\rm CO_2}$ two-phase cooling will be used to remove heat from electronics and sensors. This technology helps to reduce the amount of passive material in the tracking volume. For thermal performance, the tracking system cannot operate at room temperatures since damage induced by traversing particles would render it inoperable earlier than the expected 10 years of operation: electric currents trough the sensors increase linearly with radiation damage. Luckily, these currents are also exponentially dependent on the temperature, can be largely reduced by running at low temperatures. The design requirement is to achieve a sensor temperature of $-20^{\circ}{\rm C}$ or lower with a coolant temperature of $-30^{\circ}{\rm C}$ for innermost modules. Solution in Figure 2.12 is the result of many studies and represents the optimum solution for tracking performance, cost of the silicon and material in the tracking volume. #### 2.6.1 2S-module structure The **Strip-Strip (2S) modules** are composed of two micro-strip sensors of $10 \times 10 \,\mathrm{cm}^2$ . Each layer contains 1016 strips, long 5 cm and distant 90µm for a total sensitive area of $\sim 150 \,\mathrm{m}^2$ as shown in Figure 2.13. The connectivity between the sensor and the 2S Front-End hybrid [72] is implemented via wirebonds at one of the sensor extremities. 16 strip readout ASICs, namely CMS Binary Chips (CBCs) [73], allow transmitting the reconstructed signal to two Concentrator Integrated Circuits (CICs) that compress and format the data packet and transmit it to the tracker back-end via a 5 Gb/s transceiver (LpGBT [74]) and an optical converter (VTRx+ [75]). A DC/DC converter [76] mounted on the module assembly provides the power to the whole module electronics. **Figure 2.13.** Exploided view of 2S module structure (top). Section of the hybrid showing the connection between the two sensor layers and the readout electronics (bottom). #### 2.6.2 PS-module structure The **Pixel-Strip** (**PS**) **modules** are composed of two sensor layers of $5 \times 10\,\mathrm{cm}^2$ as shown in Figure 2.14: the outer one is segmented in 1920 strips, long 2.5 cm and distant 100µm, while the internal one segmented in 30720 "macro-pixel" of size $100\mu\mathrm{m} \times 1.5\,\mathrm{mm}$ . The pixelated sensor provides precise measurements of the z coordinate, information used for the L1 trigger decision. The PS module exploits wirebonds only for connectivity between the strip sensor and the hybrid. However, for the MPA ASIC that reads out the macro-pixel a different technology needs to be used. The pixel size has been chosen to allow using the flip-chip bump-bonding technology, a standard industrial process that represents a viable and affordable solution for large-scale production. 16 Short Strip ASICs (SSAs) and 16 Macro Pixel ASICs (MPAs) read out the strip and pixel sensors, transmitting all the data to two Concentrator Integrated Circuits (CICs) that compress and format the data packet and transmit it to the tracker back-end via a 10 Gb/s transceiver (LpGBT [74]) and an optical converter (VTRx+ [75]). Two DC/DC converters [76] are mounted on the module assembly and provide the power to the whole module electronics. Also the auxiliary electronics, as DC/DC converter [76] and transceiver (LpGBT [74]) integrated on the module assembly are bump-bonded on the hybrid. **Figure 2.14.** Exploided view of PS module structure (top). Section of the hybrid showing the connection between the two sensor layers and the readout electronics (bottom). # 2.6.3 Charge detection in the analog front-end electronics MPA, SSA and CBC ASICs include one analog Front-End (FE) each to readout data from their sensors. As a matter of fact, they have the same component but with slightly different specifications. In order to understand all the functions of a readout ASIC a general description of the analog FE is provided. The design of an analog FE represents one of the biggest challenges in the construction of a strip (as SSA and CBC ASICs) or pixel (MPA ASIC) detector readout ASIC. As general example, in the following paragraph a complete description of the analog FE from the pixel sensor to the readout of the signal coming from the particles hits is reported. A small octagonal bump pad is used to connect directly the pixel sensor to the FE electronics. Each pixel sensor can be modelled with its own detector capacitance ( $C_{det}$ ) that is obtained by summing up the capacitance to the backside of the sensor and the one with the neighbour pixels, as shown in Figure 2.15. Figure 2.15. Capacitance model of a pixel sensor. Figure 2.16 shows the complete structure of an analog FE [49] composed of: • The **Charge-Sensitive Amplifier** (CSA) is a voltage amplifier with high open loop gain A able to convert an input charge Q<sub>in</sub> to a voltage V<sub>out</sub>. Normally, it has a capacitive feedback C<sub>f</sub>. In the ideal case the amplifier acts as an integrator having: $$V_{out} = \frac{Q_{in}}{C_f}$$ Actually, a small residual voltage Vin remains at the input and corresponds to: $$V_{in} = \frac{Q_{in}}{C_{in} + (1+A)C_f},$$ where $C_{in}$ is the input capacitance obtained by summing up the detector capacitance $C_{det}$ and the preamplifier input capacitance $C_{amp}$ ; • The **Feedback circuit** is required to set the DC-operating point of the Charge-Sensitive Amplifier and to remove signal charges from the input node (or from C<sub>f</sub> after the dynamic response of the amplifier) so that the CSA output voltage returns to its original value; #### Chapter 2. The Challenges of State of the Art CMS Outer Tracker Upgrade - The **Leakage Compensation Circuit** is usually implemented to sink all or a significant fraction of the leakage current; - The **Shaper** is simply a band-pass filter that limits the bandwidth of CSA output signal. It is useful to cut low and high frequency noise contributions introduced by sensor leakage and input device; - The **Discriminator** is an electronic block capable of comparing the shaper output to a threshold value. There will be a 'high' value until the shaper output is above the threshold. When it goes under the threshold the output will go to 'low' value. Thus, the readout is done with only one bit of resolution. The threshold value is globally distributed to all the pixels but each pixel is provided of a Trimming DAC able to set the threshold locally; - The Test Charge Injection permits to verify the correct operation of the FE electronics by applying a known voltage step to a well-defined calibration capacitor C<sub>inj</sub>. The signal at the output of the analog FE is a one bit information ready to be further processed on-chip in the digital domain before being transmitted out. Figure 2.16. Block diagram of a generic analog FE circuit. #### 2.6.4 Digital readout system The complexity of the digital logic highly depends on the used technology node and on the readout rate for the target application. For medical applications the readout rates are very low, in the order of 10-100 frames per second. Generally, medical and X-ray imaging applications implement a readout system based on synchronous or asynchronous counters per pixel that count the number of hits in a specific time window [77], [78]. The ASIC periphery scans periodically the pixel array incrementing the counters and transmitting data packets [79]. As technology node advances digital logic allows implementing more features directly on-chip. In some cases, physics experiments that require a sufficient low readout rate implement a "trigger-less" detector readout system able to transmit synchronuosly the full pixel image, raw data or encoded information, at the event rate [80]. Compression or suppression techniques can be employed to reduce the readout bandwidth. Among the suppression techniques, one of the most used is the zero-suppression one: it consists in creating an encoded frame only with pixel that actually detected a signal over the set discriminator threshold. Some examples of readout ASICs for trigger-less detectors are CLICpix [81], Timepix [82], Velopix [83] and ToPix [84]. As already discussed in Section 2.3, the amount of data per BX at the level of the CMS Tracker is too high to be fully transmitted with a trigger-less approach. Moreover, for the first time, the future CMS upgrade will use data from the tracker for L1 trigger decision. Therefore, readout ASICs as MPA, SSA and CBC, beyond the analog (FE) described in Section 2.6.3, implement digital processing logic on-chip to perform at the module level an intelligent discrimination by filtering out signals related to particles that are not interesting for the event reconstruction. All the three ASICs should implement features to be compliant with the concept of $p_T$ -module explained in Section 2.4. # 2.6.5 Data aggregation The readout ASICs presented in Section 2.6.4 transmit out not only real time information, but also the full event upon reception of an L1 trigger. In order to utilize more efficiently the bandwidth of the links and reduce the total number of links needed to readout the whole detector, it is necessary to compress and merge data coming by a group of readout ASICs. The aggregation and compression of data are performed by a data concentrator ASIC mounted directly on the hybrid, called Concentrator Integrated Circuit (CIC). It allows reading, aggregating and merging MPAs or CBCs output data formats. The result of the compression is a new data packet with BX counter for real time information and Event counter for the event information. While compressing data it is able to monitor the single readout ASICs and transmitting out status flags for each of them independently. The thesis will focus on the design and development of the CIC in the context of the CMS Outer Tracker Upgrade. #### 2.7 PS module and 2S module studies For the CMS Outer Tracker Upgrade [2] two different $p_T$ -modules will be used: namely the PS module and the 2S module. As discussed already in Section 2.11, both modules are composed of two closely (few mm) spaced silicon sensors. The particle trajectories are bent by the high magnetic field (3.8T). Correlating the information from the two layers allows to evaluate the incident particle transverse momentum ( $p_T$ ). Highparticles, above a certain threshold, are more interesting for the scope of the CMS experiment research. The pairs of hits in the two sensors of a module, called stubs, are sent out synchronously at 40 MHz to be analyzed, and kept in a memory for 12.6 $\mu$ s waiting for the Level-1 (L1) trigger decision. In general all the ASICs in a module receive clock and fast commands (needed for chips synchronization) by the Low-power GigaBit Transceiver (LpGBT) ASIC [74] operating as a serializer at the module output and sending out data via the optical fiber. Figure 2.17 shows the expected stub rate in the CMS Outer Tracker obtained via physics Monte Carlo simulations. According to the distance from the collision point PS modules or 2S modules are employed. Figure 2.18 shows the cross section of the PS module. The readout of the two sensor layers is done by 16 SSA ASICs [4] and 16 MPA ASICs [3]. On the other hand, in the 2S module the readout of the two strip layers is done by 16 CBC ASICs [73]. All these three front-end ASICs, after the analog front-end, present Figure 2.17. Stub rate in CMS Outer Tracker [2]. Figure 2.18. PS module cross section. mainly two independent data paths: one for stubs data and one for L1 data. Stubs data are sent out synchronously at every Bunch Crossing (BX), while L1 data, full event image at a specific BX, are sent out asynchronously only upon the reception of a L1 trigger that follows a Poisson distribution and has an average nominal frequency of 750 kHz. In the 2S module the CBC is able to readout directly both sensor layers and create a stub. In the PS module the SSA reads out the strip layer and sends the coordinates of the hits to the MPA that forms a coincidence matrix between readout strips and MPA readout pixels and is able to create the stubs that will be transmitted out. Figure 2.19 shows a block diagram of MPA-SSA architecture and one can easily recognize the two independent data paths in the digital domain: in orange the stub data path, in blue the L1 data path. Figure 2.19. MPA-SSA architectures. MPA-SSA architecture is out of the scope of this thesis (for more details consult [15]). However, the UVM simulation environment, that will be introduced later in this chapter, has assisted also the development and the studies of these two ASICs. Two different technologies have been selected for the three front-end ASICs: 130nm CMOS for CBC and 65nm for MPA and SSA. These choices were driven by the need to optimize the required performance in terms of speed, power consumption and density, while limiting the development and production risks and costs [2]. # 2.7.1 Data bandwidth estimation for $p_T$ -modules The foreseen Phase-2 upgrades at the LHC, discussed in Chapter 2, present very challenging requirements for the front-end readout electronics of the CMS Outer Tracker detector. High data rates in combination with the employment of a novel technique for rejecting locally low transverse momentum particles as well as the strict low power consumption constraints require the implementation of an optimized readout archi- tecture and specific interconnect synchronization schemes for its components. The innovative approach of $p_T$ -modules allows reducing the amount of data to be sent out, but not enough to fulfill bandwidth requirements. A Concentrator ASIC, performing data compression, needs to be implemented to fulfill bandwidth requirements, without affecting particle recognition efficiency. Figures 2.20 and 2.21 show that, implementing a Concentrator Integrated Circuit (CIC) ASIC, both PS-module and 2S-module present an output data reduction. As a matter of fact, eight MPA ASICs have a total output bandwidth of 15.36 Gbps. However, implementing on the PS-module a CIC there is a data compression factor above 3. Following the same approach, eight CBC ASICs have a total output bandwidth of 15.36 Gbps, and implementing the CIC ASIC on the 2S module allows performing a data compression with a factor 8. Figure 2.20. PS module structure with ASICs bandwidth. Figure 2.21. 2S module structure with ASICs bandwidth. # 2.7.2 Concentrator ASIC in $p_T$ -modules It is clear that implementing a CIC ASIC can substantially reduce the data bandwidth in both modules creating a compact data packet at their output. The required concentrator chip, called CIC, will have two independent paths called stubs data path and L1 data path and receives in both modules a total of 48 input lines operating at 320 MHz: - 40 lines (5 from each FE-Chip) for stub data sent out synchronously at each Bunch Crossing (BX) for the L1 trigger decision. The CIC has to perform stub data sorting according to the stub bend, collecting stubs from 8 front-end ASICs (MPA or CBC) over 8 BXs. - 8 lines (1 from each FE-Chip) for full sensor raw data only when a L1 trigger is received. The CIC stores L1 words coming from 8 front-end ASICs over 8 BXs in 8 FIFOs and send out the full packet of L1 raw data when ready. The nominal L1 trigger frequency is 750 kHz. Figure 2.22 shows the total number of input and output lines chosen for the CIC ASIC. Figure 2.22. CIC input/output lines. The CIC ASIC is mostly a digital chip. Therefore, a UVM simulation environment is very useful to assist the development and the studies that allow optimizing the CIC architecture keeping a good readout chain efficiency and reducing the amount of data being transmitted out of a module. The UVM framework development will be presented in Chapter 3. # 2.8 Radiation effects on CMOS technology Commercial electronic devices are not qualified to work in a high radiation environment and are not suitable for specific application for aerospace, medical and High Energy Physics (HEP). A particle with high energy electromagnetic radiation can cause a parametric degradation and, eventually, a functional failure of the electronic device [85]. In LHC experiments high energy primary and secondary particles represent the main threat for the electronics. Radiation effects on CMOS circuits depend on charge, energy and mass of the particle and on the CMOS technology itself. All the radiation effects can be grouped in two main categories: - Cumulative effects, as Total Ionizing Dose (TID) and Displacement Damage (DD), develop gradually during the whole lifetime of the electronic device causing a slow degradation of device parameters and ultimately a functional failure; - Non cumulative effects, as Single Event Effects (SEE), do not affect device parameters, but are a consequence of the energy deposited by one single particle traversing the electronic device sensitive area causing a flip in a memory cell or a latch-up. The functional failure can be temporary or permanent according to the specific circuit component. # 2.8.1 Cumulative effects: Total Ionizing Dose and Displacement Damage Electronic device radiation cumulative induced effects can be classified in two subgroups: Displacement Damage (DD) effects and Total Ionizing Dose (TID) effects. DD effects are very common in space [86] and HEP applications [87]. When particles, as electrons, hadrons (neutrons, protons, etc.), heavy ions and $\gamma$ -rays hit the sensitive device can cause a lattice defect by scattering an atom from its original position in the crystalline structure. However, it has been demonstrated that CMOS technology is insensitive to DD effects. Thus, a more detailed description of DD effects is out of the scope of this thesis but can be found in [88], [89]. On the other hand, CMOS technology is highly sensitive to cumulative effects. The most common are TID effects due do deposited radiation dose (measured in rad) in the form of ionization energy in the electronic device. Figure 2.23 shows the main radiation effects due to ionization energy. Figure 2.23. Effect of ionising radiation in MOS devices. The incident ionizing radiation create electron-hole pairs in the device. The recombination rate is very high in the gate (polysilicon or metal) and in the substrate because they are characterized by low resistance. On the other hand, in the $SiO_2$ only a fraction of them recombine very rapidly just after their formation. The electron-hole pairs that do not recombine are separated in the silicon oxide by the electric field. With the assumption that a positive bias is applied to the gate, electrons characterized by high mobility can escape very quickly (few ps for a thickness of some nm) from the gate oxide to the gate itself. On the contrary, holes can only move towards the $SiO_2$ -Si interface and are trapped causing an accumulation of positive charge. The amount of accumulated charge is proportional to the number of defects in $SiO_2$ . Thus, it is fundamental to employ a technology with very high quality gate oxide in order to be more robust to holes accumulation. Holes can be trapped very quickly and can be de-trapped by annealing. However, in most recent CMOS technologies the $SiO_2$ gate oxide is so thin that other effects depending on the the quality of Shallow Trench Isolation (STI) oxides and spacers have a dominant contribution. STI, employed to isolate CMOS transistors from each other, experience the same problems of charge accumulation presented for the gate oxide. In particular, due to radiation exposure, holes can accumulate at the interface between STI and substrate forming interface states. Accumulation of positive charge in STI can induce parasitics currents at the channel edges and between neighbour transistors. The combination of cumulative effects presented so far affect the expected behaviour of a MOS transistor degrading performances of both pMOS and nMOS transistors. In nMOS, trapped holes tend to decrease the threshold voltage, while charge trapped in the interface states tend to increase it. Since these two effects have different dynamic, at the beginning of the irradiation the threshold voltage decreases. Then the slow formation of interface states prevails and the threshold voltage increases. This phenomenon is called rebound effect. For pMOS transistors instead both the trapped holes and the formation of interface states tend to increase the threshold voltage in absolute value. #### Other effects are: - the increase in leakage currents in nMOS transistors due to accumulation of positive charge in STI that can cause lateral parasitic channels below the STI itself [90], [91]; - degradation of both nMOS and pMOS mobility due to radiation-induced interface traps. Consequently also the transistor transconductance decreases [92], [93]. A more detailed and complete description of TID related effects in modern CMOS technologies can be found in [85]. # 2.8.2 Single Event Effects Single Event Effects (SEE) are non cumulative radiation induced effects caused by a single particle with high energy, as heavy ion or in the specific application protons, passing trough the device sensitive area. When an ionizing particle hits the silicon in the proximity of a p-n junction, the generated electron-hole pairs could be split and collected provoking a current spike [94]. Each technology node has a different threshold energy needed to flip the state of a node. In CMOS technology, mainly three different effects can appear: - **Single Event Upset** (SEU), is an instantaneous reversible modification of the logic state of an elementary memory cell. Most recent technologies are more sensitive to SEUs because the threshold energy required to flip a node became smaller. In general SEU are not destructive, but need to be taken into account when designing a digital circuit, especially when causing functional errors; - **Single Event Transient** (SET), is a spike on a net or on a combinatorial logic gate. The circuit should be robust to SET, especially for clock and reset lines; - **Single Event Latch-Up** (SEL), a high energy particle can cause a short circuit between power lines damaging permanently the device. In particular, circuits are more sensitive to SEL when they present the combination parasitic p-n-p-n structures [95]; In general, the rule of thumb is the following: smaller the device or technology node, smaller is the energy required to flip its state. On the other hand, larger is the energy of the incident particle, larger the probability that it causes a Single Event Upset (SEU) or Single Event Transient (SET). The measure of the energy of an ionizing particle traversing the device is called Linear Energy Transfer (LET), measured in MeV·cm<sup>2</sup>/mg. In general SEE can affect memory elements, as flip-flops or latches, leading to Single Event Upset (SEU), or combinatorial elements, as standard cells or nets in the design, leading to Single Event Transient (SET). In case the memory element is used in the control path or a critical path for the creation of the header of the output packet it is fundamental to make it robust against SEU. On the other hand, if the memory element is used to store data information, due to limited power budget, radiation hardening techniques could not be used. Therefore, in this latter case, an SEU will corrupt this data for the given packet and one should calculate the expected inefficiencies at different LETs. In the second case, in which a charged particle hits a combinatorial element, a glitch, namely a SET, of a certain length proportional to the injected charge can propagate and cause problems in case is sampled by a flip-flop. SEU hardness is one of the biggest challenge for all digital circuits used for detectors for HEP applications. # **3** p<sub>T</sub>-modules studies and system level simulation environment The CMS Outer Tracker detector consists of 2S modules and PS modules deployed in large number featuring an optimum efficiency in particle recognition. Both modules share a high complexity due to the high data rate combined with the employment of a novel technique to locally reject low transverse momentum particles, while respecting a strict low power consumption constraint. The development of the ASICs needs to be optimized in terms of readout architecture and specific interconnect synchronization schemes for their components. The ASICs complexity, alignment procedures and timing control of some signals on the hybrid for proper chip-to-chip communication require the development of an efficient verification framework. Nowadays, the verification of an hardware design takes a large part of the design effort for developing it. In this chapter, I will describe the verification framework, based on SystemVerilog and UVM methodology employed to study, develop and implement the full chain of ASICs from the analog front-end circuit to the optical fiber at the output of the module. Physics events coming from Monte Carlo simulations allow identifying the best architecture at system level to minimize the bandwidth together with the power consumption, with particular focus on the performances and sizing of the Concentrator Integrated Circuit (CIC), that is employed in both modules. In particular, the proposed architecture presents an efficiency of > 98% in particle selection and a data reduction from $\sim 30\, {\rm Gb\, s^{-1}cm^{-2}}$ to $0.7\, {\rm Gb\, s^{-1}cm^{-2}}$ while limiting the total power consumption to 250 mW for PS module or 312 mW for 2S module. The Concentrator Integrated Circuit (CIC) has been designed to have a maximum chain inefficiency in high-p<sub>T</sub> particle information transmission of < 2% and being configured to work for two different modules and for different locations in the CMS # Chapter 3. pT-modules studies and system level simulation environment Outer Tracker. At the same time, the full event readout architecture is sized in the different ASICs and ultimately in the CIC to guarantee no loss of synchronization in case of FIFO overflow occurring in high pileup conditions. Publications related to this chapter: [1], [5], [6], [10], [15]. # 3.1 Advantages of a UVM simulation environment MOS technology scaling pursue increased integration density, higher speed of operation and lower dynamic power dissipation encouraging digital employment whenever possible. Nowadays, modern chip functionalities are preferably implemented in the digital domain and ASICs more dense, more complex and, consequently, harder to be verified. For these reasons, there is the need of a verification environment that allows simulating, developing and verifying digital ASICs. Digital ASICs can be described in Verilog language, that is a Hardware Description Language (HDL), that is synthesizable and allows a description at two different levels: - Gate Level (GL), that describes directly logic gates and interconnections between them. - Register Transfer Level (RTL), that models the logic at an higher level describing the architecture via data flow between registers. A verification environment does not need to be synthesizable and, therefore, can be much more powerful and can be described at an higher level of abstraction. SystemVerilog language provides all the Verilog features plus additional ones very helpful in ASICs design verification like: - **Classes** allow organizing the testbench functionalities into sub-components that can be reused or extended. - Constrained random stimulus generation give the possibility to reach as many states possible in the design. - **Interfaces** for efficient communication between classes and the Design Under Verification (DUV). - **Transaction Level Modeling** (TLM) description provides an higher level of abstraction with respect to RTL description, extensively used in a testbench to obtain better simulation performance in terms of run-time. - **Synchronization methods**, as semaphores, allow to control the good synchronization between DUV and testbench components. - Assertions, provide methods for checking temporal and/or functional properties in the design. - **Code and functional coverage**, measure the progress of all tests in fulfilling a verification plan. On top of SystemVerilog, Universal Verification Methodology (UVM) is a methodology for functional verification using a set of standardized libraries of SystemVerilog. UVM methodology provides some additional features like: - **Phases** provide to the testbench a precise organization and an order of execution for multiple functions. - UVM base class allows, via 'uvm\_object', creating a new component with already defined set of methods. - **Object factory** provides a common way to register and create components giving the possibility to fully configure each component at run-time. - Configuration database is a common global database that allows to configure tests in a simple way and independently. The UVM testbench is organized in such a way that it can easily reused. Moreover, it allows to instantiate, connect and perform simulations of more ASICs working together with extreme accuracy. # 3.2 UVM simulation environment for $p_T$ -modules The UVM simulation environment is applicable to any size of design, but thanks to his debug features, high re-usability and efficiency is particularly convenient to develop it for a large design as the one presented in this chapter. Figure 3.1 shows a general block diagram for the developed testbench. In general, a UVM framework, is composed of different building blocks: • **Design Under Verification (DUV)**, are the multiple ASICs or the single ASIC to verify. It can be described at RTL level, at GL after synthesis, or with the final netlist including parasitics and back annotated delays. **Figure 3.1.** General block diagram of the UVM framework. - **Stimuli generation**, is the block that provides stimuli to the DUV and to the Reference model according to the specific format exploiting SystemVerilog randomization constraints. - **Interface components**, are modules able to connect input stimuli to the DUV and output of the DUV to the testbench to be decoded. - **Testbench top** allows instantiating the DUV and connecting it to the testbench framework via interfaces. - Reference model, is a golden model that predicts the correct results from the provided stimulus. It can be described in SystemVerilog and it does not need to be synthesizable. - **Testbench environment**, is UVM specific and contains multiple, reusable verification components. It allows to define the order of the UVM building phases and describe the connection of the input stimuli to both the reference model and the DUV. - **Scoreboard**, provides the results of the comparison between the DUV output with respect to the reference model expected outputs. - **Test case**, is the test chosen from a test cases library developed for the specific DUV. Tests can be applied once per simulation so that each test has a different scoreboard and coverage. Due to the high complexity and the size of the design, the development of a UVM simulation environment allows modeling the behaviour of the ASIC (or ASICs) in TLM speeding up the simulation time, while exploiting all the interesting features explained in Section 3.1. Following subsections will explain in detail the implementation of each UVM simulation environment building block. # 3.2.1 UVM active Agent for stimuli generation Every input stimuli is generated at TLM level using a specific UVM Verification Component (UVC), called UVM Agent. A UVM Agent, represented in Figure 3.2, in general contains three main components: - **Sequencer**, provides stimuli in an abstract form, registered as transaction at TLM level, called sequence items. - **Driver**, connects the Sequencer to the interface and applies sequence items to the interface according to the interface protocol. - **Monitor**, allows to check the activity on the interface collecting coverage metrics about which inputs have been applied to the DUV. **Figure 3.2.** UVM Agent for stimuli generation. The agent described so far is called active agent because it actively drives stimuli into the DUV. The DUV requires a proper stimuli generated at the testbench level. The DUV is a particle readout chain and in this specific case will receive: - External clock and fast command, the testbench provides a clock to all ASICs together with a line for fast commands. The fast command line allows sending to all the modules in the experiment specific commands to re-synchronize them all together. - **I**<sup>2</sup>**C configuration signals**, the testbench includes an I<sup>2</sup>C master that is able to communicate with all the ASICs via I<sup>2</sup>C protocol. - Particle hits, they emulate the particles hitting the double sensor layer. Particles trajectories can be generated using random constrained values or directly provided by Monte Carlo physics simulation from CMS. It is very interesting to simulate with Monte Carlo input data because this situation represents exactly what will happen in the target application. The latter kind of stimulus is the most complex to generate because needs to reproduce precisely particles collisions products. In particular, it is important to control independently the average number of high-pT particles and low-pT particles: only the first ones would generate interesting tracks that are called stubs, while the second ones would create some inefficiencies to the readout that needs to be evaluated. Stimuli are transferred via an interface to the DUV and, at the same time, to the Reference model. # 3.2.2 UVM passive Agent for DUV output At the output of a DUV one can reuse the UVM Agent as building block disabling both sequencer and driver. Such a UVM Agent, shown in Figure 3.3, is called passive and can be used when the only purpose is to collect DUV activity and to monitor which features of the ASIC have been exercised. **Figure 3.3.** UVM Agent for stimuli generation. #### 3.2.3 Reference model in TLM The Reference model is the most delicate building block to create. It is a golden model of the ASIC functionality described at TLM level of abstraction. All the stimuli, as slow and fast control, and particle hits emulation, are received by the DUV and have also an effect in the reference model. All the functionalities of a $p_T$ -module are summarized in about 1000 lines of code at high level of abstraction. The model needs to be as accurate as possible because all the outputs of the DUV will be compared to the output of the reference model to extract ASICs efficiency and to evaluate different design architectures. The developed reference model is based on transmitted data packets and provides reference outputs upon the reception of particle hit generated stimuli. # 3.2.4 UVM Scoreboard component The scoreboard is the UVC that allows comparing two items in a UVM simulation environment. In this specific case, it is used to compare expected data packets from the reference model against actual outputs from the DUV. Since the testbench has been developed for multiple ASICs working together, dedicated scoreboards have been developed for each ASIC. Thanks to this approach it is possible not only to find mismatches at run time between the DUV and the Reference model, but also to know if these mismatches are related to an ASIC hardware limitation (for instance, the ASIC reaching the bandwidth saturation or the transmitting FIFOs being full) or a bug in the RTL code. In the latter case the ASIC that is failing can be detected thanks to the display of output data from the Reference model and the DUV. Moreover, a report for each ASIC is created at the end of the simulation summarizing the total number of data packets processed from the DUV. In this way the single ASIC efficiency and the total readout chain efficiency for the particle recognition can be computed [6]. Figure 3.4 shows, as general example, the scoreboard summary at the end of a simulation. In this specific case, stubs data are collected and while the SSA and MPA are fully efficient, the CIC ASIC is showing some bandwidth limitations. # 3.2.5 Library of test cases The UVM simulation environment is a mosaic of several components and blocks connected to the DUV. This reusable and complex organization allows to stimulate **Figure 3.4.** UVM Scoreboard summary for stub data collected at the output of each ASIC of the readout chain. the DUV via test case chosen from well thought library set. A test case is a separate class registered in UVM factory that configures specific stimuli in the UVM framework and apply them to the DUV and to the reference model. Output are compared at run-time and final statistics are available via a scoreboard. More test cases have been developed to verify different functionalities of the DUV and also to see which are the main limitations linked to a specific architecture. The objective of test cases is to explore and exercise as more as logic as possible to reach the highest value of coverage. Nowadays, due to the high complexity of ASICs verification is metric driven meaning that only tests actively increasing the overall coverage are launched. # 3.2.6 Code and functional coverage Each test case from the library can be launched several times using a random seed for the simulation. The result of the simulation is written in a report at run-time together with the final scoreboard. Handling a large number of tests and many runs for each of them is quite complex and time consuming. Therefore, the vManager verification platform has been exploited to keep track of the tests. This platform has a graphical interface and is well integrated with UVM, especially with uvm report macros. The vManager tool parses the reports of each test looking for these macros (such as uvm info, uvm error, uvm fatal, etc). Therefore, a UVM simulation environment exploiting these report macros allows the vManager tool to easily classify a test as passed or failed. Moreover, for each passed test coverage of the selected DUV is collected. When having many tests is fundamental to merge their coverage to check that all the functionalities of the DUV have been exercised. # 3.3 CIC ASIC studies for two different pT-modules The developed UVM simulation environment can be configured at compilation time to test two different readout chains: half of a PS module readout chain consisting of 8 SSAs, 8 MPAs and 1 CIC ASICs or half of a 2S module readout chain consisting of 8 CBCs and 1 CIC ASICs. Simulating half of a module is enough to prove that also the other half works because the module is completely symmetric. The objective is to find the optimum CIC ASIC architecture with respect to bandwidth, power consumption, SEU robustness and compatibility with both systems. The CIC architecture has been developed in the context of the full readout chains to prove efficient communication with the other ASICs. Figure 3.5 shows the DUV used to develop the PS module readout chain. In this case the RTL models of SSA and MPA have been used and this allowed more complete studies for the chains. **Figure 3.5.** DUV for PS module. On the other hand, Figure 3.6 shows the DUV used to develop the CIC architecture for the 2S module. In this case the CBC RTL code was not available and an emulator has been developed in TLM, while a customized driver has been developed to pass data via the interface to the input of the CIC. Figure 3.6. DUV for 2S module. The CIC ASIC is highly configurable, its output bandwidth is programmable according to the $p_T$ -module position within the CMS Outer Tracker. For instance, the output data lines can operate at a frequency of 320 MHz or 640 MHz per line. In addition, only for stub data, the CIC can send out packets on 5 or 6 lines, and to transmit information related to the particle curvature in the CMS magnetic field, or instead to use the available bandwidth to provide a larger number of stub coordinates. This approach has been chosen to allows optimizing the power consumption at CMS Outer Tracker level reducing the overall power budget and trimming locally, per module, the required bandwidth. The PS-module, being closer to the collision point, expects higher radiation doses. Moreover, it implements a pixel layer, consequently it requires larger output bandwidth and has more strict power consumption constraints. For these reasons, the CIC working in the PS module represents a worst case scenario for the CIC ASIC itself. # 3.3.1 Stub occupancy and efficiency studies for PS-module The UVM simulation environment allows performing efficiencies studies on different ASICs architectures or configurations. Figure 3.7 shows the stub efficiency transmission at different level of the PS-module readout chain. The green area represents the expected stub occupancy in different locations in the CMS Outer Tracker, as explained in Section 2.7. The blue line represents the MPA efficiency in forming and transmitting stubs and is above 98 % up to a sub occupancy of 8 stubs per BX per module. At the level of the CIC the number of stubs will be much larger because it collects data from 8 MPA ASICs over 8 BXs. The orange and green lines represent the CIC efficiency in transmitting stubs when two different output frequencies are used, respectively, 320 MHz and 640 MHz frequency. The drop seen in simulation is around 40 % at the maximum expected stub occupancy. On the other hand, the test case performed doubling the CIC output bus frequency to 640 MHz, and, consequently doubling the bandwidth, shows that the readout chain works as expected with an efficiency close to 100 % It is clear as most of the PS-modules will use the 640 MHz output frequency, consuming a bit more but not being limited in bandwidth. Therefore, depending on the position of the PS module in the Outer Tracker, it will be needed to configure the CIC for the best trade-off between power budget and data bandwidth [6]. **Figure 3.7.** PS module efficiency at the MPA output and CIC output for two different output frequencies [6]. #### 3.3.2 L1 data studies for PS-module CMS Outer Tracker readout electronics, especially in the innermost PS modules closer to the collision point, will have to cope with 300 pileup events per BX at the HL-LHC. A first level (L1) trigger rate of 750 kHz and a latency of 12.6µs are required for efficient full event selection [2]. Upon the reception of a L1 trigger from the Trigger DAQ, on one PS-module 16 MPA ASICs combine and collect data from 16 SSA ASICs and provide the zero-suppressed full event data to two CIC ASICs. The input event rate is 32 Gbps, while the output bandwidth per module is 1.28 Gbps. The limited output bandwidth and the random nature of trigger arrival times require a temporary on-chip storage on MPAs and SSAs ASICs. The block diagram in Figure 3.8 depicts a high-level picture of the front-end readout chain. The main points taken into consideration during the implementation have been: - buffer inefficiencies below $10^{-6}$ for a pileup of 300 and a trigger rate of 1 MHz. - SEU robustness at chip level; - SEU robustness and synchronization at multichip level; Figure 3.8. CMS Outer Tracker half-module readout chain. The UVM simulation environment allowed simulating using the worst case scenario with a pileup of 300, a L1 trigger rate of 1 MHz and SEU injection at each BX. Figure 3.9 reports the CIC latency measured in BXs in the PS module for nominal conditions for PS module: occupancy of 200 pileup events and trigger rate frequency of 750kHz. At the same time Figure 3.10 and Figure 3.11 show the latency at the same nominal conditions due, respectively, to the MPA and to the full chain: 8 SSA ASICs, 8 MPA ASICs and 1 CIC. #### 3.3.2.1 FIFO sizing methodology The queueing theory, described in paper [96], relates the buffer inefficiency with the input/output (I/O) rate ratio, also called service rate (R), for different buffer sizes (1, 2, 4, 8, 16 and 32 buffer stages) as shown in Figure 3.12. **Figure 3.9.** CIC latency for L1 data packets measured in BXs at the nominal trigger rate frequency of 750 kHz. **Figure 3.10.** MPA latency for L1 data packets measured in BXs at the nominal trigger rate frequency of 750 kHz. When the I/O rate ratio is fixed by the system architecture, as in the case of PS module and 2S module, the buffer sizing depends only on the acceptable system inefficiency. This methodology has been employed for the sizing of each FIFO in both readout **Figure 3.11.** Full PS-module chain latency for L1 data packets measured in BXs at the nominal trigger rate frequency of 750 kHz. chains to obtain an overall inefficiency below $10^{-6}$ . #### 3.3.2.2 MPA-SSA buffer sizes and data rate studies The FIFO architecture for MPA and SSA has been studied, analyzed and optimized during CIC development. Figure 3.13 shows a diagram block of the FIFOs implemented **Figure 3.12.** Buffer inefficiency with respect to I/O rate ratio. on SSA (in orange) and on MPA (white) in order to minimize losses below $10^{-6}$ . Figure 3.13. FIFO-based architecture of the front-end ASICs. Upon the reception of a L1 trigger featuring a rate of approximately 1 MHz, the SSA ASIC needs to store in a FIFO several events before sending out to the corresponding MPA, a fixed size data packet containing the full raw data event. For the SSA FIFO the input rate is represented by the trigger rate, while the output rate is represented by the frequency of the data packet transmission, which is given by construction by one serial data line operating at 320 MHz divided by the size of an event packet in bits (192 bits). The service rate (R) is given by the following formula: $$R = \frac{\text{Input Rate}}{\text{Output Rate}} = \frac{1 \text{ MHz} \times 192 \text{ bits}}{320 \text{ MHz} \times 1 \text{ line}} = 0.6$$ Using Figure 3.12, in order to have a buffer inefficiency below $10^{-6}$ a FIFO length of 16 is required. The same applies for the Raw Pixel FIFO in the MPA ASIC, but in this specific case the input rate depends on the cluster occupancy. Therefore, one must look at the cluster occupancy at pileup 300 to size the FIFO for the worst case conditions. The Monte Carlo simulation of minimum bias events at pileup 300 over the CMS Outer Tracker barrel geometry [97], allowed extrapolating the expected cluster occupancy for the innermost barrel layer, as shown in Figure 3.14. **Figure 3.14.** Cluster occupancy at pileup 300 in the innermost barrel layer (cylinder external surface). The highest cluster occupancy per module, where one module features two sensors on top of each other, is 110 clusters, but only one layer is read out by the MPAs. The maximum cluster occupancy is thus 55 clusters per 16 MPAs, i.e. four clusters per MPA. Therefore, the input rate is given by the trigger rate, approximately 1 MHz, multiplied by the cluster occupancy per MPA. On the other hand, the output rate is given by the frequency of the Pixel Encoder that encodes one cluster per cycle at 40 MHz. Thus, assuming a worst case local cluster occupancy of 16 clusters per MPA, the service rate (*R*) is given by the following formula: $$R = \frac{\text{Input Rate}}{\text{Output Rate}} = \frac{1 \text{ MHz} \times 16 \text{ clusters}}{40 \text{ MHz} \times 1 \text{ clusters}} = 0.4$$ Using again the graph in Figure 3.12, in order to have a buffer inefficiency below $10^{-6}$ a FIFO length of 8 is required. The same approach can be used for all the other FIFOs in the system. All the FIFOs are implemented as random access memory using library latches and can then be read using pointers allowing to save power. Master Counters are used to count the number of received triggers. These counters are triplicated to guarantee a robust synchronization between different ASICs, even in presence of SEUs, and the Finite State Machines (FSM), that are also triplicated, rely on them to match the correct data packet to be sent out. #### 3.3.2.3 CIC buffer sizes and data rate studies The same approach described in Section 3.3.2.2 has been used for the CIC ASIC. Figure 3.15 shows a block diagram of how the CIC should be designed. **Figure 3.15.** FIFO-based architecture of the Data Concentrator ASIC. Since it is possible that the front-end ASICs might lose some packets, the CIC implements an additional FSM performing additional checks on received packets and handling also out of sync front-end ASICs. The CIC should implement a flag for front-end ASIC in order to inform the back-end of any problem with the corresponding front-end. After developing the analytical approach, the full readout chain has been simulated to take into consideration all the possible effects due to a complex and long chain of FIFOs. Spanning 25 ms with a pileup of 300 and a Poisson distributed trigger rate of 1 MHz a buffer inefficiency below $5 \times 10^{-6}$ has been found. This value is slightly higher than the calculated one, due to the complexity of the system. #### 3.3.3 SEU simulation The UVM simulation environment presented in Chapter 3 allows performing SEU simulations, but only on the final netlist and not on the RTL. The developed procedure consists in extracting all registers, latches, clock nets, reset nets and all critical paths from the implementation tool (Innovus in this case). Once the lists are available, one can elaborate them together with the final netlist. During simulation, it will be possible to simply deposit via the UVM simulation environment a '1' or '0' on a path and see what is the effect at chip and a system level. The unawareness of the simulation tool of the standard cell placement does not allow injecting more than one SEU/SET per single BX. Due to the high number of flip-flops and nets in the CIC ASIC the simulation run time is extremely long. For these reasons, an additional intermediate step has been added to create for general subcategories, e.g. group A, group B, group C for triplicated registers/latches and group N for data registers. The UVM simulation framework can then handle the injection of multiple registers/latches belonging to the same group in the same BX. On the other hand, when injecting on the data path it is better to keep a low SEU rate to study the effects of SEU on the ASIC efficiency. Eventually, when injecting on reset nets or general nets in the chips it is better to inject at maximum one SEU per BX to check that there are no mistakes in reset triplication or nets simplification. # 3.4 LpGBT communication for pT-modules As last step, to verify proper communication protocols and emulate delays on the hybrid a black box model of the LpGBT with the related FPGA have been used to verify the full $p_T$ -modules readout chains and the dedicated FPGA. Figure 3.16 depicts the full testbench including the LpGBT in the DUV. The LpGBT is the ASIC that provides clock and fast commands to all the other ASICs in the modules and also handles data packets coming from the CIC, insert some Forward Error Correction (FEC) techniques, as FEC5 and FEC12, for data robustness and send them out through an optical link. An FPGA in the back-end will then decode what is receiving and reconstruct the data packet information. The verification of the correct data at the output of the LpGBT is performed with SystemVerilog assertions and UVM report macros that assisted the debug steps. # 3.5 UVM simulation environment features and performances The developed UVM simulation environment allows to perform clock-cycle accurate behavioral simulations with multiple ASICs. The framework is common to all tests and the user has the possibility to change stimuli to the ASICs by selecting different test cases. In the target application a correct synchronization among all the ASICs is fundamental. The developed framework allows detecting synchronization issues and sending a synchronization request via a fast command signal to all the ASICs to recover synchronization. Focusing more at the CIC ASIC level, the input sampling at its input is fundamental for data integrity. When the CIC receives a data stream, it has to perform a proper sampling in the middle of the eye diagram to correctly read the data. This is solved in the CIC by using phase aligners surrounded by digital control circuitry. In order to align incoming data with the internal CIC clock an automatic procedure that requires specific patterns from the FE-chips has been set up and verified. Moreover, the CIC stub data path has to collect data over 8 BXs. Therefore, the CIC should identify exactly the first BX out of 8 BXs. A 8 BX data stream is called word. A digital block, namely the word alignment controller, has been developed to perform an automatic word alignment that needs specific training patterns from the front-end ASICs. Both phase alignment and word alignment procedures have been Figure 3.16. UVM framework with LpGBT and FPGA added to the DUV. # 3.5. UVM simulation environment features and performances verified via the UVM simulation environment. More details about the CIC architecture are described in Chapter 4. # 4 Development of the Concentrated Integrated Circuit ASIC This chapter describes the design and implementation of the Concentrator Integrated Circuit, namely CIC ASIC, which is used for data compression in two different modules of the CMS Outer Tracker: Pixel-Strip (PS) module and Strip-Strip (2S) module for the Phase-II upgrade in the High-Luminosity LHC. The CIC ASIC allows reducing at module level the overall number of output links and the output bandwidth. It collects data packets coming from eight upstream front-end chips (MPAs in the case of PS module and CBCs in the case of 2S module), formats data into packets containing the stub information from eight BXs and the raw data selected through the first level (L1) trigger and transmits them to the LpGBT. The CIC inputs are six differential bitlines for each of the front-end ASICs (five lines for stub data and one line for L1 data) at 320 MHz. The outputs are seven differential lines (six for stubs data and one for L1 data) at 320 MHz or 640 MHz in the innermost layers PS modules. All the bitlines between front-end ASICs and CIC, and between CIC and LpGBT are differential (SLVS). Due to two different power distribution networks, the CIC digital core is powered at 1V in the PS modules and at 1.2V in the 2S modules, while the custom SLVS drivers and receivers [98], [99] are always powered at 1.2 V. Two different data streams are generated by the CIC to the back-end: stub data stream and L1 data stream. The CIC applies a factor of data reduction of 10 by aggregating data over time and space and can select and forward to the back-end a maximum of 40 stubs among the 192 potential input stubs. Upon the reception of a L1 trigger the CIC can aggregate raw data coming from 8 front-end chips and create a new packet containing all the front-end hit clusters. The #### Chapter 4. Development of the Concentrated Integrated Circuit ASIC frame size is flexible and depends on the number of hits in the module. Trigger rates of 1 MHz have been used to size the FIFOs at system level. To limit the power consumption, different architectures have been evaluated and compared to fulfill the tight power consumption of 250 mW for CIC in PS-module and 312.5 mW for CIC in 2S-module. A 65 nm technology has been selected as compromise between power consumption and radiation tolerance. The digital cell library has been characterized taking into consideration the temperature and radiation effects, and to evaluate the critical timing corners. Radiation hardening design techniques have been employed to achieve the capability to resist up to an integrated ionizing dose of 200 Mrad. To mitigate the effect of radiation related Single Event Effects (SEE), the ASICs implement a Triple Module Redundancy (TMR) technique. The control logic, including the clock and the configuration logic have been fully triplicated, while data path is left untriplicated due to the power budget constraints. For the design implementation, a hierarchical Digital-On-Top flip-chip methodology was employed. A first silicon prototype, not SEU resistant, incorporating all required functionalities for operation in PS modules and 2S modules has been realized in 2018. A second and final version of the CIC ASIC, incorporating all required functionalities for operation in PS modules and 2S modules has been realized in 2019. Publications related to this chapter are [7],[10] and [11]. # 4.1 Transmission of high- $p_T$ particles primitives The CMS Outer Tracker readout electronics at the HL-LHC will have to cope with 300 pileup events per BX for the innermost modules, close to the collision point. For the first time, after concept of "stub" has been introduced, data are sent out synchronously by the front-end chips at every BX and they will be used for L1 trigger decision. Due to the high amount of data produced at module level, the CIC ASIC has to perform averaging over space and time to fulfil bandwidth requirements, as following: - on one PS module, 16 Macro Pixel ASICs (MPA) [3] collect strip centroids information from 16 Short Strip ASICs (SSA) [4] and provide stub packets every 2 BXs to two concentrator ASICs (2 CICs [7]). - on one 2S module, 16 CBC ASICs (CBC) [100] provide stub packets every BX to two concentrator ASICs (2 CICs [7]). # 4.2 CIC general architecture Figure 4.1 shows the full block diagram of the CIC ASIC. It is mainly a digital ASIC, where the only analog blocks are twelve phase aligners used to properly sample 48 channels on data input lines. The phase alignment working principle is described in Section 4.3. The CIC ASIC has mainly two independent data paths: one for stub data described in Section 4.1 and another for L1 data described in Section 4.4. Other input signals such as clock and fast command are also at high frequency, however they do not require phase alignment because a fixed phase difference is ensured at system level. The reset is asynchronous and the slow control signals operate at maximum frequency of 1 MHz. More details about control commands will be presented in Section 4.5. # 4.3 Input data phase alignment The CIC ASIC receives data packets from 8 front-end chips via a total of 48 bitlines operating at 320 MHz. The incoming data have a priori unknown phase that depends on PS module and 2S module construction and PVT variations. A full custom analog **Figure 4.1.** Block diagram of the entire CIC ASIC. phase aligner has been chosen from available blocks to resample the data trough the CIC internal clock in the middle of the eye diagram and make data ready for the processing logic. Figure 4.2 shows a block diagram of the phase aligner. The main part Figure 4.2. Block diagram of the analog phase aligner. of the phase aligner is the Master Delay-Locked Loop (DLL) composed of a delay line, a phase detector, a charge pump and low pass filter. The delay line, made of 16 delay cells, receives in input a 160 MHz clock creating a delayed clock signal. The phase detector allows putting out a signal proportional to the phase difference between the input clock and the feedback delayed clock at the same frequency. The charge pump converts the phase difference information into a voltage. The low pass filter removes high frequency components. The Master DLL is locked when, trough the feedback loop, the phase error reaches zero and the control voltage becomes stable. In these conditions each delay cell has a latency of 390 ps. The locked/unlocked status of the Master DLL is stored in a register readable via I<sup>2</sup>C. Four data delay lines, controlled via the same voltage control signal of the Master DLL, receive at the input the unknown phase data and create several delayed data lines that are sent in output. At this point in the digital domain these lines are multiplexed and the digital logic selects the optimum delayed data line to be sampled. The optimum phase can be found for each channel independently. Once the module is built and PVT conditions are stable the optimum selected phase will not change. However, due to the high number of input data lines (48 per CIC) and the high number of CIC in the CMS Outer Tracker (30000), it is necessary to develop an automatic procedure to find the optimum phase for each channel. Therefore, the CIC ASIC implements an automatic phase alignment based on high switching input rate in data packets coming from the front-end ASICs in order to lock the Master DLL and then selecting the optimum phase for each input channel one by one independently. This function has been implemented at digital level and allows resampling the input data ready to be used by the CIC core. I² read-only registers allow monitoring the lock of the Master DLL and of each channel. To guarantee full access to the CIC ASIC functionalities a backup solution consisting of a fixed phase mode is also implemented permitting the user to set the preferred phase via I²C registers for each channel independently. In the target application, each CIC ASIC will first use the automatic procedure to find the optimum phase alignment and then will be set to fixed phase mode in order to save power. All FSM used during the automatic procedure are clock gated to save power. #### 4.3.1 Input stub data format from MPA or CBC The MPA and CBC ASICs have different requirements and, therefore, slightly different output formats. Figure 4.3 shows the stub data format for CBC ASICs. In this case the stub information is composed of a synchronization bit at '1', and three bits (error, OR254 and overflow) for CBC status flags. In one BX the CBC can transmit up to three stubs composed of their coordinate in eight bits and bend information encoded in four bits. | | | CBC s | stub d | ata | | | |----|--------------------|--------------------|------------|------------|------------|-----------------------------------------------------------------------------| | | C <sub>1</sub> [0] | $C_{2}[0]$ | $C_3[0]$ | $B_{1}[0]$ | $B_{3}[0]$ | <u>Header</u> | | | C <sub>1</sub> [1] | $C_2[1]$ | $C_3[1]$ | $B_{1}[1]$ | $B_3[1]$ | <b>Sync1</b> : Synchronization bit at '1' | | | C <sub>1</sub> [2] | $C_{2}[2]$ | $C_3[2]$ | $B_{1}[2]$ | $B_3[2]$ | <u>Stub</u> | | × | C <sub>1</sub> [3] | $C_2[3]$ | $C_3[3]$ | $B_{1}[3]$ | $B_{3}[3]$ | $C_x$ : Coordinate of stub $x$ on 8 bits | | BX | C <sub>1</sub> [4] | $C_{2}[4]$ | $C_3[4]$ | $B_{2}[0]$ | Overflow | $\boldsymbol{B}_{\boldsymbol{x}}$ : Bend of stub $\boldsymbol{x}$ on 4 bits | | | C <sub>1</sub> [5] | $C_{2}[5]$ | $C_3[5]$ | $B_{2}[0]$ | OR254 | | | | C <sub>1</sub> [6] | C <sub>2</sub> [6] | $C_{3}[6]$ | $B_{2}[0]$ | Error | | | | C <sub>1</sub> [7] | $C_2[7]$ | $C_3[7]$ | $B_{2}[0]$ | Sync1 | | | | line 1 | line 2 | line 3 | line 4 | line 5 | | Figure 4.3. CBC stub format. On the other hand, Figure 4.4 shows the stub data format for MPA ASICs. The MPA, as described in 2.7, has an higher stub rate than the CBC and, therefore, requires 2 BXs long stub packet to cope with the fact that up to 5 stubs can be present in one BX and none in the following BX. Moreover, the MPA stub packet has an additional coordinate along the Z axis. Since the stub packet is 2 BXs long, the header has two synchronization bits, one at '1' for BX 1 and the other at '0' for BX 2, and three bits to express how many of the stubs in the 2 BXs packet refer to the first physical BX. In two BXs the MPA can transmit up to five stubs composed of their coordinate in eight bits, the band information encoded in 3 bits and the Z information in 4 bits. In both cases, the CIC has to create a unique output packet containing stubs from eight BXs coming from eight MPAs or eight CBCs. | | | MPA : | stub ( | data | | | |----|---------------------------|--------------|--------------|--------------|------------|-----------------------------------------------| | | $B_5[0]$ | $Z_{5}[3]$ | $Z_{5}[2]$ | $Z_{5}[1]$ | $Z_{5}[0]$ | <u>Header</u> | | | $C_5[2]$ | $C_{5}[1]$ | $C_{5}[0]$ | $B_{5}[2]$ | $B_{5}[1]$ | Sync1: Synchronization bit at | | | <i>C</i> <sub>5</sub> [7] | $C_{5}[6]$ | $C_5[5]$ | $C_{5}[4]$ | $C_5[3]$ | <b>Sync0</b> : Synchronization bit at | | 7 | B <sub>4</sub> [0] | $Z_4[3]$ | $Z_4[2]$ | $Z_4[1]$ | $Z_{4}[0]$ | N: Number of stubs in the first physical BX | | BX | $C_4[2]$ | $C_{4}[1]$ | $C_{4}[0]$ | $B_4[2]$ | $B_{4}[1]$ | physical bx | | | $C_4[7]$ | $C_{4}[6]$ | $C_{4}[5]$ | $C_{4}[4]$ | $C_4[3]$ | <u>Stub</u> | | | $B_3[0]$ | $Z_3[3]$ | $Z_3[2]$ | $Z_3[1]$ | $Z_{3}[0]$ | $C_x$ : Coordinate of stub $x$ on 8 | | | Sync0 | $C_3[1]$ | $C_3[0]$ | $B_3[2]$ | $B_3[1]$ | $B_x$ : Bend of stub $x$ on 3 bits | | | C <sub>3</sub> [6] | $C_3[5]$ | $C_3[4]$ | $C_3[3]$ | $C_3[2]$ | $\mathbf{Z}_x$ : Z-info of stub $x$ on 4 bits | | | $Z_2[3]$ | $Z_2[2]$ | $Z_2[1]$ | $Z_2[0]$ | $C_{3}[7]$ | | | | $C_2[1]$ | $C_{2}[0]$ | $B_2[2]$ | $B_2[1]$ | $B_{2}[0]$ | | | 7 | $C_2[6]$ | $C_2[5]$ | $C_{2}[4]$ | $C_2[3]$ | $C_{2}[2]$ | | | BX | $Z_1[3]$ | $Z_1[2]$ | $Z_1[1]$ | $Z_1[0]$ | $C_{2}[7]$ | | | | $C_1[1]$ | $C_{1}[0]$ | $B_1[2]$ | $B_1[1]$ | $B_{1}[0]$ | | | | <i>C</i> <sub>1</sub> [6] | $C_1[5]$ | $C_{1}[4]$ | $C_1[3]$ | $C_{1}[2]$ | | | | Sync1 | <i>N</i> [2] | <i>N</i> [1] | <i>N</i> [0] | $C_{1}[7]$ | | | | line 1 | line 2 | line 3 | line 4 | line 5 | | Figure 4.4. MPA stub format. # 4.3.2 Deserialization and word alignment The stub data processing logic, in the CIC core, receives 40 lines from eight different front-end readout chips, MPAs or CBCs. Once each single input signal is sampled with the good phase, as discussed in Section 4.3, the deserialization is performed singularly for each group of five lines coming different front-end. it is fundamental to rebuild the five bits word coming from each front-end ASIC according to the corresponding format described in Section 4.3.1. Figure 4.5 shows the full stub block diagram from the deserialization from input front-end channels to the word alignment steps until the stub selection logic and stub output formatter. A state machine allows selecting the good value of word alignment cells per line to rebuild the word at the level of the CIC core before any data processing. The essential requirement for an effective word alignment is that the front-end ASICs can send an easily recognizable and known pattern of eight bits. The CIC ASIC word alignment state machine compares the input word with an expected pattern stored in internal Figure 4.5. MPA stub block diagram. register, made configurable to be more flexible. The range of the word alignment is of three bits at $320\,\mathrm{MHz}$ . It can completely compensate for small difference between one line and the other, while the delay contribution coming from the hybrid itself is negligible. Once the word alignment has been performed, an $I^2C$ register can be read to confirm the procedure was successful. At this point the CIC ASIC is ready to process input data packets. ## 4.3.3 Bitonic Stub sorting algorithm The CIC ASIC collects all the stubs coming from eight front-end ASICs and processes them according to their importance with a period of 8 BXs. As discussed in Section 2.7, high- $p_T$ create a stub with low bending. Therefore, the CIC ASIC implements a sorting algorithm able to sort all the stubs received from eight front-end ASICs in eight BXs according to their bending value. The chosen architecture is based on bitonic merge sorting algorithm [101]. Such a sorting network is composed of comparators and switchers, and Figure 4.6 shows the basic cell performing the sorting. In general, when the sorting network is used on a given array of items the time required to sort all the items is fixed and proportional to $log_2 n$ , where n is the number of items to be sorted. However, as it is in our case, when the items are not available all at the same time the time required is proportional to $nlog_2n$ . Thanks to this algorithm it is not necessary to wait until all the stubs from 8 BXs are received to start sorting them but the sorting can be performed at the same time as they are received in a fast way. The CIC ASIC can sort 192 stubs maximum (three stubs per front-end ASIC per BX) and select the best 40 stubs to send in output. As example, Figure 4.7 shows, that using the presented approach, three steps are needed to sort eight items. # 4.3.4 Stub packet formatter The CIC ASIC creates a synchronous stub packet eight BX long containing stubs from eight front-end chips coming from eight BXs. In general, the CIC formats the stub data with an header and a payload. The header contains a synchronization bit, status bits, the BX counter that is essential to build the full picture at CMS experiment level, and the number of stubs present in the corresponding payload. The payload is composed of stubs tagged with the BX out of eight they refer to and the front-end chip number that created them. CIC output bandwidth and formatting is highly configurable to adjust efficiency and power consumption according to the module location in the CMS Outer Tracker. Two output formats, one for PS-module and the other for 2S-module, have been developed. Figure 4.8 shows the CIC output format in the case of the 2S-module. Since for 2S-modules the expected stub occupancy is not so high, the CIC ASIC can save some **Figure 4.6.** Basic sorting cell composed of a comparator and 2 multiplexers. Figure 4.7. Bitonic sorting example. power by being configured to send out data at 320 MHz and only on 5 lines. | | CI | C stub da | ata in 2S | | | | |----|---------------------------|---------------------|-------------|--------------------|--------------|---------------------------------------------------| | | 0 | 0 | 0 | 0 | 0 | <u>Header</u> | | | 0 | 0 | 0 | 0 | 0 | Sync: Synchronization bit at '0' | | | 0 | 0 | 0 | 0 | 0 | Status: CIC status and FE status | | 8 | B <sub>19</sub> [3] | $B_{19}[2]$ | $B_{19}[1]$ | $B_{19}[0]$ | 0 | BXID: BX ID for CMS experiment | | BX | C <sub>19</sub> [4] | $C_{19}[3]$ | $C_{19}[2]$ | $C_{19}[1]$ | $C_{19}[0]$ | N: Number of stubs in the 8 BXs packet | | | $FE_{19}[1]$ | $FE_{19}[0]$ | $C_{19}[7]$ | $C_{19}[6]$ | $C_{19}[5]$ | Stub | | | | O <sub>19</sub> [2] | $O_{19}[1]$ | $O_{19}[0]$ | $FE_{19}[2]$ | $O_r$ : BX Offset of stub $x$ on 3 bits | | | | | | | | $FE_x$ : FE ID of stub $x$ on 3 bits | | : | B <sub>1</sub> [0] | | | | | $C_x$ : Coordinate of stub $x$ on 8 bits | | | C <sub>1</sub> [1] | $C_{1}[0]$ | $B_1[3]$ | $B_1[2]$ | $B_{1}[1]$ | $\boldsymbol{B}_{x}$ : Bend of stub $x$ on 3 bits | | | <i>C</i> <sub>1</sub> [6] | $C_1[5]$ | $C_{1}[4]$ | $C_1[3]$ | $C_{1}[2]$ | | | | 01[0] | $FE_1[2]$ | $FE_1[1]$ | $FE_1[0]$ | $C_{1}[7]$ | | | | N[2] | N[1] | N[0] | O <sub>1</sub> [2] | $O_1[1]$ | | | (1 | BXID[1] | BXID[0] | N[5] | N[4] | N[3] | | | BX | BXID[6] | BXID[5] | BXID[4] | BXID[3] | BXID[2] | | | | BXID[11] | BXID[10] | BXID[9] | BXID[8] | BXID[7] | | | | Status[4] | Status[3] | Status[2] | Status[1] | Status[0] | | | | Sync | Status[8] | Status[7] | Status[6] | Status[5] | | | | line 1 | line 2 | line 3 | line 4 | line 5 | | Figure 4.8. CIC stub format for 2S-module. For high occupancy areas of the CMS Outer Tracker it is possible to increase the output frequency from $320\,\mathrm{MHz}$ to $640\,\mathrm{MHz}$ and/or disable the bend information in output permitting an higher number of stubs transmitted in output. Figure 4.9 shows the CIC output format in the case of the PS-module where using the 640 MHz output frequency and all the 6 lines is fundamental for innermost regions of the CMS Outer tracker. | | | CIC stu | b data in | PS-mod | ule | | | |------------|---------------------|---------------------|---------------------|---------------------|---------------------|---------------------|------------------------------------------------------------| | | $Z_{35}[2]$ | $Z_{35}[1]$ | $Z_{35}[0]$ | 0 | 0 | 0 | Header | | ×<br>× | C <sub>35</sub> [1] | $C_{35}[0]$ | $B_{35}[2]$ | B <sub>35</sub> [1] | B <sub>35</sub> [0] | $Z_{35}[3]$ | Sync: Synchronization bit at '1' | | 9 B | C <sub>35</sub> [7] | C <sub>35</sub> [6] | C <sub>35</sub> [5] | C <sub>35</sub> [4] | C <sub>35</sub> [3] | C <sub>35</sub> [2] | Status: CIC status and FE status | | 2 tc | O <sub>35</sub> [2] | O <sub>35</sub> [1] | O <sub>35</sub> [0] | $FE_{35}[2]$ | $FE_{35}[1]$ | $FE_{35}[0]$ | <b>BXID</b> : BX ID for CMS experiment | | BX 2 to BX | | | | | | | N: Number of stubs in the 8 BXs packet | | | | | | | | | Stub | | | | | | | | | Stub $O_x$ : BX Offset of stub $x$ on 3 bits | | | | | | | | | $FE_x$ : FE ID of stub $x$ on 3 bits | | | | | | | | | $C_x$ : Coordinate of stub $x$ on 8 bits | | | | | | | | | $\mathbf{B}_{x}^{\mathbf{x}}$ : Bend of stub $x$ on 3 bits | | | Z <sub>2</sub> [3] | $Z_{2}[2]$ | $Z_{2}[1]$ | $Z_{2}[0]$ | | | $\mathbf{Z}_{x}$ : Z-info of stub $x$ on 4 bits | | | C <sub>2</sub> [2] | $C_{2}[1]$ | $C_{2}[0]$ | $B_{2}[2]$ | $B_{2}[1]$ | $B_{2}[0]$ | | | | $FE_2[0]$ | C <sub>2</sub> [7] | C <sub>2</sub> [6] | C <sub>2</sub> [5] | $C_{2}[4]$ | $C_{2}[3]$ | | | (1 | Z <sub>1</sub> [0] | 0 <sub>2</sub> [2] | 0 <sub>2</sub> [1] | 02[0] | $FE_2[2]$ | $FE_2[1]$ | | | BX | B <sub>1</sub> [2] | $B_{1}[1]$ | $B_{1}[0]$ | $Z_1[3]$ | $Z_1[2]$ | $Z_{1}[1]$ | | | | C <sub>1</sub> [5] | $C_{1}[4]$ | $C_1[3]$ | $C_{1}[2]$ | $C_{1}[1]$ | $C_{1}[0]$ | | | | O <sub>1</sub> [0] | $FE_1[2]$ | $FE_1[1]$ | $FE_1[0]$ | C <sub>1</sub> [7] | $C_{1}[6]$ | | | | N[3] | N[2] | N[1] | N[0] | O <sub>1</sub> [2] | $O_{1}[1]$ | | | | BXID[3] | BXID[2] | BXID[1] | BXID[0] | N[5] | N[4] | | | | BXID[9] | BXID[8] | BXID[7] | BXID[6] | BXID[5] | BXID[4] | | | | Status[3] | Status[2] | Status[1] | Status[0] | BXID[11] | BXID[10] | | | | Sync | Status[8] | Status[7] | Status[6] | Status[5] | Status[4] | | | | line 1 | line 2 | line 3 | line 4 | line 5 | line 6 | | **Figure 4.9.** CIC stub format for PS-module. Figure 4.10 shows the maximum number of stubs the CIC ASIC can send out with respect to all the possible configurations for CIC stub data output, as type of front-end, chosen output frequency, transmission or not of bend information and transmission on five or six bitlines. | Front - End ASIC | | СВ | RC . | | MPA | | | | | | | | | | |---------------------------|---------------------------|----|--------------|-----------------|--------------|-----------------|--------------|-----------------|--------------|-----------------|--------------|-----------------|--|--| | Output frequency<br>(MHz) | | 32 | :0 | | | 32 | 0 | | | 64 | 10 | | | | | Number of output bitlines | | 5 | , | 6 | Ę | 5 | ( | 6 | 5 | 5 | 6 | | | | | | with without<br>Bend Bend | | with<br>Bend | without<br>Bend | with<br>Bend | without<br>Bend | with<br>Bend | without<br>Bend | with<br>Bend | without<br>Bend | with<br>Bend | without<br>Bend | | | | Maximum number of stubs | 16 20 19 25 | | | | 13 | 16 | 16 | 19 | 29 | 34 | 35 | 40 | | | Figure 4.10. CIC output configuration modes. The stub data path latency was an additional parameter to take into account during the architecture development. The data should be quickly processed by the readout chain and sent to the DAQ back-end in order to make the L1 trigger request. In the final architecture the stub data path latency is the same for both front-end ASICs and just below $400\,\mathrm{ns}$ # 4.4 Transmission of triggered raw data The future CMS Outer Tracker at the HL-LHC will have a L1 trigger rate of $750\,\text{kHz}$ and a latency of $12.6\,\mu\text{s}$ are required for efficient full event selection [2]. The system needs to fully work with inefficiencies below $10^{-6}$ with high pileup, up to 300, and an L1 trigger rate of 1 MHz to have some margin with respect to the nominal value of $750\,\text{kHz}$ , as described in the studies in Section 3.3.2. Upon the reception of a L1 trigger the CIC ASIC receives the full sensor raw image from the front-end ASICs: - on one PS-module, 16 Macro Pixel ASICs (MPA) [3] combine and collect data from 16 Short Strip ASICs (SSA) [4] and provide a zero-suppressed full event data to two concentrator ASICs (2 CICs [7]). - on one 2S-module, 16 CBC ASICs (CBC) [100] provide an unsparsified full event data to two concentrator ASICs (2 CICs [7]). A single CIC ASIC collects raw data packets, via one line per front-end operating at 320 MHz, from a total of eight front-end ASICs. Thanks to the single line transmission, no word alignment is needed for the L1 raw data path. The CIC objective is to provide to the DAQ system a tagged and zero-suppressed raw data packet, with BX counter and Event counter, easily recognizable via a unique header. # 4.4.1 Input raw data format from CBC or MPA The MPA ASICs and CBC ASICs have different output formats. In both cases only one bitline at 320 MHz is used to transmit the raw data packet. Figure 4.11 shows the raw data format for a CBC ASIC consisting of header and a payload. The header is composed of only two bits at '1'as start sequence, error flags and pipeline address for CBC debug purposes, and L1 ID to tag the event. The payload represents the full sensor image without any compression technique. Thus, a strip hit is represented by one bit at '1' in the payload data. | | Hea | ader | | Payload | | |----------------|-------------|------------------|--------|--------------|--------------| | Start sequence | Error flags | Pipeline address | L1ID | Channel data | Padding bits | | 11 | | | | | 000000 | | 2 bits | 2 bits | 9 bits | 9 bits | 254 bits | 28 bits | Figure 4.11. CBC raw format. Figure 4.12 shows the raw data format for MPA ASICs. The raw data packet from the MPA arrives at CIC input already compressed via zero-suppression technique and consists of an header field plus a payload field. The header is protected against SEU at MPA level and consists of a unique start sequence, flag bits and number of strip and pixel clusters present in the raw data packet itself. On the other hand, the payload field is not protected against SEU, due to power budget constraints that do not allow to fully triplicate all the logic, and contains all the strip and pixel clusters coordinates and information. | | | Hea | Payload | | | | | | | | | | | | | |------------------------------------|---|-------------|---------|-----------------------------|-----------------------------------------------|-----------------------------|-------|--------------------------|--------|-------|--|--------------------------|--------|--------|--| | | | | | | List of strip clusters List of pixel clusters | | | | | | | | | | | | | | | | | | | | Strip cluster coordinate | Width | MIP | | Pixel cluster coordinate | Width | Z info | | | Start sequence | | Error flags | L1ID | Number of strip<br>clusters | | Number of pixel<br>clusters | | | | | | | | | | | Start sequence with 19 bits at '1' | 0 | | | | 0 | | 0 | | | | | | | | | | 20 bits | | 2 bits | 9 bits | 5 bits | 1 bit | 5 bits | 1 bit | 7 bits | 3 bits | 1 bit | | 7 bits | 3 bits | 4 bits | | Figure 4.12. MPA raw format. In both cases, the CIC has to create a unique zero-suppressed packet with all clusters coming from 8 MPAs or 8 CBCs. CIC L1 path treats the entire pixel or strip frame sent by the front-end chip ASICs. The readout frame is triggered with a nominal rate of 750 kHz and can contain the raw sensor image, as in the case of the CBC, or zero-suppressed clusters packet, as in the case of MPA. Therefore, the CIC should be able to recognize a unique header and receive variable packet sizes. #### 4.4.2 CIC L1 input block The first block, in the CIC core, receives a total of eight lines from eight different frontend chips, MPAs, or CBCs. Once each single bitline at 320 MHz carrying the L1 data packet information is sampled with the optimum phase, as described in Section 4.3, eight equal L1 input block deserialize the corresponding data packet at 320 MHz. The objective of this block is not just to deserialize the incoming packets but to compress data. While MPA data only need deserialization at CIC level, CBC data require some manipulation. In particular, zero-suppression, by encoding cluster position and width, has been chosen to save bandwidth and create shorter packets already at the CIC input stage, in the L1 FE Block, at 320 MHz. This approach allows performing this operation without waiting for the full packet. This should be the only part of the CIC operating at 320 MHz. #### 4.4.3 FIFO controller Once input data are deserialized at 320 MHz and organized in clusters using zero-suppression, data should be transmitted out. However, due to the limited output bandwidth of 320 Mbps (via one bitline operating at 320 MHz) or 640 Mbps (via one bitline operating at 640 MHz), it is necessary to store data temporarily on a FIFO at 40 MHz. Due to the power budget constraints it is not possible to have the data path robust to SEU. Hence, it is necessary to split the FIFO in two FIFOs based on latches: one triplicated for the header, relevant for the good synchronization at system level, and one not triplicated containing cluster information. Figure 4.13 shows the block diagram of one pair of header-payload FIFO controlled by a FIFO controller. It is fundamental to protect the header FIFO because the FIFO controller compares the L1 ID master counter, triplicated counter for number of L1 trigger signals, with the L1 ID stored in the header FIFO. Several conditions can occur at this level. In the target application SEUs are the first threat to the correct working of the system. As a matter of fact, if the L1 ID in the header FIFO is smaller than the L1 ID master counter, it means that an SEU corrupted the data and the FIFO controller would discard the current data and check the L1 ID of the following packet. At the same time, the limited FIFO depth in combination with the Poissonian arrival of the L1 trigger requests leads to the possibility of having a Figure 4.13. Block diagram for FIFO controller. full FIFO condition and inevitably the loss of event data. Loosing a fraction of events does not compromise the overall efficiency of the detector as long as the exception of lost event(s) does not compromise the synchronization of the Data Acquisition (DAQ) readout chain and the efficiency is below $10^{-6}$ , as calculated in Section 3.3.2. To facilitate the exception handling of a full FIFO condition, a 7 bit counter is used to track the number of events to be readout even if the data FIFO are full. Missing events will be transmitted to the readout chain as empty events composed by a header without data payload. In the case that also the L1 ID master counter saturates, the CIC would transmit a special header with a flag set to inform the DAQ of the overflow condition. At CIC level, eight pairs of header-payload FIFOs have been implemented, one per front-end. The FIFO controller allows collecting data only from FIFOs from front-end that are sending L1 data packets and are correctly synchronized with CIC internal L1 ID master counter. In case of synchronization issues with a specific front-end the FIFO controller would raise a status flag in the output L1 packet for the corresponding front-end. The FIFO controller utilizes write and read pointers to write and read FIFOs and implements a specific algorithm to be more robust against SEU, it can discard packets with wrong L1 ID sending no clusters for the corresponding front-end and raising a flag. If the error was due to an SEU the following packet will be correct and the flag will be lowered and synchronization is guaranteed. In case of an hard overflow the synchronization would be lost lost and the flag would remain raised until the user sends a ReSync fast command or reset. Moreover, the FIFO controller directly communicates with the L1 Output formatter receiving a pull request when the previous packet has been fully transmitted out. The width of the data payload FIFO is 838 bits and its depth is limited to 16 entries to preserve silicon area and power consumption. ## 4.4.4 Priority encoder for clusters sorting Header FIFOs and payload FIFOs keep the raw data packet until the raw data packet formatter shown in Figure 4.14 is free to process a new packet. The raw data formatter block is implemented with a priority encoder that, operating at 40 MHz, allows reading up to eight clusters per cycle, tag them with the front-end chip number and save them in a temporary register. At the following clock cycle the encoder outputs are passed to an output register, ready to be serialized in output at 320 MHz or 640 MHz. Figure 4.14. Block diagram for the CIC raw data output formatter. The proposed solution allows creating the output data packet in a limited number of clock cycles while keeping a low power consumption. The FIFO can contain up to 31 clusters per front-end ASIC. At each 40 MHz clock cycle, one cluster, represented by a blue square in Figure 4.14, per front-end passes trough the priority encoder and is saved in the output register if the corresponding valid bit flag is raised, indicating that the packet actually contains a cluster. The implemented architecture allows handling very well a distributed number of clusters among the 8 front-ends. On the other hand, when one particular front-end has many more clusters than the others this system is not so efficient because it can read at maximum one cluster every 40 MHz reaching a maximum latency due to this step of 31 clock cycles at 40 MHz. Hence, the maximum number of clock cycles to read the clusters, tag them and form the output frame is 31 clock cycles at 40 MHz. The presented approach has been preferred to a 320 MHz single cluster reading and tagging approach that would have been too fast in preparing the output packet, not taking into account that the limitation of the chip is the output bandwidth. When there is no L1 data being transmitted in output, the latency of the first output packet is dominated by the output formatter operations to form an output frame. On the other hand, when data has already being transmitted, the main limitation is the output bandwidth that is limited to 320 Mbps or 640 Mbps (1 line at 320 MHz or 640 MHz). Since the L1 path consumes 60 % of the total power, it has been decided to have it running, where possible, at 40 MHz instead of 320 MHz. The new FIFO architecture, presented in Section 4.4.3 together with the priority encoder and output packet formatter that will be described in Section 4.4.5 consume eight times less than the previous architecture operating at 320 MHz. On the other hand, the L1 front-end blocks it was possible only to halve the power consumption due to the fact that some logic needs to remain at 320 MHz to perform CBC sparsification efficiently. Timing closure benefits from the choice of using 40 MHz in most of the blocks. #### 4.4.5 Output packet formatter Figures 4.15 and 4.16 show the CIC output L1 data packet when working respectively in the PS-module and in 2S-module. Both packets have a start sequence made of 27 bits at '1'. It has been demonstrated that this sequence is unique in both cases. The rest of the header is very similar: bits to flag synchronization errors related to the CIC itself or to a specific front-end chip, unique L1ID coming from the L1 ID master counter internal to the CIC, and eventually the number of strip clusters and pixel clusters. The packet has a variable length depending on the number of clusters. For PS-module up to 127 strip clusters and 127 pixel clusters can be transmitted out. However, there is a limitation for clusters coming from the same front-end chip that is 31 clusters. For 2S-module the output limit is of 127 strip clusters, with a maximum of 31 strip clusters per CBC. #### Chapter 4. Development of the Concentrated Integrated Circuit ASIC | | Header | | | | | | | | | | | | | | |-----------------------------------------------------------------------------------|--------|-------|-------|-------|-------|-------|-------|-------|-------|-----|--|--|---|--| | Start sequence Error flags L1ID Number of strip clusters Number of pixel clusters | | | | | | | | | | | | | | | | Start sequence with 27 bits at '1' | 0 | MPA 8 | MPA 7 | MPA 6 | MPA 5 | MPA 4 | MPA 3 | MPA 2 | MPA 1 | CIC | | | 0 | | | 28 bits 9 bits 9 bits 7 bits 1 bit 7 bits | | | | | | | | | | | | | | | | | Payload | | | | | | | | | | | | | | |---------|--------------------------|------------|-------|--|------------------------|--------------------------|--------|--------|--|--|--|--|--|--| | | List of stri | p clusters | | | List of pixel clusters | | | | | | | | | | | Chip ID | Strip cluster coordinate | Width | MIP | | Chip ID | Pixel cluster coordinate | Width | Z info | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 3 bits | 7 bits | 3 bits | 1 bit | | 3 bits | 7 bits | 3 bits | 4 bits | | | | | | | Figure 4.15. CIC raw data output format for PS-module. | | Header | | | | | | | | | | | | | Payload | | | | | |------------------------------------|--------|------|-------------------------------------------------|--|----|---------|-----|--|--|--|-------|-----------------------------|--------|------------------|--------------------------|-------|--|--| | | | | | | | | | | | | | | | List of strip cl | usters | | | | | | | | | | | | | | | | | | | Chip ID | Strip cluster coordinate | Width | | | | Start sequence | | | | | Er | ror fla | ags | | | | L1ID | Number of strip<br>clusters | | | | | | | | Start sequence with 27 bits at '1' | 0 | CBC8 | CBC 6 CBC 6 CBC 6 CBC 6 CBC 7 CBC 3 CBC 2 CBC 2 | | | | | | | | | | 0 | | | | | | | 28 bits | | | 9 bits 9 bits 7 bits 1 bits | | | | | | | | 1 bit | 3 bits | 8 bits | 3 bits | | | | | **Figure 4.16.** CIC raw data output format for 2S-module in zero-suppression mode. For 2S-module, an alternative output format can be transmitted configuring the CIC to do so. In this mode, mainly used for debug and analog front-end calibration purposes, the CIC ASIC simply sends out L1 raw data received by 8 CBCs without applying any data manipulation. # 4.4.6 Output serializer The output serializer allows sending out the formatted raw data packet with a configurable frequency, 320 MHz or 640 MHz. This operation is performed thanks to an handshake signal that is single ended and generated at every period of the clock at the input of the CIC ASIC. This part of the circuit has been demonstrated to be sensitive to | Header | | | | | | | | | | | | |--------------------------------------|--------|-------|-------|-------|-------------|-------|-------|-------|--------|------|--| | Start sequence | | | | Eı | rror/status | oit | | | | L1ID | | | Start sequence with 27 bits at '1' 0 | CBC 8 | CBC 7 | CBC 6 | CBC 5 | CBC 4 | CBC 3 | CBC 2 | CBC 1 | CIC | | | | 28 bits | 9 bits | | | | | | | | 9 bits | | | | Payload | | | | | | | | | | | | | | | | | | |---------|---------------|----------------------------|--------|---------------|--------|--------------|----------------------------|--------|--------------|--------------|----------------------------|--------------|--------------|----------------|----------------------------|--------------|----------------| | CB | 3C7 | From<br>CBC6<br>to<br>CBC1 | СВ | sco | CE | C7 | From<br>CBC6<br>to<br>CBC1 | CB | C0 | CBC7 | From<br>CBC6<br>to<br>CBC1 | CBC0 | CBC7 | Padding<br>bit | From<br>CBC6<br>to<br>CBC1 | CBCO | Padding<br>bit | | Error | Pipel<br>addr | | Error | Pipel<br>addr | L1ID | Chan<br>data | | L1ID | Chan<br>data | Chan<br>data | | Chan<br>data | Chan<br>Data | | | Chan<br>data | | | | | | | | | | | | | | | | | 0 | | | 0 | | 2 bits | 9 bits | | 2 bits | 9 bits | 9 bits | 2 bits | | 9 bits | 2 bits | 11 bits | | 11 bits | 10<br>bits | 1 bit | | 10<br>bits | 1 bit | Figure 4.17. CIC raw data output format for 2S-module in debug mode SEU. The serializer cannot be triplicated due to timing issues. An SEU on this part of the logic can flip only one bit in the output packet, but the handshaking signal allows a fast recovery for following bits of the same packet. The cross-section of the output serializers can be considered negligible. #### 4.5 CIC slow control and fast commands The CIC ASIC has dedicated slow control registers accessible via I<sup>2</sup>C protocol [102] and organized in four separate blocks of 32 triplicated registers each. All the blocks have the following elements: - one mask that allow writing only the specified bits of a register; - one synchronous SEU counter that allow counting the number of SEU synchronously with the 40 MHz clock used for the configuration; - one asynchronous SEU counter that allow counting the number of SEU asynchronously sensitive to error flag raised by the combinatorial voters that continuously check the any discrepancy among triplicated registers; - triplicated state machines to write and read configuration registers; - all the registers are triplicated, voted and clock gated to reduce power con- sumption. In order to avoid multiple bit upset, registers are placed, during implementation, at a distance of $15\mu m$ . In case a discrepancy, due to a SEU, among the triplicated registers values is detected, an enable signal activates the clock to correct the wrong value by one clock cycle at $40\,\mathrm{MHz}$ . Some of the configuration registers are used to select output operating frequency (320 MHz or 640 MHz), input front-end chips (MPA or CBC), number of output lines for trigger data (five or six lines), presence of stub bending information in the output packet, phase alignment and word alignment procedures, control and debug features for the phase aligners. A total of $300 \times 8$ -bit registers can be written and read via wishbone protocol with a customized SEU tolerant I<sup>2</sup>C slave block implemented at top level in the CIC ASIC [103]. Moreover, $200 \times 8$ -bit additional registers are read-only and they include SEU counters for each of the four registers blocks, the phase and word alignment values for each input line, successful Master DLL lock and single channel lock procedures for the phase aligners, successful word alignment procedure and eventually CIC flags that could be helpful during operation to identify possible errors. The communication between the I<sup>2</sup>C slave block in the CIC ASIC and the I<sup>2</sup>C master block in the LpGBT [74] has been proved to work as expected via RTL and back-annotated simulations using the UVM simulation environment, as described in Section 3.4. The CIC also receives a fast command, namely T1 command, operating at 320 MHz generated by the DAQ system and distributed to the whole CMS experiment. At the level of the CIC ASIC, it is not necessary to perform any phase alignment on this signal because it is distributed at module level together with a 320 MHz clock. The CIC ASIC should deserialize the T1 command line operating at 320 MHz and decode the eight bits word at 40 MHz. The eight bits word is a unique pattern consisting of three bits of sync code '110', four bits of command code and one bit of tail '1'. Table 4.1 reports the fast commands implemented in CIC ASIC to work synchronously with the rest of the module and the whole experiment. ReSync Soft reset of the ASIC, resets all state machines, BX counter, L1 ID counter and truncates all data being pro- cessed. It does not reset the static configuration. Orbit Reset Resets only the BX counter and the L1 ID counter. It does not reset state machines and static configuration. L1 Trigger Requires the transmission of the full raw data event in- crementing the L1 ID counter. Table 4.1 – List of fast commands used in CIC. # 4.6 Design and ASIC implementation #### 4.6.1 Technology The advantage of a scaled technology is reduced power consumption and, consequently, reduced material budget to cool down the full system. Moreover, a full characterization of 65 nm technology with TID effects has been performed and radiation models at different TID can be used to check the final implementation of the design against TID radiation effects [104]. # **4.6.2** Timing corners and TID effects Due to two different power distribution schemes in PS-module and 2S-module, the CIC digital core is powered respectively at $1.0\,\mathrm{V}$ and at $1.2\,\mathrm{V}$ , while the custom sLVS drivers and receivers [98] are powered at $1.2\,\mathrm{V}$ . Therefore, considering a $\pm 10\%$ the full range for the voltage supply would be from $0.9\,\mathrm{V}$ to $1.32\,\mathrm{V}$ that is the upper voltage supply limit for this $65\,\mathrm{nm}$ technology. In terms of temperature, the target application requires $-30\,^\circ\mathrm{C}$ for pixel sensor efficiency [59]). Therefore, taking into account some margin, the lowest temperature for a corner is $-40\,^\circ\mathrm{C}$ and the highest is given by room temperature, $25\,^\circ\mathrm{C}$ , with some margin, thus, $50\,^\circ\mathrm{C}$ . The third element for the corner is represented by the fabrication process, but it depends on the foundry. Beyond the typical process (TT), the two extreme ones are the fast-fast (FF) and slow-slow (SS) processes. Process-Voltage-Temperature (PVT) variations determine which corners need to be considered when closing timing. The set of corners for this ASIC is composed of four cases: two typical cases, one best case and one worst case. The choice of the typical case is straightforward: TT process, 1.0 V or 1.2 V voltage supply and 25°C. The choice of Best Case (BC) and Worst Case (WC) corners require some studies on the digital library with and without radiations models. Figure 4.18 shows the characterization performed for different corners with respect to the chosen typical case at $1.0\mathrm{V}$ not including radiations models. The propagation delay of few standard cells is shown in percentage with respect to the same standard cell in typical case at $1.0\mathrm{V}$ . Standard cells show a reduction of about 60% in FF process at $1.32\mathrm{V}$ with respect to the chosen reference. On the other hand, the largest increase is about 100% in SS process at $0.9\mathrm{V}$ and $-30^{\circ}\mathrm{C}$ with respect to the chosen reference. **Figure 4.18.** Delay propagation for some standard cells in different corners. Figure 4.19 shows a summary of the selected corners used for CIC implementation. In addition to PVT variations, the CIC ASIC in the target application will need to work for 10 years without being replaced and, therefore, should be able to sustain a TID up to 57 Mrad, as described in Section 2.2.2). TID effects in 65 nm technology can cause substantial change in CMOS parameters affecting ASIC performances. For this | Corner | Process | Voltage (V) | Temperature (°C) | |--------|---------|-------------|------------------| | ВС | FF | 1.32 | 0 | | TYP | TT | 1.2 | 25 | | TYP | TT | 1.0 | 25 | | WC | SS | 0.9 | - 30 | **Figure 4.19.** Table with corners used for CIC implementation. reason, additional sign-off timing analysis has to be performed with corners including radiations effects. Both standard and enclosed gate layouts of transistors of different size, with TID up to 500 Mrad have been irradiated and characterized [104]. Nowadays, the latest industry-standard CMOS model is the BSIM6, that represents a good trade-off between accuracy in modeling of real CMOS effects and computational efficiency [105]. Standard CMOS PDK based on BSIM6 does not include degradation effects due to TID. Therefore, an extension of the PDK has been developed by a collaboration between CERN and the University of Crete to include TID effects up to $500\,\mathrm{Mrad}$ [106]. The developed models are based on measurements with worst case bias scenario (maximum bias voltage equal to VDD). The new models have been validated simulating older prototyped designs, already tested and irradiated (SSA, described in [107]) and are in good agreement with measurements results from Digital Radiation (DRAD) test chip irradiated up to $500\,\mathrm{Mrad}$ , as described in detail in this article [108]. Developed models for a TID of $100\,\mathrm{Mrad}$ and in worst case bias can be used during CIC ASIC implementation and final sign-off timing analysis. In this case a safety factor of $\sim 2$ is applied to have some margin and have a more robust design. TID effects affect the performances of standard cells. For instance, the propagation delay of standard cells increases drastically according to the developed models. Figure 4.20 shows the degradation due to TID effects on SS corners at two different temperatures. They have been used for CIC sign-off timing analysis on the final design, but not during the flow because they have been considered too pessimistic [107]. **Figure 4.20.** Delay propagation for SS corners at two different temperatures with and without radiation models. #### 4.6.3 SEU hardening techniques and TID tolerance The CIC ASIC will be used in a harsh radiation environment. Ionizing particles will interact with the silicon latter of the ASIC causing Single Event Effects SEE as described in Section 2.8.2. The most important functionality of the CIC ASIC to be guaranteed at all times is the synchronization with the other chips on the same module and proper working of all control state machines, avoiding every inconvenient due to unknown state or undefined conditions that can lead to a ASIC loss of synchronization with the rest of the experiment. In the target application, a loss of synchronization requires directly a reset to restart collecting data correctly, causing the loss of data from the whole detector for a certain time window. The main objective for the design and implementation of the CIC ASIC is to study and identify all the critical registers and nets. This approach would allow to implement a circuit, robust to SEU, that never requires the issue of an external reset to recover from a loss of synchronization. On the other hand, SEUs affecting the data packets payload can present some flipped bits causing expected inefficiencies in the event reconstruction of the specific event. SEE hardening techniques have been developed, implemented and simulated to overcome issues that could arise in the target application. The most widely used technique is the Triple Module Redundancy (TMR). With this approach memory **Figure 4.21.** Example of the Triple Module Redundancy implementation in the CIC ASIC: control path and header path fully triplicated, data path not triplicated due to limited power budget. elements in the control path are instantiated three times, namely triplicated, and theirs outputs are voted and triplicated themselves before propagating to the rest of the logic. The voting cell allows propagating the voted value among the three memory elements. This approach is effective as soon as errors do not accumulate at the input of a voting cell. The condition to have a perfectly working architecture is that the voting step occurs with a frequency that is the same or higher than the expected SEU injection frequency, namely 40 MHz that corresponds to the particle collision rate in the target application. Figure 4.21 shows the triplication and voting approach used in CIC ASIC design. There are three identical state machines and therefore three clock distributed trough the chip, as will be discussed in Section 4.6.7. The three state machines normally should go trough the same states and get exactly to the same output. If the flip-flop information is corrupted by an SEU, the error is corrected right away by the voting cells. Since a single voter could be critical from an SET point of view, also the voters have to be triplicated. The voted outputs are then available to propagate to the next stage, used as feedback in the state machine itself or used directly in the data path logic. On top of the picture, the data path is not triplicated, makes use of the clock A distribution and uses voted outputs coming from the triplicated control logic. The described approach has been implemented for all the design thanks to the use of the TMRG tool [109], a triplication tool developed at CERN and implemented in python scripts. The tool is able to parse the original Verilog code, recognizing specific directives and keywords, and generate a triplicated version of the design at RTL level. Another solution would be to triplicate completely a block and vote only the outputs. However, this approach would accumulate SEU quicker since most of the logic works at 40 MHz that corresponds to the SEU rate. As described in Section 2.2.2, the power budget is very limited. Therefore, there are some parts of the logic that can be gated when not used. This is the case of the static configuration registers presented in Section 4.5. When the clock is gated the voting procedure would not work as expected. Therefore, together with the voting cell an error detection logic, that flags if the three inputs of a voter are different, has to be implemented. Consequently, the error activates the clock gating in order to correct the corrupted configuration register value. The error detection logic allows activating the logic only for a limited amount of time saving dynamic power consumption, but accepting to have a small leakage power. In order to characterize the ASIC, SEU counters have been implemented to count the number of SEUs that hit the configuration registers. They allow extracting the cross section results at different LET, as discussed in Section 5.3.6. #### 4.6.3.1 Triplicated memory cells placement A triplicated architecture at RTL level does not guarantee a SEU robust design. There are additional steps to guarantee SEU robustness. The first critical step is at synthesis level where the synthesis tool could simplify some voters, error detection logic cells or registers. Specific constraints to preserve this logic need to be defined. The second critical step is the triplicated cell placement. Additional constraints must be set to place triplicated registers distant enough to not be upset by the charge of unique particle. A test chip in 65 nm has been developed by RD53 collaboration [110] to determine that the minimum distance among triplicated registers to avoid multiple bit upset is 15 $\mu$ m. A custom TCL script has been developed to categorize all the registers in three groups A, B and C and place triplicated registers at a distance larger than 15 $\mu$ m. This solution makes the clock distribution and signal routing more complex but it is necessary for proper protection of the ASIC against SEU. # 4.6.4 Floorplan and power distribution The CIC ASIC floorplan is constrained by the available area on the hybrid and at reticle level since it will be produced on the same wafer as MPA and SSA in 65 nm technology. The final CIC ASIC is bumped with the same technology already used for MPA and SSA ASICs: Controlled Collapse Chip Connection (C4) bumping. Due to the tight material budget permitted in the tracker volume, the CIC ASIC will not have any packaging and the bare die will be directly bumped on the hybrid flex PCBs. The final die dimensions including the seal-ring are $2.80\,\mathrm{mm}\times6.5\,\mathrm{mm}$ with a bump pitch of $270\,\mu\mathrm{m}$ . A passivation opening of $70\,\mu\mathrm{m}$ has been chosen to match the requirements of the bare-die assembly on the front-end module flex. In order to ease the testability and the characterization of the prototypes wire-bond pads with a $90\mu m$ pitch have been implemented along the four sides. As a matter of fact, the CIC ASIC characterization with corners and radiations (both TID and SEU) is easier having a wire-bond ASIC. Moreover, the final version of the ASIC requires to be tested using wafer probing before being mounted on the flexible PCB. Figure 4.22 shows both C4 bump implementation and wire-bond metallization. **Figure 4.22.** Implementation of the C4 Bump (left) and the wire-bond (right) metallizations for CIC ASIC. Figure 4.23 shows the CIC ASIC Redistribution Layer (RDL) that is the top-metal layer implemented to connect the IO pads with the bump openings. Figure 4.24 shows the CIC ASIC final layout including twelve phase aligners, four configuration blocks, the IO interface with pads on all four sides of the die and the **Figure 4.23.** View of the final CIC ASIC top-metal layer (RDL), showing the bump connectivity and the ASIC dimensions. **Figure 4.24.** Final layout of the CIC ASIC and highlight of the hierarchical blocks. central part covered completely by the CIC core logic. It is noticeable a long snake on the right corresponding to the implementation of a big data FIFO. Metal 6, high thick metal, and metal 7, ultra thick metal, are used uniformly for, respectively, vertical and horizontal power distribution, as shown in Figure 4.25. **Figure 4.25.** View of the final CIC ASIC with metal 6 distributed vertically and metal 7 horizontally for uniform power distribution. Power and ground bumps are not directly connected to the standard cells, but due to the presence of wire bond pads a Peripheral IO (PIO) architecture is preferred versus an Array IO (AIO) architecture. In the AIO the IO cells are directly placed in an array fashion throughout the whole chip under each C4 bump. Instead, the PIO is used in this design connecting the C4 bumps via metal 6 and metal 7 to the wire bond pads that contain ESD protections, consisting of power and ground clamps, providing a low resistivity path for the current in excess [111]. Such ESD protections provide robustness up to $\pm 2$ kV according to the Human Body Model (HBM) test. The same applies for IO signals. For low speed signal, e.g. the reset signal, custom CMOS radiation tolerant bi-directional IO pads [112] have been developed. Instead, for high-speed links operating up to 640 MHz it is not possible to use LVDS [113] or CML differential standards, but custom radiation hard scalable Low-Voltage Differential Signaling (sLVS) [114] transmitters and receivers [98] have been developed. The connection from the C4 bumps to the IO pads is implemented using the Redistribution layer (RDL). The ASIC implements two independent power and ground domains for the IO pads and the digital core. The substrate isolation among ground domains is implemented with a double guard-ring with deep n-well and a $50\,\mu m$ high resistivity trench isolation. The guard ring encloses the digital domain and, independently, each of the analog blocks. The twelve phase aligners are placed along top and bottom sides of the die in close proximity to the corresponding bumps and wire bond pads. This choice has been ## Chapter 4. Development of the Concentrated Integrated Circuit ASIC made to reduce the path from the input to the phase aligners to not increase signal delay and, consequently, decrease phase alignment range. The deserializers and the fast command decoder logic, operating at 320MHz, plus 4 hierarchical blocks of configuration registers, operating at 40MHz, are placed on the left of the die. On the other hand, all the output of the ASIC are placed on the right together with the serializers operating at 320MHz or 640MHz with SLVS transmitter pads. # 4.6.5 Power consumption The development of the CIC ASIC architecture was constrained by a limited power budget of 250 mW for PS module and 312 mW for 2S module. Figure 4.26 reports the power consumption for typical and worst case corners for the two different modules where the CIC ASIC will be employed. The total power consumption is composed of | Module | Corner | Power consumption during word alignment | | Power consumption during data sending | | | | |-----------|--------------------|-----------------------------------------|--------------|---------------------------------------|-------------|--------------|------------| | | | Analog (mW) | Digital (mW) | Total (mW) | Analog (mW) | Digital (mW) | Total (mW) | | 2S-module | TYP at 1.2V @ 25°C | 37 | 198 | 235 | 16 | 151 | 167 | | | MIN at 1.32V @ 0°C | 63 | 252 | 315 | 56 | 193 | 249 | | PS-module | TYP at 1.0V @ 25°C | 22 | 185 | 207 | 12 | 137 | 149 | | | MIN at 1.1V @ 0°C | 27 | 237 | 264 | 16 | 176 | 192 | **Figure 4.26.** Table with power consumption during word alignment and data sending for the two different modules in typical case and worst case. an analog contribution given by the twelve phase aligners and of a digital contribution given by all the digital logic. The analog power consumption value is coming directly from analog corner simulations. Two different modes are reported in the table: the word alignment and the data sending mode. The word alignment mode will be used in the target application only once and will be active for a very short time. However, it is worth checking that the CIC ASIC does not exceed the allocated power budget. The power consumption during data sending represents a more relevant information and the CIC ASIC is well below the limit there with 192 mW for the PS module and 249 mW for the 2S module. # **4.6.6** IR drop The IR drop is defined as the voltage drop in the metal wires constituting the power grid before it reaches the VDD and VSS pins of the standard cells. The IR drop has two contributions: static and dynamic. The static IR drop simply depends on the power grid connecting the power supply and ground to the standard cells. On the other hand, the dynamic IR drop depends on the switching activity of the standard cells themselves. For 65 nm technology the foundry recommends to have an IR drop less than the 3% of the used power supply. The dynamic IR drop analysis performed before submitting the CIC ASIC did not show any problem. Figure 4.27 shows that the IR drop the maximum IR drop in the design is 23 mV that summed to 5.7 mV voltage increase on the ground is still below 30 mV (3% of 1V power supply). **Figure 4.27.** CIC IR drop simulation with normal activity during data sending. # 4.6.7 Triplicated clock distribution Due to the large variation of the propagation delay between corners, the CIC ASIC timing closure was not so simple. In particular, the ASIC needs to work efficiently both at 1.0V-10% and 1.2V+10% in order to be used in two different modules, as discussed in Section 4.6.2. The temperature inversion effect at $-40^{\circ}$ C together with degradation effects due to radiations affect significantly standard cells propagation delay, with a overall spread of 120% between fastest and slowest corners. As discussed in Section 4.6.3, each state machine in the control path is triplicated and the output flip-flops are triplicated and voted before being used to select the next state or simply used in a combinational logic. This architecture, used all over the ASIC, requires three independent clock trees that need to be constrained during Clock Tree Synthesis (CTS) in order to minimize the skew among the three clock distributions. A clock uncertainty of 200 ps between clocks and a transition time of 200 ps provide a satisfactory result in modeling the clock. Additionally, the clock-trunk is implemented with metals 5, 4 and 3 and Non-Default Rules (NDR), namely, double width of 0.2 $\mu$ m, double spacing of 0.2 $\mu$ m and clock nets shielding, while the clock-leaf has, as NDR, only double width of 0.2 $\mu$ m. This implementation of the clock allows to have clock-trunk with large, shielded and spaced enough branches and larger clock-leaf to reduce coupled noise associated with victim-aggressor nets and incremental delay. At the same time, also on-chip variations due to fabrication process play a role in advanced nodes for what concern timing closure. As a matter of fact, fastest launch path and slowest capture path are analysed for setup timing check, and slowest launch path and fastest capture path are considered for hold timing check. Hence, foundry suggested timing derate factors for launch and capture clocks are used during the final sign-off analysis. The final netlist with corresponding SDF file containing all back-annotated delays for each net and standard cell of design allow to perform a conclusive simulation to check whether the constraints used during the digital flow were correct or not. As a matter of fact, in case of issues in simulation, the constraints should be revised to have a functional design. The final design has to pass additional sign-off checks: - Design Rule Checking (DRC) check determines whether the chip layout satisfies a number of rules as defined by the semiconductor manufacturer; - Layout Versus Schematic (LVS) check determines whether the ASIC layout corresponds to the original schematic or circuit diagram of the design. Both checks are performed with Calibre [115] utilizing a modified version of the rules in order to correctly extract Enclose Layout Transistors (ELT) and to include a more strict latch-up checks due to the radiation environment. The prototype of the final CIC ASIC integrating all required functionalities for system level operation was submitted for prototyping in Multi-Project Wafer (MPW) in 65 nm technology. A split-process was required due to the different metal stack adopted in the designs. The additional options selected are std- $V_T$ , low- $V_T$ , high- $V_T$ devices, triple-well isolation, no polyimide and MOM capacitors. # 5 CIC Prototype characterization results This chapter describes the experimental results of the prototypes characterization. Two different prototypes of the CIC ASIC have been developed. The first prototype, called CIC1, implemented all the functionalities but did not include the 640 MHz option as output frequency, was not SEE radiation hard and was not within the allocated power budget. Therefore, this prototype has been mostly used for tests with other ASICs in the readout chains. A second prototype, called CIC2, respects all the specifications including the full set of functionalities and is also radiation hard. The ASIC performance has been characterized under different working conditions (temperature, power supply), utilizing a custom made test bench. Section 5.1 presents the standalone testbench with the developed firmware and tests routines for the ASIC characterization. Section 5.2 describes the results of the CIC1 ASIC with front-end ASICs in 2S-modules. Alignment procedures at CIC level proved to work within specifications and also data treatment and compression did not show any unexpected problem. Sections 5.2.2 and 5.3.1 show the power consumption for the two prototypes. While CIC1 ASIC was not optimized in terms of power consumption, CIC2 ASIC shows a lower power consumption, well within the power budgets for PS-module and 2S-module. Radiation tolerant techniques have been adopted against SEE and TID effects in CIC2 ASIC design for operation in the target application. Therefore, Section 5.3.4 TID test results, obtained by irradiating the CIC2 ASIC with X-Rays up to a total dose of # Chapter 5. CIC prototype characterization results 200 Mrad, and Section 5.3.5 presents SEE test results, obtained by irradiating the CIC2 ASIC with heavy-ions up to an effective LET of $\sim 70\,\text{MeV}\cdot\text{mg}^{-1}\text{cm}^{-2}$ . Publications related to this chapter: [11], [12] and [13]. # 5.1 Standalone testbench for CIC prototype The CIC is a 2.8x6.5 mm bump-bonded flip-chip in 65 nm technology. However, in order to ease the testing, the first prototypes have been fabricated with a wire-bond process, as shown in Figure 5.1. **Figure 5.1.** CIC ASIC with wire bonds sitting on the carrier board. All the tests have been performed on CIC ASICs wire-bonded on passive carrier boards in order to facilitate the characterization and the tests. Figure 5.2 shows the block diagram of the standalone custom test setup based on three boards. Figure 5.2. Block diagram of CIC standalone testbench. The passive carrier board is plugged on a custom interface board for voltage level translation, power regulation, addition of delays on data input lines and monitoring purposes 5.3. ### Chapter 5. CIC prototype characterization results **Figure 5.3.** CIC Carrier board plugged on a custom interface board. Firmware and software routines based on the standalone verification testbench used to model the ASIC design have been developed to run on a Xilinx KCU105 development card 5.4, connected to the custom interface board. Figure 5.4. CIC FPGA board. The test routine to perform functional tests consists of generating synthetic data with CMS simulations to emulate the outputs of the eight front-end ASICs. A software simulation of the ideal model uses such data to emulate the expected CIC outputs. The front-end data are stored in the KCU105 RAM and sent to the CIC: the RAM data capacity allows the possibility to test the CIC for tens of millions of consecutive BXs. The CIC output packets are then recorded in the RAM and compared in software with the expected values, previously obtained by the software simulator. All the CIC functionalities, starting with the startup sequence based on the hard reset deassertion and I<sup>2</sup>C configuration can be tested following the previously described routine. Since the CIC input data are produced by eight different front-end ASICs, each one clocked by a common external 320 MHz clock, a bitline phase alignment feature is required in order to re-synchronize the phase of the 48 CIC input lines, using the internal system clock. This feature allows the CIC to automatically delay the phase of each input line compensating for any any PVT variation or to apply a fixed delay configuration to the input lines when the input phases are already known. Moreover, the word alignment feature that allows the CIC to correctly reconstruct the input data words with respect to the 40 MHz internal clock needs to be validated, along with the ability to correctly detect the first event frame after the issuing of a ReSync command (BX0 detection). The test procedure validation includes the verification of all data output transmission modes for both modules flavours. # 5.1.1 CIC phases and modes The standalone testbench allows testing the two CIC working modes: setup and data taking. The setup mode is used only few times, when the CIC needs to be configured for the first time or when needs to be reconfigured. For this mode the CIC ASIC requires very specific inputs from the front-end ASICs that cause the worst case scenario for power consumption. In reality this mode will last only few $\mu$ s and consists of three main steps: - Fast control locking, discussed in Section 4.5, represents the minimal power budget of a CIC ready for data; - Phase alignment, discussed in Section 4.3, represents the worst case scenario because all the input channels receive highly switching signals (ideally 0 and 1) from front-end ASICs; - Word alignment, discussed in Section 4.3.2, represents a scenario closer to the data taking one. Eventually, the most used mode will be the data taking mode that consists in receiving input frames from front-end ASICs, compress, format them as discussed in Chapter 4 and sent out. Moreover, the data taking mode power consumption depends on clusters and stubs occupancy that is function of pileup rate. In order to measure the worst case scenario for PS-module and 2S-module, the region with the largest occupancy was chosen to produce CIC input frames. The nominal pileup is 200, while the nominal L1 trigger frequency is 750 kHz. # 5.2 First silicon prototype: CIC1 The first silicon prototype, called CIC1, has been developed with limited functionality. As a matter of fact, the output frequency is limited to 320 MHz, no power optimization has been performed and radiation hardening techniques have not been implemented. For these reasons CIC1 cannot be used in the target application. All the implemented functionalities have been tested with the standalone testbench. However, this version of the ASIC has been particularly helpful to test the system functionality at the level of the module. In particular, it has been used to test it on a hybrid while working with eight front-end ASICs, in the specific case eight CBC ASICs. # 5.2.1 Tests with 2S hybrid prototype The CIC1 has been mounted on a prototype of the 2S hybrid containing eight frontend chips, namely CBC ASICs, and CIC1 [12]. This was the first time a CIC chip of being used for validating the complete readout chain. The OT- $\mu$ DTC (micro Data, Trigger and Control) test system is based on the $\mu$ TCA FC7 data acquisition and control card, built around a Kintex7 FPGA. Figure 5.5 shows the 2S-module electrical characterization board used for the translation and buffering of the CIC1 output lines, which accommodates the 2S prototype hybrid with 8 CBC3.1 and a CIC1 mezzanine. This characterization board is connected to an FC7 [116] where a dedicated firmware for complete hybrid readout and control is implemented. The tests performed with the OT- $\mu$ DTC system have verified the communication between the different components of the set-up and the consistency of the CIC1 output with respect to what is expected from CBC under different conditions. Such tests also validated the ability for the CBC to generate the phase and word alignment patterns required for the CIC1 initialization and the correct operation of the CIC1 automatic phase-locking feature. The effectiveness of the automated BX0 identification procedure of the CIC1 was also confirmed [11]. **Figure 5.5.** Prototype of half of a 2S-module hybrid including eight CBC3.1 ASICs and one CIC1. ## 5.2.2 Power consumption The CIC1 prototype has not been optimized in terms of power consumption purposes. However, some tests have been performed with the standalone testbench to crosscheck simulation predictions by measuring the power on the final component. In particular, different modes have been exercised. Figure 5.6, 5.7 and 5.8 show, respectively, the power measurements during different steps of the setup phase: fast control locking procedure, phase alignment procedure and word alignment procedure. All the results are reported only for 320 MHz mode for different power supplies because 640 MHz mode was not available for CIC1. Keeping a fixed voltage of 1.2 V for the I/O ring, only the core voltage has been tuned during the tests. **Figure 5.6.** CIC1 power consumption during fast control locking mode for different power supply voltages. It is clear that the phase alignment is the worst case scenario for power consumption. Therefore, Figure 5.6 shows the summary of power consumption for different voltages during phase alignment. **Figure 5.7.** CIC1 power consumption during phase alignment procedure for different power supply voltages. **Figure 5.8.** CIC1 power consumption during word alignment procedure for different power supply voltages. | Module | $V_{core}(V)$ | $P_{tot}(mW)$ | |-----------|---------------|---------------| | | 0.9 | 198 | | PS-module | 1.0 | 241 | | @320MHz | 1.1 | 286 | | | 1.1 | 286 | | 2S-module | 1.2 | 339 | | @320MHz | 1.3 | 411 | **Figure 5.9.** CIC1 power consumption worst case scenario. All the measurements include a fixed 30 mW power consumption contribution due to the I/O ring. The maximum allowed power budget are 250 mW for CIC in PS-module and 312 mW for CIC in 2S-module. CIC1 ASIC is not within the allowed power budget during setup mode. On the other hand, the data taking mode power consumption highly depends on pileup conditions. However, for CIC1 the input frames used for testing were defined as follows: • for stub path, depending on module flavour, a flat random number of stubs per front-end per event was used. Between zero and three stubs per BX/front-end for the CBC, and, between zero and five per 2BX/front-end for the MPA. • for L1 path, a flat random number of pixel and strip clusters per front-end per BX was chosen and only 10 L1 triggers have been sent. Figure 5.10 and 5.11 show peaks in power consumption upon the reception of an L1 trigger. While the power consumption at the end, after all L1 packets have been read, corresponds to the stub data path consumption only. **Figure 5.10.** CIC1 power consumption in data taking mode with 10 L1 triggers in PS-module. **Figure 5.11.** CIC1 power consumption in data taking mode with 10 L1 triggers in 2S-module. The results confirm that also in data taking mode the power consumption is higher than the allowed power budget. # 5.3 Second silicon prototype: CIC2 The second silicon prototype, called CIC2, incorporates all the required functionalities for operation in PS modules and 2S modules. Moreover, power consumption has been decreased and radiation hardening techniques have been implemented as discussed in Chapter 4. The standalone testbench, already used for CIC1 and described in Section 5.1, has been used to characterize also CIC2 ASIC. # 5.3.1 Power consumption CIC2 ASIC has been developed to be used in the target application. Therefore, radiation hardening techniques, including triplication, have been used, causing an increase of the number of registers and amount of logic in the design. Therefore, an effort in porting some functionalities at lower frequencies has been made. The architecture presented in Chapter 4 is the result of different optimizations. Simulations on submitted CIC2 ASIC were encouraging with an estimated power consumption even lower than CIC1 design. The CIC2 ASIC prototype has been characterized with the same procedure already described for CIC ASIC in Section 5.2.2. The power consumption values during the setup phase (fast control locking, phase alignment and word alignment) can be directly compared to the ones of CIC1 ASIC prototype. CIC2 ASIC during fast control phase presents a substantially lower power consumption with respect to CIC1 ASIC thanks to clock gating: 120 mW for PS-module and 160 mW for 2S-module. Phase alignment procedure is the most critical for power consumption. As a matter of fact, CIC2 ASIC shows a power consumption of 225 mW for PS-module and 306 mW for 2S-module at nominal voltage. These values are highly dependent on the input toggle rate. One could think to lock input lines separately to reduce the overall power consumption. Eventually, the word alignment step, as for CIC1 case, shows a lower power consumption, with respect to the phase alignment step, of 170 mW for PS-module and 225 mW for 2S-module. Data-taking phase will be the most used in the target application. More realistic input data frames, corresponding to 200 pileup occupancy and L1 trigger rate of 750 kHz, were used to characterize the CIC2 ASIC. The power consumption for different power supply voltages is reported in Figure 5.12. In all cases, the values are well within the allocated budget which is 250 mW for PS-module and 312 mW for 2S-module. | Module | $V_{core}(V)$ | $P_{tot}(mW)$ | |-----------|---------------|---------------| | | 0.9 | 123 | | PS-module | 1.0 | 144 | | @640MHz | 1.1 | 169 | | | 1.1 | 156 | | 2S-module | 1.2 | 181 | | @320MHz | 1.3 | 210 | **Figure 5.12.** CIC2 power consumption during data taking at different power supply voltages. # 5.3.2 Temperature characterization The CIC2 ASIC has been tested and characterized in the temperature range between $-30^{\circ}\text{C}$ and $45^{\circ}\text{C}$ . The test has been performed with a CIC2 ASIC working in PS-module mode at 640 MHz output frequency, 1V power supply, high pileup of 500 and low L1 trigger rate of 250 kHz. The test consisted in repeating the measurements of core power variation and output delay line variations for each output line at different temperatures. The first outcome of the measurements was that no errors were observed during temperature tests. Figure 5.13 shows a core power variation during data sending phase around 3% over the full temperature range. Figure 5.14 shows the output delay variation for each CIC2 ASIC output line during data sending phase. The maximum output delay variation for all the lines is 1ns over the full temperature range. All the lines experience a similar delay for a given temperature. **Figure 5.13.** Core power variation at different temperatures in the range between $-30^{\circ}$ C and $45^{\circ}$ C. **Figure 5.14.** Output delay variation for each CIC2 ASIC output line at different temperatures in the range between $-30^{\circ}$ C and $45^{\circ}$ C. # 5.3.3 IR-drop issue During the test some issues due to IR-drop appeared. Simulation results presented in Section 4.6.6, were conducted in a generic case during CIC2 ASIC activity. However, after performing tests on CIC2 ASIC prototype, some bit flips have been observed at low power supply voltage (0.95 V) on two output stub data lines: line three and line four. It has been demonstrated that upon the reception of a L1 trigger the CIC ASIC experiences an increase of current. In simulation it has been possible to reproduce the same effect and the increase in current is of 600 mA over 500 ps with respect to the base 300 mA consumed. The dynamic IR drop analysis performed on the CIC ASIC used in PS-module configuration around the peak current due to L1 trigger is reported in Figure 5.15. **Figure 5.15.** CIC IR drop simulation around current peak due to L1 trigger with high input activity. It clearly shows that some standard cells are affected by a voltage drop of 30.8 mV that summed to the 14 mV increase on the ground are larger than the foundry recommended value of 30 mV (3% of 1V power supply). From the CIC ASIC meauserements came out that especially two output lines of the stub data path experience some bit flips. Figure 5.16 shows that long lines propagate from the stub output formatter to the output pads on the right side of the ASIC crossing the L1 path logic. **Figure 5.16.** CIC IR drop simulation around current peak due to L1 trigger with high input activity. Standard cells that connect the trigger path to the output lines 'Trigger\_3' and 'Trigger\_4' are not exceeding the IR drop specification but are neither far from the hot spots visible in Figure 5.15. #### **5.3.4** TID tests The Total Ionizing Dose (TID) sensitivity of CIC2 ASIC has to be tested due to the extreme radiation environment expected at the HL-LHC. TID effects have been taken into consideration during the design development thanks to worst-case scenario radiation models as described in Section 4.6.2. The main effects in the 65 nm technology are an increase of leakage current and cells propagation delay. The CIC2 ASIC has been irradiated up to 200 Mrad in an X-ray beam with an average flux of 7.6 Mrad/h for two different core voltages while checking the parameters, delay and eye width, of the ASIC output lines. Figures 5.17 and 5.18 show the output lines delay variations measured in percentage with respect to a 320 MHz clock cycle period. **Figure 5.17.** CIC2 output lines delay variation over TID irradiation at 1.0V power supply. In both graphs the expected doses for PS-module and 2S-module are reported. The measurements show that up to 60Mrad the delay variation is below 10% for each output line, reaching a maximum degradation of 55% for a power supply of 1.0V and 40% for a power supply of 1.2V. In the real application the annealing will occur during HL-LHC shutdown periods, allowing to recover some performance, as shown **Figure 5.18.** CIC2 output lines delay variation over TID irradiation at 1.2V power supply. by post radiation points. Moreover, in the target application the low temperature will compensate in part TID effects. #### **5.3.5 SEU** tests CIC2 ASIC has been designed to be SEU tolerant. Due to the limited power budget, only the control path has been triplicated. Moreover, the configuration registers have been implemented using a specific technique of clock gating, as discussed in Section 4.6.3. When a charged particle interacts with matter, the particle energy is transferred to the traversed material. In case of an ASIC this energy is sometimes enough to flip a value in the logic causing a Single Event Transient (SET) or Single Event Upset (SEU). The amount of energy that an ionizing particle transfers to the material traversed per unit distance is defined as Linear Energy Transfer (LET). By definition, LET is a positive quantity and depends on the nature of the radiation as well as on the material traversed [117]. In the CMS Outer Tracker environment the LET is the average energy of the energies of several ions composing the beam. For the SEU tests, the Heavy Ion Facility (HIF) at UCL-CRC (Université Catholique de Louvain – Cyclotron Resource Centre) allows accelerating a beam of a selected heavy ion towards a target device located in a vacuum chamber. Figure 5.19 shows the CIC2 ASIC together with its carrier board and interface board mounted on a cooling plate with two pipes that allow water to remove heat from the ASIC and the interface board in vacuum. Figure 5.19. CIC2 ready for SEU on a cooling plate. A heavy ion beam having a uniform distribution of 2 cm and a maximum flux of $1.5 \cdot 10^4$ particles·cm<sup>-2</sup>s<sup>-1</sup> is accelerated towards the CIC2 ASIC. The angle of incidence of the ion, namely $\theta$ , is defined as the angle between the ion path and the normal at the point of incidence. Hence, the effective LET is calculated as follows: $$LET(\theta) = \frac{LET(0^{\circ})}{cos(\theta)}$$ By default the angle between the incident beam and the surface of the ASIC is $0^{\circ}$ and, therefore, the effective LET corresponds to the $LET(0^{\circ})$ . By increasing the angle $\theta$ to $30^{\circ}$ , $45^{\circ}$ or $60^{\circ}$ , the effective LET increases while the effective flux decreases. Since each heavy ion has a characteristic LET the use of several ions allows characterizing the device. Table 5.1 summarizes the list of heavy ions available in the Heavy Ion Facility at UCL-CRC together with their properties. The CIC2 setup has been adapted for SEU tests. The main objective of SEU tests on CIC2 ASIC is to check that the control path is robust enough. Every packet of data | Ion | Energy | LET | Range on silicon | |-------------------------|--------|-------------------------------------------------|------------------| | | [MeV] | $[\text{MeV}\cdot\text{mg}^{-1}\text{cm}^{-2}]$ | [µm] | | $^{13}\text{C}^{4+}$ | 131 | 1.3 | 269.3 | | $^{22}Ne^{7+}$ | 238 | 3.3 | 202.0 | | $^{27}\text{Al}^{8+}$ | 250 | 5.7 | 131.2 | | $^{36}Ar^{11+}$ | 353 | 9.9 | 114.0 | | $^{53}Cr^{16+}$ | 505 | 16.1 | 105.5 | | $^{58}{ m Ni}^{18+}$ | 582 | 20.4 | 100.5 | | $^{84}{\rm Kr}^{25+}$ | 769 | 32.4 | 94.2 | | $^{103}\text{Rh}^{31+}$ | 957 | 46.1 | 87.3 | | $^{124}\text{Xe}^{35+}$ | 995 | 62.5 | 73.1 | Table 5.1 – Properties of heavy ions available at UCL-CRC-HIF, used for CIC2 test. must contain adequate information, as L1ID, BX counter, correct number of header bits, etc. Moreover, at every run the full configuration is monitored to make sure there is no difference with the original one. # 5.3.6 SEU results analysis and cross-section measurements Besides the previous checks, some statistics can be extracted from SEU measurements. Due to the limited power budget, only the CIC2 ASIC control path is triplicated while the data path is not. Therefore, a small number errors, that do not compromise the ASIC functionality, on the data path are expected to occur. The SEU cross-section $\sigma$ of the ASIC is defined as the ratio of the number of upsets $N_{SEU}$ to the particle fluence, where the fluence is the average flux $\phi$ in the test time window T. The basic formula to calculate the SEU cross-section is: $$\sigma = \frac{N_{SEU}}{\phi \cdot T}.$$ During the tests the SEU cross-section can be experimentally determined and is a function of particle energy (LET). SEU cross-section measurements as function of the deposited energy can be fit with a integral Weibull distribution: $$\sigma = \sigma_0 \left( 1 - e^{-\left(\frac{E_{dep} - E_0}{W}\right)^s} \right),$$ where: - $\sigma_0$ : is the saturation value of the SEU cross section. - $E_{dep}$ : is the deposited ionization energy. - $E_0$ : is the SEU threshold energy. - *W*: parameter directly dependent on the sensitive volume depth. - *s*: parameter directly dependent on the sensitive volume shape. Making the assumption that the target device is thin with respect to the particle range, the sensitive volume can be approximated as area of the cross-section multiplied by the thickness of the device d. At this point the deposited ionization energy $E_{dep}$ is simply related to the LET following the relation: $$E_{dep} = \text{LET} \cdot \rho_{Si} \cdot d \text{ mg} \cdot \text{cm}^{-2}$$ , where: • $\rho_{Si}$ is the density of silicon and is equal to 2.33 g·cm<sup>-3</sup>. The SEU cross-section curve shows two important points: the threshold and the knee. For LET below the threshold the ASIC does not upset. Above the threshold the upset rate increases rapidly and then slowly saturates up to the knee of the curve. Above the knee point, in the saturation region, the ion LET is so high that all the sensitive regions of the ASIC get upset when are hit by the ion beam [118]. The cross-sectional area can be calculated as the ratio between the Weibull fit saturation value $\sigma_0$ and the total number of sensitive nodes in the circuit. Figures 5.20 and 5.21 show, separately, the measured SEU cross-section as a function of the heavy ions LET for the two independent data paths present in the CIC2 ASIC: stub data path and L1 data path. Two flavours of the CIC are analyzed: CIC in PS-module working with 8 MPAs with a L1 trigger rate of 500 kHz and in 2S-module with 8 CBCs with a L1 trigger rate of 750 kHz. **Figure 5.20.** Stub SEU errors for CIC2 ASIC at different values of LET for two different configurations. **Figure 5.21.** L1 SEU errors for CIC2 ASIC at different values of LET for two different configurations. The actual LET in the target application will be around 20 MeV. The number of soft errors showed by the measurements are in good agreement with the simulations and # Chapter 5. CIC prototype characterization results the studies. No hard reset occurred during the full SEU campaign, however some multiple bit upset can occur due to output serializer. They are 20 times more rare than a soft error. # 6 Summary and conclusions The future High Luminosity-Large Hadron Collider (HL-LHC) at CERN presents very challenging requirements for the Front End readout electronics of the Compact Muon Solenoid (CMS) Outer Tracker detector: capability to identify particles with high transverse momentum, transmit information at high data rates with low power consumption constraint. In this context two different modules have been developed: the closest to the collision point requires higher resolution, is made of a silicon pixel layer and a strip silicon layer closely spaced and is called Pixel-Strip (PS) module, while the other is made of two strip silicon layers and is called Strip-Strip (2S) module. A novel technique has been proposed for both modules that allows to reject locally signals from low transverse momentum particles, that are not interesting for the Level-1 trigger system reconstruction. This approach allows to reduce the output bandwidth data flow from 1.3 Tbps to 30 Gbps, but to reduce even further (factor 10) the amount of transmitted data a Data Concentrator ASIC, called Concentrator IC (CIC) has been developed. The CIC ASIC needs to be employed in both modules and, therefore, should be able to communicate with different front-end chips having different formats. In order to develop and prove that the CIC can work in both modules a system-level testbench framework using Universal Methodology Verification (UVM) and System Verilog has been developed. It allows to not only verify the final chip, but to assist the development of the entire chip. Moreover, comparison with different architecture solutions could be carried on and the best trade-off between power consumption and bandwidth could be found using this powerful framework. Depending on the location of the #### Chapter 6. Summary and conclusions module in the CMS Outer Tracker the readout chain needs to transmit out all the relevant data. For this reason a hit generation based on Monte Carlo physics events has been developed to stress the readout chain as it would be in the real experiment. The developed CIC ASIC is a digital chip in 65 nm CMOS technology has been produced and tested in two versions. The CIC is a data concentrator ASIC capable of averaging data over space, eight front-end chips, and time, eight BXs. A first version, called CIC1, has limited functionality and was not developed to be radiation hard. However, it could be used to test the basic functionality of the standalone ASIC and in the 2S-module readout chain. A second version, called CIC2, including all the functionalities have been produced. The CIC2 ASIC features radiation hardness for a Total Ionizing Dose up to $100\,\mathrm{Mrad}$ and Single Event Effects hardness thanks to the employment of radiation hardening techniques, as triple module redundancy. On the other hand, the very strict power budget of $312\,\mathrm{mW}$ for 2S-module and 250 for PS-module had be respected. The temperature in the target application will be around $-40\,^{\circ}\mathrm{C}$ and it required additional studies on standard cells library characterization. CIC2 ASIC has been characterized with a Total Ionizing Dose up to 200 Mrad with a X-ray test campaign. Moreover, SEU tests performed with heavy ions proved SEU tolerance up to an effective LET of $\sim 70\,\text{MeV}\cdot\text{mg}^{-1}\text{cm}^{-2}$ . The CIC2 will enter in production in 2021 March, ready for hybrid circuits finalization, module assembly, testing and installation in CMS experiment in 2024-2025 according to the current plan. # **Bibliography** # References to the author's publications - [1] D. Ceresa, A. Caratelli, J. Kaplon, K. Kloukinas, and S. Scarfi, "Readout architecture for the Pixel-Strip module of the CMS Outer Tracker Phase-2 upgrade", *PoS*, vol. Vertex2016, p. 066, 2017. DOI: 10.22323/1.287.0066. - [2] CMS tracker collaboration *et al.*, "The Phase-2 Upgrade of the CMS Tracker", CERN, Tech. Rep. CERN-LHCC-2017-009. CMS-TDR-014, Jun. 2017. [Online]. Available: https://cds.cern.ch/record/2272264. - [3] D. Ceresa, A. Caratelli, J. Kaplon, K. Kloukinas, J. Murdzek, and S. Scarfi, "Design and simulation of a 65 nm Macro-Pixel Readout ASIC (MPA) for the Pixel-Strip (PS) module of the CMS Outer Tracker detector at the HL-LHC", *PoS*, vol. TWEPP-17, 032. 5 p, 2017. DOI: 10.22323/1.313.0032. - [4] A. Caratelli, D. Ceresa, J. Kaplon, K. Kloukinas, Y. Leblebici, J. Murdzek, and S. Scarfi, "Short-Strip ASIC (SSA): A 65nm silicon-strip readout ASIC for the Pixel-Strip (PS) module of the CMS Outer Tracker detector upgrade at HL-LHC", *PoS*, vol. TWEPP-17, 031. 5 p, 2018. DOI: 10.22323/1.313.0031. - [5] S. Scarfi, A. Caratelli, D. Ceresa, K. Kloukinas, and Y. Leblebici, "System level simulation framework for the asics development of a novel particle physics detector", pp. 49–52, 2018. DOI: 10.1109/PRIME.2018.8430367. - [6] S. Scarfi, A. Caratelli, L. Caponetto, D. Ceresa, G. C. Galbit, K. Kloukinas, Y. Leblebici, B. Nodari, and S. Viret, "A System-Verilog Verification Environment for the CIC Data Concentrator ASIC of the CMS Outer Tracker Phase-2 Upgrades", *PoS*, vol. TWEPP-18, 2019. DOI: 10.22323/1.343.0097. [Online]. Available: http://cds.cern.ch/record/2644445. - [7] S. Scarfi, B. Nodari, L. Caponetto, G. C. Galbit, and S. Viret, "A 65 nm Data Concentration ASIC for the CMS Outer Tracker Detector Upgrade at HL-LHC", no. CMS-CR-2018-278, Oct. 2018. DOI: 1747420. [Online]. Available: http://cds.cern.ch/record/2650712. - [8] D. Ceresa, A. Caratelli, J. T. De Clercq, D. Giovinazzo, M. Haranko, J. Kaplon, K. Kloukinas, J. Murdzek, and S. Scarfi, "Characterization of the MPA prototype, a 65 nm pixel readout ASIC with on-chip quick transverse momentum discrimination capabilities", *PoS*, vol. TWEPP2018, p. 166, 2019. DOI: 10.22323/1.343.0166. [Online]. Available: http://cds.cern.ch/record/2658203. - [9] A. Caratelli, S. Scarfi, D. Ceresa, J. T. De Clercq, M. Haranko, J. Kaplon, K. Kloukinas, and Y. Leblebici, "Characterization of the first prototype of the Silicon-Strip readout ASIC (SSA) for the CMS Outer-Tracker phase-2 upgrade", PoS, vol. TWEPP-18, 2019. DOI: 10.22323/1.343.0159. [Online]. Available: http://cds.cern.ch/record/2650963. - [10] S. Scarfi, G. Bergamin, A. Caratelli, L. Caponetto, D. Ceresa, G. Galbit, S. Jain, K. Kloukinas, Y. Leblebici, B. Nodari, and S. Viret, "Study of a Triggered, Full Event Zero-Suppressed Front-End Readout Chain operating up to 1 MHz Trigger Rate and Pileup of 300 for CMS Outer Tracker upgrade at HL-LHC", PoS, vol. TWEPP2019, p. 012, 2020. DOI: 10.22323/1.370.0012. - [11] S. Scarfi, B. Nodari, G. Bergamin, L. Caponetto, A. Caratelli, D. Ceresa, J. De Clercq, G. Galbit, S. Jain, K. Kloukinas, *et al.*, "First results from the CIC data aggregation ASIC for the Phase 2 CMS Outer Tracker", *PoS*, vol. TWEPP2019, p. 102, 2020. DOI: 10.22323/1.370.0102. - [12] S. Scarfi, A. Caratelli, D. Ceresa, J. De Clercq, G. Galbit, M. Haranko, S. Jain, C. Luigi, I. Makarenko, S. Mersi, B. Nodari, *et al.*, "OT- $\mu$ DTC, a test bench for testing CMS Outer Tracker Phase-2 module prototypes", 2019. - [13] B. Nodari, L. Caponetto, D. Ceresa, G. Galbit, S. Jain, *et al.*, "CIC2 radiation hardness tests", *CMS Internal Note*, 2020. - [14] S. Scarfi, A. Caratelli, D. Ceresa, J. T. De Clercq, M. Haranko, J. Kaplon, K. Kloukinas, and Y. Leblebici, "Low-power SEE hardening techniques and error rate evaluation in 65nm readout ASICs", *PoS*, vol. TWEPP2019, p. 015, 2020. DOI: 10.22323/1.370.0015. - [15] G. Bergamin, A. Caratelli, S. Scarfi, D. Ceresa, J. Kaplon, K. Kloukinas, and Y. Leblebici, "MPA-SSA, design and test of a 65nm ASIC-based system for particle tracking at HL-LHC featuring on-chip particle discrimination", pp. 1–3, 2020. DOI: 10.1109/NSS/MIC42101.2019.9059989. # References - [16] W. W. Moses, "Scintillator requirements for medical imaging", 1999. [Online]. Available: https://escholarship.org/uc/item/5pc245ds. - [17] N. D. Gupta and S. Ghosh, "A report on the wilson cloud chamber and its applications in physics", *Reviews of Modern Physics*, vol. 18, no. 2, p. 225, 1946. DOI: 10.1103/revmodphys.18.225. - [18] C. D. Anderson, "The Positive Electron", *Physical Review*, vol. 43, pp. 491–494, 6 1933. DOI: 10.1103/PhysRev.43.491. - [19] R. H. Dalitz, "Kaon physics—The first 50+ years", *Kaon Physics*, p. 5, 2001. DOI: 10.1007/978-3-642-84741-7\_7. - [20] D. A. Glaser, "Some effects of ionizing radiation on the formation of bubbles in liquids", *Physical Review*, vol. 87, no. 4, p. 665, 1952. DOI: 10.1103/physrev.87. 665. - [21] CERN, *Bubble chamber: Omega production and decay*, 1973. [Online]. Available: http://cds.cern.ch/record/39472. - [22] F. Hasert, S. Kabe, W. Krenz, J. Von Krogh, D. Lanske, J. Morfin, K. Schultze, H. Weerts, G. Bertrand-Coremans, J. Sacton, *et al.*, "Observation of neutrino-like interactions without muon or electron in the Gargamelle neutrino experiment", *Nuclear Physics B*, vol. 73, no. 1, pp. 1–22, 1974. DOI: 10.1016/0550-3213(74)90038-8. - [23] H. Wenninger, "In the tracks of the bubble chamber", *CERN Cour.*, vol. 44, pp. 26–29, 2004. [Online]. Available: https://cerncourier.com/a/in-the-tracks-of-the-bubble-chamber/. - [24] Conference on the Bubble Chamber and its Contributions to Particle Physics, 1994. [Online]. Available: https://cds.cern.ch/record/232652. - [25] F. Sauli, "Principles of operation of multiwire proportional and drift chambers", CERN, Tech. Rep., 1977. DOI: 10.1142/9789814355988\_0002. - [26] E. Belau, J. Kemmer, R. Klanner, U. Kötz, G. Lutz, W. Männer, E. Neugebauer, H. Seebrunner, and A. Wylie, "Silicon detectors with 5 μm spatial resolution for high energy particles", *Nuclear Instruments and Methods in Physics Research*, vol. 217, pp. 224–228, 1983. DOI: 10.1016/0167-5087(83)90138-2. - [27] M. Turala, "Silicon tracking detectors historical overview", *Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment*, vol. 541, no. 1-2, pp. 1–14, 2005. DOI: 10.1016/j.nima.2005.01.032. - [28] CERN Collaboration, *The CERN member states*, 2020. [Online]. Available: https://home.cern/about/who-we-are/our-governance/member-states (visited on 2020). - [29] O. Bruning, H. Burkhardt, and S. Myers, "The Large Hadron Collider", *Prog. Part. Nucl. Phys.*, vol. 67, pp. 705–734, 2012. DOI: 10.1016/j.ppnp.2012.03.001. - [30] L. Evans and P. Bryant, "LHC Machine", *JINST*, vol. 3, S08001, 2008. DOI: 10. 1088/1748-0221/3/08/S08001. - [31] G. Aad, J. Butterworth, J. Thion, U. Bratzler, P. Ratoff, R. Nickerson, J. Seixas, I. Grabowska-Bold, F. Meisel, S. Lokwitz, *et al.*, "The ATLAS experiment at the CERN large hadron collider", *JINST*, vol. 3, S08003, 2008. DOI: 10.1088/1748-0221/3/08/S08003. - [32] CMS collaboration and others, "The CMS experiment at the CERN LHC", *JINST*, vol. 3, S08004, 2008. DOI: 10.1007/978-88-7642-482-3\_2. - [33] A. A. Alves Jr, L. Andrade Filho, A. Barbosa, I. Bediaga, G. Cernicchiaro, G. Guerrer, H. Lima Jr, A. Machado, J. Magnin, F. Marujo, *et al.*, "The LHCb detector at the LHC", *JINST*, vol. 3, no. 08, S08005, 2008. DOI: 10.1088/1748-0221/3/08/S08005. - [34] K. Aamodt, A. A. Quintana, R. Achenbach, S. Acounis, D. Adamová, C. Adler, M. Aggarwal, F. Agnese, G. A. Rinella, Z. Ahammed, *et al.*, "The ALICE experiment at the CERN LHC", *JINST*, vol. 3, no. 08, S08002, 2008. DOI: 10.1016/b978-0-444-51343-4.50019-3. - [35] J. Haffner, "The CERN accelerator complex. Complexe des accélérateurs du CERN", Oct. 2013, General Photo. [Online]. Available: https://cds.cern.ch/record/1621894. - [36] S. Ulmer, "BASE Annual Report 2018", Tech. Rep., 2019. [Online]. Available: http://cds.cern.ch/record/2654098. - [37] The LEP Working Group, ALEPH Collaboration, DELPHI Collaboration, L3 Collaboration, OPAL Collaboration for Higgs, "Search for the standard model Higgs boson at LEP", *Physics Letters B*, vol. 565, pp. 61–75, 2003. DOI: 10.1063/1.43422. - [38] G. Aad, T. Abajyan, B. Abbott, J. Abdallah, S. A. Khalek, A. A. Abdelalim, O. Abdinov, R. Aben, B. Abi, M. Abolins, *et al.*, "Observation of a new particle in the search for the standard model higgs boson with the atlas detector at the lhc", *Physics Letters B*, vol. 716, no. 1, pp. 1–29, 2012. DOI: 10.1016/j.physletb. 2012.08.020. - [39] P. W. Higgs, "Broken symmetries and the masses of gauge bosons", *Physical Review Letters*, vol. 13, no. 16, p. 508, 1964. DOI: 10.1103/PhysRevLett.13.508. - [40] F. Englert and R. Brout, "Broken symmetry and the mass of gauge vector mesons", *Physical Review Letters*, vol. 13, no. 9, p. 321, 1964. DOI: 10.1103/PhysRevLett.13.321. - [41] S. P. Summers, "Application of FPGAs to Triggering in High Energy Physics", PhD thesis, Imperial Coll., London, 2018. [Online]. Available: http://cds.cern.ch/record/2647951. - [42] G. Apollinari, I. Béjar Alonso, O. Brüning, M. Lamont, and L. Rossi, *High-Luminosity Large Hadron Collider (HL-LHC): Preliminary Design Report*, ser. CERN Yellow Reports: Monographs. Geneva: CERN, 2015. [Online]. Available: http://cds.cern.ch/record/2116337. - [43] M. Krammer, "The update of the European strategy for particle physics", *Physica Scripta*, vol. 2013, no. T158, p. 014 019, 2013. DOI: 10.1088/0031-8949/2013/t158/014019. - [44] G. Apollinari, I. Béjar Alonso, O. Brüning, P. Fessia, M. Lamont, L. Rossi, and L. Tavian, "High-Luminosity Large Hadron Collider (HL-LHC)", *CERN Yellow Rep. Monogr.*, vol. 4, pp. 1–516, 2017. DOI: 10.23731/CYRM-2017-004. - [45] D. Contardo, M. Klute, J. Mans, L. Silvestris, and J. Butler, "Technical Proposal for the Phase-II Upgrade of the CMS Detector", Tech. Rep. CERN-LHCC-2015-010. LHCC-P-008. CMS-TDR-15-02, Jun. 2015. [Online]. Available: https://cds. cern.ch/record/2020886. - [46] G. Lutz *et al.*, *Semiconductor radiation detectors*. Springer, 1999, vol. 40. DOI: 10.1007/978-3-540-71679-2. - [47] F. Hartmann, "Silicon tracking Detectors in High-Energy Physics", *Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment*, vol. 666, pp. 25–46, 2012, Advanced Instrumentation, ISSN: 0168-9002. DOI: 10.1016/j.nima.2011.11.005. - [48] F. Hartmann, *Evolution of silicon sensor technology in particle physics*. Springer, 2009, vol. 43. DOI: 10.1007/978-3-319-64436-3. - [49] L. Rossi, P. Fischer, T. Rohe, and N. Wermes, *Pixel Detectors*. 2006. DOI: 10.1007/3-540-28333-1. - [50] M. Tanabashi, K. Hagiwara, K. Hikasa, K. Nakamura, Y. Sumino, F. Takahashi, J. Tanaka, K. Agashe, G. Aielli, C. Amsler, M. Antonelli, D. Asner, H. Baer, et al., "Review of Particle Physics", *Physical Review D*, vol. 98, 2018, ISSN: 2470-0010. DOI: 10.1103/PhysRevD.98.030001. - [51] T. Sakuma and T. McCauley, "Detector and Event Visualization with SketchUp at the CMS Experiment", *Journal of Physics: Conference Series*, vol. 513, no. 2, p. 022 032, 2014. DOI: 10.1088/1742-6596/513/2/022032. - [52] G. Perez, "Unitarization Models For Vector Boson Scattering at the LHC", PhD thesis, 2018. DOI: 10.5445/IR/1000082199. - [53] CMS collaboration *et al.*, "Particle-flow event reconstruction in CMS and performance for jets, taus and MET", CMS-PAS-PFT-09-001, Tech. Rep., 2009. [Online]. Available: https://cds.cern.ch/record/1194487?ln=en. - [54] D. Bertolini, P. Harris, M. Low, and N. Tran, "Pileup per particle identification", *Journal of High Energy Physics*, vol. 2014, no. 10, p. 59, 2014. DOI: 10.1007/jhep10(2014)059. - [55] V. Karimäki, "The CMS tracker system project: Technical Design Report", CMS-TDR-005, Tech. Rep., 1997. - [56] M. Angarano for CMS Tracker Collaboration *et al.*, "The silicon strip tracker for CMS", *Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment*, vol. 501, no. 1, pp. 93–99, 2003. DOI: 10.1016/S0168-9002(02)02016-8. - [57] M. French, L. Jones, Q. Morrissey, A. Neviani, R. Turchetta, J. Fulcher, G. Hall, E. Noah, M. Raymond, G. Cervelli, *et al.*, "Design and results from the APV25, a deep sub-micron CMOS front-end chip for the CMS tracker", *Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment*, vol. 466, no. 2, pp. 359–365, 2001. DOI: 10.1016/s0168-9002(01)00589-7. - [58] G. Bianchi for the CMS Collaboration, "tkLayout: a design tool for innovative silicon tracking detectors", 2014. DOI: 10.1088/1748-0221/9/03/C03054. - [59] D. Ceresa, "Electronic systems for intelligent particle tracking in the High Energy Physics field", PhD thesis, Politecnico di Torino, 2016. DOI: 10.6092/polito/porto/2642937. - [60] S. Mersi, CMS Collaboration, *et al.*, "Phase-2 Upgrade of the CMS Tracker", *Nuclear and particle physics proceedings*, vol. 273, pp. 1034–1041, 2016. DOI: 10.1016/j.nuclphysbps.2015.09.162. - [61] T. Böhlen, F. Cerutti, M. Chin, A. Fassò, A. Ferrari, P. Ortega, A. Mairani, P. R. Sala, G. Smirnov, and V. Vlachoudis, "The FLUKA code: developments and challenges for high energy and medical applications", *Nuclear data sheets*, vol. 120, pp. 211–214, 2014. DOI: 10.1016/j.nds.2014.07.049. - [62] A. Ferrari, P. Sala, A. Fassò, and J. Ranft, "FLUKA: A multi-particle transport code", 2005. DOI: 10.2172/877507. - [63] V. Khachatryan, D. Anderson, A. Apresyan, A. Bornheim, J. Bunn, Y. Chen, J. Duarte, A. Mott, H. Newman, C. Pena, *et al.*, "The CMS trigger system", *JINST*, vol. 12, no. 1, Art–No, 2017. DOI: 10.1088/1748-0221/12/01/P01020. - [64] A. Rácz and P. Sphicas, "CMS The TriDAS Project: Technical Design Report, Volume 2: Data Acquisition and High-Level Trigger", CMS-TDR-006, Tech. Rep., 2002. [Online]. Available: http://cds.cern.ch/record/578006. - [65] M. Pesaresi, "Tracking trigger upgrade plans for CMS at SLHC", *PoS*, p. 047, 2010. DOI: 10.22323/1.113.0047. - [66] S. Mersi, "CMS silicon tracker upgrade for HL-LHC", *PoS*, p. 030, 2011. [Online]. Available: http://cds.cern.ch/record/1462792. - [67] M. Pesaresi, "Development of a new Silicon Tracker at CMS for Super-LHC", PhD thesis, Imperial Coll., London, 2010. - [68] N. Pozzobon, "Development of a Level 1 Track Trigger for the CMS experiment at the high-luminosity LHC", *Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment*, vol. 732, pp. 151–155, 2013. DOI: 10.1016/j.nima.2013.06.010. - [69] M. Pesaresi and G. Hall, "Simulating the performance of a pT tracking trigger for CMS", *JINST*, vol. 5, no. 08, p. C08003, 2010. DOI: 10.1088/1748-0221/5/08/C08003. - [70] A. Belloni, "CMS upgrade plans & potential", in *Fourth Annual Large Hadron Collider Physics*, SISSA Medialab, vol. 276, 2017, p. 043. DOI: 10.22323/1.276. 0043. - [71] A. Dominguez, D. Abbaneo, K. Arndt, N. Bacchetta, A. Ball, E. Bartz, W. Bertl, G. M. Bilei, G. Bolla, H. W. K. Cheung, M. Chertok, S. Costa, N. Demaria, D. D. Vazquez, K. Ecklund, W. Erdmann, K. Gill, G. Hall, K. Harder, F. Hartmann, *et al.*, "CMS Technical Design Report for the Pixel Detector Upgrade", Tech. Rep. CERN-LHCC-2012-016. CMS-TDR-11, Sep. 2012. [Online]. Available: http://cds.cern.ch/record/1481838. - [72] G. Blanchot, D. Braga, A. Honma, M. Kovacs, and M. Raymond, "Hybrid circuit prototypes for the CMS Tracker upgrade front-end electronics", *JINST*, vol. 8, no. 12, p. C12033, 2013. DOI: 10.1088/1748-0221/8/12/C12033. - [73] D. Braga, G. Hall, L. Jones, P. Murray, M. Pesaresi, M. Prydderch, and M. Raymond, "CBC2: a microstrip readout ASIC with coincidence logic for trigger primitives at HL-LHC", *JINST*, vol. 7, no. 10, p. C10003, 2012. DOI: 10.1088/1748-0221/7/10/c10003. - [74] P. Moreira, "The LpGBT project status and overview", in *ACES*, 2016. [Online]. Available: https://indico.cern.ch/event/468486/. - [75] J. Troska, A. Kraxner, A. Brandon-Bravo, S. Detraz, C. Scarcella, C. Sigaud, C. Soos, F. Vasey, and L. Olanterä, "The VTRx+, an optical link module for data transmission at HL-LHC", *PoS*, p. 048, 2017. DOI: 10.22323/1.313.0048. - [76] F. Faccio, S. Michelis, S. Orlandi, G. Blanchot, C. Fuentes, S. Saggini, and F. Ongaro, "Development of custom radiation-tolerant DCDC converter ASICs", *JINST*, vol. 5, no. 11, p. C11016, 2010. DOI: 10.1088/1748-0221/5/11/C11016. - [77] M. Campbell, E. Heijne, G. Meddeler, E. Pemigotti, and W. Snoeys, "Readout for a 64/spl times/64 pixel matrix with 15-bit single photon counting", in 1997 *IEEE Nuclear Science Symposium Conference Record*, IEEE, vol. 1, 1997, pp. 189–191. DOI: 10.1109/nssmic.1997.672566. - [78] M. Lindner, L. Blanquart, P. Fischer, H. Krüger, and N. Wermes, "Medical X-ray imaging with energy windowing", *Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment*, vol. 465, no. 1, pp. 229–234, 2001. DOI: 10.1016/S0168-9002(01)00395-3. - [79] D. W. Clark and L.-J. Weng, "Maximal and near-maximal shift register sequences: Efficient event counters and easy discrete logarithms", *IEEE Transactions on Computers*, vol. 43, no. 5, pp. 560–568, 1994. DOI: 10.1109/12.280803. - [80] C. Patauner, A. Marchioro, S. Bonacini, A. U. Rehman, and W. Pribyl, "A lossless data compression system for a real-time application in HEP data acquisition", *IEEE Transactions on Nuclear Science*, vol. 58, no. 4, pp. 1738–1744, 2011. DOI: 10.1109/rtc.2010.5750389. - [81] P. Valerio, R. Ballabriga, and M. Campbell, "Design of the 65 nm CLICpix demonstrator chip", Nov. 2012. [Online]. Available: http://cds.cern.ch/record/1507691. - [82] X. Llopart, R. Ballabriga, M. Campbell, L. Tlustos, and W. Wong, "Timepix, a 65k programmable pixel readout chip for arrival time, energy and/or photon counting measurements", *Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment*, vol. 581, no. 1-2, pp. 485–494, 2007. DOI: 10.1016/j.nima.2007.08.079. - [83] T. Poikela, M. De Gaspari, J. Plosila, T. Westerlund, R. Ballabriga, J. Buytaert, M. Campbell, X. Llopart, K. Wyllie, V. Gromov, et al., "VeloPix: the pixel ASIC for the LHCb upgrade", JINST, vol. 10, no. 01, p. C01057, 2015. DOI: 10.1088/1748-0221/10/01/C01057. - [84] G. Mazza, D. Calvo, P. De Remigis, M. Mignone, J. Olave, A. Rivetti, R. Wheadon, and L. Zotti, "The ToPiX v4 prototype for the triggerless readout of the PANDA silicon pixel detector", *JINST*, vol. 10, no. 01, p. C01042, 2015. DOI: 10.1088/1748-0221/10/01/C01042. - [85] G. Borghello, "Ionizing radiation effects in nanoscale CMOS technologies exposed to ultra-high doses", PhD thesis, University of Udine, Dipartimento Politecnico di Ingegneria e Architettura, Udine, Italy, 2018. - [86] I. Jun, "Effects of secondary particles on the total dose and the displacement damage in space proton environments", *IEEE Transactions on Nuclear Science*, vol. 48, no. 1, pp. 162–175, Feb. 2001, ISSN: 0018-9499. DOI: 10.1109/23.907581. - [87] G. Lindström, M. Moll, and E. Fretwurst, "Radiation hardness of silicon detectors a challenge from high-energy physics", *Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment*, vol. 426, no. 1, pp. 1–15, 1999, ISSN: 0168-9002. DOI: 10.1016/S0168-9002(98)01462-4. - [88] J. Srour, C. J. Marshall, and P. W. Marshall, "Review of displacement damage effects in silicon devices", *IEEE Transactions on Nuclear Science*, vol. 50, no. 3, pp. 653–670, 2003. DOI: 10.1109/TNS.2003.813197. - [89] J. R. Srour and J. W. Palko, "Displacement Damage Effects in Irradiated Semiconductor Devices", *IEEE Transactions on Nuclear Science*, vol. 60, no. 3, pp. 1740–1766, Jun. 2013, ISSN: 0018-9499. DOI: 10.1109/TNS.2013.2261316. - [90] R. C. Lacoe, J. V. Osborn, D. C. Mayer, S. Brown, and D. R. Hunt, "Total-dose radiation tolerance of a commercial 0.35 μm CMOS process", in 1998 IEEE Radiation Effects Data Workshop. NSREC 98. Workshop Record. Held in conjunction with IEEE Nuclear and Space Radiation Effects Conference (Cat. No.98TH8385), Jul. 1998, pp. 104–110. DOI: 10.1109/REDW.1998.731487. - [91] G. U. Youk, P. S. Khare, R. D. Schrimpf, L. W. Massengill, and K. F. Galloway, "Radiation-enhanced short channel effects due to multi-dimensional influence from charge at trench isolation oxides", *IEEE Transactions on Nuclear Science*, vol. 46, no. 6, pp. 1830–1835, Dec. 1999, ISSN: 0018-9499. DOI: 10.1109/23.819161. - [92] F. Faccio and G. Cervelli, "Radiation-induced edge effects in deep submicron CMOS transistors", *IEEE Transactions on Nuclear Science*, vol. 52, no. 6, pp. 2413–2420, 2005. DOI: 10.1109/tns.2005.860698. - [93] F. Faccio, G. Borghello, E. Lerario, D. M. Fleetwood, R. D. Schrimpf, H. Gong, E. X. Zhang, P. Wang, S. Michelis, S. Gerardin, A. Paccagnella, and S. Bonaldo, "Influence of LDD spacers and H+ transport on the total-ionizing-dose response of 65 nm MOSFETs irradiated to ultra-high doses", *IEEE Transactions on Nuclear Science*, vol. 65, no. 1, pp. 164–174, Jan. 2018, ISSN: 0018-9499. DOI: 10.1109/tns.2017.2760629. - [94] D. Felici, S. Bertazzoni, S. Bonacini, A. Marchioro, P. Moreira, and M. Ottavi, "A 20 mW, 4.8 Gbit/sec, SEU robust serializer in 65nm for read-out of data from LHC experiments", *JINST*, vol. 9, no. 01, p. C01004, 2014. DOI: 10.1088/1748-0221/9/01/c01004. - [95] A. Caratelli, "Design of a radiation tolerant control ASIC for high energy physics experiments", *Master Thesis*, 2014. [Online]. Available: https://etd.adm.unipi.it/t/etd-08182014-164101/. - [96] H. Øverås, "Dead-time losses in a buffered data recording system", *Nuclear Instruments and Methods*, vol. 104, no. 1, pp. 85–91, 1972. DOI: 10.1016/0029-554X(72)90300-X. - [97] S. Jain, *Http://sandhya.web.cern.ch/sandhya/detectors/upgrade/tracker/*. [Online]. Available: http://sandhya.web.cern.ch/sandhya/Detectors/Upgrade/Tracker/ (visited on 2020). - [98] G. Traversi, S. Bonacini, F. De Canio, L. Gaioni, K. Kloukinas, M. Manghisoni, L. Ratti, and V. Re, "Design of low-power, low-voltage, differential I/O links for High Energy Physics applications", *JINST*, vol. 10, no. 01, p. C01055, 2015. DOI: 10.1088/1748-0221/10/01/c01055. - [99] F. De Canio, L. Gaioni, M. Manghisoni, L. Ratti, V. Re, and G. Traversi, "Characterization of slvs driver and receiver in a 65 nm cmos technology for high energy physics applications", vol. 313, pp. 1–4, 2018. - [100] G. Hall, M. Pesaresi, M. Raymond, D. Braga, L. Jones, P. Murray, M. Prydderch, D. Abbaneo, G. Blanchot, A. Honma, *et al.*, "CBC2: A CMS microstrip readout ASIC with logic for track-trigger modules at HL-LHC", *Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment*, vol. 765, pp. 214–218, 2014. DOI: 10.1016/j.nima. 2014.04.056. - [101] H. Peters, O. Schulz-Hildebrandt, and N. Luttenberger, ""Fast In-Place Sorting with CUDA Based on Bitonic Sort", in *Parallel Processing and Applied Mathematics*, R. Wyrzykowski, J. Dongarra, K. Karczewski, and J. Wasniewski, Eds., Springer Berlin Heidelberg, 2010, pp. 403–410, ISBN: 978-3-642-14390-8. DOI: 10.1007/978-3-642-14390-8 42. - [102] N. Semiconductors, "UM10204 I2C-bus specification and user manual", *Rev*, vol. 3, p. 19, 2014. [Online]. Available: https://www.nxp.com/docs/en/userguide/UM10204.pdf. - [103] Opencores, "The WISHBONE System-on-Chip (SoC) Interconnect Architecture for Portable IP Cores", 2008. [Online]. Available: http://opencores.org/opencores/wishbone. - [104] M. Bucher, A. Nikolaou, A. Papadopoulou, N. Makris, L. Chevas, G. Borghello, H. D. Koch, and F. Faccio, "Total ionizing dose effects on analog performance of 65 nm bulk CMOS with enclosed-gate and standard layout", in *2018 IEEE International Conference on Microelectronic Test Structures(ICMTS)*, IEEE, 2018, pp. 166–170. DOI: 10.1109/ICMTS.2018.8383790. - [105] Y. S. Chauhan, S. Venugopalan, M.-A. Chalkiadaki, M. A. U. Karim, H. Agarwal, S. Khandelwal, N. Paydavosi, J. P. Duarte, C. C. Enz, A. M. Niknejad, *et al.*, "BSIM6: Analog and RF compact model for bulk MOSFET", *IEEE Transactions on Electron Devices*, vol. 61, no. 2, pp. 234–244, 2014. DOI: 10.1109/TED.2013. 2283084. - [106] A. Nikolaou, M. Bucher, N. Makris, A. Papadopoulou, L. Chevas, G. Borghello, H. D. Koch, K. Kloukinas, T. S. Poikela, and F. Faccio, "Extending a 65nm cmos process design kit for high total ionizing dose effects", in 2018 7th International Conference on Modern Circuits and Systems Technologies (MOCAST), IEEE, 2018, pp. 1–4. DOI: 10.1109/MOCAST.2018.8376561. - [107] A. Caratelli, "Research and development of an intelligent particle tracker detector electronic system", PhD thesis, Ecole Polytechnique, Lausanne, 2019. DOI: 10.5075/epfl-thesis-9702. - [108] L. Casas, D. Ceresa, S. Kulis, S. Miryala, J. Christiansen, R. Francisco, and D. Gnani, "Characterization of radiation effects in 65 nm digital circuits with the DRAD digital radiation test chip", *JINST*, vol. 12, no. 02, p. C02039, 2017. DOI: 10.1088/1748-0221/12/02/c02039. - [109] S. Kulis, "Single Event Effects mitigation with TMRG tool", *JINST*, vol. 12, no. 01, p. C01082, 2017. DOI: 10.1088/1748-0221/12/01/c01082. - [110] S. Miryala1, T. Hemperek, and M. Menouni, "Characterization of Soft Error Rate Against Memory Elements Spacing and Clock Skew in a Logic with Triple Modular Redundancy in a 65nm Process", *PoS*, vol. Twepp-17, 2018. [Online]. Available: https://pos.sissa.it/343/029/pdf. - [111] CERN ASIC Support, "ESD Clamp datasheet", CERN Internal IP documentation library, Tech. Rep., 2018. [Online]. Available: https://espace.cern.ch/asics-support. - [112] I. Kremastiotis and CERN ASIC Support, "CERN IO-Pads, RadTol 1.2V CMOS pads in 65nm technology", CERN Internal IP documentation library, Tech. Rep., 2015. [Online]. Available: https://espace.cern.ch/asics-support. - [113] Texas Instruments, "LVDS application and data handbook", *Literature Number: SLLD009*, 2002. DOI: 10.1016/b978-0-12-800629-0.00049-8. - [114] JEDEC Standard, "Scalable Low-Voltage Signaling for 400 mV (SLVS-400)", *JESD8-13*, *October*, 2001. [Online]. Available: https://www.jedec.org/standards-documents/docs/jesd-8-13. - [115] M. Graphics, *Calibre verification user's manual*, 2008. [Online]. Available: https://www.mentor.com/. - [116] M. Pesaresi, M. B. Marin, G. Hall, M. Hansen, G. Iles, A. Rose, F. Vasey, and P. Vichoudis, "The FC7 AMC for generic DAQ & control applications in CMS", *JINST*, vol. 10, no. 03, p. C03036, 2015. DOI: 10.1088/1748-0221/10/03/c03036. - [117] International Commission on Radiation Units and Measurements, "Foundamental quantities and units for Ionizing Radiation Report 85", *Journal of the International Commission on Radiation Units and Measurements*, vol. 11, NP, 2011. DOI: 10.1093/jicru/ndr012. - [118] M. Huhtinen and F. Faccio, "Computational method to estimate Single Event Upset rates in an accelerator environment", *Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment*, vol. 450, no. 1, pp. 155–172, 2000. DOI: 10.1016/s0168-9002(00)00155-8. ## EDUCATION 2017 - 2021 Ph.D. candidate in Microsystems and Microelectronics EPFL - École polytechnique fédérale de Lausanne, Microelectronic Systems Laboratory (LSM), Switzerland **②** 2014 - 2016 M.Sc. in Micro and Nanotechnologies for Integrated Systems, Final grade: 109/110 EPFL - École polytechnique fédérale de Lausanne, Switzerland Institut Politechnique de Grenoble, France Politecnico di Torino, Italy 2011 - 2014 Bachelor's degree in Electronic Engineering, Final grade: 110 cum laude Sapienza Università di Roma, Italy # TECHNICAL SKILLS #### Advanced Experience - HDL languages: Verilog, VHDL, System-Verilog - Verification methodology: UVM - Cadence EDA tools for IC design: Genus, Innovus, Voltus, Tempus, Incisive, vManager, Virtuoso, Calibre - Source code and design management: Git, Github, Gitlab, Cliosoft #### Good Experience - Programming Languages: C, C++ - Scripting Languages: Python, Bash, TCL - Operating systems: Windows, Linux #### Basic Knowledge - PCB design: Altium Designer - CAD: COMSOL, Silvaco - Data analysis: MATLAB # LANGUAGES English Italian French Spanish # S C A R F Ì Digital IC Designer and Verification Engineer ⊠ simon.scarfi@gmail.com in simonscarfi # EXPERIENCE #### O Digital IC Design | 2017 - 2021 CERN - European Organization for Nuclear Research Geneva, Switzerland - Design of a Data Concentrator ASIC (CIC) for the CMS Outer Track er Upgrade: RTL description, architecture studies, design synthesis and implementation in 65 nm technology until sign-off. - Focus on low-power and radiation hardening design techniques to mitigate Total Ionizing Dose (TID) and Single Event Effects (SEE). - Simulations, studies and optimization of the electronic readout system of an innovative silicon particle detector, capable of discriminating particles with high transverse momentum for the CMS experiment. - Power distribution studies at module level to isolate analog front-end power supply from digital noise. - Design of analog IP blocks (LDO, DAC, etc.) to be employed in the analog front-end of silicon sensor readout ASIC. #### ODigital IC Verification | 2017 - 2021 CERN - European Organization for Nuclear Research Geneva, Switzerland - Development of a System-Verilog and UVM system level simulation framework for multichip development and verification for two modules for the CMS Outer Tracker Upgrade: hit generation based on Monte Carlo physics events, UVC component and interface, readout chain reference model at TLM level, scoreboard and test case library. - Metric Driven Verification for better ASIC predicatbility and quality of the verification effort: test planning, develop UVM environment features, execute tests, measure and analyze results before reiterating all the previous steps until satisfied with achieved design functional and code coverage. - Development of randomized SEU/SET fault injection in the UVM framework to evaluate ASIC robustness against SEE and possible architecture improvements. - Sign-off verification on the final ASIC netlist with back-annotated timing delays for all the corners. ### CONFERENCES - Topical Workshop on Electronics for Particle Physics TWEPP 2018, Antwerpen, Belgium - Topical Workshop on Electronics for Particle Physics TWEPP 2019, Santiago, Spain