# Signaling in 3-D integrated circuits, benefits and challenges

THÈSE Nº 6509 (2015)

PRÉSENTÉE LE 1<sup>ER</sup> MAI 2015 À LA FACULTÉ INFORMATIQUE ET COMMUNICATIONS LABORATOIRE DES SYSTÈMES INTÉGRÉS (IC/STI) PROGRAMME DOCTORAL EN GÉNIE ÉLECTRIQUE

# ÉCOLE POLYTECHNIQUE FÉDÉRALE DE LAUSANNE

POUR L'OBTENTION DU GRADE DE DOCTEUR ÈS SCIENCES

PAR

# Somayyeh RAHIMIAN OMAM

acceptée sur proposition du jury:

Prof. D. Atienza Alonso, président du jury Prof. G. De Micheli, directeur de thèse Dr A. Jerraya, rapporteur Dr I. Bacivarov, rapporteur Prof. Y. Leblebici, rapporteur



To my parents

Soghra and Hassan

for their endless love, encouragement and support

# Acknowledgements

Without the help and support from a large number of people, this work would not have been ended. Working with such great colleagues and friends is the most worthy part during my PhD.

I am honored and fortunate to have been advised by Prof. Giovanni De Micheli. I would like to express my special thanks to him, for all of his supports, encouragements and immense knowledge. His patience, good advises, and impressive comments are so appreciated. My thesis could not have been accomplished without the guidance of Dr. Vasilis Pavlidis and Dr. Pierre-Emmanuel Gaillardon. I would like to thank them for their tremendous guides and supports.

I would especially like to thank my examiners, Prof. David Alonso Atienza, Prof. Yusuf Leblebici, Dr. Ahmed Jerraya, and Dr. Iuliana Bacivarov, who provided encouraging and constructive feedbacks. I am grateful for their thoughtful and detailed comments.

I would like to express my sincere appreciation to my former supervisor, Prof. Omid Shoaei and Prof. Mahdi Fakhraei who encouraged me in this path and all of their supports. I learnt a lot from them and it was my honor to be their student and work with them.

My colleagues at EPFL have a significant role in my project. I wish to thank Xifan Tnag and Giulia Beanato. It has been a great pleasure working with them.

Tremendous thanks to all the members of LSI labs for the priceless moments I had in this lab. The very friendly atmosphere of the lab helped me a lot to have a fulfilled life in Switzerland. I want to specially thank Christina Govoni for all kind help and support during my PhD and also my dear friends Seyedeh Sara Ghoreishizadeh and Hassan Ghasemzadeh Mohammadi for their priceless friendship. I am so proud to be a LSI member.

I am also grateful to Julien Ghaye and Banafsheh Abasahl for helping me during my writing.

I am indebted to all my friends in Lausanne who have supported me over the last few years, Mahdi, Samira, Banafsheh, Amin, Farhang, Maryam and Ali, Sareh, Zahra and all of my lovely friends. I wish the best for you in your life.

#### Acknowledgements

I would like to express my warmest thanks to my dear family: my brother Mohsen and my sister Sekineh for all their encouragements and love. My whole-hearted gratitude goes to my mother, Soghra and my father Hassan who are my greatest resource of love, friendship and enthusiasm. They have been the greatest inspiration throughout my life. Words are not sufficient to express how blessed I feel for having them as my parents. I just want to say I love you with all my heart.

Somayyeh Rahimian

Lausanne, January 2015

# Abstract

Three-dimensional (3-D) or vertical integration is a design and packaging paradigm that can mitigate many of the increasing challenges related to the design of modern integrated systems. 3-D circuits have recently been at the spotlight, since these circuits provide a potent approach to enhance the performance and integrate diverse functions within a multi-plane stack.

Clock networks consume a great portion of the power dissipated in a circuit. Therefore, designing a low-power clock network in synchronous circuits is an important task. This requirement is stricter for 3-D circuits due to the increased power densities. Synchronization issues can be more challenging for 3-D circuits since a clock path can spread across several planes with different physical and electrical characteristics. Consequently, designing low power clock networks for 3-D circuits is an important issue. Resonant clock networks are considered efficient low-power alternatives to conventional clock distribution schemes. These networks utilize additional inductive circuits to reduce power while delivering a full swing clock signal to the sink nodes. In this research, a design method to apply resonant clocking to synthesized clock trees is proposed.

Another considerable challenge for 3-D integration is manufacturing and the related yield implications. Manufacturing processes for 3-D circuits include some additional steps as compared to standard CMOS processes, such as wafer thinning and TSV fabrication. This manufacturing complexity makes 3-D circuits more susceptible to manufacturing defects, which can lower the overall yield of the bonded 3-D stack. Testing is another complicated task for 3-D ICs, where pre-bond test is a prerequisite. Contactless testing methods have been considered as an alternative for conventional test methods. Pre-bond testability, in turn, presents new challenges to 3-D clock network design primarily due to the incomplete clock distribution networks prior to the bonding of the planes. A design methodology of resonant 3-D clock networks that support wireless pre-bond testing is introduced. To efficiently address this issue, inductive links are exploited to wirelessly transmit the clock signal to the disjoint resonant clock networks. The inductors comprising the *LC* tanks are used as the receiver circuit for the links, essentially eliminating the need for additional circuits and/or interconnect resources during pre-bond test.

Through Silicon Vias (TSVs) are the enablers for achieving high bandwidth paths in interplane communications. TSVs also provide higher vertical link density and facilitate the heat

#### Acknowledgements

flow in the 3-D circuits as compared to other potential schemes such as inductive links. However, reliability issues and crosstalk problems among adjacent TSVs decrease the yield and performance of TSV based circuits. Moreover, the area footprint of TSVs and related keep-out areas is significant. Reducing the number of TSVs employed for inter-plane signal transferring can alleviate these problems.

Serialization can be considered as a solution to alleviate the challenges related to TSV bunches for transferring data among the planes. Converting parallel data into higher-rate serial data can reduce the number of TSVs and consequently area and cross-talk effects. Conversely, using serializer/deserializers circuits can add complexity to system design, specifically when bandwidth is limited and with respect to power consumption. A study of serial *vs.* parallel data communication for TSV-based 3-D circuits is presented in this research.

Recent FPGAs are quite complex circuits which provide reconfigurablity at the cost of lower performance and higher power consumption as compared to ASIC circuits. Exploiting a large number of programmable switches, routing structures are mainly responsible for performance degradation in FPAGs. Employing 3-D technology can provide more efficient switches which drastically improve the performance and reduce the power consumption of the FPGA. RRAM switches are one of the most promising candidates to improve the FPGA routing architecture thanks to their low on-resistance and non-volatility. Along with the configurable switches, buffers are the other important element of the FPGAs routing structure. Different characteristics of RRAM switches- as compared to CMOS transmission gates - change the properties of signal paths in RRAM-based FPGAs. The on resistance of RRAM switches is considerably lower than CMOS pass gate switches which results in lower *RC* delay for RRAM-based routing paths. This different nature in critical path and signal delay in turn affect the need for intermediate buffers. Thus the buffer allocation should be reconsidered. In the last part of this research, the effect of intermediate buffers on signal propagation delay is studied and a modified buffer allocation scheme for RRAM-based FPGA routing path is proposed.

Key words: 3-D integration, Through Silicon Via (TSV), Resonant clocking, Pre-bond test, Inductive link, Serialization, Resistive Ram (RRAM), FPGA, Routing path

# Résumé

L'intégration verticale ou tridimensionnelle (3-D) est un paradigme de conception et de réalisation de circuits électroniques qui peut résoudre un grand nombre des défis couramment rencontrés dans la conception de circuits intégrés. Les circuits 3-D ont été récemment mis en avant du fait de leur capacité à améliorer les performances et à intégrer diverses fonctionnalités au sein de plusieurs niveaux de transistors.

Les réseaux de distribution d'horloge sont responsables d'une grande partie de la consommation d'un circuit numérique. Dès lors, la conception de réseaux d'horloge à basse consommation pour les circuits synchrones est d'une importance capitale. Ce besoin est d'autant plus fort pour les circuits 3-D du fait de leur grande densité d'intégration. Les problèmes de synchronisation sont également plus ardus avec les circuits 3-D en considérant qu'un simple signal d'horloge peut être distribué à travers plusieurs niveaux possédant chacun ses caractéristiques électriques et physiques propres. En conséquence, le développement de réseaux de distribution d'horloge à basse consommation est vital. Les réseaux de distribution résonants sont considérés comme des alternatives efficaces aux solutions conventionnelles de distribution d'horloges. Ces réseaux utilisent des circuits inductifs pour limiter la puissance consommée tout en fournissant un signal d'horloge ayant une plage de tension maximale aux différents noeuds. Cette thèse propose une méthode de conception automatisée pour la synthèse de réseaux d'horloges résonnants.

Un autre défi de taille pour l'intégration 3-D est l'augmentation du rendement de fabrication. Le processus de fabrication utilisé pour les circuits 3-D nécessite la réalisation d'étapes supplémentaires par rapport à une fabrication CMOS standard, telles que l'amincissement des couches de silicium ou encore la réalisation des contacts électriques entre les niveaux, connus sous le nom de "Through Silicon Vias" (TSV). Cette hausse de la complexité de fabrication rend les circuits 3-D plus susceptibles aux défauts de fabrication réduisant ainsi le rendement global des circuits 3-D une fois empilés et assemblés. Le test des circuits 3-D est une étape non-triviale pour laquelle les tests intermédiaires avant assemblage dits "pre-bond tests" deviennent nécessaires. Des méthodes de test sans contact ont été envisagées comme alternatives aux méthodes traditionnelles. Cependant, ceci crée des nouveaux défis pour le test des réseaux de distribution d'horloge pour circuits 3-D avant assemblage, du fait que le réseau d'horloge n'est complet qu'après assemblage de tous les niveaux. Pour résoudre efficacement ce problème, des liens inductifs sont utilisés pour transmettre les signaux d'horloge aux réseaux résonants

#### Acknowledgements

disjoints. Ainsi, des circuits LC sont utilisés pour la réception de signaux, éliminant ainsi l'utilisation de connections physiques ou de circuits additionnels pour les "pre-bond tests".

Les TSVs sont les éléments permettant la communication de données à haute vitesse entre les différents niveaux des circuits 3-D. Comparés aux liens inductifs, les TSVs fournissent une densité accrue de liens verticaux et facilitent l'extraction de chaleur. Cependant, des problèmes de fiabilité et de diaphonie ("crosstalk") entre les TSVs adjacent limitent le rendement de fabrication et la performance finale des circuits les employant. De plus, la taille d'un TSV est non négligeable. La réduction du nombre de TSVs employés dans un circuit est alors nécessaire pour limiter ces problèmes.

La sérialisation des transferts de données entre les niveaux peut être considérée comme une solution. Convertir le transfert des données parallèles en série tout en augmentant la fréquence de transmission permet de réduire le nombre de TSVs et de limiter les effets de diaphonie. En revanche, l'emploi de cette technique nécessite des circuits additionnels qui complexifient la conception du système, tout particulièrement au niveau de la puissance consommée et de la bande passante. Une étude traitant de la sérialisation des communications entre les niveaux des circuits 3-D est aussi présentée dans cette thèse.

Les FPGAs récents sont des circuits complexes qui offrent un haut niveau de reconfigurabilité par rapport aux ASICs, au détriment d'une baisse de la performance et d'une augmentation de la puissance électrique consommée. L'utilisation d'un grand nombre de connections programmables et de matrices de routage est la principale limite des FPGAs modernes. L'emploi des technologies 3-D permet d'envisager la réalisation de connections programmables plus efficaces, et ainsi d'améliorer drastiquement les performances et de réduire la consommation des FPGAs. Les mémoires RRAM sont des candidats prometteurs pour améliorer les matrices de routage, car elles offrent une faible résistance dans l'état actif et sont non-volatile. En plus des connections programmables, l'utilisation de répéteurs, ou buffers, joue un rôle essentiel au sein des FPGAs. Les caractéristiques des mémoires RRAM changent les propriétés des chemins de données dans les FPGAs les mettant en oeuvre. La résistance à l'état passant d'une mémoire RRAM est en effet nettement plus faible que celle des connections CMOS conventionnelles ce que se traduit par des délais RC réduits dans les chemins de données. Ceci modifie intrinsèquement les contraintes sur l'architecture et impacte le nom de répéteurs nécessaires. Dans la dernière partie de cette thèse, les effets induits par l'utilisation de répéteurs sur les temps de propagation des signaux sont étudiés et une nouvelle méthode d'allocation de répéteurs pour FPGAs à base de mémoires RRAM est proposée.

Mots clefs : Intégration 3-D, Through Silicon Via (TSV), Réseau d'horloge résonant, Test pre-bond, Lien inductif, Sérialisation, Resistive Ram (RRAM), FPGA, chemin de routage

# Contents

| Ac | knov    | wledge  | ments                            | v    |
|----|---------|---------|----------------------------------|------|
| Ab | ostra   | ct (Eng | lish/Français)                   | vii  |
| Li | st of 1 | figures |                                  | xiii |
| Li | st of ] | Figure  | 8                                | xv   |
| Li | st of 1 | tables  |                                  | xix  |
| Li | st of ' | Tables  |                                  | xxi  |
| 1  | Intr    | oducti  | on                               | 1    |
|    | 1.1     | Benef   | its of 3-D integration           | 3    |
|    |         | 1.1.1   | Heterogeneous integration        | 3    |
|    |         | 1.1.2   | Higher density                   | 4    |
|    |         | 1.1.3   | Shorter interconnect             | 5    |
|    | 1.2     | Challe  | enges of 3-D integration         | 5    |
|    |         | 1.2.1   | Thermal problems                 | 6    |
|    |         | 1.2.2   | Design kit                       | 6    |
|    |         | 1.2.3   | Test                             | 6    |
|    |         | 1.2.4   | Mechanical stability             | 6    |
|    | 1.3     | Classi  | fication of vertical integration | 7    |

#### Contents

|   |     | 1.3.1 Coarse-Grain ICs                             | 7  |
|---|-----|----------------------------------------------------|----|
|   |     | 1.3.2 Fine-Grain ICs                               | 7  |
|   | 1.4 | 3-D vs. 2.5-D                                      | 11 |
|   | 1.5 | Thesis Outline                                     | 12 |
| 2 | TSV | based 3D Circuits                                  | 15 |
|   | 2.1 | Manufacturing process for TSVs and inductive links | 16 |
|   | 2.2 | Different applications of TSVs in 3-D ICs          | 20 |
|   |     | 2.2.1 Inter-plane signaling                        | 20 |
|   |     | 2.2.2 Power and ground distribution                | 25 |
|   |     | 2.2.3 Thermal TSVs                                 | 26 |
|   | 2.3 | Noise Coupling                                     | 28 |
|   | 2.4 | Synchronization in 3-D circuits                    | 31 |
|   |     | 2.4.1 Resonant clocking                            | 32 |
|   | 2.5 | Test                                               | 33 |
|   | 2.6 | Design Challenges for TSVs                         | 37 |
|   | 2.7 | Summary                                            | 38 |
| 3 | Mor | nolithic Integration                               | 39 |
|   | 3.1 | RRAM                                               | 41 |
|   |     | 3.1.1 RRAM in FPGA structure                       | 45 |
|   | 3.2 | Summary                                            | 48 |
| 4 | Res | onant Clocking                                     | 49 |
|   | 4.1 | Resonant clocking for symmetric networks           | 52 |
|   |     | 4.1.1 Resonant clocking for 3D H-trees             | 54 |
|   |     | 4.1.2 Simulation results                           | 59 |
|   | 4.2 | Resonant clocking for synthesized networks         | 60 |

|    |                     | 4.2.1 Simulation results                      | 67  |
|----|---------------------|-----------------------------------------------|-----|
|    | 4.3                 | Transceiver circuit for the inductive link    | 71  |
|    |                     | 4.3.1 Simulation results                      | 76  |
|    | 4.4                 | Summary                                       | 79  |
| 5  | Seri                | alization                                     | 81  |
|    | 5.1                 | Cross Talk                                    | 81  |
|    | 5.2                 | Serialization                                 | 82  |
|    | 5.3                 | Simulation Results                            | 84  |
|    | 5.4                 | Summary                                       | 89  |
| 6  | Buf                 | fer Allocation for RRAM-based FPGA Structure  | 91  |
|    | 6.1                 | Buffer distribution in FPGA                   | 92  |
|    |                     | 6.1.1 Delay Calculation in Critical Path      | 93  |
|    |                     | 6.1.2 Validation by circuit level simulations | 97  |
|    | 6.2                 | Architectural Simulations                     | 101 |
|    |                     | 6.2.1 Methodology                             | 101 |
|    |                     | 6.2.2 Simulation Results                      | 101 |
|    | 6.3                 | Summary                                       | 102 |
| 7  | Con                 | clusions                                      | 105 |
|    | 7.1                 | Contributions                                 | 106 |
|    | 7.2                 | Future Research                               | 108 |
| Bi | bliog               | raphy                                         | 111 |
| Bi | bliog               | raphy                                         | 122 |
| Cı | urriculum Vitae 123 |                                               |     |

| 1.1  | Moore's Law from 1970 - 2005                                                                                                                                                                                          | 1  |
|------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----|
| 1.2  | The intrinsic delay of NMOS transistor for different feature sizes                                                                                                                                                    | 2  |
| 1.3  | interconnect delay <i>vs</i> . gate delay                                                                                                                                                                             | 3  |
| 1.4  | heterogeneous system                                                                                                                                                                                                  | 4  |
| 1.5  | improving the density and performance of DRAMs by using 3-D integration                                                                                                                                               | 5  |
| 1.6  | Different methods of implementing vertical interconnects for SIPs                                                                                                                                                     | 8  |
| 1.7  | Monolithic integration [1].                                                                                                                                                                                           | 8  |
| 1.8  | different inter-plane communication methods for fine grain 3-D circuits                                                                                                                                               | 9  |
| 1.9  | Basic operation of an inductor link, where (a) is the inductively coupled transceiver and (b) is the equivalent circuit for the spiral inductor [2,3].                                                                | 10 |
| 1.10 | 2.5-D vs. 3-D IC [4].                                                                                                                                                                                                 | 11 |
| 1.11 | combining 2.5-D and 3-D integration                                                                                                                                                                                   | 12 |
| 1.12 | Xilinx Virtex-7, an example of 2.5-D integrated devices.                                                                                                                                                              | 12 |
| 2.1  | Basic steps of TSV manufacturing for "a via-middle" process, (a) wafer prepara-<br>tion and FEOL, (b) TSV etching and metal filling, (c) wafer thinning and BEOL,<br>(d) wafer bonding, and (e) handle wafer removal. | 17 |
| 2.2  | fabrication steps for different types of TSVs [5]                                                                                                                                                                     | 18 |
| 2.3  | an overview to challenges in different types of TSVs [5]                                                                                                                                                              | 20 |
| 2.4  | Face-to-Face and Face-to-Back bonding                                                                                                                                                                                 | 21 |
| 2.5  | An <i>RLC</i> model of a TSV [6]                                                                                                                                                                                      | 22 |
|      |                                                                                                                                                                                                                       |    |

| 2.6  | Average wirelength <i>vs</i> . number of TSVs [7]                                                                                                                                                                                                                                       | 23 |
|------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----|
| 2.7  | Different TSV allocations for 3-D circuits where in (a) no active area for TSVs is reserved during placement and in (b) some active area for TSVs is reserved during placement [8].                                                                                                     | 24 |
| 2.8  | Current paths within a 3-D circuit, (a) where the TSV is connected only to the topmost metal layer and (b) where the TSV is connected to the power lines on both the uppermost (MT) and the first (M1) metal layers of a circuit plane. TSVs are formed with a "via-middl" process [9]. | 26 |
| 2.9  | The thermal profile of a 3-D circuit before and after thermal via insertion [10].                                                                                                                                                                                                       | 27 |
| 2.10 | Thermal model of a TTSV in a three-plane circuit where (a) is the compact model and (b) is the distributed model [11].                                                                                                                                                                  | 28 |
| 2.11 | Cross-talk model between adjacent TSVs                                                                                                                                                                                                                                                  | 29 |
| 2.12 | structure of Coaxial TSV                                                                                                                                                                                                                                                                | 30 |
| 2.13 | different symmetric clock topologies expanded to 3-D [12]                                                                                                                                                                                                                               | 31 |
| 2.14 | Resonant clock network with four resonant circuits [13].                                                                                                                                                                                                                                | 32 |
| 2.15 | Potentional test flow for 3-D circuits [14].                                                                                                                                                                                                                                            | 34 |
| 2.16 | DfT architecture for 3D-ICs [14]                                                                                                                                                                                                                                                        | 35 |
| 2.17 | Different methods of testing a 3-D circuit with two planes shown in (a) where<br>the clock signal for pre-bond test is provided by (b) the use of redundant wiring<br>and (c) the use of a DLL for each local network.                                                                  | 36 |
| 2.18 | 3-D test structure using inductive links.                                                                                                                                                                                                                                               | 36 |
| 2.19 | Transition delay test methods where (a) is LOC and (b) is LOS method $\ [15].\ .\ .$                                                                                                                                                                                                    | 37 |
| 3.1  | process of monolithic fabrication                                                                                                                                                                                                                                                       | 40 |
| 3.2  | transistor level and gate level monolithic structure [1]                                                                                                                                                                                                                                | 41 |
| 3.3  | memristor structure [16]                                                                                                                                                                                                                                                                | 41 |
| 3.4  | hystersis behaviour in I-V curve of memristors [16]                                                                                                                                                                                                                                     | 42 |
| 3.5  | Switching behavior of a memristor where the resistance changes by changing the applied voltage                                                                                                                                                                                          | 42 |
| 3.6  | typical materials used for memristor structure                                                                                                                                                                                                                                          | 43 |
|      |                                                                                                                                                                                                                                                                                         |    |

| 3.7  | The four circuit properties (voltage, current, magnetic flux, and charge) and their relations [16] where three relations represent the well-known passive circuit elements, resistors ( $R = dV/dI$ ), inductors ( $l = dQ/dI$ ), and capacitors ( $c = dQ/dV$ ). The forth relation is described as $M = d\phi/dQ$ and it is well fitted |    |
|------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----|
|      | to the new element called memory resistance or memristance                                                                                                                                                                                                                                                                                | 43 |
| 3.8  | 1T1R structure                                                                                                                                                                                                                                                                                                                            | 44 |
| 3.9  | RRAM array structure                                                                                                                                                                                                                                                                                                                      | 44 |
| 3.10 | Replacing SRAMS cells with RRAM [17]                                                                                                                                                                                                                                                                                                      | 46 |
| 3.11 | NANA and NOR type memory cell [18]                                                                                                                                                                                                                                                                                                        | 46 |
| 3.12 | Using RRAM as the programable switch [19]                                                                                                                                                                                                                                                                                                 | 47 |
| 3.13 | The layout of 1T1R switch(a) and RRAM based FPGA (b) [17]                                                                                                                                                                                                                                                                                 | 47 |
| 4.1  | Resonant clock network with four resonant circuits [13]                                                                                                                                                                                                                                                                                   | 50 |
| 4.2  | clock distribution in a 3-D circuit where in (a) the clock network is connected in both planes and (b) includes some disconnected parts.                                                                                                                                                                                                  | 50 |
| 4.3  | <i>RLC</i> model of a 16-sink clock network where (a) is the distributed <i>RLC</i> model and (b) is the simplified <i>RLC</i> model of resonant network [20]                                                                                                                                                                             | 52 |
| 4.4  | $ H_{out} $ for different number of resonant circuits $\ldots \ldots \ldots \ldots \ldots \ldots$                                                                                                                                                                                                                                         | 53 |
| 4.5  | $ H_{out} $ for different number of <i>LC</i> tanks and resonant inductance using the model in [21] (dotted vertical lines) and the proposed approach (solid vertical lines) .                                                                                                                                                            | 54 |
| 4.6  | Different topologies for 3-D resonant networks where (a) is the asymmetric and (b) is the symmetric topology.                                                                                                                                                                                                                             | 55 |
| 4.7  | Different topologies for a two-plane 3-D resonant clock network where (a) is a single TSV structure with one <i>LC</i> tank per plane, (b) is a single TSV structure with                                                                                                                                                                 |    |
|      | four <i>LC</i> tanks per plane, (c) is a four TSV structure with four <i>LC</i> tanks per plane, and (d) is a four TSV structure with eight <i>LC</i> tanks per plane.                                                                                                                                                                    | 56 |
| 4.8  | <i>RLC</i> model for a two-plane 3-D circuit with four <i>LC</i> tanks where (a) is the model for the single-TSV and (b) is the model for four-TSV structures                                                                                                                                                                             | 57 |
| 4.9  | The driver resistance and power <i>v.s.</i> resonant inductance                                                                                                                                                                                                                                                                           | 57 |
| 4.10 | Driver resistance <i>vs</i> . wire size                                                                                                                                                                                                                                                                                                   | 58 |
| 4.11 | Simple clock tree with unbalanced branches.                                                                                                                                                                                                                                                                                               | 62 |
|      |                                                                                                                                                                                                                                                                                                                                           |    |

| 4.12 | Signal swing and power consumption <i>vs.</i> resonant inductance                                                                                                                                                    | 63 |
|------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----|
| 4.13 | Lumped model for a clock distribution network                                                                                                                                                                        | 64 |
| 4.14 | Two parallel branches of a clock tree.                                                                                                                                                                               | 65 |
| 4.15 | Different levels of intermediate nodes for a tree with <i>N</i> levels                                                                                                                                               | 67 |
| 4.16 | sub blocks of the circuit in DTT approach [22].                                                                                                                                                                      | 67 |
| 4.17 | An example of an unbalanced clock tree.                                                                                                                                                                              | 69 |
| 4.18 | Comparison of a synthesized tree with 1016 sinks among different design meth-<br>ods where (a) is the transfer function for a sink node and (b) is the power con-<br>sumption.                                       | 70 |
| 4.19 | Basic operation of an inductor link, where (a) is the inductively coupled transceiver and (b) is the equivalent circuit for the spiral inductor [2,3].                                                               | 72 |
| 4.20 | Sinusoidal global clock signal in resonant clock networks                                                                                                                                                            | 73 |
| 4.21 | Simplified transceiver circuit for an inductive link.                                                                                                                                                                | 73 |
| 4.22 | The transmitter source current versus the transmitter inductance and capacitance.                                                                                                                                    | 75 |
| 4.23 | Schematic of the transceiver circuit used for wireless pre-bond test                                                                                                                                                 | 75 |
| 4.24 | Lumped model of a spiral inductor [20]                                                                                                                                                                               | 76 |
| 4.25 | Preferred structure for resonant clock network where (a) is the post-bond net-<br>work, (b) is the network in pre-bond testing mode using redundant wiring and<br>(c) is the network using wireless pre-bond testing | 78 |
| 4.26 | Clock signal at the receiver side of the inductive link where (a) is the clock signal received by the inductor and (b) is the clock signal after the clock buffers at the sink nodes.                                | 79 |
| 5.1  | <i>RLC</i> model for TSV                                                                                                                                                                                             | 82 |
| 5.2  | Different topologies for studying crosstalk where (a) shows the grounded TSV, (b) is the shielded topology and (c) is the bunch of TSV without shielding                                                             | 82 |
| 5.3  | Eye diagram for different schemes for 16 bit where (a) is for grounded TSVs, (b) is for shielded TSVs and (c) is for coupled TSVs.                                                                                   | 83 |
| 5.4  | Parallel and serial method for inter-plane data communication where (a) shows the parallel approach and (b) is the serial one.                                                                                       | 84 |

| 5.5  | Serializer circuit and signaling.                                                                                                         | 85  |
|------|-------------------------------------------------------------------------------------------------------------------------------------------|-----|
| 5.6  | Encoder structure where (a) shows the ETI decoder and (b) is the TIC decoder.                                                             | 86  |
| 5.7  | Yield <i>vs</i> . Number of TSV.                                                                                                          | 88  |
| 5.8  | Power consumption for 8-bit data transmission <i>vs.</i> TSV size (a) and signal frequency (b)                                            | 89  |
| 6.1  | Conventional FPGA structure.                                                                                                              | 92  |
| 6.2  | The critical path in conventional FPGAs.                                                                                                  | 94  |
| 6.3  | <i>RC</i> model for the critical path                                                                                                     | 94  |
| 6.4  | The critical path in conventional FPGAs (a) and the structure of complementary regenerative feedback repeater [23] (b)                    | 95  |
| 6.5  | (a) <i>RC</i> model for the critical path with regenerative buffer and (b) simplified <i>RC</i> model.                                    | 96  |
| 6.6  | SB_2 switch box structure and its equivalent circuit for buffered and un-buffered switches [24].                                          | 98  |
| 6.7  | RRAM circuit model [25]                                                                                                                   | 98  |
| 6.8  | The conventional and the modified structure for <i>n</i> =3                                                                               | 99  |
| 6.9  | Critical path delay for SRAM and RRAM based FPGAs for $N=10$ and $R_{on}=1$ k $\Omega$ and 2 k $\Omega$ .                                 | 99  |
| 6.10 | Critical path delay for SRAM and RRAM based FPGAs for $N=10$ and $N=20$                                                                   | 100 |
| 6.11 | Critical path delay for regenerative and conventional buffers where (a) shows the delay for SRAM-based FPGAs and (b) for RRAM-based FPGAs | 100 |
| 6.12 | different FPGA architectures where (a) is conventional architecture and (b), (c) show modified architecture $B_2$ and $B_3$ .             | 102 |
| 6.13 | Architectural simulation results for MCNC benchmarks where (a) is critical path delay, (b) is power consumption and (c) is the area.      | 103 |

# List of Tables

| 2.1 | Physical and electrical characteristics of different TSV manufacturing approaches. | 19 |
|-----|------------------------------------------------------------------------------------|----|
| 2.2 | TSV characteristics for different physical parameters                              | 23 |
| 4.1 | Interconnect parameters used in the investigated clock network                     | 59 |
| 4.2 | Design parameters and power consumption for different topologies.                  | 61 |
| 4.3 | Design parameters for different design methodologies                               | 71 |
| 4.4 | Power consumption and inductor area for different design methodologies             | 71 |
| 4.5 | Design parameters and power consumption for different topologies.                  | 77 |
| 4.6 | Power and area overhead for different methods of pre-bond testing                  | 78 |
| 5.1 | different encoding shemes                                                          | 86 |
| 5.2 | TSV parameters                                                                     | 87 |
| 5.3 | Jitter for a bunch of 16 TSVs                                                      | 87 |
| 5.4 | Operating frequency, power, and area for different serialization rates             | 88 |

# **1** Introduction

*Three-dimentional* (3-D) integration is an emerging candidate for implementing high performance multifunctional systems-on-chip. This improvement in performance originates from the drastic decrease in the on-chip interconnect length. Heterogeneous 3-D integration also provides the opportunity to combine different technologies in different parts of the system such as sensor, analog and RF, digital circuits and memory planes [26].



Figure 1.1: Moore's Law from 1970 - 2005.

During the last several decades remarkable growth had been achieved in electronics world. As a singe of this improvement, Moore's law predicted that the transistor density in integrated circuits doubles every 1.5-2 years. Scaling trend reduces the size of transistors and improves the performance of individual devices while reducing the power consumed by the devices. For a period of time, moore's law successfully motivated the electronic world to scale down the

#### **Chapter 1. Introduction**

transistor size and increase the transistor density in integrated circuits as shown in Figure 1.1 [27].

However, this trend cannot continue forever. Recently, some troubles have risen that slow down the scaling speed. Limitation in manufacturing technologies and materials makes the scaling of the feature size extremely difficult and expensive for deep sub-micron devices [28]. Certain device parameters like gate oxide thickness cannot be reduced anymore which results in high leakage and parasitics for extremely scaled devices [27, 29]. Due to these continuing problems, increasing the operating frequency of the devices leads to high power consumption and uncontrollable thermal problems. As shown in Figure 1.2 the intrinsic delay of an NMOS transistor will increase by scaling beyond 45 nm [29].



Figure 1.2: The intrinsic delay of NMOS transistor for different feature sizes.

Besides, as the chip size continues to increase to provide more functionality, the total interconnect length and corresponding RC delay also increases. In recent integrated circuits, especially in the deep sub-micron regime, the interconnect delay becomes dominant compared to gate delay as shown in Figure 1.3 [29, 30]. Having long interconnects makes the designers insert repeaters to reduce the RC delay. The power consumed by the interconnect and the repeaters has become a considerable ratio of the total power consumption of the chip [30].

Therefore, improving the density and speed in integrated circuit cannot be simply achieved by transistor scaling. New solution should be considered to provide higher density, functionality, and performance for the integrated circuits while reducing the power consumption.



Figure 1.3: interconnect delay vs. gate delay.

3-D integration is a potential solution to improve the performance of integrated circuits and maintaining the progress defined by Moor's law. Expanding the design of integrated circuits to vertical direction leads to significant benefits and serious challenges for 3-D ICs. The main benefits of 3-D integration are higher density, shorter interconnects, and easier heterogeneous integration [28]. 3-D integration also can reduce the power and even the cost of ICs [30].

Three-dimensional IC technology is in early stages and requires lots of study and standardization to be stablished and exploited for high volume costumer electronics. The potential advantages of integration is so huge that a big part of electronic community are working on different issues of these technology. In next subsection we will discuss about the advantages and challenges of integration.

### 1.1 Benefits of 3-D integration

#### 1.1.1 Heterogeneous integration

Nowadays, consumer devices include divergent functionalities such as sensors, memory, processors, and RF transceivers which have different and even incompatible process flows as shown in Figure 1.4 [30,31]. Therefore, accommodating different parts in a single layer seems to be extremely difficult if not impossible.



Figure 1.4: heterogeneous system.

Vertical integration offers a great opportunity for heterogeneous systems where every layer can be manufactured in a different process technology and stacked together to form a 3-D circuit.

Moreover, 3-D integration can also improve the performance of heterogeneous systems by mitigating substrate noise. In 2-D circuits where all different parts are fabricated on a same silicon substrate, the noise injected from aggressive parts of the circuit (*e.g.* digital circuits) to the substrate can degrade the performance of sensitive parts (*e.g.* analog circuits). In 3-D circuits, these parts can be fabricated in different layers where having separated silicon bulks can alleviate the substrate cross talk problems [28].

#### 1.1.2 Higher density

In recent years, consumers show a great tendency to use miniaturized electronic products [31]. Moreover, manufacturers tend to increase the functionality per chip to reduce the number of component on PCBs and the total cost. These trends makes transistor density an important design issue for consumer electronic products. Exploiting 3-D integration can increase the number of transistor in the same area as compared to 2-D circuits and improve the transistor density. Figure 1.5 indicates that 3-D integration can improve the density and performance of DRAMS as compared to other stacking technologies.



Figure 1.5: improving the density and performance of DRAMs by using 3-D integration.

#### 1.1.3 Shorter interconnect

Using multiple dies, 3-D ICs have smaller footprints for the same number of transistors as compared to planar circuits. Total interconnect length will drastically decrease due to reduced footprint area. Shorter interconnects can improve the specifications of the circuit in several senses. One of the most important effects of shorter global interconnects is improving the interconnect band-width. As mentioned before interconnect delay becomes an important part of circuit delay in recent circuits and reducing interconnect delay can enhance the performance of the circuit. Moreover a considerable amount of power is dissipated in global interconnects which can be reduced by using shorter wire-lengths [27, 28]. Shorter interconnects also improve the routing congestion which can subsequently reduce the number of metal layers used for routing each layer of 3-D circuits. Decreasing the number of metal layers can reduce the manufacturing cost of the IC [32].

## 1.2 Challenges of 3-D integration

Vertical integration has such significant potential benefits that cannot be ignored. Integration has not been matured yet and confronts important challenges. Expanding popular and standard 2D integration to third dimension requires new manufacturing and design consideration to alleviate corresponding issues. The main challenges for integration are listed below:

#### 1.2.1 Thermal problems

Thermal problems turn to be a big challenge for integration. The temperature in chip is determined by the power consumption density and cooling mechanisms inside the chip. The power consumption density is increased in integration due to stacking multiple thinned layers in small area. In ICs, some layers are far from heat sinks and the heat should be dissipated through other layers. Using special material for the chip and package that can dissipate the heat faster and employing new cooling techniques such as liquid-based cooling are potential solution to mitigate thermal problem in ICs [33]. Co-design 2D layers to avoid vertically aligning potential hot spots is another approach to prevent emerging local heat accumulation in circuits.

#### 1.2.2 Design kit

integration as any new technology arises new challenges to Electronic Design Automation (EDA). Expanding existing 2D EDA methods to third dimensions requires massive efforts.Power and clock delivery to all dies and managing IR drops should be carefully considered. TSVs are sources of stress and reliability issues in circuits. Therefore stress-aware and reliability-aware placing algorithms are required to allocate TSVs and other active parts of the circuits. Due to high power density, fast and accurate thermal analysis and optimization algorithms should be employed.

#### 1.2.3 Test

Testing is an important part of high-volume chip manufacturing process. Testing is even more important in ICs since additional manufacturing steps (rather than standard CMOS process) makes these circuits more vulnerable to fabrication defects and reduces the yield. Accessibility is one the main challenges for testing ICs. The number of pins is restricted and some layers are not directly connected to the IO pins. How to connect the test probes to all these layers is an important issue.

Pre-bond testing is an effective approach to increase the total yield of circuits. Testing each die before bonding to other dies, prevents stacking damaged or nonfunctional dies with known good dies (*i.e.* KGD). However, supporting pre-bond test imposes some additional requirements such as providing temporary test pins and adds new complications to the test process.

#### 1.2.4 Mechanical stability

There are several concerns about mechanical stability in ICs. Circuits are composed of several thinned layers which can have diverse materials and different sizes. Different thermal expansion coefficient may cause mechanical stress for circuits as the operating temperature changes. Thinned wafers themselves are another source of mechanical problems in ICs. Thinned wafers can be easily defected and even broken during the manufacturing and bonding process. Even exploiting carrier wafers may cause some mechanical issues especially during bonding and de-bonding process. TSVs are the other important source of stress in circuits. TSVs induce thermo-mechanical stress in silicon substrate due to mismatch in the *coefficients of thermal expansion* (CTE) between TSV conductor and the silicon substrate.

## 1.3 Classification of vertical integration

Integration comes in many forms and shapes.

#### 1.3.1 Coarse-Grain ICs

In recent years, several methods such as System-in-Package (SiP) is proposed to connect either bare or packaged dies along the vertical dimension. The dies within an SiP are integrated at the die or package level, where only coarse-grain interconnections can be achieved among the circuitries in different tiers. An SiP has higher packaging efficiency and a smaller footprint and weight rather than a conventional 2D system. However, Due to the limited locations and low density of these vertical interconnects, the advantage of 3-D integration to support shorter interconnects cannot be fully utilized in 3-D SiPs. There are different methods to implement vertical interconnection in SiPs such as wire bonding (see Figure 1.6 (a)), peripheral vertical interconnect (see Figure 1.6 (b)), area array vertical interconnect (see Figure 1.6 (c)), and metallization on the faces of an SiP, as shown in Figure 1.6(d).

#### 1.3.2 Fine-Grain ICs

#### **Monolithic 3-D integration**

Monolithic approach enables extreme fine-grain vertical integration by sequential device process rather than bonding fabricated dies together. In this approach the upper layers are successively grown above the lower layers as shown in Figure 1.7. The two device layers are connected by *molithic intertier vias* (MIVs).

#### Polylithic 3-D integration

In polylithic approach, each layer is processed separately, using conventional fabrication methods and different layers are assembled by different bonding technologies. The main difference between these ICs and SiPs is that in polylithic ICs inter-plane vertical interconnections can be allocated in any location (not occupied by active devices) and their location is not limited to the periphery or a fix dedicated area [28].

Ploylithic 3-D ICs require additional fabrication steps rather than standard 2D process to



Figure 1.6: Different methods of implementing vertical interconnects for SIPs



Figure 1.7: Monolithic integration [1].

implement vertical interconnection. However, the manufacturing process is much simpler than monolithic 3-Ds since each layer can be fabricated by standard 2D approaches. There



Figure 1.8: different inter-plane communication methods for fine grain 3-D circuits.

are different inter-plane communication schemes for 3-D integration such as contactless communication (*i.e.* capacitive and inductive coupling) and using through silicon vias as shown in Figure 1.8.

Among all these integration approaches, TSV based 3-D integration has the potential to offer the greatest bandwidth for vertical interconnect, and therefore is the most promising one among all the vertical interconnect technologies. TSVs can be inserted at any available location where vertical interconnection is required, fully exploiting the advantage of 3-D ICs in reducing interconnect length and delay. Second, different tiers of TSV-based 3-D ICs are separately fabricated, which shortens the manufacturing time as compared to monolithic 3-D ICs and allows integration of different technologies in different tiers.

One of the main alternatives for TSVs is contactless links. The contactless 3-D ICs are formed by bonding, typically, bare dice with a diversity of methods [34]. In these circuits, communication among circuits in different planes is achieved through AC-coupling. An AC Coupling Interface (ACCI) uses the transitions of a digital signal as the useful part of information and discards the DC component. Specialized links transfer data among the planes either capacitively [35] or inductively [36]. Capacitively coupled links use a single metal layer in each plane to form a capacitor, whereas in inductive links spiral inductors (which can require several metal layers) are utilized. To sustain inter-plane communication, while using typical on-chip power supply voltages the plates of the capacitor must be in close proximity (<10  $\mu$ m) [35]. This requirement poses stringent constraints on the thickness of the bonded wafers. Moreover, this requirement limits the number of planes that can utilize this type of communication to only two planes. Consequently, capacitive coupling is primarily used for face-to-face bonded 3-D circuits. Inductive links are a promising candidate for inter-plane wireless communication due to the better performance for larger communication distances as compared to capacitive links. Contactless links are employed to transfer the AC signal among the planes. The power of the transferred signal can be adapted by different circuit parameters, such as the number and the width of the inductor turns [36]. Contactless links are implemented with standard CMOS processes, which reduce the manufacturing complexity as compared to wired vertical links. Moreover, contactless 3-D ICs do not require the use of ESD protection circuits as compared to TSV based 3-D ICs [37], reducing the power and area required for inter-plane

#### **Chapter 1. Introduction**

communication. Alternatively, contactless ICs require a different type of circuitry, such as a transmitter and a receiver to produce the proper signals for inter-plane communication. Thermal issues can also be more pronounced in contactless links, since the TSVs support faster heat conduction. In the absence of an effective solution to remove heat, local hot spots can be developed, causing thermal gradients, which can adversely affect the speed and functionality of the circuit. Furthermore, due to large size of contactless links as compared to TSVs, horizontal thermal gradients also affect the performance of these links, while in TSVs the vertical gradients are dominant.

The quality of an inductive link is a function of the electric and magnetic characteristics of the link. Misalignments between the horizontal locations of the inductors in adjacent planes can affect the coupling coefficient. Furthermore, thermal issues can change the electrical components of the link, such as the metal resistance.

An inductive link consists of two spiral inductors and a current based transceiver. As shown in Figure 4.19, the transmitter converts the input voltage to current pulses, which are coupled through spiral inductors and recovered as voltage pulses at the receiver.



Figure 1.9: Basic operation of an inductor link, where (a) is the inductively coupled transceiver and (b) is the equivalent circuit for the spiral inductor [2,3].

One of the main challenges in inductively coupled 3-D circuits is to transfer DC power via inductive links. Since these links use magnetic field to transfer the data, the only approach for transferring supply power is to convert the DC power to an AC signal, couple the AC signal through the inductive link, and rectify this signal to DC at the receiver side. Some transceiver circuits have been proposed for this purpose. In [38], a ring oscillator is used at the transmitter side to convert the DC power to an RF signal where in [39] an H-bridge transmitter is employed. Rectifiers are exploited to convert back the transmitted RF signal to the DC power. The efficiency of this links is dependent to the distance between the planes and typically is lower than galvanic connections due to the limited efficiency of the transceiver circuits. As shown in [39], to transfer 36 mW of DC power between two planes with a distance of 15  $\mu$ m, a large inductor of 700  $\mu$ m×700  $\mu$ m is employed where the efficiency of the power delivery is only 10%.

The quality of the magnetic coupling can also be affected by misalignment and interference. Although misalignments between the horizontal locations of the inductors in adjacent planes can affect the coupling coefficient, the effect is weaker as compared to TSVs where misalignments can result in defective and/or failed links [3, 37]. This situation means that greater alignment margins can be supported by the inductive link, which facilitates manufacturing. Alternatively, design implications are raised for these links since larger interference or weaker coupling among the inductor is possible. These implications should, therefore, be considered in the design of these links.

#### 1.4 3-D vs. 2.5-D

In 3-D architecture, all dies are placed on top of each other. These stacked architectures are efficient in terms of footprint; However, thermal issues can be problematic as many dies would be away from the heat-sink. Alternatively, 2.5-D structures employ an interposer as a large carrier to combine multiple 2D or stacked chips [Applications Driving 3-D, Cost Comparison between 3-D and 2.5-D] as shown in Figure 1.10. The interposer can be made of glass as in [40]or it can be polymer-based [41].



Figure 1.10: 2.5-D vs. 3-D IC [4].

In 2.5-D systems, functional dies are connected to the interposer using micro-bumps which are around 10 um diameter. TSVs are not used in the dies but inside the interposer to connect the metalization layer on its upper and lower surface. [32, 42]. The interposer is attached to the main substrate using regular bump which are around 100 um diameter [42]. 2.5-D circuits are easier to design and fabricate as compared with fine-grain 3-D ICs. Using interposers reduces thermal expansion coefficient (CTE) mismatch between the active die and copper filled TSVs and hence improves the stress issue in 2.5-D systems [43]. Interposers can provide better cooling for high power systems. Wide I/O channel can be implemented on interposers supporting high bandwidth, low power interconnection between the dies. However, all these advantages of 2.5-D integration comes in cost of an additional common substrate component to the system which implies additional costs compared to the vertical stacking in integration approach.

It is possible to benefit both 2.5-D and 3-D approach by attaching stacked dies on the interposer as shown in Figure 1.11



Figure 1.11: combining 2.5-D and 3-D integration.

As one example of the use of this technology Xilinx Virtex-7 2000T device is an example of this technology which has came to the market for high-volume application. This device has 2 million logic cells and four FPGA dice attached to a silicon interposer, which supports 10,000 connections between adjacent dice.



Figure 1.12: Xilinx Virtex-7, an example of 2.5-D integrated devices.

## 1.5 Thesis Outline

The contribution of this thesis consists of three major parts: first I design a resonant clock network for 3-D circuits that can support pre-bond testing. Next, I employ serialization to reduce the cross-talk among TSVs in inter-plane communication and last I propose new

architectures for buffer allocation in RRAM-based FPGAs.

The reminder of this thesis is organized as follows. An overview on TSV based circuits is presented in Chapter 2. The manufacturing process of TSVs is reviewed in Section 2.1 and different applications of TSVs in a circuits is discussed is Section 2.2. Noise coupling in 3-D ICs is considered in Section 2.3 and testing in circuits is investigated in Section 2.4. Synchronization issues for vertically stacked circuits is introduced in Section 2.5 and Section 2.6 presents an overview to design challenges for TSV based circuits.

Chapter 3 is dedicated to monolithic circuits and RRAMs as an interesting example of CMOS compatible monolithic structure. Different applications of RRAMs are reviewed in Section 3.1.

In Chapter 4, employing resonant clocking for 3-D circuits is considered. Section 4.1 investigates the resonant clocking for symmetric networks and in Section 4.2 presents a methodology for expanding resonant clocking to synthesized clock network. In Section 4.3 a transceiver circuit design is proposed to provide the clock signal for disconnected network during pre-bond testing.

Chapter 5 investigates the effect of cross-talk on TSV based inter-plane communication performance. Serialization is considered as a potential approach to reduce the cross-talk between adjacent TSVs.

Chapter 6 talks about RRAM based FPGAs and buffer allocation in these FPGAs. Section 6.1 presents an analytical description on critical path to optimize the number of buffers and circuit level simulation are provided to validate the analytic expressions. A new approach for buffer allocation is proposed in Section 6.2 and architectural simulation are done to evaluate the proposed structures.
# 2 TSV based 3D Circuits

*Three-dimentional* (3-D) integration is an emerging candidate for implementing high performance multifunctional systems-on-chip. This improvement in performance originates from the drastic decrease in the on-chip interconnect length. Heterogeneous 3-D integration also provides the opportunity to combine different technologies in different parts of the system such as sensor, analog and RF, digital circuits and memory planes [26]. There are different inter-plane communication schemes for 3-D integration. Employing an efficient medium for data communication among different planes is a key factor in achieving a high performance 3-D system. *Through-silicon-vias* (TSVs) are employed by the majority of the 3-D fabrication processes. TSVs are tens of micrometers long and are fabricated with aspect ratios up to 1:10 [30]. TSVs produce the highest interconnect bandwidth within a 3-D system as compared to wire bonding, peripheral vertical interconnects, and solder ball arrays [26]. To increase the signal bandwidth in 3-D systems while saving or, at least, not increasing the power, TSVs should be fabricated to have low impedance characteristics.

Inter-plane signaling, however, is only one of the usages of TSVs in 3-D circuits. TSVs are also used to distribute power and ground throughout a 3-D stack. In addition, due to the high thermal conductivity of these interconnects, TSVs can alleviate the thermal problems in 3-D integrated circuits. These links reduce the temperature fluctuations caused by local hot spots within the volume of the 3-D stack. Different models have been presented to describe the thermal conductivity of TSVs. On the other hand, manufacturing issues are one of the primary challenges for TSV-based 3-D circuits. Vertical interconnects require additional manufacturing process steps rather than standard process. These additional steps increase the manufacturing cost and lead to lower fabrication yield for the entire system. As shown in [44], increasing the number of TSVs adversely affects the yield of a 3-D circuit.

In this chapter, the manufacturing methods of TSVs are reviewed in following section and the multifold role of the TSVs in 3-D ICs is discussed in Section 2.2. Noise coupling mechanism for 3D circuit is investigated in Section 2.3. Test and synchronization issues for 3-D circuits are reviewed, respectively, in Sections 2.4 and 2.5. The challenges for designing TSV based 3-D circuits are listed in Section 2.6 and contactless link is introduced in the last section as one of

the main counterparts of TSVs for inter-plain communication.

### 2.1 Manufacturing process for TSVs and inductive links

Manufacturing issues are one of the primary challenges for TSV based 3-D circuits. Fabricating high performance, reliable, and cheap vertical interconnects, which do not affect the neighboring active devices is an important requirement in 3-D systems towards high volume production [26].

TSVs are, usually, manufactured as tapered wires (*e.g.* copper, tungsten or poly-silicon) surrounded by a thin dielectric layer (liner) to insulate the metal from the semiconducting substrate [30]. Tungsten (W) has a better *coefficient of thermal expansion* (CTE) match to silicon substrate but provides lower conductivity as compared to copper (Cu) [45]. TSVs are fabricated with length from 10  $\mu$ m to >100  $\mu$ m. In first 3-D circuits, large TSVs with diameters larger than 10  $\mu$ m are used since the fabrication processes were not capable of providing small TSVs. Nowadays fabricating smaller TSVs with diameter around 2  $\mu$ m is feasible [46].

There are four main approaches for manufacturing TSVs. The "via first" approach where the TSVs are formed before the devices (*i.e. front end of the line* (FEOL)), the "via-middle" approach where TSVs are fabricated after the transistors but before the backend interconnect (*i.e. back end of the line* (BEOL)), the "via last" method where the TSVs are fabricated after or in the middle of the back end interconnect process [30], and the "via after" method where the vias are fabricated when both FEOL and BEOL design is completed and and IC is bonded to another layer [47].

In the "via first" technique, the TSVs connect the topmost metal layer of one plane with the first metal layer of another plane. The main problem of "via first" is that TSVs are fabricated before FEOL which is done at high temperature. Some of the materials used for TSV body (e.g. copper) have high diffusibility at high temperature and can easily diffuse to silicon substrate during front end processing and ruin the TSVs. Due to this issue copper cannot be used for "via first" TSVs and these TSVs are mostly formed by poly-silicon or tungsten which results in high resistivity as compared to other type of TSVs [5,47]. The TSVs presented in [48,49] are fabricated with the "via-first" method and have diameters between 12  $\mu$ m to 18  $\mu$ m and the length of 30  $\mu$ m to 50  $\mu$ m where the first one is a poly silicon TSV with a resistance of 1.3  $\mu$ m to 5  $\mu$ m and the second is a metal TSV with a resistance of 230  $\mu$ m. In the "via last" approach, the topmost metal layers of adjacent planes are connected together using TSVs. For the "via-last" approach, the TSVs have to be etched through the substrate and the metal layers. Consequently, "via-last" typically requires longer TSVs. This situation can lead to either higher aspect ratio increasing the manufacturing complexity or larger diameter increasing the area overhead of TSVs. As an example consider the process in [50], which produces TSVs 150  $\mu$ m long and 5 to 15  $\mu$ m wide with a TSV resistance of 9.4  $\mu$ m to 2.6  $\mu$ m. The "via-first" methods also pose several changes on the front end processing of the wafers. For these reasons, the "via-middle" approach appears (for the time being) as the proper compromise between the



TSV size and manufacturing complexity.

Figure 2.1: Basic steps of TSV manufacturing for "a via-middle" process, (a) wafer preparation and FEOL, (b) TSV etching and metal filling, (c) wafer thinning and BEOL, (d) wafer bonding, and (e) handle wafer removal.

The basic steps of a typical "via-middle" TSV fabrication process are shown in Figure 2.1. The first step is to process each plane separately with some reserved area for TSVs, afterwards a deep trench is opened through the *inter layer dielectric* (ILD) and device layers. The depth of this trench is mainly determined by the resulting aspect ratio of the TSV process and the wafer thinning capabilities. The trench sidewall is isolated from the conductive substrate and the opened trench is filled with metal, such as tungsten or copper as depicted in Figure 2.1(b). Then the metal layers are added to the top of silicon substrate and the wafer is thinned as shown in Figure 2.1(c). Before stacking the planes, the wafers are typically thinned to decrease the overall length of the TSVs. An example of via middle TSVs is presented in [46] where the TSVs are 25  $\mu$ m long and 5  $\mu$ m wide with a resistance of 20 m $\Omega$ . A reduced wafer thickness,

however, cannot sustain the mechanical stresses incurred during the handling and bonding phases of a 3-D process. An auxiliary wafer called "handle wafer" is attached to the original wafer as shown in Figure 2.1(d), thereby, providing the required mechanical durability for the bonding step. The alignment and bonding steps follow as illustrated in Figure 2.1(e). Finally, the handle wafer is removed from the thinned wafer.

As shown in figure 2.2 the basic fabrication steps for other types of TSVs (including TSV etch, TSV fill, wafer thinning, and bonding) are analogous and the main difference is how to sequence FEOL,TSV, and BEOL fabrication [5].



Figure 2.2: fabrication steps for different types of TSVs [5].

Via after TSVs are good candidates for heterogeneous 3-D devices. Different layers (*e.g.* analog, digital, RF, sensor, *etc.*) can be fabricated in different foundries and stacked together using these TSVs [47].

The resulting characteristics of different processes related to each of these approaches are listed in Table 2.1. As stated above, "via-middle" process can produce smallest TSVs where lowest resistance is reported for "via-last" TSVs.

An overview to challenges in different types of TSVs is shown in Figure 2.3 [5]. As mentioned before, the main drawback of Via first TSVs is the via filling material which is mostly limited to poly-silicon. For Via-last TSVs, alignment should be more stringent which makes the lithography challenging.

|            |      | Diameter( $\mu m$ ) | Length( $\mu m$ ) | Resistance (m $\Omega$ ) | Bandwidth (Gb/s) |
|------------|------|---------------------|-------------------|--------------------------|------------------|
| Via-first  | [48] | 18                  | 50                | 1.3-5                    | 3                |
|            | [49] | 12                  | 30                | 230                      | N/A              |
| Via-middle | [46] | 5                   | 25                | 20                       | N/A              |
| Via-last   | [50] | 5-15                | 150               | 9.4-2.6                  | N/A              |
|            | [51] | ~40                 | N/A               | N/A                      | 1.6              |

Table 2.1: Physical and electrical characteristics of different TSV manufacturing approaches.

There are three different approaches for stacking 3-D layers. *Wafer-to-wafer* (W2W), *Die-to-Wafer* (D2W), and *Die-to Die* (D2D). In W2W approach, the entire wafer is bonded together. Where in D2W and D2D bonding, dies are diced and bonded to other dies or wafers [47, 52, 53]. W2W bonding afford high manufacturing throughput since many dies are bonded simultaneously. This approach also provides simpler process, more accurate alignment, and thinest wafers rather than other approaches [52, 54]. Considering a constant aspect ratio for TSVs, offering thin wafers, leads to smaller TSVs and greater TSV density for W2W bonded ICs [53]. Alternatively, W2W bonding is possible for wafers with the same size and it suffers from yield issues. In other bonding approaches, dies are tested before bonding and only good dies are stacked. This pre-bond testa improves the yield of D2W and D2D bonding compared to W2W approach [53–55].

The other classification of 3-D bonding is based on the position of layers during the bonding. In *Face-to-Face* (F2F) bonding, the top metal layer of each layer is connected to each other as shown in Figure 2.4, where in *Face-to-Back* (F2B) bonding, top metal layer of one layer is bond to the substrate of next level as depicted in Figure 2.4. F2F bonding scheme is applicable just for stacking two layers where F2B approach is used for stacking multi layers. Since there is no substrate between two layers in F2F approach, inter-plain connections can be fabricated as micro bumps and TSVs are used just for providing I/O connection [56].

TSVs induce thermo-mechanical stress to the IC due to large different in coefficient of thermal expansion between TSV conductor (*e.g.* Copper) and silicon substrate surrounding the TSV [57]. This stress may cause some defects during the fabrication process or reduce the performance of the circuit during device operation as the operation temperature increases [58]. The TSVs can squeeze or stretch the adjacent transistors and result mobility variation. It also cause  $V_{th}$  shift in neighboring active devices [59]. The amount of this proximity effect depends on shape and location of the TSV in the circuit. To avoid this proximity effect a *keep out zone* (KOZ) should be considered around the TSVs in which no active device is placed. The keep out zone can be up to 10  $\mu$ m for digital circuits, and 20  $\mu$ m for analog circuits [TSV Stress Management] which increases the area and reduces the density of TSV based circuits [60].



Figure 2.3: an overview to challenges in different types of TSVs [5].

## 2.2 Different applications of TSVs in 3-D ICs

Through-silicon-vias are one of the main candidates to form the vertical links for 3-D integrated systems. These vertical interconnects provide a high performance, high density path for inter-plane signaling. TSVs also play a vital role for distributing power and ground to those planes that do not support I/O and behave as thermal conduits for all but the plane, which is attached to the heat sink [50]. These functions of the TSVs are discussed in this section.

Each of these TSV usages can require particular and potentially conflicting characteristics for these wires. For signal TSVs, the important parameters are speed and power as discussed in following subsection, whereas for TSVs used for power distribution the voltage loss along the TSV is emphasized, as described in Subsection 1.2.2. For thermal TSVs, the superior heat conductivity of the vertical interconnects is discussed and enhanced thermal TSV models are described in Subsection 1.2.3.

#### 2.2.1 Inter-plane signaling

TSVs exhibit quite different physical and electrical characteristics as compared to the horizontal interconnects (BEOL) in 2-D circuits. This situation carries substantial performance



Figure 2.4: Face-to-Face and Face-to-Back bonding

benefits, although the complexity of the interconnect analysis and design process increases due to the different characteristics of the vertical links and the constraints that this type of interconnect poses on the physical design process. The heterogeneity of 3-D circuits, the diverse fabrication technologies, and the variety of the bonding style makes TSV modeling more challenging as compared to horizontal interconnects. To provide high performance vertical links, proper electrical models for signal TSVs are required. To improve the speed of the circuit, low resistive and low capacitive TSVs should be used for signaling to decrease the RC delay of the vertical interconnects. TSV electrical characteristics depend on their geometrical parameters as well as on material properties such as the dielectric properties of the barrier and insulating layers and the dopant concentration in the substrate. For the purpose of extracting parasitics and subsequent analysis, a representative structure for a TSV is assumed to be a copper filled via with uniform circular cross-section and an annular dielectric barrier of *SiO*<sub>2</sub> or *Si*<sub>3</sub>*N*<sub>4</sub> surrounding the Cu cylinder with a thickness of 0.2  $\mu$ m [61].

TSVs can be modeled by the metal resistance  $R_{TSV}$ , the self-inductance  $L_{TSV}$  and the parasitic capacitor  $C_{TSV}$  between the TSV and the substrate. A variety of electrical models for TSVs have recently appeared in literature, relating the physical characteristics of the TSVs as determined by the different manufacturing methods with the electrical properties of these interconnects [6, 62–64]. A comprehensive electrical model for TSVs is shown in Figure 2.5 where  $\pi$  *RLC* model is used to describe the high frequency behavior of TSVs .

The resistance of TSV is described as [62]:

$$R_{TSV} = R_{dc} + R_{ac} \tag{2.1}$$



Figure 2.5: An RLC model of a TSV [6].

where  $R_{dc}$  is the electrical resistance of the copper and is defined as :

$$R_{dc} = \frac{\rho l_{tsv}}{\pi r^2} \tag{2.2}$$

where  $l_{tsv}$ , r, denote the length and radius of the TSV and  $\rho$  is the resistivity of the TSV core material (*e.g.* copper). However, for high-frequency signals, the resistance increases due to skin effect.  $R_{ac}$  is added to modulate the resistance of TSVs in high frequencies and is defined as:

$$R_{ac} = l_{tsv} \frac{\sqrt{\pi \mu f \sigma}}{r \sigma} \tag{2.3}$$

where the frequency is denoted by *f*, the magnetic permeability by  $\mu$ , and the electric conductivity of the TSV metal by  $\sigma$ .

The capacitance and inductance of the TSV can be expressed as:

$$L_{TSV} = l_{tsv} f(b) \tag{2.4}$$

22

$$f(b) = \frac{\mu_0}{4\pi} \left[ ln \left( 2b^{-1} + \sqrt{(0.5b)^{-2} + 1} + 2b^{-1} - \sqrt{(0.5b)^2 + 1} \right) \right]$$
(2.5)

$$C_{TSV} = \frac{2\pi\varepsilon_{si}l_{tsv}}{ln\left(\frac{r+t_{ox}}{r}\right)}$$
(2.6)

where  $t_{ox}$  is the dielectric thickness and  $\varepsilon_{si}$  is the dielectric constant of silicon and  $b=2r/l_{tsv}$ .

The expressions rendered for lumped *RLC* model of TSVs are compared with numerical simulators like Raphael and Sdevice [65, 66] and show a good correspondence for different TSV architectures [62]. The characteristics of different TSVs is investigated using lumped models and numeric simulator and is listed in Table 2.2.

| Reference | Diameter      | Length        | Resistance | Capacitance | Inductance |
|-----------|---------------|---------------|------------|-------------|------------|
|           | (µ <i>m</i> ) | (µ <i>m</i> ) | (mΩ)       | (fF)        | (pH)       |
| [6]       | 2             | 20            | 119.3      | 52.4        | 13.8       |
| [64]      | 1.5           | 18            | 152.4      | 2.1         | 4.7        |
| [67]      | 55            | 165           | 12         | 922         | 35         |

Table 2.2: TSV characteristics for different physical parameters

Due to the geometry and manufacturing methods for the TSVs, the capacitance per unit length is typically higher, while the resistance per unit length is smaller as compared to a horizontal wire. Consequently, due to the significantly shorter length of the TSVs, the RC delay of these wires is low, allowing a significant reduction in the delay of the problematic global wires. The shorter interconnect length in 3-D circuits also results in reduced total capacitance of the interconnects, decreasing the power consumed by interconnects in 3-D circuits.



Figure 2.6: Average wirelength vs. number of TSVs [7].

Another significant element that can affect the inter-plane signaling performance is the number of TSVs used for this purpose and the allocation of these TSVs within the planes. In present technologies, TSVs are much larger than the horizontal interconnects and the silicon area occupied by these interconnects is an important parameter that can affect the overall performance of a 3-D circuit [68]. Moreover, TSVs can cause routing congestion and increase the average distance between cells due to routing requirements [32]. Therefore, employing excessive number of TSVs can increase the total area of the 3-D circuits and, hence, the total interconnect length. Figure 2.6 shows this trend for a four layer 3-D IC with 2.5  $\mu$ m diameter TSVs [7]. High number of TSVs also reduces the fabrication yield and causes reliability issues due to higher manufacturing complexity of TSV rather than standard CMOS. Alternatively, reducing the number of TSVs to minimize the total length of interconnect, can significantly improve the speed of a 3-D circuit.

Several methods have been proposed to minimize the number of TSVs in the high-level synthesis stage [8, 69–73]. The method presented in [70] attend to maximize the data transfer within a layer by simultaneous scheduling, resource binding and layer assessment in 3-D system which can result in lower signal transfer between the layers and reduce the number of TSVs. The method proposed in [69] employes idle TSVs and idle function units as an alternative path besides the regular data path. Efficient allocation of alternative paths can considerably decrease the number of TSVs. Other binding methods has been proposed in [71, 72].



Figure 2.7: Different TSV allocations for 3-D circuits where in (a) no active area for TSVs is reserved during placement and in (b) some active area for TSVs is reserved during placement [8].

In conventional approaches, TSVs are located at spaces s between blocks after routing in each

layer. Without preserving some active area for TSVs during placement, the available TSVs can be located far from the connected cells causing the total inter-connect length to increase as shown in Figure 2.7 [8]. Several placement methods have been proposed to investigate the optimal number and location for TSVs to increase the speed of 3-D circuits [8,68,73]. TSV co-placement scheme proposed in [32] places TSVs and gates simultaneously during 3-D placement. As compared to conventional approach, this method shows an average of 5%, 8%, and 9% wire length reduction, respectively, for two-layer, three-layer and four-layer 3-D circuit.

Serialization is another method to reduce the number of TSVs [74,75]. Multiplexing the TSVs without increasing the frequency of data transfer introduces some drawback such as increased delay and data traffic congestion where increasing the data transfer frequency may lead to higher power consumption for the circuit. Additional circuits such as Serializer/Deserilazer are required for this approach.

#### 2.2.2 Power and ground distribution

Additionally to signaling, TSVs are used to distribute power and ground to all the planes, which are not connected to the package pins. Power integrity is a major design issue for 3-D circuits. An important parameter to evaluate the quality of a power distribution network is the voltage drop due to the dynamic and leakage power of the circuits. The reduced footprint of a 3-D circuit can result in higher current densities. In addition, the resistive path along the TSVs connecting the package pads with the power grids can also increase the static IR drop observed for specific planes. The dynamic supply noise originates from the switching activity of the circuit and decoupling capacitors are employed to mitigate this problem [76]. The allocation of the decoupling capacitive TSVs for power-ground distribution can increase the intrinsic decoupling capacitance, reducing in turn, the required amount of the extrinsic decoupling capacitance.

As discussed in [77], a crucial parameter for the *IR* drop on the top metal layer of each plane is the total area occupied by the TSVs. Assuming a specific area of the circuit is dedicated for power and ground TSVs, using more TSVs with smaller diameter results in lower voltage drop at the load. Simulation results indicate that by slightly increasing the density of the TSVs, a considerable reduction in the intra-plane power distribution resources can be achieved. According to [77], for a ten plane 3-D circuit with TSVs with a diameter of 10  $\mu$ m, if the area occupied by TSV increases from 0.4% to 10% of the circuit area, the area required for the power/ground network can be reduced from 40% to 10% of the circuit area achieving the same *IR* drop.

The low resistance of the TSVs can efficiently be exploited to improve power integrity. For example, the TSVs can be connected to both the topmost and lowest metal layers of the power grid, providing additional current paths, which are shown in Figure 2.8 [9]. These



Figure 2.8: Current paths within a 3-D circuit, (a) where the TSV is connected only to the topmost metal layer and (b) where the TSV is connected to the power lines on both the uppermost (MT) and the first (M1) metal layers of a circuit plane. TSVs are formed with a "via-middl" process [9].

paths exhibit particularly low impedance characteristics supporting the distribution of large amount of current in the vicinity of a TSV without exceeding the allowed voltage drop. Due to this low impedance path, stacks of common vias within this region are removed decreasing routing congestion without degrading power integrity. This improvement in power integrity can lead to a reduction in the required decoupling capacitance in the vicinity of the TSVs. The decoupling capacitance reduces up to 25% for the case investigated in [9].

The TSVs, however, occupy active silicon area and the TSV density cannot be considerably increased. Coaxial TSVs can also be of assistance in this case. A single coaxial TSV can be used to deliver both the power and ground by conducting the power through the inner metal and ground through the outer metal [78]. Merging power and ground TSVs reduces the area required for the power distribution network without increasing the *IR* drop.

#### 2.2.3 Thermal TSVs

In 3-D integrated circuits, thermal issues are forecast to be a major challenge due to the high power density, the low thermal conductivity along the primary heat transfer path, and the smaller footprint area of the circuit attached to the heat sink. Several techniques have been developed to facilitate the heat transfer within 3-D circuits to reduce the temperature, such as thermal through silicon via (TTSV) planning, thermal wire insertion, liquid cooling, and

thermal driven floor planning [10, 11, 79].

This type of TSVs is utilized to convey heat from the planes located far from the heat sink, such that the temperature limits are obeyed. These TSVs are typically interspersed within the white space that exists among the circuit blocks within each plane, where several techniques can be employed for this purpose [77]. Thermal TSV (TTSVs) insertion has been shown to be a useful technique to reduce the local hot spots and decrease the maximum temperature of the circuit. The thermal profile of a 3-D circuit before and after TTSV insertion is shown in Figure 2.9 [10].



Figure 2.9: The thermal profile of a 3-D circuit before and after thermal via insertion [10].

TTSV placement can significantly affect the thermal behavior of the 3-D circuits. Several methods of thermal via insertion are presented in [10, 79, 80], which can drastically reduce the maximum temperature of the 3-D circuit. These methods aim at distributing the thermal vias, such that the area occupied by both the TTSVs and the horizontal interconnects used to connect these TSVs is also reduced.

Improved modeling of the thermal behavior of the TTSVs can decrease the number of TTSVs used to transfer the heat to the ambient with considerable savings in area [78]. Analyzing how the TTSVs affect the developed temperature in 3-D ICs is important for efficient TTSV insertion. The thermal properties of TTSVs, in turn, depend upon several physical and technological parameters.

The traditional approach is to model a TTSV as a vertical lumped thermal resistor in each physical plane, which is proportional to the length and inversely proportional to the diameter of the TTSV. The TTSV is considered as a one-dimensional network implying a flow of heat only in the vertical direction towards the heat sink of the system. This method is insufficient in capturing the thermal behavior of the TTSVs, since the lateral heat transfer through these structures is neglected. Compact thermal models as illustrated in Figure 2.10(a) can capture the major heat transfer paths by employing a small resistive network with few resistors [11]. The thermal resistances are described by closed form expressions including the TSV geometry.



Figure 2.10: Thermal model of a TTSV in a three-plane circuit where (a) is the compact model and (b) is the distributed model [11].

Since the heat transfer process is highly complex, some fitting parameters are employed to improve the accuracy of this model. A more accurate model for TTSVs is illustrated in Figure 2.10(b), where each TSV is modeled as a distributed resistive network eliminating the requirement of curve fitting coefficients.

### 2.3 Noise Coupling

Noise coupling between TSVs and/or the substrate is another important parameter that can affect the signal integrity in 3-D circuits and should be considered in the TSV modeling and design process. Increasing the speed of the link and TSV density increases both the noise coupling between adjacent TSVs (crosstalk) and the noise coupled to the substrate. In standard 2-D circuits the crosstalk is usually caused by the two neighboring wires on the same layer. 3-D circuits are more vulnerable to crosstalk since TSVs are bundled and thus most TSVs are surrounded by other TSVs. Consequently, a TSV can be affected by several adjacent TSVs from all directions.

To reduce the current leakage to the substrate a thicker sidewall isolation layer (liner) or shorter TSVs can be employed. Increasing the thickness of the liner can result in higher area overhead and/or higher electrical resistance for the TSVs. If shorter TSVs are to be used, producing very thin wafers is the primary challenge.

The cross-talk between TSVs happens due to two different mechanisms. The first mechanism is injecting the noise in silicon bulk via TSV body capacitor ( $C_{TSV}$ ), which can be transferred to the victim TSV along the capacitive, resistive silicon path ( $C_{Si}$  and  $R_{Si}$ ). The second way is inductive coupling between the inductors of adjacent TSVs.



Figure 2.11: Cross-talk model between adjacent TSVs

The model for analyzing the crosstalk in TSV bunches is shown in Figure 2.11. In this model, for each TSV the effect of four closest neighboring TSVs is shown for simplicity. In a more precise modeling, other TSVs can also affect the victim, however, this effect attenuates as the distance between aggressor and victim TSV increases. The resistance and capacitance of the bulk and as in [81] :

$$R_{si} = \frac{\rho d}{2r l_{tsv}} \tag{2.7}$$

$$C_{si} = \frac{\pi \varepsilon_{si}}{\cosh^{-1}\left(\frac{d}{2r}\right)} \tag{2.8}$$

Where *d* is the pitch of the two TSVs.

The mutual inductance of two adjacent TSVs is described by (4.1) where *b* is defined as 2d/ltsv [62].

Noise coupling is a serious issue that can degrade the performance of signal transfer, specially for high frequency applications. Several methods have been proposed to decrease the noise

#### Chapter 2. TSV based 3D Circuits

coupling in 3-D circuits such as using high resistivity (HRS) substrate, thick insulating dielectric. Employing high resistive substrate increases the cost of fabrication and thick insulating dielectric reduces the substrate noise, but cannot suppress the coupling between TSVs.

An efficient way to reduce noise coupling in 3-D circuits is shielding the signal TSV with several grounded TSVs [82].Similar to the horizontal wires, power and ground TSVs can be utilized to shield signal TSVs [82]. Using shield TSVs increases the number of TSVs and reduces the yield. To reduce the number of required TSV "Sheildus" method is proposed [83]. This method considers the data transfer pattern for TSVs at runtime and assigns more stable bits to shielding TSVs to increase the data transfer rate.

Employing guard rings and Deep n-wells (DNW) are common methods to mitigate substrate noise in 2-D circuits, which can be applied also for 3-D circuits [82].

Another efficient approach to alleviate this problem is employing High-Frequency Scalable TSVs [84,85]. Figure 2.12 illustrates the structure of the coaxial TSV that consists of a central conductor to transmit the data, a surrounding conducting grounded layer for shielding, and layers of dielectrics to insulate these two conducting layers from each other and silicon substrate. The ground shielding layer confines the electro-magnetic field to the inner part of the TSV and suppresses the signal loss for RF signals. Despite the larger pitch of Coaxial TSVs as compared to regular TSVs, smaller keep-out zones for these kinds of TSVs and resolving the need for separated shielding TSVs help to increase the accessible density of coaxial TSVs in 3-D circuits. Exploiting coaxial TSVs also increases the intrinsic decoupling capacitance. Although coaxial TSVs can potentially alleviate the substrate noise problem in 3-D circuits and provide excellent impedance matching, the manufacturing implications of these TSVs on the design process of power distribution networks are not clear.



Figure 2.12: structure of Coaxial TSV

#### 2.4 Synchronization in 3-D circuits

A primary challenge in designing synchronous circuits is how to distribute the clock signal to the sequential parts of the circuit [86]. This issue can be more challenging for 3-D circuits since a clock path can spread across several planes with different physical and electrical characteristics [12]. 3-D integration drastically decreases the interconnect length of the global wires, which can reduce the number of clock buffers and result in more power-efficient clock networks. Conversely, thermal issues are more pronounced in 3-D integrated circuits. Clock networks consume a great portion of the power dissipated in a circuit [87]. Consequently, designing low power clock networks for 3-D circuits is an important issue.

Symmetric clock networks afford low-skew synchronization by delivering the clock signal at the same time at the sink nodes. These symmetric structures including H-trees, X-trees, mesh, and ring clock distribution networks are widly employed in 2D ciruits. Expanding 2D symmetric clock network to 3-D does not nessecerily eventuate in another symmetric structure since clock signal in transferred through TSVs which have different characteristics from horizontal interconnects. A study on extending different kind of symmetric clock network to three dimensional circuit is presented in [12]. Different topologies shown in Figure 2.13 are considered and fabricated. Experimental results reveal that the clock skew originating from vertical interconnects is negligible which persuades the designers to employ these well-known topologies also for 3-D circuit design. Studies presented in [68, 88] investigate the effect of number of TSVs in a 3-D clock distribution network on characteristics of the clock signal.



Figure 2.13: different symmetric clock topologies expanded to 3-D [12].

3-D circuits consist of different layers fabricated in different technology processes and having diverse electrical specifications. This non-uniform structure affects the characteristics of clock networks. Some researches have attempted to investigate synchronization for non-uniform 3-

D circuits. A simultaneous buffer and TSV allocation algorithm is presented in [89] where the size and number of repeaters are pre-defined, where [90] proposes an algorithm to determine the size, number and position of repeaters in 3D clock network.

#### 2.4.1 Resonant clocking

An efficient approach to eliminate the clock buffers and reduce power is to use resonant clocking [13, 20, 91]. In this approach, on-chip inductance is added to the clock network forming a resonant circuit with the interconnect capacitance Consequently, the power consumed by the network decreases, since the energy alternates between electric and magnetic fields instead of being dissipated as heat.

A seminal work, introducing the concept and design of resonant transmission lines is described in [92]. A design of a global clock distribution network is presented in [20], in which four resonant circuits are connected to a conventional H-tree structure as illustrated in Figure 2.14. Each quadrant consists of an on-chip spiral inductor that resonates with the wiring capacitance of the clock network and a decoupling capacitor connected to the other end of the spiral inductor. A simple lumped circuit model is utilized in [20] to determine the resonant inductance. The resonant frequency of the network is (in first-order) estimated by  $f_r es = \frac{1}{2\pi\sqrt{LC}}$  where *C* and *L*, respectively, denote the equivalent capacitance of the network interconnect and the inductance of the spiral inductors. The decoupling capacitor is employed to provide a positive voltage offset on the grounded end of the resonant inductor and adapt the voltage level to the CMOS logic level [21]. This capacitor should be sufficiently large to guarantee that the resonant frequency of the decoupling capacitor  $f_{res-dec} = \frac{1}{2\pi\sqrt{LC_{decap}}}$  is much lower than the desired resonant frequency of the clock network.



Figure 2.14: Resonant clock network with four resonant circuits [13].

Based on this structure, a design methodology for resonant H-tree clock distribution networks

is proposed in [20]. In this work, the clock tree is modeled with a distributed *RLC* interconnect. The electrical model is utilized to determine the parameters of the resonant circuit and the output impedance of the clock driver such that the power consumed by the network and the clock driver are minimum, while a full swing signal is delivered at the output nodes.

To support pre-bond tests, resonant operation should be achieved for each individual plane during testing irrespective of the employed pre-bond testing approach. The resonant 3-D clock network should be designed such that resonant operation at a specific frequency is individually achieved for each plane as well as for the entire 3-D system. In next section, we investigate the issues for designing clock distribution networks which support post and pre-bond test for 3-D circuits.

## 2.5 Test

One of the main challenges for the high volume production of 3-D circuits is manufacturing and the related yield implications. Fabrication processes for 3-D circuits include some additional steps as compared to standard CMOS processes, such as wafer thinning and TSV fabrication. This manufacturing complexity makes 3-D circuits more susceptible to manufacturing defects, which can lower the overall yield of the bonded 3-D system. Testing vertical interconnects and detecting defective TSVs prior to bonding the next layer is an efficient method to improve the yield of the 3-D circuit [93].

TSV defects can arise in manufacturing, bonding or life time of 3-D circuits. Defects which occur during TSV fabrication can be detected by pre-bong testing of TSVs where defects due to alignment, bonding and stress needs post-bond tests to be detected [94]. TSV manufacturing defects comprises several mechanisms such as micro-voids in TSVs, TSV pinch-off, oxide defects, (*e.g.* pinholes), thermo-mechanical stress induced defects, and voids and cracks in micro-bumps [95]. Micro-voids and TSV pinch-off increases the resistance of the TSVs where pinhole in TSV oxides leads to high leakage between TSV and substrate and increases the capacitance [95]. Thermal expansion mismatch between Cu, Si, and SiO2 and electromigration in micro bumps are other probable defects that can degrade the reliability of TSVs [95]. Despite different mechanisms for TSV defects, they can be modeled as opens, shorts, and delay faults similar to horizontal interconnects [95]. Therefore lots of testing pattern which has been presented for 2D circuits can be used also for TSVs [94].

The major challenge for pre-bond testing of TSVs is the huge size of test probes as compared to TSV pitch and diameter. The minimum requisite pitch for today's probe technology is 35  $\mu$ m which is much larger than recent TSVs. To overcome this problem a test method is presented in [93] in which the probe needle is in contact with a number of TSVs. These TSVs are shorted and form a network and a test flow is proposed to determine the resistance of each TSV in the network.

Another useful method to improve yield is wafer level pre-bond test, which includes testing

each layer prior to bonding. As mentioned before, there are two major approaches to bond physical layers, wafer-to-wafer (W2W) and die-to-wafer (D2W) [26]. Although each approach poses different manufacturing requirements, detecting defective dies prior to bonding can improve the yield of the 3-D system [96,97]. Figure 2.15 depicts the potential 3-D test flow [14] where after every manufacturing operation there is a subsequent test executed. In this approach defects are caught as early as possible, before they lead to further damages. Pre-bond test in the form of wafer level test can help to bond more functioning dies in W2W integration [98] and prevent from bonding a functioning die to a defective die of the wafer in the case of D2W integration [99].



Figure 2.15: Potentional test flow for 3-D circuits [14].

#### [7], [8]. [TSV Defects].

Pre-bond testing of dies introduces many challenges in Design-for-Test (DfT). Thinned wafers are far more fragile than un-thinned ones, so the number of contacts can be made during probing is smaller for these wafers [95].

A DfT structure for 3-D circuits is shown in Figure 2.18 where the blue parts depict conventional DfT and the red parts are the added components for 3-D DfT [14]. A common method for wafer level testing is to employ a probe card and mechanically connect the test needles to the device under test (DUT). As shown in Figure 2.18 to support pre-bond test in 3-D circuits, test probe pads should be accessible by each plane which leads to extra test pads for the 3-D circuits. The area overhead of the test pads, the risk of damage in low-k dielectrics and deforming the pads are the important issues in conventional wafer level testing. Wafer level testing of the unbond wafers for 3-D systems is subjected to new issues that should be addressed. For example, broken scan chain is an issue that should be considered to support pre-bond scan test [94, 100, 101]. Another issue resembling the broken chains are the typically fragmented local clock networks within each plane of the stack. Due to this situation, new methods are needed to provide synchronization to catch.

To support pre-bond test in 3-D circuits, each plane needs a complete clock tree. However,



Figure 2.16: DfT architecture for 3D-ICs [14].

3-D clock networks often include several disconnected networks in some of the planes. These networks connect with TSVs to the plane where the main tree supplies the clock signal to the entire clock distribution network. To complete the clock network in the second plane, the most common method is using redundant wiring and using an additional clock driver following, for example, the principles of the technique in [97] (see Figure 2.17(b)). In this method, there are two important design parameters, sizing the additional wires and clock drivers used only during testing to satisfy the specifications related to the clock signal. These parameters should be chosen so that a full swing signal is delivered to the sink nodes. There is a tradeoff for determining these parameters. If the wire width is decreased, a larger clock driver should be utilized. Alternatively, increasing the width of the wire results in a smaller clock driver but increases the area occupied by the redundant wires, which are used only during pre-bond test.

Another way to deliver the clock signal is to use a *Delayed Locked Loop* (DLL) for each local network [102]. The area overhead of this method increases proportionally to the number of networks. In these approaches, the DLLs must be switched off and the redundant networks must be disconnected from the network used for normal operation by employing transmission gates and control signals (see Figure 2.17(c)).



Figure 2.17: Different methods of testing a 3-D circuit with two planes shown in (a) where the clock signal for pre-bond test is provided by (b) the use of redundant wiring and (c) the use of a DLL for each local network.



Figure 2.18: 3-D test structure using inductive links.

A non-conventional approach to address this issue is to employ a wireless testing scheme for pre-bond test. Wireless testing can eliminate the need for test probe pads used just for pre-bond test. Wireless testing has been widely explored and reported to improve the cost and reliability of the testing process in VLSI circuits by reducing the manufacturing defects caused by adding test pads [103–105]. Replacing the direct test needles by wireless connections can increase the test frequencies at higher pin density and provide a faster test process using high parallelism [103, 104]. Possible technologies for wireless testing are RF, near field (including capacitive and inductive coupling), and optical communication. Due to the short distance between the planes in 3-D integrated circuits, near field communication is considered as the proper candidate for wireless testing in these circuits [103]. To support wireless communication between the automatic test equipment (ATE) and the DUT, inductive links can be used [36]. This approach typically incurs a significant area overhead since spiral inductors have to be added to the DUT. A potential 3-D test structure employing inductive links is shown in Figure 2.18.

At-speed scan testing ensures good test coverage for integrated circuits [15, 106–108]. Scanbased testing requires that test patterns are scanned at a low clock frequency before the fast capturing clock is applied. To provide at-speed scan testing, the scan chain is loaded at the test clock frequency, afterwards, two clock pulses at operating frequency are applied to the chain [15]. As shown in Figure 2.19, there are two methods to launch the transition. In *Launch-off-Shift* (LOS) method the transition is launched in the last cycle of scan shift where in *Launch-off-Capture* (LOC) the transition is launched in the first cycle of the functional launch cycle. Therefore, to produce the proper timing for capturing the data, the clock signal should efficiently switch between the two frequencies as also highlighted in [102], where the use of DLLs enables this frequency transitions. In case of the redundant wires [97], proper frequencies for the clock signal can be produced by an on-chip PLL or the ATE.

![](_page_58_Figure_3.jpeg)

Figure 2.19: Transition delay test methods where (a) is LOC and (b) is LOS method [15].

### 2.6 Design Challenges for TSVs

TSV manufacturing is a primary challenge that should be overcome if 3-D ICs are to be produced in high volume. Beyond manufacturing, however, there are several design challenges related to the different roles of TSVs within 3-D circuits. A common issue among the different types of TSVs (assuming that some library of TSV cells can exist) is that the TSVs occupy silicon area and can block all of the metal layers. Consequently, the TSVs compete with the devices and the intra-plane wires for silicon and wiring resources, respectively. How to best place the TSVs becomes an important issue with implications to the delay of the inter-plane nets and routing congestion [68, 73]. Although, some techniques exist on placing the TSVs, there are still great margins for improvement.

For power/ground TSVs several issues also arise. Inserting TSVs within a circuit block has an adverse effect on the area and wirelength of this block and, often, the TSVs are considered to be placed at the periphery of this block. Placing the TSVs at the periphery of the block, however, means that longer current paths are formed. These paths can increase the IR drop on the load affecting power integrity. To address this situation, either the fineness of the power grid or the density of the TSVs can increase to lower the impedance of the paths [77, 79]. Several tradeoffs can be explored to determine the proper design that meets the power supply noise constraints, while not sacrificing excessive wiring and silicon resources.

Although power/ground and signal TSVs also behave as thermal conduits, the number as well as the allocation can be insufficient to mitigate thermal issues in 3-D circuits. To this end, thermal TSVs can be efficiently placed across the area of each plane. Many thermal-aware placement techniques have been developed with noteworthy results [80]. As the number of planes forming a 3-D system increases, the efficiency of the TTSVs degrades since the thermal paths to the heat sink become longer. The increase in the vertical thermal impedance, in turn, requires an increase in the TTSV density. Alternatively, increasing the TTSV density can adversely affect other important design objectives, such as speed, power, and area. Increasing, therefore, the density of the TTSVs may not be possible, which means that the TTSVs should be treated as an auxiliary means to reduce temperature [82]. Rather careful thermal-aware physical design should first be applied to manage the increased power densities in 3-D circuits. Considering these challenges, the requirement is to manufacture and design TSVs such that signal, power, and thermal integrity within 3-D ICs is ensured. Addressing each of these objectives separately, which is the usual practice, may not result in the proper use of this expensive resource.

### 2.7 Summary

This chapter presents an interview to TSV based 3D circuits. The fabrication process of TSVs is discussed in Section 2.1. Different approaches for manufacturing TSVs, such as via first, via middle, via last, and via after is reviewed and the benefits and challenges of them is discussed. In Section 2.2, different applications of TSVs in a 3-D circuit is investigated. Inter-plane communication is the main known role of TSVs, where TSVs can be also employed for power and ground distribution among planes. Thermal TSVs can alleviate the thermal problems in 3-D circuits originated from high power density and low thermal conductivity. In Section 2.4 synchronization in 3-D circuits is discussed. Distributing the clock signal among different planes is an interesting challenge. Test is important issues in 3D circuits due to the vulnerability of these circuits to fabrication defects. Section 2.5 is dedicated to testing TSV-based circuits and related issues.. An overall view to design challenges in TSV based 3D ICs IS presented in Section 2.7

# **3** Monolithic Integration

TSV based 3D circuits have attracted lots of attentions in semiconductor industry. Despite the significant benefits that TSVs can offer, there are some shortcomings that persuade designers to search for alternative approaches of vertical integration. The International Technology Roadmap for Semiconductors (ITRS) [109] predicts that TSV pitch will remain in the range of several microns, while on-chip interconnect pitch is in the range of 30–100 nm (in 2014). Alignment limitation is an important issue that makes TSV downsizing very challenging. Also TSV manufacturing process imposes a minimum keep out zone that increases the area overhead for 3D ICs. In addition, the parasitic capacitance of TSVs is large (tens-hundreds of fF ), which may degrade the timing and power of circuits. Mechanical stress due to very thin stacked layers is another restrictive challenge in designing TSV based ICs. *3-D monolithic integration* (3DMI) is interesting alternative for polylithic TSV based 3D circuits which offer through-silicon connections with less than 50nm diameter and therefore provide 10,000 times the density of TSV technology. Monolithic inter-die vias (MIVs) has much better characteristics than TSVs in terms of parasitics, and stress due to their smaller size [1].

In this approach, devices in each active layer are processed sequentially starting from the bottom-most layer rather than bonding two fabricated dies together using bumps and/or TSVs. Devices are built on a substrate wafer an after proper isolation, a second device layer is formed and devices are processed on the second layer. This sequence of isolation, layer formation, and device processing can be repeated to build a multilayer structure [29].

The key point in monolithic 3D integration is the high alignment precision between successive layers which is determined only by lithographic alignment accuracy. The alignment precision for monolithic vertical integration is around 10 nm where TSV based integration support the alignment precision up to 0.5 um [110].

To build a monolithic 3D circuit, the first layer is fabricated using the same manufacturing process of a conventional 2D circuit as shown in Figure 3.1. After fabricating active devices and interconnects of first layer, an isolation layer is deposited and polished using a low-temperature low-temperature chemical mechanical planarization (CMP) process. To form the

next tier a thin single crystal layer of silicon (or Ga and GaAs for optic and RF applications) is attached atop the isolation layer. [111]. Since this layer can be very thin (*i.e.* around 30 nm), MIVs are very short with negligible parasitic capacitance (*e.g.* less than 0.1 fF) where for TSV based 3D circuits the overlaying layers cannot be thinner than 50 u which results in higher parasitic capacitance for TSVs.

![](_page_61_Figure_2.jpeg)

Figure 3.1: process of monolithic fabrication

Thermal budget during forming second layer is the biggest challenge in monolithic vertical integration. Once copper or aluminum is added on for bottom layer interconnect, the process temperatures need to be limited to less than  $400^{\circ}C$ , where forming single crystal silicon requires  $1200^{\circ}C$  and forming transistors in single crystal silicon requires around  $800^{\circ}C$ . Therefore novel low-thermal-budgeting process must be applied to fabricate the top device layer. Tungsten is considered to be exploited instead of copper or aluminum since it can tolerate high temperatures without degrading. Moreover the isolation layer between layers should have high melting point and low thermal conductivity to prevent the thermal damages of the lower layer. From the perspective of thermal isolation during processing, it is ideal to have a large thickness for the isolation layer. However, the thermal resistance during circuit operation and the length of the inter-layer interconnects would limit the actual thickness of this isolation oxide layer.

Several monolithic approaches are proposed. Low temperature bonding process is presented in [112, 113]. A sequential process using lateral seeded crystallization is introduced in [111]. Another proposed approach is plasma-activated low-temperature wafer bonding process [114] where amorphous Silicon is deposited on the isolator layer and crystallized using laser pulses.

There are two different design approaches for monolithic 3D circuits, transistor level monolithic integration (TMI) where pMOS nMOS transistors are located in different layers as depicted in Figure 3.2 (a) and gate level monolithic integration (GMI) where the placement of devices is similar to TSV based 3D circuits as shown in Figure 3.2 (b). TMI allows the highest integration density possible. A drawback is the need for different placement and routing tools.

Although monolithic 3D circuit can provide ultra high density, their fabrication process confronts serious problems. The quality of crystal for upper layers is usually low and imperfect. Therefore, high performance devices cannot be built in the upper layers. The high temperature

![](_page_62_Figure_1.jpeg)

Figure 3.2: transistor level and gate level monolithic structure [1].

fabrication process of upper layers degrade the underlying devices and strict thermal budget should be considered to control this effect. Due to the sequential nature of this method, manufacturing throughput is low.

#### **3.1 RRAM**

Ongoing demand for high capacity, dense memories, motivate the research for new *non-volatile memories* (NVM) such as *phase-change RAM* (PRAM), *nanoelectromechanical relay* (NEM), *magnetic RAM* (MRAM), and resistive RAM (RRAM). RRAMs cells provide ultra high dense memories with easy programing features, faster write time and lower write power as compared to other emerging NVMs [19, 115]. These characteristics makes RRAMs a promising candidate for next-generation non-volatile memories.

The basic element of RRAM cells is *memristance*. Despite the theory of memristance was stablished long time ago, the first memristor was fabricated recently at 2008 in HP labs. This device has a capacitive like structure including an insulator or semiconductor sandwiched between to metal layers depicted in Figure 3.3 [16, 116].

![](_page_62_Figure_7.jpeg)

Figure 3.3: memristor structure [16]

The I-V curve of a memristor is shown in Figure 3.4. There is a hysteresis loop which includes two linear regions of high and low resistance and switching regions where the device state changes. The resistance of the memristor can vastly change (over 1000%) by changing the applied voltage. Figure 3.5 shows the change in resistance of an RRAM cell in high and low resistance region by applying  $\pm$  5V pulsed voltage. The switching speed can be faster

than several nanoseconds. The resistance of memristors changes with the applied voltage or current. This behavior gives the device kind of memory and makes them a good candidate to be used in non-volatile memory structures. The low-resistance and high-resistance states can exhibit high and low logic [16].

![](_page_63_Figure_2.jpeg)

Figure 3.4: hystersis behaviour in I-V curve of memristors [16]

![](_page_63_Figure_4.jpeg)

Figure 3.5: Switching behavior of a memristor where the resistance changes by changing the applied voltage

Resistive switching is observed in many metal-insulator-metal (MIM) structures. Various metal oxides such as perovskite-type manganite and titanates and binary metal oxides like *SiO* and *NiO* are detected to have hysteresis I-V curve. Typical materials for electrodes an insulator of MIM structured memristors are listed in Figure 3.6.

The building block of RRAM is 1T1R which consists of a CMOS transistor (1T) and a MIM structured memristor. In the 1T1R structure, the memristor is fabricated on top of the source or drain terminal of a transistor as shown in Figure 3.8. The transistor is used to program the memristor and write and read the state of the MIM structure.

Several 1T1R cells can be merged in array structure to form as memory block as shown in Figure 3.9. SET and RESET operation are used to program the RRAM cell.

SET operation changes the resistance state from high resistive state (HRS) to low-resistive

|                        | Examples                                                                                                                                                                            |  |
|------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|
| Top electrode          | Pt, TiN/Ti, TiN, Ru, Ni                                                                                                                                                             |  |
| Transition Metal Oxide | $\begin{array}{l} {\rm TiO}_{\rm x},  {\rm NiO}_{\rm x},  {\rm HfO}_{\rm x},  {\rm WO}_{\rm x}, \\ {\rm TaO}_{\rm x},  {\rm VO}_{\rm x},  {\rm CuO}_{\rm x}  ,  \ldots \end{array}$ |  |
| Bottom Electrode       | TiN, TaN, W, Pt,                                                                                                                                                                    |  |

Figure 3.6: typical materials used for memristor structure

![](_page_64_Figure_3.jpeg)

Figure 3.7: The four circuit properties (voltage, current, magnetic flux, and charge) and their relations [16] where three relations represent the well-known passive circuit elements, resistors (R = dV/dI), inductors (l = dQ/dI), and capacitors (c = dQ/dV). The forth relation is described as  $M = d\phi/dQ$  and it is well fitted to the new element called memory resistance or memristance.

state (LRS) which represents writing "0" in the memory. On the contrary, RESET operation changes the resistance from LRS to HRS writing "1" logic in to the memory [18, 115].

To SET the memory cell, SET voltage is applied to BL and source line (SL) is connected to zero volt. Applying RESET voltage to SL and zero volt to BL resets the RRAM cell. On of the main advantages of RRAMS over flash memories is that the SET and RESET voltage is very low (*i.e.* around 1.6-1.8 V for RRAMs vs. 10-12 V for flash memory) [18, 115].

![](_page_65_Figure_1.jpeg)

Figure 3.8: 1T1R structure

![](_page_65_Figure_3.jpeg)

Figure 3.9: RRAM array structure

RRAMs can provide very dense memory blocks. Comparing to SRAMs which consist of six transistor in each cell, single transistor structure of RRAMs can drastically improve the area efficieny of the memory block. Based on ITRS roadmap, including the prepheral circuits (*e.g.* controling and encoding/decoding circuits, write drivers, sense amplifiers), RRAMs can increase the density of the memory block by six times. This density improvement is predicted to be twice (100%) as compared to dense DRAM memories. This density improvement can lead to shorter addressing lines and hence reduced power consumed in driver circuits [16, 117]. Moreover, RRAM technology is CMOS-compatible which makes them cost efficient as compared to DRAMs which require additional fabrication steps rather than standard CMOS process. The materials of memristors are deposited at low temperature and they can be fabricated between two metal layers during BEoL process [118]. The yield of RRAM memories is around 80-90% which is quite close to the yield of matured memories such as DRAMs.

The retention time for existing memristors is reported to be up to a couple years [16] where this time for SRAM is 5 years and for flash memory is around 100 years [119]. The retention time of RRAM is predicted to reach a decade in near future which will be more than sufficient for any memory application.

On the other side, write times are a little slow, on the order of nanoseconds. Memristors by comparison require approximately 10 ns just to write to a single cell, to which bus and addressing latencies must be added. This means RRAM will be much slower than DRAM for

writes. However RRAM requires lower write time as compared to other non-volatile memories such as flash memories, but it is reasonable to expect that a read operation will be much faster than a write operation.

The main weak-point of RRAM is endurance which is reported between  $10^5$  and  $10^7$  write cycles while for DRAM endurance is around  $10^{10}$  to  $10^{12}$ . This low endurance can limits the application of RRAM.

This combination of extreme density, moderate speed, and moderate endurance makes memristors an excellent candidate for FPGA applications. Limited number of write cycle, tolerable required write speed, and growing need for programmed data in FPGAs matches the RRAM specifications and makes RRAMs perfect applicant for FPGA structure [16] different applications of RRAM in FPGAs are considered in next section.

#### 3.1.1 RRAM in FPGA structure

FPGAs have noticeably low performance in terms of area, delay, and power consumption as compared to ASIC circuits. In conventional FPGA structure SRAMs are used to form the programmable interconnects. The low density of SRAM-based storage increases the area overhead of FPGA programmability and consequently, leads to longer routing paths and larger interconnect delay. Moreover, a considerable amount of power is consumed in SRAMs during stand-by due to volatile structure of this memory.

Replacing SRAMs with RRAMs can improve the area and power consumption in FPGA structure. RRAM have a smaller cell size and their non-volatile structure saves significant amount of leakage power during standby.

On the other hand, the drawbacks of RRAM which is poor performance during writing operation and low endurance can be ignored in FPGA applications. There is limited write access to the RRAMs only during programming the FPGA. The number of programming cycles is expected to be small (*e.g.* less than 500) for typical FPGA users [19].

RRAMS can be employed as memory blocks or as a part of routing switches. These two applications are discussed in next subsection.

#### **RRAM in memory block**

As shown in Figure 3.10, SRAM based configuration bits can be replaced by RRAMs resulting in reduced area and power consumption. RRAMs are applied to not only the SRAMs in programmable interconnects but also the SRAMs in LBs.

There are two types of memory cell artitecture, NAND-type cells and NOR-type cells as shown in Figure 3.11. NOR-type cells faster serial access rather than NAND-types. For flash memories typically NAND cells used since the sharing the metal contact between adjacent cells make

![](_page_67_Figure_1.jpeg)

Figure 3.10: Replacing SRAMS cells with RRAM [17]

them more area efficient. For RRAM memories, each cell requires separate metal contact, therefor the density of both NAND and NOR type is similar. This makes NOR-type cells better candidate for RRAM structure [18].

![](_page_67_Figure_4.jpeg)

Figure 3.11: NANA and NOR type memory cell [18]

#### **RRAM** in routing path

The programable interconnects occupy up to 90% of FPGA area and are responsible for up to 80% and 85% of the total delay and power consumption, respectively [19]. Hence routing structures play an important role for the existing gap between FPGA and ASIC circuit.

New routing structures are proposed for programable switches shown in Figure 3.12. In this architecture, the pass transistor switches are replaced by RRAM cells.

Since RRAMs are fabricated between the metal layers (*e.g.* between metal layers 5 and 6 in [120]), the routing switches can be placed over the logic layer as depicted in Figure 3.13 (a) which results in drastic area reduction. Figure 3.13 (b) illustrates the FPGA layout for RRAM based. In RRAM based structure the area is mainly determined just by the logic blocks which

![](_page_68_Figure_1.jpeg)

Figure 3.12: Using RRAM as the programable switch [19]

provide drastic area reduction (up to 96% [19]).

![](_page_68_Figure_4.jpeg)

Figure 3.13: The layout of 1T1R switch(a) and RRAM based FPGA (b) [17]

A smaller footprint results in shorter interconnects and reduced interconnect delay. Up to 55% performance improvement is reported for RRAM based FPGA as compared to conventional ones. Non-volatility of RRAMs reduces the power consumption of the new structure up to 79%.

There are some challenges for designing RRAMs based FPGAs. An important issue is the programmability of RRAMs integrated in the routing path. RRAMs have only two terminals and to program them there must be a programming transistor at each of the two terminals of an RRAM. When they are used as routing switches, the two terminals are shared between the programming and signal paths. Their programming circuits need careful design; otherwise there will be interference between the two modes of operation.

The programming transistors can be shared so that only one programming transistor is required for each node. Different algorithms have been proposed for placing the programming transistors in RRAM based FPGAs [18,19,121]. Efficient placement of programming transistors can drastically decrease the number of these transistors [19].

The majority of works assume minimum size for programable transistors to achieve minimum delay while some researches [18, 19, 122, 123] state that it is not necessarily the case. This study presents a method to determine the optimum size for programing transistor to reduce the power without loosing the performance.

There have been lots of studies on the replacement of routing switches with RRAMs. However, routing buffers become a bottleneck after this replacement and thus need further improve-

ment. Adaptive buffer allocation method is proposed in [19] where the positions of inserted buffers in mrFPGA are optimized on demand. Finding an efficient specific algorithm for buffer allocation in RRAM based FPGAs is an important ongoing task that can affect the performance, area, and power of these circuits.

## 3.2 Summary

A quick review on momolithic 3D integration is presented in this chapter. RRAM technology as a promising candidate to build dense non-volatile memories is investigated in Section 3.1. The structure and behavior of RRAMs is studied and their benefits and drawbacks are noticed. Another important application of RRAM cells in routing path of FPGA are also discussed in this section. Routing structures are the main responsible for critical path delay and area in FPGAs. Replacing CMOS gate transistor switches with RRAM switches can improve the critical delay and reduce the area for routing structures.

# 4 Resonant Clocking

Clock distribution networks consume a considerable portion of the power dissipated by synchronous circuits. As the area of the integrated circuits increases, larger networks are required to distribute the clock signal, which results in higher capacitive loads and resistive losses of the interconnects and degrades the signal integrity along these interconnects. A common solution to alleviate this problem is to insert clock buffers in the intermediate nodes of the clock network. Although buffer insertion improves clock signal integrity, clock buffers significantly increase the power consumed by the clock distribution network.

3-D integration drastically decreases the interconnect length of the global wires, which can reduce the number of clock buffers and result in more power-efficient clock networks. Alternatively, thermal issues are more pronounced in 3-D integrated circuits due to the increased power densities. Consequently, designing low power clock networks for 3-D circuits is a primary challenge.

Resonant clock distribution networks are considered efficient low-power alternatives to conventional clock distribution schemes. These networks utilize additional inductive circuits to reduce power while delivering a full swing clock signal to the sink nodes as shown in Figure 4.1. In this approach, on-chip inductance is added to the clock network and forms a resonant circuit with the interconnect capacitance, decreasing in this way the power consumed by the network, since the energy alternates between electric and magnetic fields instead of dissipating as heat.

Manufacturing process of 3-D circuits includes some additional steps rather than standard CMOS process such as wafer thinning and TSV fabrication. Advanced fabrication process and thermo mechanical stresses make 3-D circuits more susceptible to manufacturing defects and increase the importance of testing in these circuits. Pre-bond test, includes testing each plane before bonding to other planes and can improve the yield of 3-D systems [96]. The nature of resonant clock network poses different constraints for testing. For example, resonant operation should be achieved for each individual plane during testing irrespective of the employed pre-bond testing approach. The design of 3-D resonant clock networks and the

![](_page_71_Figure_1.jpeg)

Figure 4.1: Resonant clock network with four resonant circuits [13].

related constraints have not been explored as compared to traditional planar clock networks [13, 92, 96, 102, 124]. Consider for example, a 2-D circuit that employs a monolithic *LC* tank to resonate. This design would be inadequate for a 3-D resonant clock network, since pre-bond test is not supported. The resonant 3-D clock network should be designed such that resonant operation at a specific frequency is individually achieved for each plane as well as for the entire 3-D system.

Another challenge for pre-bond testing of 3-D circuits is the incomplete clock distribution networks prior to the bonding of the planes. 3-D clock networks often include several disconnected networks in some of the planes, which connect with TSVs to the plane where the main tree feeds the clock signal to the entire clock distribution network (see Figure 4.2).

![](_page_71_Figure_5.jpeg)

Figure 4.2: clock distribution in a 3-D circuit where in (a) the clock network is connected in both planes and (b) includes some disconnected parts.

To provide pre-bond test, each plane needs a complete clock tree. A technique to provide such
a tree by employing additional wiring has recently been presented [124]. In another approach each disconnected clock tree is driven by a DLL, enabling pre-bond test for each plane of the 3-D system [102]. Both methods support pre-bond test for traditional clock networks by providing a means to connect the local networks within each plane. Contactless testing methods have been considered as an alternative for conventional test methods [103–105, 125]. Inductive links can be exploited to wirelessly transmit the clock signal to the disjoint resonant clock networks. Inductive links typically incur a significant area overhead as in other techniques. This overhead, however, is practically eliminated for resonant clock networks. The inductors comprising the *LC* tanks are used as the receiver circuit for the links, essentially eliminating the need for additional circuits and/or interconnect resources during pre-bond test.

The contribution of this chapter is a design methodology of a resonant clock network for 3-D circuits that supports wireless pre-bond testing for this network through the use of inductive links. Resonant operation is ensured for each plane either in test or functional mode and the clock signal characteristics are maintained within each plane and for either operating mode. The proposed 3-D resonant clock networks considerably lower the power of the clock distribution system, while pre-bond test is supported by the proper design and allocation of the *LC* tanks within each plane. There are two aspects to design a wireless testable resonant clock network; 1) designing the on-chip resonant clock network and 2) providing the transceiver circuit required for wireless communication during the pre-bond testing.

A design methodology for H-tree resonant clock networks in 3-D circuits is proposed in next section. The number of *LC* tanks, the resonant circuit parameters, and the driver size for normal operation are determined such that a full swing signal is provided at the sink nodes and the power consumption of the circuit is minimized. The effect of different parameters including the number of planes and number of TSVs among the planes for designing 3-D resonant clock networks is investigated.

In symmetric clock networks (*e.g.* H-trees) the location of *LC* tanks are determined where finding the proper allocation method for *LC* tanks in synthesized trees is a challenge. In Section 4.2, a design method for applying the resonant clocking approach for synthesized clock trees is presented. To simplify the problem, an ideal clock driver is assumed to be used for these clock trees (*i.e.*  $R_{driver} = 0$ ). The proper number of *LC* tanks and the related resonant parameters are determined in the proposed method. This method provides the minimum number of *LC* tanks that can deliver a full swing signal to all the sink nodes by considering the capacitive load at each node to determine the location of *LC* tanks. Resonance parameters, such as the size of the inductor can be adapted to reduce the power consumption and/or area overhead of the clock distribution network.

The design of a transceiver circuit consisting of a transmitter placed off-chip (*i.e.* in the probe card) and an on-chip receiver is described in Section 4.3. The probe card is designed to deliver a full swing sinusoidal clock signal in normal operation frequencies as well as scan and at-

speed testing. The complexity of the design is shifted to the transmitter plane where one probe card can be used to test several circuits.

## 4.1 Resonant clocking for symmetric networks

After a brief review on resonant clock networks in 2-D circuits, in this chapter, I propose a method to extend resonant clocking to symmetric 3-D clock networks and discuss how to determine the important parameters in these networks.

A design methodology for resonant H-tree clock distribution networks is proposed in [20]. In this work, the clock tree is modeled with a distributed *RLC* interconnect as illustrated in Figure 4.3. This electrical model is utilized to determine the parameters of the resonant circuit and the output impedance of the clock driver such that the power consumed by the network and the clock driver are minimum, while a full swing signal is delivered at the output nodes.



Figure 4.3: *RLC* model of a 16-sink clock network where (a) is the distributed *RLC* model and (b) is the simplified *RLC* model of resonant network [20]

To deliver a full swing signal at the sink nodes, the magnitude of the transfer function of the network  $H_{out}$ , should be close to one. This parameter is often fixed to 0.9 [20, 21] (for the remainder of the chapter "full swing signal" implies any signal swing that satisfies this specification). As shown in [20],  $|H_{out}|$  is described by

$$|H_{out}| = \sqrt{\frac{\left|Z_{in_{\omega}}\right|^2}{\left(R_{driver} + Re(Z_{in_{\omega}})\right)^2 + Im(Z_{in_{\omega}})^2}} \cdot \left|H_{\omega}(j\omega)\right|$$
(4.1)

where  $H_{\omega}$  and  $Z_{in_{\omega}}$  notate the transfer function and input impedance of the network. When

 $|H_{out}|$  is fixed at 0.9, the driver resistance can be determined by

$$R_{driver} = \sqrt{\frac{|H(j\omega)|^2 \cdot |Z_{in_{-}\omega}|^2}{0.9^2} - Im(Z_{in_{-}\omega})^2 - Re(Z_{in_{-}\omega})}$$
(4.2)

Several resonant circuits can be utilized to improve the characteristics of the clock signal. In a symmetric H-tree clock network, the number of *LC* tanks (resonant circuits) also depends on the location of these circuits. If the resonant circuits are placed closer to the driver, fewer circuits are needed. Alternatively, when these circuits are placed close to the sink nodes, more *LC* tanks are required. Since the equivalent inductance is the parallel combination of all the inductors, increasing the number of resonant circuits leads to a larger required inductance for each circuit. Using a higher number of larger inductors results in larger area occupied by the resonant inductors.

The number of resonant circuits also affects the output signal swing. As discussed in [21], by increasing the number of resonant circuits and placing these circuits closer to the sink nodes, each inductor resonates with a smaller part of the circuit resulting in lower attenuation of the output signal swing. Alternatively, increasing the number of resonant circuits and using larger inductors in each *LC* tank reduces the quality factor of the *LC* tanks, since in spiral inductors the effective series resistance (ESR) increases more aggressively than the inductance [20]. A lower quality factor for resonant circuits produces a higher signal loss and decreases the output signal swing. Considering a clock network with 256 sinks driven by an ideal clock driver,  $|H_{out}|$  for a different number of resonant circuits over a wide range of resonant inductance is shown in Figure 4.4, where for fewer than 16 and more than 64 resonant circuits, the  $|H_{out}|$  cannot meet the 0.9 signal swing depicted by the dotted line.



Figure 4.4: |*H*<sub>out</sub>| for different number of resonant circuits

To determine the resonant parameters one approach is to only consider the capacitance of the clock network and employ to determine the total resonant inductance [13, 21, 126]. By

#### **Chapter 4. Resonant Clocking**

doubling the number of *LC* tanks, the inductance of each tank is also doubled. In this approach, the inductive component of the network wires is not considered. In large clock networks with long interconnects, the inductance of the wires cannot be neglected [20]. Furthermore, this method assumes that placing the resonant circuit in different locations does not change the equivalent capacitance of the network (*i.e.* the capacitance seen by the primary clock driver). These simplifications can result in inaccurate estimation of the resonant inductance, adversely affecting the signal swing.



Figure 4.5:  $|H_{out}|$  for different number of *LC* tanks and resonant inductance using the model in [21] (dotted vertical lines) and the proposed approach (solid vertical lines)

The signal swing for a clock network with 256 sinks using an ideal driver for different number of *LC* tanks is illustrated in Figure 4.5 Employing any inductance within the crosshatched ranges, this clock network can meet the signal swing specifications. The resonant inductance determined with the simplified approach is illustrated by the dotted lines, where due to imprecise estimation of the inductance, the clock network cannot deliver a full swing signal to the sinks (as required by the dashed horizontal line). As depicted in Figure 4.5, using the simplified model from [21] can reduce  $|H_{out}|$  from 0.9 to 0.65.

#### 4.1.1 Resonant clocking for 3D H-trees

Based on design implications for resonant networks, different clock network topologies can be considered to adapt the conventional (planar) resonant clock networks to 3-D circuits [127]. In the first topology denoted as "symmetric topology", each plane contains resonant circuits and can be separately investigated as shown in Figure 4.6 In another structure, denoted as "asymmetric topology", the resonant circuit is placed in only one plane and should resonate with the total capacitance of the 3-D stack at the desired frequency. During prebond test, each plane should separately resonate. Note that this requirement is an additional constraint specific to resonant networks and is completely different to the techniques that can be employed to connect the local networks [102, 124] in either a standard or resonant clock distribution approach. Consequently asymmetric structures, which can be considered as an extension of 2-D clock networks, do not support pre-bond testing in a straightforward manner, since the resonant circuit is contained within only one plane.



Figure 4.6: Different topologies for 3-D resonant networks where (a) is the asymmetric and (b) is the symmetric topology.

The other important parameter in designing a resonant 3-D clock network is the number of TSVs used to connect the physical planes. From this perspective, different topologies can be explored, for example, using a single TSV (or a small group of TSVs connected in parallel) in the center of each plane or by using multiple TSVs. In the multiple TSV structure, one of the planes contains a complete clock tree, where for the other planes the clock network consists of several disconnected local networks each connected to the first plane by TSVs. Increasing the number of TSVs provides more local networks increasing the area occupied by the TSVs. Four topologies for a two-plane 3-D circuit with 32 sinks are shown in Figure 4.7. *RLC* models to analyze the different 3-D structures are depicted in Figure 4.8. In single-TSV topologies the equivalent resistance of the circuit is determined as the resistance of each plane divided by the number of planes. By increasing the number of TSVs and omitting some wires in the upper planes, the resistance for the 3-D circuit. Alternatively, increasing the number of TSVs results in decreased capacitance for the 3-D circuit.

To specify the parameters of resonance for a 3-D system with a specific number of TSVs, we have employed a distributed *RLC* model of the clock distribution network. Assuming a 3-D clock network with *N* planes and *n* branch levels in each plane, the number of sink nodes is  $N.2^n$ . TSVs can be placed at any of these *n* levels. Connecting the networks of the entire planes at the *i*<sup>th</sup> level of the prima clock network (*e.g.* located in layer 1 in Figure 4.6) results in  $2^{(i-1)}$  TSVs. The number of *LC* tanks in each plane is assumed to be more than the TSVs such that at least one *LC* tank is connected to each local network. The location of *LC* tanks is swept from the TSVs to the sinks and the resonant parameters, power consumption, and driver resistance for each topology are determined. Based on this comparison the topology with the desired characteristics can be selected. The driver resistance described by (4.1) is



Figure 4.7: Different topologies for a two-plane 3-D resonant clock network where (a) is a single TSV structure with one *LC* tank per plane, (b) is a single TSV structure with four *LC* tanks per plane, (c) is a four TSV structure with four *LC* tanks per plane, and (d) is a four TSV structure with eight *LC* tanks per plane.

plotted over the inductance to determine the resonant parameters, similar to the 2-D case. In the 3-D system the transfer function for different planes can be different due to the effect of the TSV. Not surprisingly, the last plane (the plane with the greatest distance from the clock driver) has the lowest signal swing. The driver size should be determined such that the signal swing for every plane meets the specifications. Consequently, the transfer function magnitude of the last plane should be used in (4.1). Following this process, the number of the *LC* tanks and the parameters of the resonant circuits are determined for normal operation as discussed in our work in [128].

For each location the driver resistance is adapted to produce a transfer function amplitude of 0.9 for a wide range of inductor sizes. The driver resistance and corresponding power consumption are swept versus the inductance. The inductance for which the driver resistance is maximum or the power consumption is minimum (which do not necessarily occur for the same frequency) is determined as shown in Figure 4.9.

For pre-bond test, the power consumed by the clock network within each plane is low as compared to the total power consumed by the 3-D system. Consequently, the power consumed by one plane during the pre-bond test mode is a secondary parameter and since the planes are not bonded, heat is removed faster and the thermal constraints are more relaxed. The predominant parameter in test mode is the voltage swing. The clock network should deliver a full swing clock signal to the sinks to test each plane. During the test mode, each plane should

## 4.1. Resonant clocking for symmetric networks



Figure 4.8: *RLC* model for a two-plane 3-D circuit with four *LC* tanks where (a) is the model for the single-TSV and (b) is the model for four-TSV structures.



Figure 4.9: The driver resistance and power *v.s.* resonant inductance.

consist of a complete clock network. Additional wiring can be used in each plane to connect local networks except for the first plane [124]. There are two important design parameters in pre-bond test, sizing the additional wires and clock drivers used only during testing. These parameters should be chosen such that a full swing signal is delivered to the sink nodes. There is a tradeoff for determining these parameters. If the wire width is decreased, a larger clock driver should be utilized. Alternatively, increasing the width of the wire results in a smaller clock driver but increases the area occupied by the redundant wires, which are used only during pre-bond test. One simplistic approach is to replicate the tree of the first plane for all the planes during the test mode and also use a driver with the same strength as the main clock driver. This choice, however, leads to over design and to wasting silicon. To determine these parameters, we sweep the wire width and the related driver resistance is obtained by (4.1). The driver resistance for different wire width for a two-plane 3-D system with 8 TSVs and 16 resonant circuits is plotted in Figure 4.10.



Figure 4.10: Driver resistance vs. wire size.

The parameters estimated using the simplistic approach is shown at point 1in Figure 4.10. For points 2, 3, and 4, the parameters are determined using our approach where for point 2 the driver resistance is maximum using the wire size of the first plane and for point 3 the wire size is minimum while employing a replica of the main driver. Point 4 provides a better tradeoff, since the wire width decreases to half of the wire width of point 2 and the resistance of the driver decreases by only 15%. This resistance is twice as large as compared to the resistance of the driver used to drive the entire clock distribution network.

To determine the resonant parameters for a specific topology, we adapt the driver resistance to produce a transfer function amplitude of 0.9 for a wide range of inductor sizes using (4.2). The driver resistance and corresponding power consumption are swept versus the inductance. The inductance for which the driver resistance is maximum or the power consumption is minimum (which do not necessarily occur for the same frequency) is determined. In a 3-D system, the transfer function for different planes can be different due to the effect of the TSV. Not surprisingly, the plane(s) with the greatest distance from the clock driver exhibit(s) the lowest signal swing. The driver size should be determined such that the signal swing for every plane meets the specifications. Consequently, the transfer function magnitude of the last plane should be used in 4.2. Following this process, the number of the *LC* tanks and the parameters of the resonant circuits are determined for normal operation.

This method should be further enhanced to also support the post-bond testing procedure. Since the scan test frequency is lower than the operating frequency ( $f_{test\_postbond} < f_{res}$ ), to support the resonant operation during post-bond scan test and deliver a full swing clock signal to the sink nodes, a larger resonant inductance should be utilized ( $L_{test\_postbond} > L_{res}$ ). The power consumption during scan test is lower than normal operation due to the lower frequency. Consequently, the main concern is signal swing. To reduce the area overhead, we add a single inductor to the first plane to satisfy the resonant constraints at scan test frequency. Note that this inductance is not required to be distributed among the planes, since the post-bond testing is performed where all the clock networks are connected with the TSVs and distributing this inductance results in wasting area. Since the  $R_{driver}$  and resonant parameters are determined in the previous step, by substituting the test frequency in (4.1) and fixing  $|H_{out}|$  to 0.9, the size of this extra inductor can be determined.

#### 4.1.2 Simulation results

Assuming an H-tree network as the preferred network in [20], the number of sinks is determined based on the circuit area *A* and the load capacitance that each sink drives,  $C_L$ . A case study of an H-tree resonant clock network with 256 leaves is considered. The load capacitance at each node is assumed to be 20 fF and the operating frequency is 5 GHz. The interconnect length can be determined from the total network area. For a circuit area of the length of the longest and shortest interconnects is l/4 and  $l/2^n+2$ , respectively, where *n* indicates the number of sinks.The PTM model for a 0.18  $\mu$ m CMOS technology is used to estimate the resistance, inductance, and capacitance of the horizontal interconnects. The total area of the network is 3.4 mm× 3.4 mm. The parameters of the interconnects are listed in Table 4.1, where  $L_1$  to  $L_8$ indicate the different wire segments from the driver to the sinks.

|                | $L_1 - L_5$ | $L_6$ | $L_7$ | $L_8$ |
|----------------|-------------|-------|-------|-------|
| $R[\Omega/mm]$ | 2.75        | 5.5   | 11    | 22    |
| L[nH/mm]       | 0.46        | 0.6   | 0.72  | 0.82  |
| C[fF/mm]       | 254.6       | 175.4 | 130   | 103   |

Table 4.1: Interconnect parameters used in the investigated clock network

For a conventional (non-resonant) clock network, inverters are properly inserted at the intermediate nodes to deliver a full swing clock to the output, while in the resonant clock network, resonant circuits are added to the clock tree to provide a proper clock signal at the output. The amount of the resonant inductance is determined as described in the previous section. The decoupling capacitor that should be sufficiently large not to affect the frequency of resonance is set to 60 pF. The effective series resistance (ESR) for the inductors is determined from [20].

Different topologies of 3-D circuits are explored. To form a 3-D system, the 2-D circuit is folded into several planes. The electrical and physical characteristics of the TSVs used to connect

these planes are based on [62]. The number of LC tanks, the inductor size for each resonant circuit, and the clock driver resistance for normal and pre-bond operation are listed in Table 2. The size of the wires in the upper planes for pre-bond test is determined and compared with the size of the wires in the first plane. The resulting decrease in wire width is also listed in this table

Increasing the number of TSVs can result in a smaller primary driver and lower power due to the decreased capacitive load of the clock network. Alternatively, increasing the resistance of the circuit requires a larger primary driver, increasing the power consumed by the clock network. Predicting which behavior is dominant is not straightforward and strongly depends on the interconnect characteristics of the clock network. Using wide wires results in stronger capacitive behavior, while in long wires the resistive component can become dominant. For this case study, as shown in Table 2, there is not a uniform trend for the design parameters as a function of the number of TSVs.

The power consumed by different topologies for standard and resonant clock networks is listed in columns 7 and 8 of Table 4.2. As reported in this table, the power consumed by the resonant clock network is considerably lower than the standard network in 3-D circuits. This improvement is accompanied by an increase in the area occupied by the resonant circuits. The area of a resonant clock network increases due to these additional circuits, but alternatively, omitting the clock buffers can decrease the area of the resonant networks.

Increasing the number of planes decreases the length of the wires in the network for a specific number of sinks. Omitting long interconnects as required for some of the topologies shown in Figure 4.7 reduces the resistive voltage drop and the capacitance of the network and decreases the power consumed by the network. For a 3-D circuit with two planes, the power consumption reduces up to 64%, where this reduction reaches to 70% and 72% for circuits with four and eight planes, respectively. Decreasing the equivalent capacitance of the network also results in larger resonant inductance and increases the area of the resonant clock network. Alternatively, by increasing the number of planes, the driver size and the width of the additional wires required for pre-bond test, reduces due to the smaller circuit area in each plane.

The preferred 3-D structure that decreases the power consumption of the clock network by 72%, consists of eight planes which is the maximum number of planes investigated in this case study. Each plane connects to the others using 2 TSVs. 16 *LC* tanks are used in this structure (two *LC* tanks per plane) and the inductance for each *LC* tank is 2 nH. The number of planes and the number of *LC* tanks is determined such that the highest voltage swing is achieved (which implies that the Q-factor of the spiral inductors is also the highest).

# 4.2 Resonant clocking for synthesized networks

Different methods for allocating the *LC* tanks in symmetric clock distribution networks have been discussed in previous section. A new method for allocating the *LC* tank for H-trees is

|             |        | #LC   | <i>I</i> [nH] | R <sub>driver</sub> | $R_{driver}[\Omega]$ | Wire width    | power    | r [mW]   |
|-------------|--------|-------|---------------|---------------------|----------------------|---------------|----------|----------|
|             |        | tanks | L[III]        | [Ω]                 | pre-bond             | reduction (%) | Standard | Resonant |
| 2-D         | -      | 32    | 2.2           | 1.2                 | -                    | -             | 543      | 310      |
|             | 1 TSV  | 16    | 1.5           | 3.52                | 7.16                 | 0             | 385      | 241      |
|             | 2 TSV  | 16    | 1.4           | 2.95                | 6.2                  | 50            | 374      | 230      |
| 3-D         | 4 TSV  | 16    | 1.5           | 2.77                | 6.1                  | 50            | 353      | 244      |
| 2<br>planes | 8 TSV  | 16    | 1.3           | 2.98                | 5.6                  | 50            | 312      | 196      |
|             | 16 TSV | 32    | 3             | 2.55                | 5.7                  | 50            | 347      | 218      |
|             | 32 TSV | 64    | 5             | 0.97                | 4.5,2.4              | 25,50         | 304      | 203      |
|             | 1 TSV  | 16    | 1.8           | 2.15                | 8.8                  | 0             | 308      | 190      |
| 3-D         | 2 TSV  | 16    | 1.6           | 3.55                | 10                   | 87.5          | 297      | 174      |
| 4           | 4 TSV  | 16    | 1.7           | 3.9                 | 10.8                 | 81.5          | 289      | 162      |
| planes      | 8 TSV  | 32    | 3.2           | 3.5                 | 8.4                  | 81.5          | 295      | 176      |
|             | 16 TSV | 64    | 6.5           | 3                   | 6.9                  | 75            | 308      | 184      |
|             | 1 TSV  | 8     | 1             | 2.4                 | 17.8                 | 0             | 299      | 173      |
| 3-D         | 2 TSV  | 16    | 2             | 3.5                 | 20                   | 87.5          | 283      | 152      |
| b<br>planes | 4 TSV  | 32    | 3             | 3.27                | 15                   | 81.5          | 311      | 188      |
| <b>`</b>    | 8 TSV  | 64    | 5             | 2.65                | 9                    | 50            | 307      | 181      |

Table 4.2: Design parameters and power consumption for different topologies.

proposed here. This method is applicable to symmetric structures where the location and number of H-trees are dependent and the number of *LC* tanks is power of two. In [21], the *LC* tanks are placed at equidistant points from the root which is a proper method for symmetric clock trees, such as H-trees and binary trees. The performance of this method degrades for asymmetric clock trees since maintaining equal distances to the root results in sub-tress with dissimilar capacitance resonating with resonant inductors of the same size.

The contribution of this work is a methodology that determines the minimum number of *LC* tanks that can deliver a full swing signal to the sink nodes in a synthesized clock tree and determine the proper resonant parameter for these *LC* tanks. The resonant parameters can be determined to satisfy one of the two objectives, minimizing power or the area of the inductors.

An early method to apply the resonant clocking to synthesized trees is proposed in [21]. This method allocates the *LC* tanks at equidistant points from the root node. The location of *LC* tanks is swept from the root toward the sinks to find the maximum signal swing. Maintaining the distance from the *LC* tanks to the root constant reduces the number of candidate *LC* tank locations which can degrade the performance of this method. In asymmetric clock networks, for long branches (which can lead to lower signal swing at the correspond sinks) placing the *LC* tanks closer to the sinks can improve the signal swing, which is not supported by this method.

Other approaches for applying resonant clocking for synthesized clock networks are presented in [129, 130]. These methods are proposed for grid clock network structure where the capacitance of the network is almost equally distributed. LARCS [130] chooses a small library

of resonant inductance and for each node determines a vicinity of nodes so that the total node capacitance resonates with the employed inductance at the desired clock frequency. Using limited candidates for resonant inductance reduces the complexity of LARCS but on the downside the performance of the method can degrade. The LARCS method is also applicable to clock trees, but due to the highly irregular structure of trees determining the appropriate local regions (vicinities) to resonate with the same inductance can be a formidable task. The length of the tree branches and their capacitance is not uniform in clock trees and, very often, the branches near the root are much longer than the interconnect segments near the sinks. In Figure 4.11, a simple example of a clock tree is shown where the length and capacitance of  $W_1$  and  $W_2$  is much larger than  $W_3$  to  $W_6$ . For node  $N_1$ , the vicinity includes  $W_1$  or  $W_2$  (or both) and for  $N_2$ , the vicinity includes  $W_3$  and  $W_4$ ; therefore the capacitance for the vicinity of  $N_1$  is much larger than the capacitance for vicinity of  $N_2$ . The use of the same resonant inductance for these two vicinities results in quite different resonant frequencies.



Figure 4.11: Simple clock tree with unbalanced branches.

In this section a new method for applying the resonant approach to synthesized trees is introduced. As discussed in our work in [131], this method minimizes the number of LC tanks that suffice to deliver a full swing signal to all the sink nodes (*i.e.* a signal swing greater than 0.9). The important contribution of this method is to properly allocate the LC tanks along the clock network for any number of tanks. Later in this section it will be shown that the signal swing for the branches with higher impedance is lower than other branches and resonant behavior can change the capacitive element of the impedance. Consequently, locating the LC tanks considering the capacitive load of the nodes is a proper method to improve the signal swing. In the proposed method, there is at least one LC tank from the root to each sink. The method begins with the placement of one LC tank at the root. If a full signal swing for all the sinks is not achieved, the number of LC tanks is increased and the next candidate LC tank is added to the node with the highest capacitance. The number of LC tanks, the transfer function for all the sinks is determined for a wide range of resonant inductance. Using distributed

*RLC* model for the interconnects, the transfer function for each sink can be determined as a function of resonant inductance. The area budget defines the upper limit for the inductance range. If there is an inductance range that can satisfy the output voltage swing requirement, the number and location of *LC* tanks have been determined. If increasing the number of *LC* tanks up to reaching the area budget does not result in delivering a full swing signal to all the sinks, the *LC* tank locations and resonant inductance that result in the highest amplitude for the transfer function is determined as the final solution. Alternative methods such as buffer insertion should be employed for this resonant network to supply a full swing signal for the sinks.

In the range of inductors that can result in full signal swing, two approaches can be considered. In first approach, called *minPow* hereafter, the power consumption of the network is minimized and in second one called *minArea* the area overhead of the resonant inductors is minimized. The transfer function and power consumed by a clock tree are shown in Figure 4.12 where selecting  $L_1$  as the resonant inductance for *minArea* reduces the area of these inductors and choosing  $L_2$  for *minPow* results in lower power consumption for the clock network.



Figure 4.12: Signal swing and power consumption vs. resonant inductance.

Comparing to the method in [21], which is the only method presented for synthesized trees, the proposed method uses a more efficient parameter to locate the *LC* tanks and sweeping the resonant inductance results in better power and/or area rather than the first-order estimation used by [21]. Three basic properties of resonant behavior, which are the foundations of the proposed methodology, are described hereafter.

**Property 1.** For two parallel branches, the branch with the higher impedance exhibits the smaller signal swing.

If we model the entire clock network with a single  $RC\pi$ -section as illustrated in Figure 4.13,

the transfer function at the output node is determined as

$$|H| = \frac{1}{\sqrt{\left(1 + R_N^2 C_L C_N \omega^2\right)^2 + R_N^2 \cdot \left(4C_L^2 + C_N^2\right) \cdot \omega^2}}$$
(4.3)

which indicates that by increasing the resistance and capacitance of the clock network denoted by  $R_N$  and  $C_N$ , respectively, or the load capacitance  $C_L$  the amplitude of the transfer function decreases. Consequently, the branch with the higher impedance has the smaller transfer function and exhibits the lower signal swing.



Figure 4.13: Lumped model for a clock distribution network.

**Property 2.** Adding a resonant inductor to a clock network can, simultaneously, reduce the power and improve signal swing.

Resonant behavior occurs, where in a clock cycle the energy alternates between electric and magnetic fields. In an electric circuit, resonance occurs when the inductive and capacitive part of the impedance cancel each other. Therefore adding the resonant inductor ideally cancels the imaginary part of the circuit impedance due to the capacitive components. In real (non-ideal) clock network, since the capacitance is distributed along the interconnects, adding a lumped inductor to the network cannot completely cancel the capacitive part of the impedance but it reduces capacitance of the circuit. In the  $\pi$  model of the interconnect, adding the resonant inductor in parallel to capacitance of the network, increases the capacitive part of the impedance. The input impedance of the network and output voltage transfer function can be described as:

$$Z_{in} = R + \frac{1}{\frac{1}{X_C} + \frac{1}{R + X_L}}$$
(4.4)

64

$$\frac{V_O}{V_{in}} = \frac{X_L}{X_L + R} \cdot \frac{1}{1 + R \cdot \left(\frac{1}{X_C} + \frac{1}{R + X_L}\right)}$$
(4.5)

where  $X_C$  and R stand for capacitive and resistive part of clock network impedance, respectively, and  $X_L$  stands for the impedance of the load. As shown in (4.4), by increasing  $X_C$ , the input impedance of the circuit increases which results in decreasing the power consumed by the clock network where the signal swing is also increased as shown in (4.5).

**Property 3.** Adding an *LC* tank to a node of a branch increases the signal swing of its descendants more than other sink nodes.

A segment of a clock network with two parallel branches is shown in Figure 4.14. The transfer function from  $V_{O2}$  and  $V_{O3}$  to  $V_1$  can be determined using (4.5). By adding the *LC* tank to the first branch,  $X_{C2}$  and, consequently, the  $\frac{V_{O2}}{l}V_1$  increases where  $\frac{V_{O3}}{l}V_1$  is constant.



Figure 4.14: Two parallel branches of a clock tree.

Based on these properties, an algorithm is devised to find the proper location for the *LC* tanks. In a synthesized clock tree, the signal swing at different sink nodes is not equal. As described in property 1, the branch with higher impedance exhibits lower signal swing at the sink nodes. Consequently, to provide a uniform signal swing at the sink nodes the signal swing of high impedance branches should increase more than other branches. As mentioned in property 2, the signal swing improvement can be achieved adding resonant inductors. Property 3 suggests adding the resonant inductor to the branches with lower signal swing (*i.e.* higher impedance) to better improve the signal swing of these branches. Based on these properties, the proposed algorithm employs the input impedance of each node as a parameter to locate the *LC* tanks. Since the goal of this algorithm is to reduce the number of *LC* tanks, the algorithm starts with one *LC* tanks to reach the full swing signal at all sink nodes. In each step of the algorithm, the location of the new *LC* tank is determined considering the input impedance of the input impedance of the number of the number of the number of the new *LC* tank is determined considering the input impedance of the input impedance of the number of the number of the number of the new *LC* tank is determined considering the input impedance of the input impedance of the number of the number of the new *LC* tank is determined considering the input impedance of the intermediate nodes of the clock network.

The algorithm starts from the tree that represents the topology of the distribution network.

"Breadth first " traversal is used where each node has a certain level (depth) in the tree as shown in Figure 4.15. The algorithm starts by adding one *LC* tank to the root node and evaluating the transfer function at all the sink nodes. A proper method to calculate the transfer function in tree structures is to use *Direct Truncation of the Transfer Function* (DTT) [22]. DTT is a recursive method producing the transfer function of a tree structured interconnect based on the transfer function of the sub-blocks of the circuit. For the circuit shown in Figure 4.16 the transfer function of node k,  $T_k(s)$  is determined as

$$T_k(s) = \frac{N_k(s)}{D(s)} \tag{4.6}$$

$$D(s) - N_1(s) = (s \cdot R_1 + s^2 L_1) \cdot \sum C_k \cdot N_k(s)$$
(4.7)

$$N_1(s) = D_1(s).D_r(s)$$
(4.8)

where  $N_k(s)$  and  $D_k(s)$  are the nominator and denominator of the transfer function at node kand  $D_l$  and  $D_r$  are the denominators of the transfer function for the right and left sub-blocks.  $L_k$ ,  $C_k$ , and  $R_k$  are the inductance, capacitance and resistance at node k, respectively, as shown in Figure 4.16. This approach is quite convenient for adding and omitting *LC* tanks since each *LC* tank can be treated as a sub-block added in parallel to the clock network circuit.

If the amplitude of the transfer function for all the sinks is more than 0.9, the algorithm terminates. Otherwise, the candidate locations for *LC* tanks are at the nodes of "Level 1". The nodes of "Level 1" are sorted according to the capacitive load seen at each node. First, the *LC* tank is added to the node with the highest capacitive load. The number of *LC* tanks increases till a full swing signal is exhibited to all the sink nodes. If adding the *LC* tank to all the nodes in "Level 1" cannot support a full swing clock signal for the sinks, the algorithm progresses to the next level downstream. The algorithm iterates until the desired signal swing at all sink nodes is achieved.

This algorithm can reduce the number of *LC* tanks as compared to the previous method of allocating resonant inductors for clock trees [21], particularly for unbalanced clock trees. Consider the example clock tree shown in Figure 4.17 where allocating one *LC* tank on the root can provide a full swing signal for sink nodes  $s_1$  to  $s_4$ . By using the proposed *LC* location algorithm, which places the second *LC* tank at node  $n_1$ , a full swing signal is delivered to nodes  $s_5$  to  $s_7$ . In previous method [21], the *LC* tanks are located at equal distance from the root and exploiting this method for this clock tree requires at least three *LC* tanks to provide a full swing



Figure 4.15: Different levels of intermediate nodes for a tree with *N* levels.



Figure 4.16: sub blocks of the circuit in DTT approach [22].

signal for all the sinks.

#### 4.2.1 Simulation results

In this section the proposed method is applied to the synthesized clock trees from 2010 ISPD benchmark set. A clock frequency of 1 GHz is assumed and the technology data for 0.18  $\mu$ m is used to construct the case studies. The output resistance of the clock driver is set to 10  $\Omega$ . The decoupling capacitor is 15 pF which is large enough not to interfere in clock frequency. The power consumption, signal swing, and area of the *LC* tanks of the proposed method are compared with the methods presented in [21] and [130]. The area of the inductors is

#### Pseudo code of LC tanks placement Algorithm

#### Main

}

*N*= Number of levels in tree  $Cur_l = 1 / * Current level * /$  $H_{best} = 0$ Put initial *LC* tank location at root repeat { Determine the transfer function if  $(H(L) > H_{best})$  then  $H_{best} = H(L)$ Best Location = current LC tank location Add-LC-tank (Cur<sub>1</sub>) Until (voltage swing is satisfied) Determine-resonant-inductance

#### Determine-resonant-inductance

Design objectives: minPow, minArea for (L|H(L) > 0.9) { if *minPow* minimize (power(L)) /\* plot power(L) \*/ else if minArea minimize (L) } return

### Add-LC-tank (Cul)

If ( $Cur_l$  is full) { if  $(Cur_l = N)$  return  $Cur_l = Cur_l + 1$ } determine the location of next LC tank return

estimated using

$$L = K_1 \cdot \mu_0 \frac{n^2 \cdot d_{avg}}{1 + K_2 \rho}$$
(4.9)

where  $K_1$  and  $K_2$  for square inductors are, respectively, 2.34 and 2.75 [132].  $d_{avg} = (d_{out} + d_{in})/2$ and  $\rho = \frac{d_{out} - d_{in}}{d_{out} + d_{in}}$  where  $d_{out}$  and  $d_{in}$  are outer and inner diameter of the inductor. To approximate the area of the resonant inductors, the ratio between  $d_{out}$  and  $d_{in}$  is considered



Figure 4.17: An example of an unbalanced clock tree.

to be 3, which is a practical ratio to have a proper magnetic flux [133] and results in  $\rho = 0.5$ . The area of the inductor can be described as:

$$Area = d_{out}^2 = \left(\frac{3.L.(1+0.5.K_2)}{2.K_1\mu_0 n^2}\right)^2,\tag{4.10}$$

where the number of turns for all of the inductors is considered to be 4 in this study.

To determine the resonant inductance, for LARCS a library of four inductors; 8 nH, 10 nH, 12 nH, and 15 nH is used where for the method presented in [21] the first-order estimation is utilized. For the proposed method, the *minPow* and *minArea* approaches are considered as discussed in the previous section. When the resonant inductance is determined, the corresponding spiral inductor with a high quality factor and low area should be designed. There are different simulation tools to design spiral inductors such as COMSOL [134], ASITIC [135], and Sonnet [136].

The transfer function for a synthesized tree with 1016 sinks is plotted in Figure 4.18. As shown in this Figure, adding 15 *LC* tanks in the nodes as determined by the proposed method can deliver a full swing signal to the sink nodes where using LARCS method results in inadequate clock signal swing although 58 *LC* tanks are employed. The method of [21], adds 14 *LC* tanks to the clock tree where the amplitude of transfer function is 0.5 and clock buffers should be used to deliver a full swing signal to the sinks. Simulation results show that LARCS is not working properly for clock trees which is expected since this method is proposed for clock grids.

Design parameters and simulation results for different clock trees are listed in Table. 4.3 and 4.4. Number of *LC* tanks, resonant inductance, area overhead, and the power consumed by method proposed in [21], LARCS, *minPow* and *minArea* are reported in Table 4.3 and area overhead, and the power consumed by these methods are listed in Table 4.4. Comparing two



Figure 4.18: Comparison of a synthesized tree with 1016 sinks among different design methods where (a) is the transfer function for a sink node and (b) is the power consumption.

approaches of proposed method shows that the first approach reduces the power consumption up to 14.7% where the second approach reduces the area overhead up to 19%. Consequently none of these approaches has a great preference to the other one and specific design objectives determine which approach works better for a certain case.

The power consumed by the clock distribution network is reduced up to 57% applying resonant clocking scheme as compared to a standard clock network. The amplitude of transfer function for [21] is around 0.5, while the proposed method delivers full swing signal improving the signal swing up to 80%. The number of *LC* tanks for the proposed method and the method presented in [21] is comparable where the *minPow* algorithm leads to an inductor area decrease by 51% since the inductors used by the proposed method are smaller than the inductors determined by the first order estimation in [21]. This situation is because the first order estimation neglects the inductive parameters of the interconnect and overestimates

| # sinks  | [21]          |                 | LADCS [120]   |                 | Proposed Method |                 |                 |  |
|----------|---------------|-----------------|---------------|-----------------|-----------------|-----------------|-----------------|--|
| # 5111K5 | [ [4          | 51]             | LAICS [150]   |                 |                 | minPow          | minArea         |  |
|          | #<br>LC tanks | Res_Ind<br>(nH) | #<br>LC tanks | Res_Ind<br>(nH) | #<br>LC tanks   | Res_Ind<br>(nH) | Res_Ind<br>(nH) |  |
| 1107     | 14            | 11.5            | 58            | 10              | 15              | 8.3             | 7.8             |  |
| 2249     | 27            | 15              | 71            | 12              | 23              | 10.8            | 9.7             |  |
| 1845     | 20            | 13              | 63            | 10              | 18              | 9.6             | 9               |  |
| 1915     | 19            | 20              | 42            | 15              | 18              | 16.2            | 14.5            |  |
| 1016     | 15            | 18              | 35            | 15              | 13              | 16              | 13              |  |
| 1134     | 17            | 19.5            | 29            | 15              | 14              | 15              | 12.5            |  |

Table 4.3: Design parameters for different design methodologies.

Table 4.4: Power consumption and inductor area for different design methodologies.

| #     |               | [21] |                                |               |      | Proposed Method                |     |               |                                | _             |                                |                    |
|-------|---------------|------|--------------------------------|---------------|------|--------------------------------|-----|---------------|--------------------------------|---------------|--------------------------------|--------------------|
| Sinks |               | [21] |                                | LARCS [150]   |      | LAKUS [130]                    |     | mir           | ıPow                           | min           | ıArea                          | wer<br>dard<br>W)  |
|       | Power<br>(mW) | H    | Ind Area<br>(mm <sup>2</sup> ) | Power<br>(mW) | H    | Ind Area<br>(mm <sup>2</sup> ) | H   | Power<br>(mW) | Ind Area<br>(mm <sup>2</sup> ) | Power<br>(mW) | Ind Area<br>(mm <sup>2</sup> ) | Pov<br>Stan<br>(m) |
| 1107  | 60            | 0.5  | 10.6                           | 70            | 0.35 | 47.5                           | 0.9 | 47            | 6                              | 55            | 5.25                           | 113                |
| 2249  | 136           | 0.43 | 34.8                           | 184           | 0.3  | 58.2                           | 0.9 | 116           | 15.4                           | 134           | 12.4                           | 271                |
| 1845  | 45            | 0.45 | 19.4                           | 138           | 0.4  | 51.6                           | 0.9 | 37            | 9.4                            | 41            | 8.28                           | 68                 |
| 1915  | 36            | 0.55 | 43.6                           | 41            | 0.48 | 54.2                           | 0.9 | 29            | 27.1                           | 33            | 21.7                           | 59                 |
| 1016  | 28            | 0.57 | 27.8                           | 35            | 0.53 | 45.2                           | 0.9 | 21            | 19.1                           | 24            | 12.6                           | 43                 |
| 1134  | 40            | 0.6  | 37                             | 54            | 0.55 | 37.4                           | 0.9 | 32            | 18                             | 37            | 12.5                           | 61                 |

the resonant inductance). This improvement increases up to 57% for the *minArea*. Simultaneously the power consumed by the proposed method is decreased up to 25% and 14% for the *minPow* and *minArea* approach as compared to the method presented in [21]. Using the proposed method drastically decreases the resonant inductance compared to previous methods. Although the inductor area for *minArea* is 19% less than *minPow*, comparing to the previous methods the area improvement for these two methods is in the same range.

Designing resonant clock networks has been considered in previous sections. In next section we study pre-bond testability for designed networks and exploit inductive links to wirelessly transmit the clock signal to the disjoint resonant clock networks.

# 4.3 Transceiver circuit for the inductive link

Pre-bond testability presents new challenges to 3-D clock network design primarily due to the incomplete clock distribution networks prior to the bonding of the planes. As mentioned before, inductive links are exploited to supply the clock signal for all sink nodes during the

#### **Chapter 4. Resonant Clocking**

pre-bond test. The inductors within the *LC* tanks are used as the receiver circuit for the links, essentially eliminating the need for additional circuits and/or interconnect resources during pre-bond test.

Contactless interfaces use AC-coupling to transfer the data. An *AC Coupling Interface* (ACCI) uses the transitions of a digital signal as the useful part of information and discards the DC component. In inductive links, the transferred power can be adapted by different circuit parameters, such as the number of the inductor turns [36].

The quality of an inductive link is a function of the electric and magnetic characteristics of the link. Misalignments between the horizontal locations of the inductors in adjacent planes can affect the coupling coefficient. Furthermore, thermal issues can change the electrical components of the link, such as the metal resistance.

An inductive link consists of two spiral inductors and a current based transceiver. As shown in Figure 4.19, the transmitter converts the input voltage to current pulses, which are coupled through spiral inductors and recovered as voltage pulses at the receiver.



Figure 4.19: Basic operation of an inductor link, where (a) is the inductively coupled transceiver and (b) is the equivalent circuit for the spiral inductor [2,3].

In standard clock networks, the clock signal is pulse shaped and if inductive links transfer the clock signal, the coupled current and voltage should be recovered at the receiver as shown in Figure 4.19. Alternatively, in resonant clock networks, the global clock signal is sinusoidal and the buffers at sink nodes can convert the clock signal to a square waveform. Alternatively, proper flip-flops for resonant clock networks can be utilized [137]. Due to the sinusoidal shape of the coupled clock signal in resonant clock networks, the receiver circuit in these networks is less complex as compared to receivers in conventional clock networks. As indicated in Figure 4.20, the receiver voltage is also sinusoidal and no additional receiver circuit is required to recover the transferred voltage.

The circuit of the inductive link is illustrated in Figure 4.21. The inductance of the receiver  $L_R$ 



Figure 4.20: Sinusoidal global clock signal in resonant clock networks.

is determined as described in the previous section. The coupled voltage is determined by

$$V_R = L_R \frac{dI_R}{dt} + M \frac{dI_T}{dt}$$
(4.11)

$$|V_R| = (L_R.I_{Rmax} + M.I_{Tmax}).\omega \tag{4.12}$$

where  $\omega$  denotes the frequency of the coupled signal.



Figure 4.21: Simplified transceiver circuit for an inductive link.

The clock signal at the sink nodes is described by

$$V_{sink} = H_d(j\omega).V_R \tag{4.13}$$

where  $H_d(j\omega)$  is the transfer function for the local clock tree and  $V_{sink}$  is the clock voltage at the sink nodes. The transmitter inductance  $L_T$  and, hence, the transmitter current  $I_T$  should be determined such that the clock signal at sink nodes described by (4.13) is a full swing signal in both frequencies. From (4.11) and (4.13),  $I_T$  is written as

$$I_T = \frac{V_s ink}{Z_d(\omega)} \cdot \frac{Z_d(\omega) - L_R \cdot j \cdot \omega}{M \cdot j \cdot \omega}$$
(4.14)

where  $Z_d$  denotes the impedance of the clock network.

The capacitance *C* is added to the transmitter circuit to reduce the source current and power consumption of the tester since the source current can be described by

$$I_{S} = I_{T} \cdot (1 - L_{T} \cdot C \cdot \omega^{2}) \tag{4.15}$$

The inductance and capacitance of the transmitter are determined such that Is is minimized. The amplitude of Is for a wide range of  $L_T$  and C is shown in Figure 4.22 where  $I_T$  is 20 mA and the clock frequency is 1 GHz. As shown in this figure, there are several choices for  $L_T$  and C that minimize  $I_s$ .

To provide a full swing clock signal in both scan and at-speed test modes, the coupled voltages in these two modes should be determined such that

$$V_{sink}(\omega_s) = V_{sink}(\omega_a), \tag{4.16}$$

where  $\omega_s$  and  $\omega_a$  denote, respectively, the scan and at-speed test frequencies.

Since the magnitude of the received voltage is proportional to the signal frequency, to deliver a full swing clock for pre-bond test in different frequencies, the *LC* parameter of transmitter circuit must be adapted. The scan test clock frequency is lower than the clock frequency in normal operation. Consequently, a larger inductance is needed during scan test to deliver a full swing clock signal. To satisfy this constraint, we have used an additional *LC* circuit



Figure 4.22: The transmitter source current versus the transmitter inductance and capacitance.

to amplify the low frequency current coupled to the chip. This auxiliary circuit should be switched off during high frequency at-speed test mode. The schematic of the transmitter circuit is depicted in Figure 4.23.

For high frequency operation-including at-speed test and normal operation-,  $\omega_a$  (*i.e.* at-speed frequency) is applied in (4.13) to (4.15) to determine  $L_T = L_{T1}$ , and  $C = C_1$ . For scan test mode, by using  $\omega_s$  in (4.13) to (4.15) and having  $L_T = L_{T1} + L_{T2}$  and  $C = C_1 + C_2$ , we can determin  $L_{T2}$  and  $C_2$ . The scan-enable signal controls a transmission gate, which connects/disconnects the auxiliary *LC* circuit during scan/at-speed test modes.



Figure 4.23: Schematic of the transceiver circuit used for wireless pre-bond test.

The equivalent lumped model used for the inductors is shown in Figure 4.24 where  $L_S$ ,  $R_S$ , and  $C_S$  denote the inductor, the series resistance due to the conductor losses, and parasitic coupling capacitance among the inductor turns, respectively.  $C_{SUB}$ ,  $C_{OX}$ , and  $R_{SUB}$  are, respectively, the parasitic capacitances of the silicon substrate and the inter-metal dielectric, and the resistance due to the eddy currents in the substrate. This model can properly describe the behavior of spiral inductors up to several giga hertzes. The parameters of this model can be determined by applying the 0.18  $\mu$  m technology parameters for the expressions presented in [132].

The receiver circuit consists of the resonant inductor and a small voltage divider to produce

the voltage offset. The inductive link can only transfer the AC signals and the coupled voltage has no DC offset. To adapt the voltage level of the received signal to the CMOS logic level, a voltage divider is used to produce a 0.9 V DC voltage from a 1.8 V voltage source. This circuit is connected to the local clock network by a transmission gate, which is on during the pre-bond test operation and is off in normal operation mode. Note that this voltage divider is practically the sole overhead of the proposed method.



Figure 4.24: Lumped model of a spiral inductor [20].

As compared to previous methods to ensure pre-bond testing, such as using redundant wires or adding a DLL for each local clock network, employing inductive links for pre-bond test has the least area overhead occupied by the additional circuit used for test purposes. The power consumed in test mode is lower for this method, since the power consumed by DLLs or the power dissipated in redundant wires is virtually eliminated in this approach.

## 4.3.1 Simulation results

In this section, a resonant clock network for a two plane 3-D circuit is designed and simulated using Cadence Spectre. For pre-bond test, two methods are investigated, the redundant wiring and wireless testing as depicted in Figs. 2(b) and 2(d). For the first approach the width of redundant wires and test clock drivers is determined as described in [128]. A comparison is also offered with the method in [102], where the transistor count and power are used as a reference.

A case study of a 3-D H-tree resonant clock network with 256 leaves is considered. The 3-D circuit is assumed to consist of two planes each containing 128 sink nodes. The load capacitance at each node is assumed to be 20 fF and the operating and test frequencies are 1 GHz and 400 MHz, respectively. The total area of the network is 3.4 mm  $\times$  3.4 mm. The wire segments are the same as mentioned in table 4.1 in Subsection 4.1.1.

The amount of the resonant inductance is determined as described in Section 4.1. The decoupling capacitor that should be sufficiently large to not affect the frequency of resonance

is set to 60 pF. The number of *LC* tanks, the inductor size for each resonant circuit, and the clock driver resistance are listed in Table 4.5.

|        | #LC   | I [nH] | R. [0]                  | power [mW] |          |  |
|--------|-------|--------|-------------------------|------------|----------|--|
|        | tanks |        | ndriver <sup>[22]</sup> | Standard   | Resonant |  |
| 1 TSV  | 8     | 8      | 6.2                     | 137.5      | 110      |  |
| 2 TSV  | 8     | 7.5    | 5.9                     | 139        | 107      |  |
| 4 TSV  | 16    | 17.5   | 6.1                     | 147        | 103      |  |
| 8 TSV  | 16    | 16     | 6.5                     | 130        | 85       |  |
| 16 TSV | 32    | 31     | 5.8                     | 136        | 98       |  |
| 32 TSV | 64    | 50     | 3.5                     | 132        | 96       |  |

Table 4.5: Design parameters and power consumption for different topologies.

A path of the preferred structure among the different variants is shown in Figure 4.25(a) where the TSVs are placed on the fourth level of the network. In this structure, eight TSVs are used to connect the two planes and each plane contains eight *LC* tanks with a resonant inductance of 16 nH for each tank. The power consumption of the clock network is reduced by 35% in comparison to a standard clock network. For the redundant wiring approach, shown in Fig. 10(b) the width of redundant wires and test clock drivers is determined as described in [128]. The redundant wire width is determined to be half the wire width in the first plane and the driver size is also reduced to 47% of the main clock driver.

For the off-chip transmitter in the wireless approach depicted in Figure 4.25(c), a sinusoidal voltage source with an output resistance of 25  $\Omega$  is considered. The coupling coefficient between the transmitter and the receiver is considered to be 0.25 that is a typical value in previous studies [97]. The transmitter inductance for the operating frequency  $L_{T1}$  and the corresponding capacitance  $C_1$  are 17 nH and 6 pF, respectively. The auxiliary inductance  $L_{T2}$  and capacitance employed for the scan test frequency  $C_2$  are 22.5 nH and 7 pF respectively. As shown in Figure 4.26, this circuit is capable of switching between the test and operating frequency when the scan enable signal toggles without missing any clock cycle.

The resistance of the voltage divider  $R_d$  is 100 k $\Omega$ . A high resistance is used to reduce the current of the voltage divider. A transmission gate can be employed to switch the voltage divider off during normal operation, eliminating the leakage current of the divider while negligibly increasing the area overhead of the test circuit.

For DLLs, the power and transistor count reported in [102] are considered where the circuit has also been designed in a 0.18  $\mu$ m technology process. Eight DLLs are used to supply eight disconnected clock trees. For redundant wiring, the area and power for additional wires are investigated and for the wireless method the area and power of the voltage divider and the transmission gates are considered. Since the voltage divider is much smaller than the DLLs and additional wires, a considerable reduction in power and area overhead is achieved by the wireless approach. The power consumption and area overhead for the compared methods of



Figure 4.25: Preferred structure for resonant clock network where (a) is the post-bond network, (b) is the network in pre-bond testing mode using redundant wiring and (c) is the network using wireless pre-bond testing.

pre-bond testing are listed in Table 4.6. Note that all the methods require a control signal to disconnect/switch off the wires, the transmission gates and/or the DLLs. Since this signal is common to all methods and has almost the same fan-out is not included in this comparison.

|                                      | DLL [102] | Redundant wiring | Proposed method |
|--------------------------------------|-----------|------------------|-----------------|
| Power (mW)                           | 6.5       | 2.8              | 0.026           |
| Interconnects area (m <sup>2</sup> ) | 0         | 110200           | 1600            |
| Transistor count                     | 3664      | 20               | 16              |

Table 4.6: Power and area overhead for different methods of pre-bond testing.

The proposed approach reduces the power consumption by 99% and 99.6% as compared to using redundant wiring and DLLs, respectively. The interconnect area is reduced by 98.5% in comparison to redundant wiring and the transistor count is reduced by 99.5% as compared to employing DLLs.

The main improvements offered by this method are scalability, ease of test, reducing the test power, reducing the silicon area, and increasing the reliability. The reliability is ensured by preventing potential damages, which can occur during probing pads in wired test approaches.



Alternatively, during normal operation, the use of resonant clocking offers significant power savings, demonstrating the advantages of this unified scheme.

Figure 4.26: Clock signal at the receiver side of the inductive link where (a) is the clock signal received by the inductor and (b) is the clock signal after the clock buffers at the sink nodes.

# 4.4 Summary

Exploiting resonant clock networks for 3-D circuits is studied in this chapter. In first section, a design methodology for H-tree resonant clock networks in 3-D circuits is proposed. The number of *LC* tanks, the resonant circuit parameters, and the driver size for normal operation are determined such that a full swing signal is provided at the sink nodes and the power consumption of the circuit is minimized. The effect of different parameters including the number of planes and number of TSVs among the planes for designing 3-D resonant clock networks is investigated. An approach to minimize the additional wire width and clock driver size for pre-bond test is proposed. A 256-sink H-tree clock network operating at 5 GHz is considered as the case study where a power reduction of 72% is achieved for an eight-plane resonant clock network in comparison to a 2-D standard network. Simulation results indicate 43% reduction in the power consumed by the resonant 3-D clock network as compared to a conventional buffered clock network.

In next section, a design method to apply resonant clocking to synthesized clock trees is proposed. A "Breadth first" tree traversal algorithm is employed and the *LC* tanks are swept from the highest capacitive nodes of the topmost level to the clock sinks to determine the minimum number of *LC* tanks and the size of *LC* tanks. The transfer function of the sink nodes and the power consumption of the clock network for a wide range of resonant inductance

are explored to determine the amount of resonant inductance that results in a full swing clock signal at the sink nodes. Two approaches are presented where in the first approach the inductance that minimizes the power is determined as the resonant inductance and, in the second approach, the inductance that results in the least area overhead is determined as the inductance of the *LC* tanks.

The power consumed by the resonant clock tree produced by the new method is significantly lower than the standard clock network. Up to 57% power reduction is achieved in simulated case studies. Comparing the proposed method with previous methods shows up to 80% improvement in the amplitude of the transfer function at the sink nodes by locating the LC tanks in proper nodes of the tree. Using fewer number of LC tanks and smaller resonance inductors reduces the area up to 51% as compared to previous methods. Proper allocation of LC tanks, using a distributed *RLC* model for the clock network and sweeping the resonant inductance also reduces the power consumption of the proposed method up to 25% as compared to previous methods. Comparing *minPow* and *minArea* approaches shows that the *minPow* reduces the power consumption up to 14.7% where the *minArea* reduces the area overhead up to 19%. Simulation results indicate up to 52% reduction in the power consumed by the resonant clock network as compared to a conventional buffered clock network. Compared to existing methods, the number of *LC* tanks for the proposed technique is decreased up to 15% and the signal swing is also improved by 44%. Depending on whether power or area is the design objective, two different approaches are followed to determine the parameters of resonance.

In Section 4.3, an approach to deliver the clock signal during the pre-bond test is proposed. Wireless pre-bond testing is supported by the use of an inductive link. A circuit design for for the transceiver circuit is presented that provides a full swing clock signal in both test and operating frequencies and has a negligible on-chip area overhead. A 256-sink H-tree clock network with operating and test frequencies of 1 GHz and 400 MHz, respectively, is considered as the case study. The area occupied by the additional circuits used for testing is reduced by 98.5% in comparison to the redundant wiring method, where in the same time power consumed for pre-bond testing is reduced by 99%. The power consumed by the proposed clock network during normal operation is reduced by 35% as compared to a standard clock distribution network.

As mentioned before, thermal problems are more pronounced for 3-D circuits as compared to 2-D counterparts. By exploiting the resonance phenomemant and inductive coupling, a low power and pre-bond testable (with negligible area overhead) 3-D clock distribution network is provided that can mitigate thermal issues in 3-D circuits.

# 5 Serialization

Crosstalk among TSVs is another important concern that can affect the signal integrity and timing of the transferred data. In standard 2-D circuits the crosstalk is usually caused by two neighboring wires on the same layer. 3-D circuits are more vulnerable to crosstalk since TSVs are bundled and thus most TSVs are surrounded by other TSVs. Consequently, a TSV can be affected by several adjacent TSVs from all directions.

Serialization can be considered as a solution to alleviate the challenges related to TSV bunches for transferring data among the planes. Converting parallel data into higher-rate serial data can reduce the number of TSVs and consequently area and crosstalk effects. Conversely, using serializer/deserializer circuits can add complexity to system design, specifically when bandwidth is limited and with respect to power consumption.

This chapter proposes a case study of serial *vs*. parallel data communication for TSV-based 3-D circuits. For parallel data communication, crosstalk and resulting jitter is investigated. The power consumption, area and fabrication yield for serial and parallel approaches are compared.

In the following section the cross talk among a bunch of TSVs is considered for several cases. In Section 5.2 a review of serialization method is presented and the simulation results are discussed in Section 5.3. The summary is offered in Section 5.4.

# 5.1 Cross Talk

One of the challenges in TSV-based 3-D circuits is the cross talk between adjacent TSVs. To analyze the effect of neighboring TSVs on each other an accurate model for TSVs is required. Different models for TSVs have been proposed in [138, 139]. The *RLC* model used in our study is shown in Figure 5.1 where the resistance of this model is described in Eq. (2.7).

Figure 5.2 illustrates different topologies for TSV bunches considered to study crosstalk. In Figure 5.2(a) the body of each TSV is connected to ground through a guard ring, p+ well over



Figure 5.1: RLC model for TSV.

the resistive, capacitive bulk which prevents neighboring TSVs to induce noise to it. Figure 5.2(b) shows a shielded topology where ground TSV are employed to mitigate the interference between adjacent TSVs and in Figure 5.2(c) a bunch of signaling TSVs are located without using shielding methods.



Figure 5.2: Different topologies for studying crosstalk where (a) shows the grounded TSV, (b) is the shielded topology and (c) is the bunch of TSV without shielding.

Figure 5.3 shows the eye diagram for the output signal of a TSV with a diameter of 5  $\mu$ m located in a bunch of 16 TSVs with the pitch of 10  $\mu$ m. As expected, the signal integrity for the TSVs with connection to ground is better than the other topologies since the signal is not affected by the neighboring TSVs. Exploiting ground TSVs can alleviate the crosstalk issues, but it drastically increases the number of TSVs and hence the reliability and yield challenges in the circuit.

# 5.2 Serialization

Although data parallel TSV connection provide the highest bandwidth for inter-layer data communication, reliability, yield and area issues suggest the use of serial communication, Moreover exploiting TSV bunches can affect the signal integrity due to cross talk between the TSVs as discussed in previous section. These problems encourage us to explore the data serial/parallel trade-off for decreasing the number of TSVs while preserving the performance of the system.



Figure 5.3: Eye diagram for different schemes for 16 bit where (a) is for grounded TSVs, (b) is for shielded TSVs and (c) is for coupled TSVs.

Serialization is one of the solutions for overcoming the aforementioned issues. Since TSVs can transfer data up to 40 Gb/s [140, 141], serializing the data and reducing the number of TSVs can help to improve the yield and fabrication cost of the system and reduce the area occupied by the TSVs. Figure 5.4 shows the two approaches for inter-plane communication where the serial structure shown in (b) replaces *n* parallel circuit shown in (a) using n:1/1:n serializer/deserializer. The area and power consumed by serializer/deserializer must be considered as the overhead of serialization. Typically, the area of the serializer and deserializer is very small compared to the TSV footprint and reducing the number of TSVs considerably saves area which makes the power consumption the only real drawback of this approach.

A tree-type serilazer/deserializer [138] is designed in 65 nm CMOS technology that can operate at up to 10 GHz serial clock frequency. The structure of the designed serializer is shown in Figure 5.5(a) where the serialization rate is 8. Different phase selection signals are generated and used to sample the inputs and serialize as shown in Figure 5.5(b).

Using encoding schemes to reduce the power consumption of the serialization circuit is one potential solution to reduce the switching activity and hence the power consumption of the circuit. Different encoding schemes and their improvement in reducing the switching activity for 8 bit data is listed in Table 5.1. As shown in this table, TIC/ETI [142,143] method has the best performance in terms of reducing the number of transitions in a serial data communication. The structure of these encoding schemes is shown in Figure 5.6.



Figure 5.4: Parallel and serial method for inter-plane data communication where (a) shows the parallel approach and (b) is the serial one.

In TIC (*Transition Inversion-based Coding*) encoding scheme, first, the number of transitions in a data word is calculated. If the number of transitions is more than a threshold (*e.g.* half the word length of the data) the data will be coded, otherwise, it will remain unchanged. To encode the data, this approach checks every two bits in the serial word and maps "00", "01", "10", and "11", respectively, to "01", "00", "11", and "10".

In receiver side, a decision bit is required to determine if the data is coded or not. Hence, an extra bit should be transmitted for each data word. In ETI (*Embedded Transition Inversion Coding*) scheme, the coding routine is the same as TIC but the need for decision bit is eliminated. For each coded data, a phase difference is generated between the clock and the data. The decoder employs a phase detector and recognizes if the data is coded.

Simulation results show that the power consumed by the encoder/decoder circuits is much more than the power reduced by decreasing the switching activity. Accordingly, exploiting encoding methods is not a proper solution to reduce the power consumption of serialization method where TSVs are used.

# 5.3 Simulation Results

In the first part of this section, signal coupling among a bunch of TSVs and the effects of this coupling on signal integrity is studied. Afterwards a serialzer/deserilazer circuit for different number of TSVs is simulated using 65 nm technology in Cadence Spectre and the area, power and fabrication yield of the serial communication is compared to parallel approach. The



Figure 5.5: Serializer circuit and signaling.

fabrication yield of the 3-D circuit is described as [144]:

$$Y = (y_{die})^{N_{tier}} . (Y_{stacking})^{N_{tier}-1}$$
(5.1)

85

| Encoding | Description                                                                                                                   | #transition | %improvement |
|----------|-------------------------------------------------------------------------------------------------------------------------------|-------------|--------------|
| -        | -                                                                                                                             | 898         | -            |
| shuffle  | if #transition(in(i)) > n/2<br>out(i, (1:2:n)) = in(i, (2:2:n))<br>out(i, (2:2:n)) = in(i, (1:2:n))<br>else<br>out(i) = in(i) | 824         | 8.2%         |
| Xor      | $if #transition (in(i))>n/2$ $out(i) = in(i) \oplus in(i-1)$ $else$ $out(i) = in(i)$                                          | 756         | 15.8%        |
| TIC/ETI  | $if \#transition (in(i))>n/2$ $out(i, (2:2:n)) = \sim in(i, (2:2:n))$ $else$ $out(i) = in(i)$                                 | 616         | 31%          |

Table 5.1: different encoding shemes



Figure 5.6: Encoder structure where (a) shows the ETI decoder and (b) is the TIC decoder.

$$Y_{stacking} = Y_{bonding} (1 - f_{tsv})^{N_{tsv}}$$
(5.2)

where  $N_{tier}$  is the number of layers stacked and  $Y_{die}$  is the yield of a single die,  $Y_{bonding}$  is 86
the yield of 3-D process,  $f_{tsv}$  is the TSV failure rate and  $N_{tsv}$  the total number of TSVs. To estimate the effect of TSV number on the yield of a two-plane 3-D circuit, we assume that  $Y_{die}$ ,  $Y_{bonding}$ , and  $f_{tsv}$  are 0.95, 0.98 and 1e-6, respectively, and  $N_{tier}$  is two.

TSVs are modeled using the *RLC* model shown in Figure 5.1. The *RLC* model shown in Figure 5.1 is used to estimate the resistance, inductance, and capacitance of the TSVs. Three different TSV diameters are considered where the parameters of the interconnects are listed in Table 5.2.

| TSV diameter [µm] | R [mΩ] | C [fF] | L [pH] |
|-------------------|--------|--------|--------|
| 5                 | 297    | 18     | 46     |
| 10                | 103    | 41     | 36     |
| 50                | 17     | 218    | 17     |

Table 5.2: TSV parameters

The crosstalk for 16 bundled TSVs with diameter of 10  $\mu$ m is simulated. Three different topologies shown in Figure 5.2 are considered. The bandwidth of the input data is 5 Gb/s. To measure the TSV signal quality, an eye diagram plot is used and the jitter of the signal is calculated and listed in Table 5.3.

| Pitch [µm]                          | Jitter [ps]                                           |                                            |  |  |  |
|-------------------------------------|-------------------------------------------------------|--------------------------------------------|--|--|--|
|                                     | With shielding                                        | Without shielding                          |  |  |  |
| $D=5\mu m$                          |                                                       |                                            |  |  |  |
| 10                                  | 1.12                                                  | 8.59                                       |  |  |  |
| 15                                  | 0.74                                                  | 7.36                                       |  |  |  |
| 20                                  | 0.53                                                  | 4.95                                       |  |  |  |
| D=10µm                              |                                                       |                                            |  |  |  |
| 20                                  | 0.4                                                   | 2.34                                       |  |  |  |
| 30                                  | 0.34                                                  | 1.4                                        |  |  |  |
| 40                                  | 0.21                                                  | 1.34                                       |  |  |  |
| D=50µm                              |                                                       |                                            |  |  |  |
| 100                                 | 0.27                                                  | 0.75                                       |  |  |  |
| 150                                 | 0.18                                                  | 0.4                                        |  |  |  |
| 200                                 | 0.11                                                  | 0.32                                       |  |  |  |
| 20<br>30<br>40<br>100<br>150<br>200 | 0.4<br>0.34<br>0.21<br>D=50μm<br>0.27<br>0.18<br>0.11 | 2.34<br>1.4<br>1.34<br>0.75<br>0.4<br>0.32 |  |  |  |

Table 5.3: Jitter for a bunch of 16 TSVs.

As shown in this table, the jitter is felt more in smaller TSVs since the inductance of the TSVs decreases by increasing the size of TSVs. Moreover by increasing the size of TSVs the pitch of adjacent TSV also increases which reduces the inductive and capacitive coupling between the TSVs. Ground shielding can reduce the jitter of the bunch of 16 signaling TSVs up to 86% while it increases the number of TSVs to 44.

To alleviate crosstalk problem without drastically increasing the TSV area, serialization method is considered. Simulation results for different number of bits is listed in Table 5.4 where TSVs with 10 um diameter are used. Since the area of the serializer and deserializer is quite smaller (*e.g.* less than  $0.1 \times$  in 65nm technology) than the footprint of the TSV, the area reported in the table is the area of TSVs.

|              | Parallel Clock | Serial Clock | Douror | Area |  |  |  |
|--------------|----------------|--------------|--------|------|--|--|--|
|              | Frequency      | Frequency    | Power  |      |  |  |  |
| 8bit         |                |              |        |      |  |  |  |
| Parallel     | 1000           |              | 237    | 628  |  |  |  |
| W/O encoding | 1000           | 8            | 984.6  | 78.5 |  |  |  |
| TIC encoding | 1000           | 9            | 1202   | 78.5 |  |  |  |
| 16 bit       |                |              |        |      |  |  |  |
| Parallel     | 529            |              | 385    | 1256 |  |  |  |
| W/O encoding | 529            | 7.5          | 1326   | 78.5 |  |  |  |
| TIC encoding | 529            | 8            | 1530   | 78.5 |  |  |  |
| 32 bit       |                |              |        |      |  |  |  |
| Parallel     | 273            |              | 251    | 2512 |  |  |  |
| W/O encoding | 273            | 7.75         | 2193   | 78.5 |  |  |  |
| TCI encoding | 273            | 8            | 2345   | 78.5 |  |  |  |

Table 5.4: Operating frequency, power, and area for different serialization rates.



Figure 5.7: Yield vs. Number of TSV.

As shown in Figure 5.7, reducing the number of TSVs from 32 to 1 can improve the fabrication yield of the whole circuit by 0.0022%

Figure 5.8 shows the power consumption of 8-bit inter-plane data transmission for different structures, parallel, serial without encoding and serial with TIC encoding. Figure 5.8(a) shows the power consumption *vs.* TSV diameter and Figure 5.8(b) indicates the power *vs.* signal frequency.



Figure 5.8: Power consumption for 8-bit data transmission *vs*. TSV size (a) and signal frequency (b).

As mentioned before, the power consumed by encoder/decoder circuits is more than the power saved by reducing the switching activity. Thus, employing encoding schemes does not seem a proper approach to reduce the power consumption in TSV based inter-plane transmission.

Since the designed serializer/deserializer circuit can operate at maximum frequency of 10 GHz, the frequency of the input signal limited to 10 GHz/n where n is the serialization rate. For signals below this frequency the performance of the system does not degrade due to serialization where for faster signals errors caused by serialization circuit decreases the performance of the system.

## 5.4 Summary

In this chapter the crosstalk effect on bundled TSVs is considered. For a bunch of 16 TSVs, up to 8.95 ps jitter is added to a 5 GHz bandwidth data due to TSV to TSV coupling. Exploiting grounded TSVs to shield the signaling TSVs can reduce the jitter by 86% but alternatively increases the total number of TSVs to 44.

## **Chapter 5. Serialization**

Serialization approach is proposed to improve the signal integrity and reducing the number of TSVs. Using serialization drastically reduces the TSV area and slightly improves the fabrication yield of the circuit. On the other hand the power consumed by the serializer/deserializer is not negligible and should be carefully considered and compared to the power of the whole 3-D circuit to find out if this approach is a proper solution for a certain application or not.

# 6 Buffer Allocation for RRAM-based FPGA Structure

3-D integration provides dense cuircuits as compared to its 2-D counterpart. Reducing the footprint area in large circuits also results in better performance for 3-D circuits. Recent *Field Programmable Gate Arrays* (FPGAs) are quite complex circuits and employing 3-D technology can improve their performance and speed. The marketshare of FPGAs is increasing due to their versatility, but unfortunately FPGAs are worse than *Application-Specific Integrated Circuits* (ASICs) in terms of computational density (area), delay, and power consumption. The programmable interconnects occupy up to 90% of FPGA area and are responsible for up to 80% and 85% of the total delay and power consumption, respectively [145]. Improving routing structures is crucial to bridge the gap between FPGA and ASIC circuits. Exploiting monolithic 3-D circuits such as RRAMs can lead to improved routing structure for FPGAs.

In conventional FPGA architectures, SRAMs and pass transistors are used to form the programmable interconnects. The relatively low density of SRAM-based storage leads to inefficient silicon area utilization and consequently to longer routing paths and larger interconnect delays. Moreover, a considerable amount of power is consumed in SRAMs during standby due to the volatile nature of the memory circuits.

Replacing SRAM cells with Non-Volatile Memories (NVM) has been introduced as a promising approach to reduce the standby power, area (and consequently delay) of FPGAs. RRAMs provide ultra-dense memory arrays with easy programing features, fast write time and low write energy as compared to other emerging NVMs [19, 115]. These characteristics make RRAMs an emerging solution for next-generation FPGAs. We consider FPGAs with RRAMs storing the configuration and proving the routing means. Our study concentrates on routing.

In addition to realizing memory blocks, there have been several studies on RRAM-based routing structures. RRAM cells can replace the transmission gate switches controlled by a SRAM cell and reduce significantly the routing delay [17,24,146]. Signal buffers are unavoidable parts of routing paths in FPGAs and buffer allocation in RRAM-based FPGAs should be adjusted due to different specification of the routing paths. An adaptive buffer allocation method is proposed in [19] where the positions of inserted buffers in [17] are optimized on demand.

91

#### Chapter 6. Buffer Allocation for RRAM-based FPGA Structure

However, this method is complex as it needs a serious change in the architecture and employs additional reconfigurable switches to allocate the buffers. Buffer allocation in RRAM-based FPGAs is an important ongoing task that can affect the performance, area, and power of these circuits.

In this chapter, we analyze the effects of buffer allocation in RRAM-based FPGAs. Different buffering styles and models are studied and evaluated, with the goal of reducing the number of buffers while maintaining performance. We propose a structure with reduced number of buffers in routing path of RRAM-based FPGAs without sacrificing the system performance.

The rest of the chapter is organized as follows, Section 6.1 presents a review of conventional and RRAM-based FPGA structures. Section 6.2 shows the effect of buffer allocation on FPGA performance. Architectural simulation results are shown in Section 6.3 and conclusions are drawn in Section 6.4.

## 6.1 Buffer distribution in FPGA

The structure of a conventional island-style FPGA is shown in Figure 6.1, where each *Logic Block* (LB) is surrounded by routing channels. Logic blocks consist of *Look-Up Tables* (LUTs), D Flip-Flops and multiplexers to implement both combinational and sequential logic functions. Each logic block is connected to the routing channels through *Connection Boxes* (CBs), and *Switch Boxes* (SBs) provide the connection between the different routing channels. CBs and SBs, both consist of a large set of programmable switches that are configure thanks to programing bits. Conventionally, scan-chain SRAM cells are used to store the configuration data [147]. Each SRAM cell includes a minimum of six transistors, which results in large area overhead for conventional FPGAs. Volatility and slow loading time are other issues of the SRAM cells.



Figure 6.1: Conventional FPGA structure.

Novel nonvolatile RRAM technologies bear a lot of promise to replace SRAMs, . RRAMs have only two terminals and programming transistor are required at each of the two terminals to turn an RRAM switch on or off. In FPGA structure where several programmable switches are employed to implement routing multiplexers, these terminals are shared among several RRAM cells. The programming transistors can also be shared so that only one programming transistor is required for each node. Different algorithms have been proposed for placing the programing transistors in RRAM based FPGAs [18, 19]. Efficient placement of programming transistors can drastically decrease the number of these transistors [19].

RRAM technology is CMOS-compatible, which makes them cost efficient as compared to current technologies. Besides, the drawbacks of RRAM - which show poor performance during writing operation and low endurance - are not relevant during FPGA operation. There is limited write access to the RRAMs only during programming the FPGA. Having smaller footprint results in shorter interconnects and reduced interconnect delay.

New routing structures are proposed in [24] for reprogrammable switches. In this architecture, the pass transistor switches are replaced by RRAM cells. Since RRAMs are fabricated between the metal layers, the routing switches can be placed over the logic layer which results in a drastic area reduction.

Along with the configurable switches, buffers are the other important element of the FPGAs routing structure. Different characteristics of RRAM switches- as compared to CMOS transmission gates - change the properties of signal paths in RRAM-based FPGAs. Since critical paths are different in nature in RRAMs-based FPGAs, we reconsider the buffer allocation problem.

In the next section, we consider the effect of intermediate buffers on signal propagation delay and address the allocation of buffers in RRAM-based FPGAs. In the first part, we study a modified approach for buffer allocation routing path in RRAM FPGAs and express the relation of the signal delay with number of buffers in different structures. In the next part, we show circuit level simulations to evaluate the proposed structures.

#### 6.1.1 Delay Calculation in Critical Path

On a general basis, insertion of buffers breaks the routing path into smaller segments and reduces the quadratic delay of the path. In exchange the intrinsic delay of the buffers is added to the path. Hence, over employing the buffers can degrade the delay. Besides, reducing the number of buffers in routing structure can reduce the power consumption and area overhead.

In conventional FPGA structures, after each switch block there is a buffer which restores the signal and drives the next block, as illustrated in Figure 6.2.

Using one buffer after each switch box is not necessarily the most efficient structure. An alternative approach consists of allocating one buffer after each n switch boxes where n is larger than one. The optimum n can be determined in terms of circuit characteristics. Figure

#### Chapter 6. Buffer Allocation for RRAM-based FPGA Structure



Figure 6.2: The critical path in conventional FPGAs.

6.3 shows the *RC* model for the critical path between two logic blocks where there are *N* switch blocks and a buffer is assigned after each *n* switch boxes.



Figure 6.3: *RC* model for the critical path.

The delay between two buffers is expressed, using [148] as:

$$\tau_s = R_0(C_0 + C_g) + 2nR_0C_{seg} + nR_{seg}C_g + n^2R_{seg}C_{seg}$$
(6.1)

where  $R_o$  and  $C_o$  stand for the output resistance and capacitance of the buffer, and  $R_{seg}$  and  $C_{seg}$  denote the resistance and capacitance of each segment including the switch and the wire between the switches. The delay between two logic blocks is:

$$\tau = \frac{N}{n}\tau_s + (\frac{N}{n} - 1)\tau_b \tag{6.2}$$

where  $\tau_b$  is the intrinsic delay of the buffers. To minimize the delay, *n* is determined as:

$$n = \sqrt{\frac{R_0(C_0 + C_g) + \tau_b}{R_{seg}C_{seg}}} \tag{6.3}$$

One can see that *n* increases by reducing  $R_{seg}$  and  $C_{seg}$ . The wire segments between the switch boxes is shorter in RRAM-based FPGAs due to smaller area and RRAM switches have smaller on resistance as compared to SRAM-based structures. Therefore,  $R_{seg}$  and  $C_{seg}$  can be smaller in RRAM-based structures, which results in higher n and lower number of buffers for these type of FPGAs. Interestingly, *n* (*i.e.* the number of unbufferd switches between two buffered switches) is not dependent on the length of critical path which allows us to determine a unique *n* for different critical paths.

Another potential solution to reduce the path delay is using regenerative repeaters instead of conventional buffers [19, 23]. These repeaters use feedback loops to amplify the attenuated signal along the propagation path. The critical path of an FPGA with regenerative repeater is depicted in Figure 6.4(a) and the structure of a simple regenerative buffer is shown in Figure 6.4(b). Periodic amplification of the signal along the routing path results in linear delay rather than quadratic *RC* delay. Contrarily to conventional buffers, regenerative buffers are bidirectional and since they have one input/output port, placing them in routing path is quite simpler than conventional buffers.



Figure 6.4: The critical path in conventional FPGAs (a) and the structure of complementary regenerative feedback repeater [23] (b).

Figure 6.5 shows the critical path with N switch boxes where we employ one regenerative buffer after each n switch boxes.  $R_b$  and  $C_b$  denote the output resistance and capacitance of the regenerative buffer.

The output voltage of the critical path can be described as:

$$V_{out} = V_{in} \cdot \frac{\alpha^{(m-1)}}{1 + \frac{R_1}{X_L} \cdot \frac{R_1 R_3 + mR_1 X_1 + R_1 X_L + R_3 X_1}{R_1 R_3 + (m+1)R_1 X_1 + R_3 X_1}}$$
(6.4)

95





Figure 6.5: (a) *RC* model for the critical path with regenerative buffer and (b) simplified *RC* model.

where *m*, the number of buffers is (N/n - 1) and

$$R_1 = R_b$$

$$X_1 = \frac{1}{C_b + nC_{seg}\omega j} \tag{6.5}$$

 $R_3 = nR_{seg}$ 

$$X_L = \frac{1}{(C_b + nC_{seg})\omega j} \mid\mid (nR_{seg} + \frac{1}{(C_g + nC_{seg})\omega j})$$

96

The signal delay is:

$$\tau = (m-1)\tau_r + R_b(C_b + C_g + 2nC_{seg}) + nR_{seg}(C_g + C_{seg})$$
(6.6)

and the optimum n to minimize the delay is determined where:

$$2nR_{seg}C_{seg} + (2R_bC_{seg} + R_{seg}C_g) - \frac{N}{n^2} = 0$$
(6.7)

Contrarily to structures with conventional buffers, the optimum number of regenerative buffers is dependent on *N*.

If optimum n is larger than one, it means that reducing the number of buffers reduces the delay of the signal path. Using these analytical expressions lets us reduce the number of buffers for each data path without sacrificing the performance of the system.

#### 6.1.2 Validation by circuit level simulations

In this section, we perform circuit-level simulations using Cadence Spectre to evaluate the effect of buffers on performance metrics. The performance of RRAM-based blocks is compared to their SRAM-based counterparts. Electrical simulations are performed in a commercial 65 nm technology with  $V_{dd}$  equal to 1V. However, the results are not technology dependent and similar improvement can be achieved using other technologies.

The SB\_2 structure for switch boxes shown in Fig. 7 is considered where each signal path passes through one switch for each switch box [24]. For unbuffered switchboxes, the same structure (excluding input and output buffers) is used as shown in Figure 6.6(c). For SRAM-based structure, pass transistors switches (NMOS and PMOS) with ( $1 \times and 2.5 \times minimum$  size) are used.

For RRAM cells, the model proposed in [25] is used which includes the parasitic elements as shown in Figure 6.7.  $C_p$  and  $R_p$  model the intrinsic MIM structure capacitance and the resistance related to the leakage current between two electrode.  $R_c$  is the contact resistance and  $R_s$  shows the switching element. In our simulations,  $C_p$ ,  $R_p$ , and  $R_c$  are set to 20 fF, 200 M $\Omega$ , and 20  $\Omega$ , respectively based on [25].

Two different RRAM switches with  $R_{on}$  of 1 k $\Omega$  and 2 k $\Omega$  are used in simulations [149] where the  $R_{off}$  for both case is 1M $\Omega$ . For RRAM-based structure, the programming transistors are included in the simulation model.

Different structures for a critical path in FPGA are considered and simulated. In the reference



Figure 6.6: SB\_2 switch box structure and its equivalent circuit for buffered and un-buffered switches [24].



Figure 6.7: RRAM circuit model [25]

architecture, the critical path between two logic boxes goes through N buffered switchboxes. In the modified structures, the same critical path is considered while instead of having n buffered switches, we use one buffered switch and replace the n-1 remaining ones with unbuffered switches as shown in Figure 6.8.

In a modified critical path with fixed length, different possible start points for the path can result in various characteristics. For example in Figure 6.8, the critical paths starting from logic blocks #1 have different characteristics as compared to paths starting from logic blocks #2 and #3. However, in all our case studies, simulation results show negligible difference in delay and power consumed by these different structures. Hence, we report for one structures (#1) in following simulation results.

In our first circuit simulation, we consider a path with 10 switch boxes (N=10) and sweep n to find the optimum number of buffers. Two different RRAM switches ( $R_{on}=1$  k $\Omega$  and 2 k $\Omega$ ) are used in simulation. Figure 6.9 shows the delay for SRAM and RRAM critical path. As expected, by increasing the on resistance of the RRAMs, the signal delay increases. The simulation results show that for SRAM -based FPGAs, the minimum delay is obtained by having one buffer after each switchbox whereas for RRAM-based FPGAs, there is no need to dedicate a buffer after



Figure 6.8: The conventional and the modified structure for *n*=3.

each switchbox. In this case study, using one buffer after each two switchboxes results in lower delay as compared to other structures.



Figure 6.9: Critical path delay for SRAM and RRAM based FPGAs for *N*=10 and  $R_{on}$ =1 k $\Omega$  and 2 k $\Omega$ .

In a second set of simulations, we vary the length of the critical path to evaluate its effect on the optimum number of buffers. We consider N=10 and N=20 with  $R_{on}=1$  k $\Omega$ . The delay is shown in Figure 6.10.

For both cases, the optimum n to minimize the delay is 2. It confirms the aforementioned expression, (6.3), where *n* is not dependent to *N*.

In the next simulation, we use regenerative buffers in critical path and compare the signal delay with the same path with conventional buffers. The critical path delay for *N*=10 is shown in Figure 6.11. Results show that for SRAM-based FPGAs the performance of the routing path with conventional buffers is better than regenerative buffers while for RRAM-based FPGAs, the best performance is achieved by using regenerative buffers. However, as shown in Figure 6.11(b), exploiting regenerative buffers for RRAM-based structures does not necessarily result in lower delay for all number of buffers. Hence, employing this kind of buffers should be



Figure 6.10: Critical path delay for SRAM and RRAM based FPGAs for *N*=10 and *N*=20.

carefully considered depending on the circuit characteristics. Note that the area overhead of regenerative buffers is more than the conventional ones.



Figure 6.11: Critical path delay for regenerative and conventional buffers where (a) shows the delay for SRAM-based FPGAs and (b) for RRAM-based FPGAs.

In addition, circuit level simulations confirm the effect of RRAM switches on improving the performance of the critical path. The low on resistance of RRAM switches reduces the RC delay of routing path and allows us to reduce the number of restoring buffers. Reducing the number of buffers can enhance the performance of RRAM-based FPGAs with both conventional and regenerative buffering style.

Based on this fact modified structures can be proposed for RRAM-based FPGAs where some of the conventional buffered multiplexers are replaced by unbuffered RRAM switches. In the next section, these modified routing channels are employed in FPGA architecture.

### 6.2 Architectural Simulations

In the previous section, we studied the effect of buffer distribution at the circuit level. In this section, we move to the architectural level and study the impact of the buffer allocation on the FPGA performance.

#### 6.2.1 Methodology

The architecture level simulations are done using the VTR flow [150]. The twenty largest MCNC benchmarks [151] are first synthesized by ABC [152]. Then, packing, placement, and routing are performed by VPR 7 [14]. The island-type structure shown in Figure 6.1, is considered with  $F_{cin}$ ,  $F_{cout}$ , and  $F_s$  respectively set to 0.15, 0.1, and 3. Technology parameters (area, delay and power) are extracted from commercial 45nm technology.

The benchmarks are mapped on both standard CMOS SRAM-based and RRAM-based FPGAs. The different routing schemes that will be considered are depicted in Figure 6.12. In Figure 6.12(a) we consider a standard single driver routing scheme with channel length of 1. All the routing multiplexers are buffered. Note that other channel length can be used with no specific differences with the results. Figure 6.12(b) shows modified routing scheme that removes half the buffers. After each buffered multiplexer, a buffer is removed. We call this routing scheme  $B_2$ . Figure 6.12(c) removes two third of the buffers by using two unbuffered multiplexer between buffered multiplexers. This scheme is referenced to as  $B_3$ .

#### 6.2.2 Simulation Results

Figure 6.13 shows power, area, and critical path delay for six architectures including SRAMbased conventional routing, RRAM-based conventional routing, RRAM-based  $B_2$  and  $B_3$ routing, and RRAM-based  $B_2$  and  $B_3$  routing. Comparing conventional structure ( $B_1$ ) for CMOS and RRAM-based circuits show that employing RRAM switches improves the critical path delay by 56% where the power consumption is reduced by 8.9%.

In SRAM-based structures,  $B_1$  has the best timing performance which means that reducing the number of buffers for these FPGAs is not a useful approach. Alternatively, in RRAM-based structures,  $B_2$  shows better performance which allows us to used one unbuffered switch after each conventional buffered one. For the studied benchmarks, using  $B_1$  structure in the same time improves the average delay, area and power by 8.6%, 15.9%, and 5%, respectively.

#### Chapter 6. Buffer Allocation for RRAM-based FPGA Structure



Figure 6.12: different FPGA architectures where (a) is conventional architecture and (b), (c) show modified architecture  $B_2$  and  $B_3$ .

## 6.3 Summary

Employing 3-D integration can improve the performance of the large circuits as compared to 2-D integration. In this chapter we focus on FPGAs as a good example of complex circuits and attend to improve their performance, power consumption and area overhead by using RRAMs which are part of monolithic 3-D circuits.

Compared to Application-Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) provide reconfigurablity at the cost of lower performance and higher power consumption. Exploiting a large number of programmable switches, routing structures are mainly responsible for this performance degradation. Hence exploiting more efficient switches can drastically improve the performance and reduce the power consumption of the FPGA. RRAM switches are one of the most promising candidates to improve the FPGA routing architecture thanks to their low on-resistance and non-volatility. The different nature of RRAM switches, as compared to standard CMOS encourages us to reconsider the buffer distribution. This chapter proposes an approach to reduce the number of buffers in routing path of RRAM-based FPGAs.

Our architectural simulations for the twenty biggest MCNC benchmarks show that using RRAM switches improves the critical path delay by 56% as compared to CMOS switches where

#### 6.3. Summary









Figure 6.13: Architectural simulation results for MCNC benchmarks where (a) is critical path delay, (b) is power consumption and (c) is the area.

in the same time the area and power are also reduced, respectively, by 16.8% and 8.9%. Our buffering scheme gives an extra bonus of 8.6% for delay reduction and improves the power and area by 5% and 15.9% as compared to conventional buffering approach for RRAM-based

FPGAs.

In SRAM-based structures reducing the number of buffers degrades the performance of the circuit due to high resistance of the CMOS switches.

## 7 Conclusions

3-D integration is a promising prospect for implementing high performance multifunctional systems-on-chips. Interconnect length plays a dominant role in limiting the speed in modern integrated circuits. Exploiting vertical inter-chip interconnects is a proper solution which can drastically decrease the interconnect length and improve the performance.

Clock networks consume a great portion of the power dissipated in a circuit. Therefore, designing a low-power clock network in synchronous circuits is an important task. This requirement is stricter for 3-D circuits due to the increased power densities. Synchronization issues can be more challenging for 3-D circuits since a clock path can spread across several planes with different physical and electrical characteristics. Consequently, designing low power clock networks for 3-D circuits is an important issue. Resonant clock networks are considered efficient low-power alternatives to conventional clock distribution schemes. These networks utilize additional inductive circuits to reduce power while delivering a full swing clock signal to the sink nodes.

Another considerable challenge for 3-D integration is manufacturing and the related yield implications. Manufacturing processes for 3-D circuits include some additional steps as compared to standard CMOS processes, such as wafer thinning and TSV fabrication. This manufacturing complexity makes 3-D circuits more susceptible to manufacturing defects, which can lower the overall yield of the bonded 3-D stack. Testing is another complicated task for 3-D ICs, where pre-bond test is a prerequisite. Contactless testing methods have been considered as an alternative for conventional test methods. Pre-bond testability, in turn, presents new challenges to 3-D clock network design primarily due to the incomplete clock distribution networks prior to the bonding of the planes. To efficiently address this issue, inductive links are exploited to wirelessly transmit the clock signal to the disjoint resonant clock networks. The inductors comprising the *LC* tanks are used as the receiver circuit for the links, essentially eliminating the need for additional circuits and/or interconnect resources during pre-bond test.

Through Silicon Vias (TSVs) are the enablers for achieving high bandwidth paths in inter-

plane communications. TSVs also provide higher vertical link density and facilitate the heat flow in the 3-D circuits as compared to other potential schemes such as inductive links. However, reliability issues and crosstalk problems among adjacent TSVs decrease the yield and performance of TSV based circuits. Moreover, the area footprint of TSVs and related keep-out areas is significant. Reducing the number of TSVs employed for inter-plane signal transferring can alleviate these problems.

Serialization can be considered as a solution to alleviate the challenges related to TSV bunches for transferring data among the planes. Converting parallel data into higher-rate serial data can reduce the number of TSVs and consequently area and cross-talk effects. Conversely, using serializer/deserializers circuits can add complexity to system design, specifically when bandwidth is limited and with respect to power consumption.

Recent FPGAs are quite complex circuits which provide reconfigurablity at the cost of lower performance and higher power consumption as compared to ASIC circuits. Exploiting a large number of programmable switches, routing structures are mainly responsible for performance degradation in FPAGs. Employing 3-D technology can provide more efficient switches which drastically improve the performance and reduce the power consumption of the FPGA. RRAM switches are one of the most promising candidates to improve the FPGA routing architecture thanks to their low on-resistance and non-volatility. Along with the configurable switches, buffers are the other important element of the FPGAs routing structure. Different characteristics of RRAM switches- as compared to CMOS transmission gates - change the properties of signal paths in RRAM-based FPGAs. The on resistance of RRAM switches is considerably lower than CMOS pas gate switches which results in lower *RC* delay for RRAM-based routing paths. This different nature in critical path and signal delay in turn affect the need for intermediate buffers. Thus the buffer allocation should be reconsidered. In the last part of my thesis, I consider the effect of buffers on signal propagation delay and address the allocation of buffers in RRAM-based FPGAs.

## 7.1 Contributions

In first part of my research, a design methodology for H-tree resonant clock networks in 3-D circuits is presented. The proposed 3-D resonant clock networks considerably lower the power of the clock distribution system, while pre-bond test is supported by the proper design and allocation of the *LC* tanks within each plane. In this way, resonant operation is ensured for each plane either in test or functional mode and the clock signal characteristics are maintained within each plane and for either operating mode. The number of *LC* tanks, the resonant circuit parameters, and the driver size for normal operation are determined such that a full swing signal is provided at the sink nodes and the power consumption of the circuit is minimized. The effect of different parameters including the number of planes and number of TSVs among the planes for designing 3-D resonant clock networks is investigated. An approach to minimize the additional wire width and clock driver size for pre-bond test is

proposed. Simulation results indicate that using a resonant clock network can significantly decrease the power consumption of the clock tree in 3-D circuits. Furthermore, the results confirm that the power consumed in a 3-D clock network is lower than a 2-D clock network due to the shorter interconnect length. A 256-sink H-tree clock network operating at 5 GHz is considered as the case study where a power reduction of 72% is achieved for an eight-plane resonant clock network in comparison to a 2-D standard network.

Afterwards, a design method to apply resonant clocking to synthesized clock trees is proposed. A "breadth first" tree traversal algorithm is employed and the *LC* tanks are swept from the highest capacitive nodes of the topmost level to the clock sinks to determine the minimum number and the size of *LC* tanks. The transfer function of the sink nodes and the power consumption of the clock network for a wide range of resonant inductance are explored to determine the amount of resonant inductance that results in a full swing clock signal at the sink nodes. Two approaches are presented where in the first approach the inductance that minimizes the power is determined as the resonant inductance and, in the second approach, the inductance that results in the least area overhead is determined as the inductance of the *LC* tanks.

The power consumed by the resonant clock tree produced by the new method is significantly lower than the standard clock network. Up to 57% power reduction is achieved in simulated case studies. Comparing the proposed method with previous methods shows up to 80% improvement in the amplitude of the transfer function at the sink nodes by locating the *LC* tanks in proper nodes of the tree. Using fewer number of *LC* tanks and smaller resonance inductors reduces the area up to 51% as compared to previous methods. Proper allocation of *LC* tanks, using a distributed *RLC* model for the clock network and sweeping the resonant inductance also reduces the power consumption of the proposed method up to 25% as compared to previous methods. Comparing minPow and minArea approaches shows that the minPow reduces the power consumption up to 14.7% where the minArea reduces the area overhead up to 19%.

The next part of the research introduces a design methodology of resonant 3-D clock networks that support wireless pre-bond testing through the use of inductive links. By exploiting the resonance phenomena and inductive coupling, a low power and pre-bond testable 3-D clock distribution network is provided for the first time. The low-power property originates from the phenomenon of energy resonance, while the minimal overhead pre-bond testability is assured by employing inductive links, where the receiver circuit is the inductor of the on-chip *LC* tank. The probe card (*i.e.* the off-chip part of the inductive link) is designed to deliver a full swing sinusoidal clock signal in test and normal operation frequencies. Designing a resonant clock network for normal operation is investigated and the number of *LC* tanks, the resonant circuit parameters, and the driver size for normal operation are determined, such that a full swing signal is provided at the sink nodes and the power consumption of the circuit is lowered. A design for the transmitter circuit is presented that provides a full swing clock signal in both test and operating frequencies and has a negligible on-chip area overhead.

#### **Chapter 7. Conclusions**

A 256-sink H-tree clock network with operating and test frequencies of 1 GHz and 400 MHz, respectively, is considered as the case study. The area occupied by the additional circuits used for testing is reduced by 98.5% in comparison to the redundant wiring method, where in the same time power consumed for pre-bond testing is reduced by 99%. The power consumed by the proposed clock network during normal operation is reduced by 35% as compared to a standard clock distribution network.

The next chapter of the thesis is dedicated to the problems of TSV based inter-plane communication. Crosstalk between the TSVs is the other important concern that can affect the signal integrity and timing of the transferred data. Chapter 5 proposes a case study of serial *vs.* parallel data communication for TSV-based 3-D circuits. For parallel data communication, crosstalk and resulting jitter is investigated and its area and fabrication yield of is compared with serial approach.

For a bunch of 16 TSVs, up to 8.95 ps jitter is added to a 5 GHz bandwidth data due to TSV to TSV coupling. Exploiting grounded TSVs to shield the signaling TSVs can reduce the jitter by 86% but alternatively increases the total number of TSVs to 44. Serialization approach is proposed to improve the signal integrity and reducing the number of TSVs. Using serialization drastically reduces the TSV area and improves the fabrication yield of the circuit by 0.0022%. On the other hand the power consumed by the serializer/deserializer is not negligible and should be carefully considered and compared to the power of the whole 3-D circuit to find out if this approach is a proper solution for a certain application or not.

In the last part of my work, I consider the effect of intermediate buffers on signal transmission in RRAM based FPGAs. Different characteristics of RRAM switches let us reduce the number of intermediate buffers and improve power, area and even the delay in RRAM-based FPGAs. I have studied a modified approach for buffer allocation routing path in RRAM FPGAs. Analytical expressions are presented to determine the optimum number of buffers for each defined path and circuit level simulations are performed to validate these analytical expressions.

Architectural simulation for the twenty largest MCNC benchmarks show that exploiting RRAM switches can reduce the delay by 56% as compared to SRAM-based architecture, while the power consumption and area are also reduced by 8.9% and 16.8%. Our proposed buffering approach provides additional improvement of 8.6%, 5%, and 15.6%, respectively, for delay, area, and power for RRAM-based FPGAs. In SRAM-based structures reducing the number of buffers degrades the performance of the circuit due to high resistance of the CMOS switches.

## 7.2 Future Research

The first part of my research is dedicated to design low-power clock distribution network for 3-D circuits. Employing extra on-chip inductors, resonant clock networks reduce the power consumption of the clock networks. However, implementing these extra on-chip inductors increases the area overhead of this clocking scheme. The advantage of the common fabrication

methods for on-chip inductors is that they only use the upper metal layers and leave the active silicon area to the implementation of the transistors. However, today's dense circuit design dedicating of massive metal area to realize inductive components arises serious challenges, *i.e.* interconnect architecture requires almost all the metal area and leaves small portion for the implementation of any other components. Hence, area-aware design of high-quality on-chip inductors is an important task to facilitate the use of resonant clock networks. New approaches of manufacturing inductors for 3-D circuits can be considered to replace the traditional inductor fabrication methods. As an example, TSVs can be employed in order to efficiently realize these inductors. In addition, exploiting other conventional technologies would result in more efficient designs to optimize the area and quality factor of the inductors.

In today's technology, higher density circuitry requires vertically stacked architecture. This technology accompanies wafer thinning (normally through standard wafer backside grinding) in order to decrease the lateral size of the chip and increase density efficiency and decreases inter-layer interconnect length. On the other hand, such a short vertical distance gives rise to the cross talk and inductive coupling between neighboring layers.

By increasing the area of the integrated circuits, larger networks are required to distribute the clock signal between sequential components. Using long interconnects results in larger inductive component of the wires. Consequently, the clock network in one plane can induce a magnetic field to the adjacent planes. This problem can be more pronounced for resonant clock network since resonant on-chip inductors are added to the circuit. Investigating the magnetic interference for 3-D resonant clock networks can be considered as a follow-up phase of my research. For this aim, magnetic field simulator packages like HFSS or COMSOL should be utilized in order to solve electromagnetic fields induced in the structures and therefore provide a more realistic model for the whole clock network.

One of the important advantages of 3-D integration for mixed-signal circuits is to alleviate substrate noise problems. By implementing analog and digital circuits in different planes in heterogeneous 3-D circuits, there is no common substrate between these parts to pass the digital substrate noise to sensitive analog circuits. Distributing the clock signal in heterogeneous 3-D circuits is a challenging task. Studying the effects of resonant circuits on substrate noise is another topic worth investigating.

In has been reported that the two important parameters of skew and jitter can be enhanced in resonant clocking approach. However, since this thesis has been dedicated to the fundamental study of this approach, the effect of these two parameters has been neglected. As a more comprehensive study, it is worthwhile to include them in the design in order to provide high quality clock signal.

This study has been independent from process variation effect. However, fabrication imperfections are inevitable in the actual realization of the design. As a future work, study on an adaptive design, which can tolerate component variation, is advantageous. In my work, a methodology for designing a circuit with resonant clock distribution networks is proposed. Another add-up to this study is the implementation of a digital circuit using resonant clock networks that can support contactless pre-bond testing. The functionality of the circuit should be verified both in low as well as high frequency. Consequently, the circuit should be able to perform in wide range of frequencies and in different situations such as preand post-bond configuration. For this purpose, special libraries for 3-D integration process should be used in a commercial tool suite like Cadence, and inductive links can be modeled directly within such tools. As an alternative approach, particular software packages for RF engineering such as VPCM or Momentum can be used to calculate the inductance and the Q-factor of the inductor and export the layout to a commercial tool suite. This would give designers a powerful tool set for this technology.

## **Bibliography**

- Chang Liu and Sung-Kyu Lim. A design tradeoff study with monolithic 3d integration. In *Quality Electronic Design (ISQED), 2012 13th International Symposium on*, pages 529–536, March 2012.
- [2] N. Miura, D. Mizoguchi, M. Inoue, T. Sakurai, and T. Kuroda. A 195-gb/s 1.2-w inductive inter-chip wireless superconnect with transmit power control scheme for 3-d-stacked system in a package. *Solid-State Circuits, IEEE Journal of*, 41(1):23–34, Jan 2006.
- [3] Daisuke Mizoguchi, Noriyuki Miura, Yoichi Yoshida, Nobuhiko Yamagishi, and Tadahiro Kuroda. Measurement of inductive coupling in wireless superconnect. *Japanese Journal of Applied Physics*, 45(4S):3286, 2006.
- [4] C. Maxfield. 2d vs. 2.5d vs. 3d ics, available online. http://www.eetimes.com/document.asp, 2012.
- [5] Yole Developement. 3dic & tsv interconnects, 2010.
- [6] D. M. Jang *et al.* Development and evaluation of 3-d sip with vertically interconnected through silicon vias (tsv). In *Electronic Components and Technology Conference (ECTC)*, pages 847–852, May 2007.
- [7] Dae Hyun Kim, S. Mukhopadhyay, and Sung-Kyu Lim. Tsv-aware interconnect length and power prediction for 3d stacked ics. In *Interconnect Technology Conference, 2009. IITC 2009. IEEE International*, pages 26–28, June 2009.
- [8] Meng-Kai Hsu, Yao-Wen Chang, and V. Balabanov. Tsv-aware analytical placement for 3d ic designs. In *Design Automation Conference (DAC), 2011 48th ACM/EDAC/IEEE*, pages 664–669, June 2011.
- [9] Vasileios Pavlidis and Giovanni De Micheli. Power distribution paths for 3-d ic. In *Proceedings of the Great Lakes Symposium on VLSI (GLSVLSI 2009)*, pages 263–268, 2009.
- [10] Brent Goplen and Sachin Sapatnekar. Thermal via placement in 3d ics. In *Proceedings of the 2005 International Symposium on Physical Design*, ISPD '05, pages 167–174, New York, NY, USA, 2005. ACM.

- [11] Hu Xu, V.F. Pavlidis, and G. De Micheli. Analytical heat transfer model for thermal through-silicon vias. In *Design, Automation Test in Europe Conference Exhibition (DATE), 2011*, pages 1–6, March 2011.
- [12] V.F. Pavlidis, I. Savidis, and E.G. Friedman. Clock distribution networks for 3-d ictegrated circuits. In *Custom Integrated Circuits Conference, 2008. CICC 2008. IEEE*, pages 651–654, Sept 2008.
- [13] S.C. Chan, Kenneth L. Shepard, and P.J. Restle. Design of resonant global clock distributions. In *International Conference on Computer Design*, pages 248–253, Oct 2003.
- [14] E.J. Marinissen. Testing tsv-based three-dimensional stacked ics. In *Design, Automation Test in Europe Conference Exhibition (DATE), 2010*, pages 1689–1694, March 2010.
- [15] N. Ahmed, M. Tehranipoor, C.P. Ravikumar, and K.M. Butler. Local at-speed scan enable generation for transition fault testing using low-cost testers. *Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on*, 26(5):896–906, May 2007.
- [16] D.L. Lewis and H.-H.S. Lee. Architectural evaluation of 3d stacked rram caches. In 3D System Integration, 2009. 3DIC 2009. IEEE International Conference on, pages 1–4, Sept 2009.
- [17] J. Cong and Bingjun Xiao. mrfpga: A novel fpga architecture with memristor-based reconfiguration. In *Nanoscale Architectures (NANOARCH), 2011 IEEE/ACM International Symposium on*, pages 1–8, June 2011.
- [18] S. Tanachutiwat, Ming Liu, and Wei Wang. Fpga based on integration of cmos and rram. Very Large Scale Integration (VLSI) Systems, IEEE Transactions on, 19(11):2023–2032, Nov 2011.
- [19] J. Cong and Bingjun Xiao. Fpga-rpi: A novel fpga architecture with rram-based programmable interconnects. *Very Large Scale Integration (VLSI) Systems, IEEE Transactions on*, 22(4):864–877, April 2014.
- [20] J. Rosenfeld and E.G. Friedman. Design methodology for global resonant h-tree clock distribution networks. In *IEEE International Symposium on Circuits and Systems (ISCAS)*, pages 4 pp.–2076, May 2006.
- [21] M.R. Guthaus. Distributed lc resonant clock tree synthesis. In *Circuits and Systems* (*ISCAS*), 2011 IEEE International Symposium on, pages 1215–1218, May 2011.
- [22] Y.I Ismail and E.G. Friedman. Dtt: direct truncation of the transfer function an alternative to moment matching for tree structured interconnect. *Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on,* 21(2):131–144, Feb 2002.
- [23] I Dobbelaere, M. Horowitz, and A El Gamal. Regenerative feedback repeaters for programmable interconnections. *Solid-State Circuits, IEEE Journal of*, 30(11):1246–1253, Nov 1995.

- [24] Ming Liu and Wei Wang. rfga: Cmos-nano hybrid fpga using rram components. In *IEEE International Symposium on Nanoscale Architectures(NANOARCH)*, pages 93–98, June 2008.
- [25] Haitong Li, Peng Huang, Bin Gao, Bing Chen, Xiaoyan Liu, and Jinfeng Kang. A spice model of resistive random access memory for large-scale memory array simulation. *Electron Device Letters, IEEE*, 35(2):211–213, Feb 2014.
- [26] V. F. Pavlidis and E. G. Friedman. *Three-Dimensional Integrated Circuit Design*. Morgan Kaufmann Publishers, 2009.
- [27] Yuan Xie, Jason Cong, and Sachin Sapatnekar. *Three-Dimensional Integrated Circuit Design: EDA, Design and Microarchitectures.* Springer Publishing Company, Incorporated, 1st edition, 2009.
- [28] Hu Xu. Modeling and design techniques for 3-d ics under process, voltage, and temperature variations, 2012.
- [29] A. Sheibanyrad, F. Ptrot, and A. Jantsch. *3D Integration for NoC-based SoC Architectures*. Springer Publishing Company, Incorporated, 1st edition, 2010.
- [30] S. Tan, R. Gutmann, and R. Reif. Wafer Level 3-D ICs Process Technology. Springer, 2008.
- [31] A. Papanikolaou, D. Soudris, and R. Radojcic. *Three Dimensional System Integration*. Springer Publishing Company, Incorporated, 2011.
- [32] Sung Kyu Lim. *Design for High Performance, Low Power, and Reliable 3D Integrated Circuits.* Springer, 2013.
- [33] R. Topaloglu. Applications driving 3d integration and corresponding manufacturing challenges. In *Proceedings of the 48th Design Automation Conference*, DAC '11, pages 220–223, New York, NY, USA, 2011. ACM.
- [34] J. A. Burns *et al.* A wafer-scale 3-d circuit integration technology. *Electron Devices, IEEE Transactions on*, 53(10):2507–2516, Oct 2006.
- [35] E. Culurciello and A.G. Andreou. Capacitive inter-chip data and power transfer for 3-d vlsi. *Circuits and Systems II: Express Briefs, IEEE Transactions on*, 53(12):1348–1352, Dec 2006.
- [36] J. Xu, S. Mick, J. Wilson, Lei Luo, K. Chandrasekar, E. Erickson, and P.D. Franzon. Ac coupled interconnect for dense 3-d ics. In *Nuclear Science Symposium Conference Record, 2003 IEEE*, volume 1, pages 125–129 Vol.1, Oct 2003.
- [37] N. Miura, D. Mizoguchi, T. Sakurai, and T. Kuroda. Analysis and design of inductive coupling and transceiver circuit for inductive inter-chip wireless superconnect. *Solid-State Circuits, IEEE Journal of,* 40(4):829–837, April 2005.

- [38] K. Onizuka, H. Kawaguchi, M. Takamiya, T. Kuroda, and T. Sakurai. Chip-to-chip inductive wireless power transmission system for sip applications. In *Custom Integrated Circuits Conference, 2006. CICC '06. IEEE*, pages 575–578, Sept 2006.
- [39] Yuan Yuxiang, Y. Yoshida, and T. Kuroda. Non-contact 10% efficient 36mw power delivery using on-chip inductor in 0.18 μm cmos. In *Solid-State Circuits Conference, 2007. ASSCC* '07. *IEEE Asian*, pages 115–118, Nov 2007.
- [40] V. Sukumaran, Q. Chen, Fuhan Liu, N. Kumbhat, T. Bandyopadhyay, H. Chan, S. Min, C. Nopper, V. Sundaram, and R. Tummala. Through-package-via formation and metallization of glass interposers. In *Electronic Components and Technology Conference* (ECTC), 2010 Proceedings 60th, pages 557–563, June 2010.
- [41] K. Kikuchi, Hirotaka Oosato, S. Ito, S. Segawa, H. Nakagawa, K. Tokoro, and M. Aoyagi. 10-gbps signal propagation of high-density wiring interposer using photosensitive polyimide for 3d packaging. In *Electronic Components and Technology Conference, 2006. Proceedings. 56th*, pages 6 pp.–, 2006.
- [42] Dimitrios Velenis, Mikael Detalle, Yann Civale, Erik Jan Marinissen, Gerald Beyer, and Eric Beyne. Cost comparison between 3d and 2.5d integration. In *Electronic System-Integration Technology Conference (ESTC), 2012 4th*, pages 1–4, Sept 2012.
- [43] R. Chaware, K. Nagarajan, K. Ng, and S.Y. Pai. Assembly process integration challenges and reliability assessment of multiple 28nm fpgas assembled on a large 65nm passive interposer. In *Reliability Physics Symposium (IRPS), 2012 IEEE International*, pages 2B.2.1–2B.2.5, April 2012.
- [44] X. Dong and Y. Xie. System-level cost analysis and design exploration for threedimensional integrated circuits (3d ics). In *Asia and South Pacific Design Automation Conference (ASP-DAC)*, pages 234–241, Jan 2009.
- [45] Madhavan Swaminathan and Ki Jin Han. *Design and Modeling for 3D ICs and Interposers*. World Scientific, 2014.
- [46] G. Van der Plas *et al.*,. Design issues and considerations for low-cost 3-d tsv ic technology. *IEEE Journal of Solid-State Circuits*,, 46(1):293–307, Jan 2011.
- [47] Philip Garrou, Christopher Bower, and Peter Ramm. *Handbook of 3D integration*. Wiley-VCH, 2012.
- [48] M. Kawano, N. Takahashi, Y. Kurita, K. Soejima, M. Komuro, and S. Matsui. Threedimensional packaging technology for stacked dram with 3-gb/s data transfer. *IEEE Transactions on Electron Devices*, 55(7):1614–1620, July 2008.
- [49] R. S. Patti. Three-dimensional integrated circuits and the future of system-on-chip designs. *Proceedings of the IEEE*, 94(6):1214–1224, June 2006.

- [50] D. Henry *et al.* Low electrical resistance silicon through vias: technology and characterization. In *Electronic Components and Technology Conference*, pages 7 pp.–, 2006.
- [51] U. Kang *et al.* 8 gb 3-d ddr3 dram using through-silicon-via technology. *IEEE Journal ofSolid-State Circuits*, 45(1):111–119, Jan 2010.
- [52] J. Verbree, E.J. Marinissen, P. Roussel, and D. Velenis. On the cost-effectiveness of matching repositories of pre-tested wafers for wafer-to-wafer 3d chip stacking. In *IEEE European Test Symposium (ETS)*, pages 36–41, May 2010.
- [53] S. Reda, G. Smith, and L. Smith. Maximizing the functional yield of wafer-to-wafer 3-d integration. *IEEE Transaction on Very Large Scale Integrated Systems.*, 17(9):1357–1362, September 2009.
- [54] G. Smith, Larry Smith, S. Hosali, and S. Arkalgud. Yield considerations in the choice of 3d technology. In *International Symposium on Semiconductor Manufacturing (ISSM)*, pages 1–3, Oct 2007.
- [55] R.S. Patti. Three-dimensional integrated circuits and the future of system-on-chip designs. *Proceedings of the IEEE*, 94(6):1214–1224, June 2006.
- [56] http://chippacking.blogspot.ch/2011/04/3d-integration-technology-on.html.
- [57] C. McDonough, B. Backes, Wei Wang, and R.E. Geer. Thermal and spatial profiling of tsvinduced stress in 3dics. In *Reliability Physics Symposium (IRPS), 2011 IEEE International,* pages 5D.2.1–5D.2.6, April 2011.
- [58] Hui Min Lee, Er-Ping Li, En-Xiao Liu, and G.S. Samudra. Comprehensive study of the impact of tsv induced thermo-mechanical stress on 3d ic device performance. In *Electrical Design of Advanced Packaging and Systems Symposium (EDAPS), 2013 IEEE,* pages 36–39, Dec 2013.
- [59] J. Zhang, L. Zhang, Y. Dong, H.Y. Li, C.M. Tan, G. Xia, and C.S. Tan. The dependency of tsv keep-out zone (koz) on si crystal direction and liner material. In *3D Systems Integration Conference (3DIC), 2013 IEEE International*, pages 1–5, Oct 2013.
- [60] K. Athikulwongse, A. Chakraborty, Jae seok Yang, D.Z. Pan, and Sung-Kyu Lim. Stressdriven 3d-ic placement with tsv keep-out zone and regularity study. In *Computer-Aided Design (ICCAD), 2010 IEEE/ACM International Conference on*, pages 669–674, Nov 2010.
- [61] M. Grange, R. Weerasekera, and D. Pamunuwa. Optimal signaling techniques for through silicon vias in 3-d integrated circuit packages. In *IEEE Conference on Electrical Performance of Electronic Packaging and Systems (EPEPS)*, pages 237–240, Oct 2010.
- [62] G. Katti, M. Stucchi, K. De Meyer, and W. Dehaene. Electrical modeling and characterization of through silicon via for three-dimensional ics. *Electron Devices, IEEE Transactions on*, 57(1):256–262, Jan 2010.

- [63] I. Savidis and E. G. Friedman. Electrical modeling and characterization of 3-d vias. In *Circuits and Systems , Proceedings of the IEEE International Symposium on*, pages 784–787, 2008.
- [64] Jin Hu, Lingqiu Wang, Lifeng Jin, and Hao Zheng JiangNan. Electrical modeling and characterization of through silicon vias (tsv). In *Microwave and Millimeter Wave Technology* (*ICMMT*), 2012 International Conference on, volume 2, pages 1–4, May 2012.
- [65] Raphael interconnect analysis program reference manual, ver. a-2007.09, 2007.
- [66] Sentaurus device user guide, ver. a-2008.09, 2008.
- [67] I. Savidis and E.G. Friedman. Closed-form expressions of 3-d via resistance, inductance, and capacitance. *Electron Devices, IEEE Transactions on*, 56(9):1873–1881, Sept 2009.
- [68] Ming-Chao Tsai and TingTing Hwang. A study on the trade-off among wirelength, number of tsv and placement with different size of tsv. In *International Symposium on VLSI Design, Automation and Test (VLSI-DAT)*, pages 1–4, April 2011.
- [69] Chun-Hua Cheng, Chih-Hsien Kuo, and Shih-Hsu Huang. Tsv number minimization using alternative paths. In *IC Design Technology (ICICDT), 2011 IEEE International Conference on*, pages 1–4, May 2011.
- [70] M. Mukherjee and R. Vemuri. Simultaneous scheduling, binding and layer assignment for synthesis of vertically integrated 3d systems. In *IEEE International Conference on Computer Design: VLSI in Computers and Processors, (ICCD)*, pages 222–227, Oct 2004.
- [71] Wei-Kai Cheng and Yi-Chun Yen. A resource binding technique for tsv number minimization in high-level synthesis of 3d ics. In *Integrated Circuits (ISIC), 2011 13th International Symposium on*, pages 285–288, Dec 2011.
- [72] V. Krishnan and S. Katkoori. A 3d-layout aware binding algorithm for high-level synthesis of three-dimensional integrated circuits. In *International Symposium on Quality Electronic Design (ISQED)*, pages 885–892, March 2007.
- [73] V. F. Pavlidis and E. G. Friedman. Timing driven via placement heuristics in 3-d ics. *Integration, the VLSI Journal*, 41(4):489–508, 2008.
- [74] Y. Ghidini, M. Moreira, L. Brahm, T. Webber, N. Calazans, and C. Marcon. Lasio 3d noc vertical links serialization: Evaluation of latency and buffer occupancy. In *Symposium* on Integrated Circuits and Systems Design (SBCCI), pages 1–6, Sept 2013.
- [75] S. Pasricha. Exploring serial vertical interconnects for 3d ics. In *ACM/IEEE Design Automation Conference (DAC)*, pages 581–586, July 2009.
- [76] M.B. Healy and Sung-Kyu Lim. Distributed tsv topology for 3-d power-supply networks. *Very Large Scale Integration (VLSI) Systems, IEEE Transactions on*, 20(11):2066–2079, Nov 2012.

- [77] I. Tsioutsios, V.F. Pavlidis, and G. De Micheli. Physical design tradeoffs in power distribution networks for 3-d ics. In *IEEE International Conference on Electronics, Circuits, and Systems (ICECS)*, pages 430–433, Dec 2010.
- [78] N.H. Khan, S.M. Alam, and S. Hassoun. Power delivery design for 3-d ics using different through-silicon via (tsv) technologies. *Very Large Scale Integration (VLSI) Systems, IEEE Transactions on*, 19(4):647–658, April 2011.
- [79] Gang Huang, M. Bakir, A. Naeemi, H. Chen, and J.D. Meindl. Power delivery for 3d chip stacks: Physical modeling and design implication. In *IEEE Conference on Electrical Performance of Electronic Packaging*, pages 205–208, Oct 2007.
- [80] J. Cong and Yan Zhang. Thermal via planning for 3-d ics. In *IEEE/ACM International Conference on Computer-Aided Design (ICCAD)*, pages 745–752, Nov 2005.
- [81] Kihyun Yoon, Gawon Kim, Woojin Lee, Taigon Song, Junho Lee, Hyungdong Lee, Kunwoo Park, and Joungho Kim. Modeling and analysis of coupling between tsvs, metal, and rdl interconnects in tsv-based 3d ic with silicon interposer. In *Electronics Packaging Technology Conference, 2009. EPTC '09. 11th*, pages 702–706, Dec 2009.
- [82] Jonghyun Cho, Joohee Kim, Taigon Song, Jun So Pak, Joungho Kim, Hyungdong Lee, Junho Lee, and Kunwoo Park. Through silicon via (tsv) shielding structures. In *Electrical Performance of Electronic Packaging and Systems (EPEPS), 2010 IEEE 19th Conference on*, pages 269–272, Oct 2010.
- [83] Yuan-Ying Chang, Y.S.-C. Huang, V. Narayanan, and Chung-Ta King. Shieldus: A novel design of dynamic shielding for eliminating 3d tsv crosstalk coupling noise. In *Asia and South Pacific Design Automation Conference (ASP-DAC)*, pages 675–680, Jan 2013.
- [84] Wen-Sheng Zhao, Yong-Xin Guo, and Wen-Yan Yin. Transmission characteristics of a coaxial through-silicon via (c-tsv) interconnect. In *Electromagnetic Compatibility (EMC)*, 2011 IEEE International Symposium on, pages 373–378, Aug 2011.
- [85] N.H. Khan, S.M. Alam, and S. Hassoun. Through-silicon via (tsv)-induced noise characterization and noise mitigation using coaxial tsvs. In *3D System Integration, 2009. 3DIC* 2009. IEEE International Conference on, pages 1–7, Sept 2009.
- [86] E.G. Friedman. Clock distribution networks in synchronous digital integrated circuits. *Proceedings of the IEEE*, 89(5):665–692, May 2001.
- [87] T. Xanthopoulos, D. W. Bailey, A. K. Gangwar, M. K. Gowan, A. K. Jain, and B. K. Prewitt. The design and analysis of the clock distribution network for a 1.2 ghz alpha microprocessor. In *IEEE International Solid-State Circuits Conference. Digest of Technical Papers. ISSCC*, pages 402–403, Feb 2001.
- [88] Xin Zhao, J. Minz, and Sung-Kyu Lim. Low-power and reliable clock network design for through-silicon via (tsv) based 3d ics. *Components, Packaging and Manufacturing Technology, IEEE Transactions on*, 1(2):247–259, Feb 2011.

- [89] Xu He, Sheqin Dong, Yuchun Ma, and Xianlong Hong. Simultaneous buffer and interlayer via planning for 3d floorplanning. In *Quality of Electronic Design (ISQED)*, pages 740–745, March 2009.
- [90] H. Xu, V. F. Pavlidis, and G. De Micheli. Repeater insertion for two-terminal nets in three-dimensional integrated circuits. In *Nano-Net*, volume 20 of *Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering,* pages 141–150. Springer Berlin Heidelberg, 2009.
- [91] V.S. Sathe, J.C. Kao, and M.C. Papaefthymiou. Rf2: A 1ghz fir filter with distributed resonant clock generator. In *IEEE Symposium on VLSI Circuits*, pages 44–45, June 2007.
- [92] V.L. Chi. Salphasic distribution of clock signals for synchronous systems. *IEEE Transac*tions on Computers, 43(5):597–602, May 1994.
- [93] B. Noia and K. Chakrabarty. Pre-bond probing of tsvs in 3d stacked ics. In *Test Conference (ITC), 2011 IEEE International,* pages 1–10, Sept 2011.
- [94] E.J. Marinissen and Y. Zorian. Testing 3d chips containing through-silicon vias. In *International Test Conference (ITC)*, pages 1–11, Nov 2009.
- [95] K. Chakrabarty, S. Deutsch, H. Thapliyal, and Y. Fangming. Tsv defects and tsv-induced circuit failures: The third dimension in test and design-for-test. In *Reliability Physics Symposium (IRPS), 2012 IEEE International*, pages 5F.1.1–5F.1.12, April 2012.
- [96] Jia Li and Dong Xiang. Dft optimization for pre-bond testing of 3d-sics containing tsvs. In *IEEE International Conference on Computer Design (ICCD)*, pages 474–479, Oct 2010.
- [97] Z. Xin, D. L. Lewis, H. H. Lee, and S. Lim. Low-power clock tree design for pre-bond testing of 3-d stacked ics. *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, 30(5):732–745, May 2011.
- [98] J. Verbree, E.J. Marinissen, P. Roussel, and D. Velenis. On the cost-effectiveness of matching repositories of pre-tested wafers for wafer-to-wafer 3d chip stacking. In *IEEE European Test Symposium (ETS)*, pages 36–41, May 2010.
- [99] C. Ferri, S. Reda, and R.I. Bahar. Strategies for improving the parametric yield and profits of 3d ics. In *IEEE/ACM International Conference on Computer-Aided Design (ICCAD)*, pages 220–226, Nov 2007.
- [100] X. Wu, P. Falkenstern, and Y. Xie. Scan chain design for threedimentional integrated circuits (3d ics). In *International Conference on Computer Design (ICCD)*, pages 208–214, Nov 2007.
- [101] Tak-Yung Kim and Taewhan Kim. Clock tree synthesis with pre-bond testability for 3d stacked ic designs. In *ACM/IEEE Design Automation Conference (DAC)*, pages 723–728, June 2010.

- [102] M. Buttrick and S. Kundu. On testing prebond dies with incomplete clock networks in a 3d ic using dlls. In *Design, Automation Test in Europe Conference Exhibition (DATE),* 2011, pages 1–6, March 2011.
- [103] E.J. Marinissen, Dae Young Lee, J.P. Hayes, C. Sellathamby, B. Moore, S. Slupsky, and
   L. Pujol. Contactless testing: Possibility or pipe-dream? In *Design, Automation Test in Europe Conference Exhibition, 2009. DATE '09.*, pages 676–681, April 2009.
- [104] C.V. Sellathamby, M.M. Reja, Lin Fu, B. Bai, E. Reid, S.H. Slupsky, I.M. Filanovsky, and K. Iniewski. Noncontact wafer probe using wireless probe cards. In *Test Conference*, 2005. Proceedings. ITC 2005. IEEE International, pages 6 pp.–452, Nov 2005.
- [105] A. Radecki, Hayun Chung, Y. Yoshida, N. Miura, T. Shidei, H. Ishikuro, and T. Kuroda. 6w/25mm2 inductive power transfer for non-contact wafer-level testing. In Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2011 IEEE International, pages 230–232, Feb 2011.
- [106] V. Vorisek, T. Koch, and H. Fischer. At-speed testing of soc ics. In *Design, Automation and Test in Europe Conference and Exhibition (DATE)*, volume 3, pages 120–125 Vol.3, Feb 2004.
- [107] B. Keller, A. Uzzaman, L.Bibo, and T. Snethen. Using programmable on-product clock generation (opcg) for delay test. In *Asian Test Symposium (ATS)*, pages 69–72, Oct 2007.
- [108] Xijiang Lin, Ron Press, J. Rajski, P. Reuter, T. Rinderknecht, B. Swanson, and N. Tamarapalli. High-frequency, at-speed scan testing. *Design Test of Computers, IEEE*, 20(5):17–25, Sept 2003.
- [109] http://public.itrs.net].
- [110] Young-Joon Lee, Patrick Morrow, and Sung Kyu Lim. Ultra high density logic designs using transistor-level monolithic 3d integration. In *Proceedings of the International Conference on Computer-Aided Design*, ICCAD '12, pages 539–546, New York, NY, USA, 2012. ACM.
- [111] B. Rajendran. Sequential 3d ic fabrication: Challenges and prospects. *IEEE Transaction on Electron Devices*, 2010.
- [112] P. Batude *et al.* Advances in 3d cmos sequential integration. In *IEEE International Electron Devices Meeting (IEDM)*, pages 1–4, Dec 2009.
- [113] O. Thomas, M. Vinet, O. Rozeau, P. Batude, and A. Valentian. Compact 6t sram cell with robust read/write stabilizing design in 45nm monolithic 3d ic technology. In *IC Design* and Technology, 2009. ICICDT '09. IEEE International Conference on, pages 195–198, May 2009.

- [114] Shari N. Farrens, James R. Dekker, Jason K. Smith, and Brian E. Roberds. Chemical free room temperature wafer to wafer direct bonding. *Journal of The Electrochemical Society*, 142(11):3949–3955, 1995.
- [115] Meng-Fan Chang, Shyh-Shyuan Sheu, Ku-Feng Lin, Che-Wei Wu, Chia-Chen Kuo, Pi-Feng Chiu, Yih-Shan Yang, Yu-Sheng Chen, Heng-Yuan Lee, Chen-Hsin Lien, F.T. Chen, Keng-Li Su, Tzu-Kun Ku, Ming-Jer Kao, and Ming-Jinn Tsai. A high-speed 7.2-ns read-write random access 4-mb embedded resistive ram (reram) macro using processvariation-tolerant current-mode read schemes. *IEEE Journal of Solid-State Circuits*, 48(3):878–891, March 2013.
- [116] A. Sawa. Resistive switching in transition metal oxides. *Materials Today*, 1(6):28 36, 2008.
- [117] M. Liu and W. Wang. Application of nanojunction-based rram to reconfigurable ic. *Micro Nano Letters, IET*, 3(3):101–105, September 2008.
- [118] P.-E. Gaillardon, D. Sacchetto, S. Bobba, Y. Leblebici, and G. De Micheli. Gms: Generic memristive structure for non-volatile fpgas. In VLSI and System-on-Chip (VLSI-SoC), 2012 IEEE/IFIP 20th International Conference on, pages 94–98, Oct 2012.
- [119] Sergei Skorobogatov. Data remanence in flash memory devices. In Proceedings of the 7th International Conference on Cryptographic Hardware and Embedded Systems, CHES'05, pages 339–353, Berlin, Heidelberg, 2005. Springer-Verlag.
- [120] Young Yang Liauw, Zhiping Zhang, Wanki Kim, AE. Gamal, and S.S. Wong. Nonvolatile 3d-fpga with monolithically stacked rram-based configuration memory. In *Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2012 IEEE International*, pages 406–408, Feb 2012.
- [121] Andrea L. Lacaita and Dirk J. Wouters. Phase-change memories. *physica status solidi* (a), 205(10):2281–2297, 2008.
- [122] E. Ahmed and J. Rose. The effect of lut and cluster size on deep-submicron fpga performance and density. *Very Large Scale Integration (VLSI) Systems, IEEE Transactions on*, 12(3):288–298, March 2004.
- [123] Chen Dong, Deming Chen, S. Haruehanroengra, and Wei Wang. 3-d nfpga: A reconfigurable architecture for 3-d cmos/nanomaterial hybrid digital circuits. *Circuits and Systems I: Regular Papers, IEEE Transactions on*, 54(11):2489–2501, Nov 2007.
- [124] Xin Zhao, D.L. Lewis, H.-H.S. Lee, and Sung-Kyu Lim. Pre-bond testable low-power clock tree design for 3d stacked ics. In *IEEE/ACM International Conference on Computer-Aided Design (ICCAD)- Digest of Technical Papers*, pages 184–190, Nov 2009.
- [125] Gil-Su Kim, M. Takamiya, and T. Sakurai. A capacitive coupling interface with high sensitivity for wireless wafer testing. In *3D System Integration*, 2009. *3DIC 2009. IEEE International Conference on*, pages 1–5, Sept 2009.

- [126] V.S. Sathe, J.C. Kao, and M.C. Papaefthymiou. Rf2: A 1ghz fir filter with distributed resonant clock generator. In VLSI Circuits, 2007 IEEE Symposium on, pages 44–45, June 2007.
- [127] V.F. Pavlidis, I Savidis, and E.G. Friedman. Clock distribution networks for 3-d ictegrated circuits. In *Custom Integrated Circuits Conference, 2008. CICC 2008. IEEE*, pages 651–654, Sept 2008.
- [128] S. Rahimian, V.F. Pavlidis, and G. De Micheli. Design of resonant clock distribution networks for 3-d integrated circuits. In *International Workshop on Integrated Circuit and System Design. Power and Timing Modeling, Optimization, and Simulation*, pages 267–277, 2011.
- [129] Xuchu Hu and M. Guthaus. Distributed resonant clock grid synthesis (rocks). In *Design Automation Conference (DAC), 2011 48th ACM/EDAC/IEEE*, pages 516–521, June 2011.
- [130] Xuchu Hu, W. Condley, and M.R. Guthaus. Library-aware resonant clock synthesis (larcs). In *Design Automation Conference (DAC), 2012 49th ACM/EDAC/IEEE*, pages 145–150, June 2012.
- [131] S. Rahimian, V. F. Pavlidis, X. Tang, and G. De Micheli. An enhanced design methodology for resonant clock trees. *Journal of Low Power Electronics*, 9(2):198–206, 2013.
- [132] S.S. Mohan, M. del Mar Hershenson, S. P. Boyd, and T. H. Lee. Simple accurate expressions for planar spiral inductances. *IEEE Journal of Solid-State Circuits*, 34(10):1419– 1424, Oct 1999.
- [133] T. H. Lee. *The Design of CMOS Radio-Frequency Integrated Circuits*. Cambridge University Press, 2004.
- [134] www.comsol.com.
- [135] http://rfic.eecs.berkeley.edu/ niknejad/asitic.html.
- [136] http://www.sonnetsoftware.com.
- [137] S.E Esmaeili, A.J. Al-Kahlili, and G.E.R. Cowan. Low-swing differential conditional capturing flip-flop for lc resonant clock distribution networks. *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, 20(8):1547–1551, Aug 2012.
- [138] Wei-Yu Tsai, Ching-Te Chiu, Jen-Ming Wu, S.S.H. Hsu, Yar-Sun Hsu, and Ying-Fang Tsao. A novel low gate-count serializer topology with multiplexer-flip-flops. In *Circuits and Systems (ISCAS), 2012 IEEE International Symposium on*, pages 245–248, May 2012.
- [139] J. Kim *et al.* High-frequency scalable electrical model and analysis of a through silicon via (tsv). *Components, Packaging and Manufacturing Technology, IEEE Transactions on*, 1(2):181–195, Feb 2011.

- [140] R. Weerasekera, M. Grange, D. Pamunuwa, and H. Tenhunen. On signalling over throughsilicon via (tsv) interconnects in 3-d integrated circuits. In *Design, Automation Test in Europe Conference Exhibition (DATE), 2010,* pages 1325–1328, March 2010.
- [141] Krishna C. Chillara, Jinwook Jang, and Wayne P. Burleson. Robust signaling techniques for through silicon via bundles. In ACM Great Lakes Symposium on VLSI, pages 383–386, 2011.
- [142] Wen-Chih Huang, Chih-Hsing Lin, and Ching-Te Chiu. Embedded transition inversion coding for low power serial link. In *IEEE Workshop on Signal Processing Systems (SiPS)*, pages 102–105, Oct 2011.
- [143] A.R. Bharghava and M. B. Srinivas. Transition inversion based low power data coding scheme for synchronous serial communication. In *IEEE Computer Society Annual Symposium on VLSI (ISVLSI)*, pages 103–108, May 2009.
- [144] Xiangyu Dong and Yuan Xie. System-level cost analysis and design exploration for threedimensional integrated circuits (3d ics). In Asia and South Pacific Design Automation Conference (ASP-DAC), pages 234–241, Jan 2009.
- [145] Mingjie Lin and A El Gamal. A low-power field-programmable gate array routing fabric.
   Very Large Scale Integration (VLSI) Systems, IEEE Transactions on, 17(10):1481–1494, Oct 2009.
- [146] P. Gaillardon *et al.* Design and architectural assessment of 3-d resistive memory technologies in fpgas. *Nanotechnology, IEEE Transactions on*, 12(1):40–50, Jan 2013.
- [147] Vaughn Betz, Jonathan Rose, and Alexander Marquardt, editors. *Architecture and CAD* for Deep-Submicron FPGAs. Kluwer Academic Publishers, Norwell, MA, USA, 1999.
- [148] W. C. Elmore. The transient response of damped linear networks with particular regard to wideband amplifiers. *Journal of Applied Physics*, 19(1):55–63, Jan 1948.
- [149] H.Y. Lee, P.S. Chen, T. Y Wu, Y.S. Chen, C.C. Wang, P.J. Tzeng, C. H Lin, F. Chen, C.H. Lien, and M. J Tsai. Low power and high speed bipolar switching with a thin reactive ti buffer layer in robust hfo2 based rram. In *IEEE International Electron Devices Meeting (IEDM)*, pages 1–4, Dec 2008.
- [150] J. Rose *et al.*,. The vtr project: Architecture and cad for fpgas from verilog to routing. In ACM/SIGDA International Symposium on Field Programmable Gate Arrays, FPGA '12, pages 77–86, New York, NY, USA, 2012. ACM.
- [151] Saeyang Yang. Logic synthesis and optimization benchmarks user guide version 3.0, 1991.
- [152] University of California in Berkeley. Abc: A system for sequential synthesis and verification, Available online. http://www.eecs.berkeley.edu/ alanmi/abc/.
#### CURRICULUM VITAE September 2014





Mailing Address: Chemin de Rionza 3, 1020 Renens VD, Switzerland

**Phone:** (+41) 78 6346573

**Email:** s.rahimian.omam@gmail.com

# **Personal Information**

Date of Birth: June, 26, 1980 Place of Birth: Roodsar, Iran Nationality: Iranian Switzerland Residency Permit B Marital Status: Single

# **Objective**

Seeking a position as an experienced digital/mixed-signal electronic engineer and researcher in a suitable environment to translate my skill, knowledge, and abilities into value for the organization.

# **Education**

| <b>2010-present</b><br>( <i>expected</i> 2014) | <b>PhD</b> on Electrical Engineering, Ecole Polytechnique Federale de Lausanne<br><b>Thesis Title</b> "Exploiting on-chip inductors for signaling and testing in 3-D ICs"                                                                                                                                                                                                                                                                           |
|------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 2003-2006                                      | <b>M.Sc.</b> on Electrical Engineering, University of Tehran, Tehran, Iran<br><b>Thesis Title</b> : "Design and Implementation of Interpolation Filters for High-<br>Precision Over-Sampling Digital to Analog Converters"                                                                                                                                                                                                                          |
| 1998-2003                                      | <b>B.Sc.</b> on Electrical Engineering, Electronics, Sharif University of Technology, Tehran, Iran                                                                                                                                                                                                                                                                                                                                                  |
|                                                | <b>Thesis Title</b> : "Design and Implementation of a Universall Controller Board with FPGA "                                                                                                                                                                                                                                                                                                                                                       |
| Research Experience                            |                                                                                                                                                                                                                                                                                                                                                                                                                                                     |
| 2010-present                                   | <ul> <li>Researcher as Doctoral assistant, Integrated System Laboratory (LSI), EPFL</li> <li>Proposing a methodology for exploiting resonant clock networks for 3D circuits</li> <li>Designing contactless approach for pre-bond testing in 3D circuits employing inductive links</li> <li>Exploiting RRAM switches to design 3-D FPGAs</li> </ul>                                                                                                  |
| 2003-2006:                                     | <ul> <li>Research Assistant, IC Lab, University of Tehran</li> <li>Design and implementing the digital part of a high-precision over-sampling D/A converter. MATLAB design, Simulink Fixed-point conversions, HDL design, implementation, simulation, synthesis, and verification.</li> <li>Circuit and system design of a second order sigma-delta analog modulator and the corresponding digital decimation filter (MATLAB/HDL/HSPICE)</li> </ul> |
| Work Experience                                |                                                                                                                                                                                                                                                                                                                                                                                                                                                     |
| 2004-2007                                      | Digital Design Engineer with NikTek Semiconductor, Tehran, Iran                                                                                                                                                                                                                                                                                                                                                                                     |
|                                                | Design of 192/96/48 KS/s, Digital Decimation Filter for 24-bits Audio Sigma-Deltal23 ADC in 0.18u technology.                                                                                                                                                                                                                                                                                                                                       |

- System level filter design in MATLAB/Simulink
- Front-end design, HDL implementation (Verilog)
- Back-end design, Post place & route simulation both for ASIC & FPGA
- Lab test: FPGA prototype verification, ASIC board level verification
- Documentation

2007-2010 Digital Design Engineer with Rezvan Incorporation, Tehran, IranDesign and Implementation of the Digital Base-band of mobile DVBT Receiver in 0.13u technology

- Front-end design, HDL implementation (VHDL/Verilog)
- Back-end design, Post place & route simulation both for ASIC & FPGA
- Lab test: FPGA prototype verification, ASIC board level verification

# **Technical Skills**

- Hardware Description Language: Verilog, VHDL
- EDA Tools: Mentor Modelsim, Altera Quartus, Synopsys Design Compiler, Cadence Sim Vision
- System Design Tools: MATLAB/SIMULINK

#### Languages

- Persian Mother tongue
- English fluent (C1/C2)
- **French** intermediate (A2/B1, continuing the courses)
- German Beginner (A1, continuing the courses)

# **Research Interest**

- 3-D integration
- Resonant clocking
- Digital system design an implementation for communication application
- Mixed\_signal design
- Data Converter design and implementation

# **Honors and Awards**

**2003,** Ranked 70<sup>th</sup> among 12000 participants in the nationwide university entrance exam for graduate studies in electrical engineering, Tehran, Iran.

**1998**, Ranked 12<sup>th</sup> among 350000 participants in the nationwide university entrance exam for undergraduate studies in electrical engineering, Tehran, Iran.

# **Extracurricular Activities**

- Playing "Tar", a traditional Persian music instrument.
- Hiking
- Social activities and organizing events, e.g.
  - o President of Iranian Student Iranian Students Association At EPFL (2012-2013).
  - Head of communication group in of Resana magazine (the magazine of electrical engineering faculty in Sharif University of Technology).
  - Founder and committee member of NavaConcerts association to organize traditional and classical music concerts from Iran or other countries in Switzerland (organizing 4 concerts in a year)
  - Helping the charity "Child Foundation" and organizing a charity dinner in favour of them at EPFL

#### **Publications**

#### **Journal Papers:**

[1] **S. Rahimian**, V. F. Pavlidis, and G. De Micheli, "An Enhanced Design Methodology for Resonant Clock Trees," Journal of Low Power Electronics, Vol. 9, No. 2, pp. 198-206, August 2013.

[2] **S. Rahimian**, V. F. Pavlidis, and G. De Micheli, "Inter-Plane Communication Methods for 3-D ICs," Journal of Low Power Electronics. Vol. 8, No. 2, pp. 170-181, April 2012.

[3] **S. Rahimian**, S. M. Fakhraie, and O. Shoaei, "Minimizing the addercost in multiple constant multipliers," IEICE Electronics Express, Vol. 3, No. 14, pp. 340-346, July 2006.

#### **Conference Papers:**

[1] **S. Rahimian**, Y. Leblebici, and G. De Micheli, "Parallel vs. Serial Inter-plane communication using TSVs," IEEE Latin American Symposium on Circuits and Systems (LASCAS), pp. 25-28, February 2014.

[2] **S. Rahimian**, V. F. Pavlidis, and G. De Micheli, "A Low-Overhead Method for Pre-bond Test of Resonant 3-D Clock Distribution Networks," 3-D Test Workshop in International Test Conference, August 2012.

[3] **S. Rahimian**, G. De Micheli and V. F. Pavlidis, "Low-power clock distribution networks for 3-D ICs," IEEE Convention of Electrical and Electronics Engineers in Israel, 2012.

[4] **S. Rahimian**, V. F. Pavlidis, and G. De Micheli, "Design of Resonant Clock Distribution Networks for 3-D Integrated Circuits," Proceedings of the International Workshop on Power and Timing Modeling, Optimization, and Simulation, pp. 267-277, August 2011.

[5] **S. Rahimian**, S. M. Mortazavi, S. M. Fakhraie, and O. Shoaei, "Implementation of Multiplier Block with Reduced Adder Cost," IEEE International Conference on Electronics, Circuits and Systems, December 2006.

[6] **S. Rahimian**, S. M. Mortazavi Zanjani, S. M. Fakhraie, and O. Shoaei, "Minimizing the adder cost in multiple constant multipliers," IEEE International Conference on Electronics, Circuits and Systems, December. 2006.

[7] **S. Rahimian**, S. M. Mortazavi Zanjani, S. M. Fakhraie, and O. Shoaei, "Design of high-precision low-power interpolation modules with modified Sinc filters," IEEE International Midwest Symposium on Circuits and System, August 2006.

[8] S. M. Mortazavi Zanjani, S. Rahimian, S. M. Fakhraie, and O. Shoaei, "Algorithmic design of high-precision low-power multi-stage decimation filters," IEEE International Midwest Symposium on Circuits and System, August 2006.

[9] S. M. Mortazavi Zanjani, S. M. Fakhraie, **S. Rahimian**, and O. Shoaei, "Experimental evaluation of different realizations of recursive CIC filters," IEEE Canadian Conference on Electrical and Computer Engineering, May 2006.

[10] V. Majidzadeh, **S. Rahimian**, and O. Shoaei, "A Simplified First Order Mismatch Shaping DAC for Sigma-Delta," in Proc. Of the 14th Iranian International Conference on Electrical Engineering, May 2006.