Files

Abstract

The invention of the integrated circuit and the manufacturing progress as well as continuing progress in the manufacturing process are the fundamental engines for the implementation of all technologies that support today's information society. The vast majority of microelectronic applications presented nowadays use the well-established CMOS process and fabrication technology which exhibit high reliability rates. The hypothesis of reliable components has mostly been taken in the development of electronic systems fabricated in the past four decades. The steady downscaling of CMOS technology has led to the development of devices with nanometer dimensions. For future nano-circuits, emerging nanodevices and their associated interconnects, the expected higher probabilities of failures, as well as the higher sensitivities to noise and variations, could make future chips prohibitively unreliable. The systems to be fabricated will be made of unreliable components and achieving 100% correctness will not be only extremely costly, but might be plainly impossible. The global picture is that reliability emerges as one of the most significant threats to the design of future integrated computing systems. Building reliable systems out of unreliable components will require increased cooperative involvement of the logic designers and architects, where high-level techniques will rely upon lower levels support based on novel modeling including component and system reliability as design parameters. An architecture suitable for circuit-level and gate-level redundant modules and exhibiting significant immunity to permanent and random failures, as well as unwanted fluctuation of the fabrication parameters is presented, which is based on a four-layer feed-forward topology, using averaging and thresholding as the core voter mechanisms. The architecture with both fixed and adaptable threshold is compared to triple and R-fold modular redundancy techniques, and its superiority is demonstrated based on numerical simulations as well as analytical developments. A chip implementation of the architecture is realized. Other applications of the architecture like delay variations minimization are identified and explored. A novel general method enabling introduction of fault-tolerance, and evaluation of circuit and architecture reliability is proposed. The method is based on the modeling of probability density functions (PDFs) of unreliable components and their subsequent evaluation for a given reliability architecture. PDF modeling, presented for the first in the context of realistic technology and arbitrary circuit size, is based on a cutting-edge reliability evaluation algorithm and offers scalability, speed and accuracy. Fault modeling has also been developed to support PDF modeling. In the second part of the thesis a new methodology that introduces reliability in existing design flows is proposed. The methodology consists of partitioning the whole system into reliability-optimal partitions and applying reliability evaluation and optimization at local and system level. System level reliability improvement of different fault-tolerant techniques is studied in depth. Optimal partition size analysis and redundancy optimization have been performed for the first time in the context of a large-scale system, showing that a target reliability can be achieved with low to moderate redundancy factors (R < 50) even for high defect densities (device failure rate up to 10-3). The optimal window of application of each fault-tolerant technique with respect to defect density is presented as a way to find the optimum design trade-off between the reliability and power/area. R-fold modular redundancy with distributed voting and averaging voter is selected as the most promising candidate for the implementation in trillion-transistor logic systems. Finally, a realistic circuit example of the methodology implementation is verified using simulations.

Details

Actions