Technology scaling has progressed to enable integrated circuits with extremely high density enabling systems of tremendous complexity with manageable power consumption. With the continuation of Moore's law for many years, electronic chips have been able to accommodate the growing performance demand and energy-efficiency requirement of many applications. Unfortunately, in the past decade, we have also seen diminishing returns from the most advanced process nodes in terms of performance and power consumption. The issue lies in the fact that it has become increasingly difficult to guarantee a reliable operation without costly design margins due to process variations. The diminishing benefit from technology scaling is mainly due to the worst-case design paradigm which requires a conservative design margin. It has been therefore proposed to abandon the conservative and 100% error-free design paradigm while exploiting the inherent fault-tolerance of many applications through approximate computing to avoid the need for conservative margins. The contributions of this thesis focus on algorithmic and architectural techniques for the design of approximate and efficient hardware and are summarized as follows. First, we propose a design methodology that allows us to drop the requirement of 100% reliable operation and to accept dies with unreliable memory components. We examine the proposed methodology for multiple applications such as image processing kernels and a channel decoder in communication systems and we show the first measured example of an integrated circuit that delivers a stable performance despite the presence of errors in its memories. Second, we propose a systematic statistical framework to dynamically adapt the output quality of a channel decoder at run-time, as an example of an iterative algorithm, which provides a notable reduction in the energy consumption for different approximation levels. Finally, we propose algorithmic and architectural techniques at design-time, inspired by static approximate computing paradigm, to reduce the complexity of the arithmetic units of a channel decoder resulting in a significant reduction in both design area and energy consumption while still achieving the best-in-class throughput.