High-speed serial links are a crucial application of semiconductor technology and have been the enabler of the scaling of computing systems. The increasing data-rate requirements of these links have only been partially satisfied by advancements in process technologies. Implementing serial links operating at a data rate of more than 25 Gb/s thus requires constantly inventing smarter algorithms, architectures, power-management techniques, and new standards. The Viterbi algorithm (VA), an attractive solution for symbol detection in the presence of intersymbol interference (ISI) and noise, minimizes the error probability in detecting the whole symbol sequence, instead of a single symbol as in decision-feedback equalization. The bit-error-rate (BER) performance of the VA is thus better than that of symbol-by-symbol detectors because the VA does not cancel the ISI, but rather uses the information embedded therein to maximize the reliability of its decisions. However, implementing a maximum-likelihood sequence detector (MLSD) realizing the VA may be prohibitive because design specifications regarding area, latency, power consumption, and speed may not be satisfied concurrently. Suboptimal solutions, such as feed-forward and decision-feedback equalizers (DFEs), may therefore be chosen instead of MLSDs due to their lower implementation complexity. A reduced-state sequence detector (RSSD) reduces the implementation complexity of the MLSD with negligible performance degradation by exploiting set-partitioning principles and embedded per-survivor decision feedback. A sliding-block or systolic-array Viterbi detector (VD) breaks the speed bottleneck in sequence-detector implementations by parallelizing the operation of the VA. In this thesis, we implement a 56-Gb/s four-level pulse-amplitude-modulation (4-PAM) DFE to demonstrate the feasibility of a multi-level analog-to-digital-converter (ADC)-based symbol-by-symbol detector in terms of achievable area, BER, energy-efficiency, latency, and speed figures comparable to those of analog solutions. Furthermore, to improve the BER figures, we implement a 25.6-Gb/s 4-PAM reduced-state sliding-block VD to demonstrate the feasibility of a multi-level sequence detector in terms of achievable area, energy-efficiency, latency, and speed figures comparable to those of symbol-by-symbol detectors. Moreover, we develop a sliding-block VA with optimized unequal synchronization and survivor path memory lengths to reduce its implementation complexity, latency, and power consumption. An increase in speed is thereby achieved at the same implementation complexity, BER, latency, and power consumption. We then develop a novel VD with embedded per-survivor decision feedback, whose longest path contains only one adder, to increase its speed significantly. Next, we propose a concatenated-coding scheme using an outer Reed—Solomon code and a four-dimensional (4-D) 5-PAM inner trellis-coded-modulation (TCM) scheme to achieve signal-to-noise-ratio gains without bandwidth expansion. Finally, we implement a 70-Gb/s 4-D 5-PAM systolic-array TCM decoder with eight states, which includes an inverse Tomlinson—Harashima precoder, to demonstrate the feasibility of a multi-level ADC-based sequence decoder in terms of achievable speed figures comparable to those of sequence detectors.