The FPGA implementation of Viterbi decoders for multiple-input multiple-output (MIMO) wireless communication systems with bit-interleaved coded modulation (BICM) and perantenna coding is considered. The paper describes how the recursive add-compare-select (ACS) unit, which constitutes the performance bottleneck of the circuit, can be pipelined to increase the throughput. As opposed to employing multiple parallel decoders, silicon area (resource utilization on the FPGA) is significantly reduced. The proposed optimizations lead to an implementation that achieves a throughput of 216 Mbps in a 4 x 4 MIMO-WLAN system prototype based on IEEE 802.11a.