From an error rate performance perspective, maximum likelihood (ML) detection is the preferred detection method for multiple-input multiple-output (MIMO) communication systems. However, for high transmission rates a straight forward exhaustive search implementation suffers from prohibitive complexity. The K-best algorithm provides close-to-ML bit error rate (BIER) performance, while its circuit complexity is reduced compared to an exhaustive search. In this paper, a new VLSI architecture for the implementation of the K-best algorithm is presented. Instead of the mostly sequential processing that has been applied in previous VLSI implementations of the algorithm, the presented solution takes a more parallel approach. Further-more, the application of a simplified norm is discussed. The implementation in an ASIC achieves up to 424 Mbps throughput with an area that is almost on par with current state-of-the-art implementations.