Lattice-reduction (LR)-aided successive interference cancellation (SIC) is able to achieve close-to optimum error-rate performance for data detection in multiple-input multiple-output (MIMO) wireless communication systems. In this work, we propose a hardware-efficient VLSI architecture of the Lenstra-Lenstra-Lovasz (LLL) LR algorithm for SIC-based data detection. For this purpose, we introduce various algorithmic modifications that enable an efficient hardware implementation. Comparisons with existing FPGA implementations show that our design outperforms state-of-the-art LR implementations in terms of hardware-efficiency and throughput. We finally provide reference ASIC implementation results for 130 nm CMOS technology.