Polar codes are capacity-achieving error-correcting codes with an explicit construction that can be decoded with low-complexity algorithms. In this work, we show how the state-of-the-art low-complexity decoding algorithm can be improved to better accommodate low-rate codes. More constituent codes are recognized in the updated algorithm and dedicated hardware is added to efficiently decode these new constituent codes. We also alter the polar code construction to further decrease the latency and increase the throughput with little to no noticeable effect on error-correction performance. Rate-flexible decoders for polar codes of length 1024 and 2048 are implemented on FPGA. Over the previous work, they are shown to have from 22% to 28% lower latency and 26% to 34% greater throughput when decoding low-rate codes. On 65 nm ASIC CMOS technology, the proposed decoder for a (1024, 512) polar code is shown to compare favorably against the state-of-the-art ASIC decoders. With a clock frequency of 400 MHz and a supply voltage of 0.8 V, it has a latency of 0.41 μs and an area efficiency of 1.8 Gbps/mm2 for an energy efficiency of 77 pJ/info. bit. At 600 MHz with a supply of 1 V, the latency is reduced to 0.27 μs and the area efficiency increased to 2.7 Gbps/mm2 at 115 pJ/info. bit.