Polar codes are widely considered as one of the most exciting recent discoveries in channel coding. For short to moderate block lengths, their error-correction performance under list decoding can outperform that of other modern error-correcting codes. However, high-speed list-based decoders with moderate complexity are challenging to implement. Successive-cancellation (SC)-flip decoding was shown to be capable of a competitive error-correction performance compared to that of list decoding with a small list size, at a fraction of the complexity, but suffers from a variable execution time and a higher worst-case latency. In this work, we show how to modify the state-of-the-art high-speed SC decoding algorithm to incorporate the SC-flip ideas. The algorithmic improvements are presented as well as average execution-time results tailored to a hardware implementation. The results show that the proposed fast-SSC-flip algorithm has a decoding speed close to an order of magnitude better than the previous works while retaining a comparable error-correction performance.