Polar codes represent one of the major recent breakthroughs in coding theory and, because of their attractive features, they have been selected for the incoming SG standard. As such, a lot of attention has been devoted to the development of decoding algorithms with good error performance and efficient hardware implementation. One of the leading candidates in this regard is represented by successive-cancelation list (SCL) decoding. However, its hardware implementation requires a large amount of memory. Recently, a partitioned SCL (PSCL) decoder has been proposed to significantly reduce the memory consumption. In this paper, we consider the paradigm of PSCL decoding from a practical standpoint, and we provide several improvements. First, by changing the target signal-to-noise ratio and consequently modifying the construction of the code, we are able to improve the performance at no additional computational, latency, or memory cost. Second, we bridge the performance gap between SCL and PSCL decoding by introducing a generalized PSCL decoder and a layered PSCL decoder. In this way, we obtain almost the same performance of the SCL decoder with a significantly lower memory requirement, as testified by hardware implementation results. Third, we present an optimal scheme to allocate cyclic redundancy checks. Finally, we provide a lower bound on the list size that guarantees optimal maximum a posteriori performance for the binary erasure channel.