Polar codes, invented by Arikan in 2009, are known to achieve the capacity of any binary-input memoryless outputsymmetric channel. Further, both the encoding and the decoding can be accomplished in O(N log(N)) real operations, where N is the blocklength. One of the few drawbacks of the original polar code construction is that it is not universal. This means that the code has to be tailored to the channel if we want to transmit close to capacity. We present two "polar-like" schemes that are capable of achieving the compound capacity of the whole class of binary-input memoryless symmetric channels with low complexity. Roughly speaking, for the first scheme we stack up N polar blocks of length N on top of each other but shift them with respect to each other so that they form a "staircase." Then by coding across the columns of this staircase with a standard ReedSolomon code, we can achieve the compound capacity using a standard successive decoder to process the rows (the polar codes) and in addition a standard Reed-Solomon erasure decoder to process the columns. Compared to standard polar codes this scheme has essentially the same complexity per bit but a block length which is larger by a factor O(N log(2)(N)/epsilon). Here N is the required blocklength for a standard polar code to achieve an acceptable block error probability for a single channel at a distance of at most ! from capacity. For the second scheme we first show how to construct a true polar code which achieves the compound capacity for a finite number of channels. We achieve this by introducing special " polarization" steps which " align" the good indices for the various channels. We then show how to exploit the compactness of the space of binary-input memoryless output-symmetric channels to reduce the compound capacity problem for this class to a compound capacity problem for a finite set of channels. This scheme is similar in spirit to standard polar codes, but the price for universality is a considerably larger blocklength.