Spatial scalability of video signals can be achieved with critically sampled spatial wavelet schemes but also with an overcomplete spatial representation. Critically sampled schemes struggle with the problem that critically sampled high-bands are shift-variant. Therefore, efficient motion compensation is challenging. On the other hand, overcomplete representations can be shift-invariant, thus permitting efficient motion compensation in the spatial subbands, but they have to be designed carefully to achieve high compression efficiency. This paper discusses an orthonormal transform for decomposing two different spatial scales of the same image. The transform is such that it minimizes the impact of the quantization noise on the reconstructed video signal at the decoder. Further, we investigate the decorrelation property of the transform. Finally, we compare to the compression efficiency of a Laplacian pyramid, a conventional scheme for an overcomplete representation of images, and observe coding gains up to 1 dB.