This article explores the efficiency of motion-compensated three-dimensional transform coding, a compression scheme that employs a motion-compensated transform for a group of pictures. We investigate this coding scheme experimentally and theoretically. The practical coding scheme employs in temporal direction a wavelet decomposition with motion-compensated lifting steps. Further, we compare the experimental results to that of a predictive video codec with single-hypothesis motion compensation and comparable computational complexity. The experiments show that the 5/3 wavelet kernel outperforms both the Haar kernel and, in many cases, the reference scheme utilizing single-hypothesis motion-compensated predictive coding. The theoretical investigation models this motion-compensated subband coding scheme for a group of K pictures with a signal model for K motion-compensated pictures that are decorrelated by a linear transform. We utilize the Karhunen-Loeve Transform to obtain theoretical performance bounds at high bit-rates and compare to both optimum intra-frame coding of individual motion-compensated pictures and single-hypothesis motion-compensated predictive coding. The investigation shows that motion-compensated three-dimensional transform coding can outperform predictive coding with single-hypothesis motion compensation by up to 0.5 bits/sample.