The rate distortion behavior of sparse memoryless sources is studied. These serve as models of sparse signal representations and facilitate the performance analysis of "sparsifying" transforms like the wavelet transform and nonlinear approximation schemes. For strictly sparse binary sources with Hamming distortion, is shown to be almost linear. For nonstrictly sparse continuous-valued sources, termed compressible, two measures of compressibility are introduced: incomplete moments and geometric mean. The former lead to low- and high-rate upper bounds on mean squared error, while the latter yields lower and upper bounds on source entropy, thereby characterizing asymptotic behavior. Thus, the notion of compressibility is quantitatively connected with actual lossy compression. These bounding techniques are applied to two source models: Gaussian mixtures and power laws matching the approximately scale-invariant decay of wavelet coefficients. The former are versatile models for sparse data, which in particular allow to bound high-rate compression performance of a scalar mixture compared to a corresponding unmixed transform coding system. Such a comparison is interesting for transforms with known coefficient decay, but unknown coefficient ordering, e.g., when positions of highest-variance coefficients are unknown. The use of these models and results in distributed coding and compressed sensing scenarios are also discussed.