We show that we can effectively fit complex animation models to noisy image data. Our approach is based on robust least squares adjustment and takes advantage of three complementary sources of information: stereo data, silhouette edges and 2D feature points. We take stereo to be our main information source and use the other two whenever available. In this way, complete head models-including ears and hair-can be acquired with a cheap and entirely passive sensor, such as an ordinary video camera. The motion parameters of limbs can be similarly captured. They can then be fed to existing animation software to produce synthetic sequences