In this paper we address the issue of joint estimation of head pose and facial actions. We propose a method that can robustly track both subtle and extreme movements by combining two types of features: structural features observed at characteristic points of the face, and intensity features sampled from the facial texture. To handle the processing of extreme poses, we propose two innovations. The first one is to extend the deformable 3D face model Candide so that we can collect appearance information from the head sides as well as from the face. The second one is to exploit a set of view-based templates learned online to model the head appearance. This allows us to handle the appearance variation problem, inherent to intensity features and accentuated by the coarse geometry of our 3D head model. Experiments on the Boston University Face Tracking dataset show that the method can track common head movements with an accuracy of $3.2\degrees$, outperformning some state-of-the-art methods. More importantly, the ability of the system to robustly track natural/faked facial actions and challenging head movements is demonstrated on several long video sequences.