We attempt to find suitable modeling for complex gesture patterns so that we can utilize it for recognition and reproduction. As a result, we propose a hybrid framework where the observable characteristics of gestures are modeled explicitly, while pattern details are processed by blind learning methods. Our hybrid approach thus consists of multiple layers with inter-communicating mechanisms between them. They are attribute-level trackers (AT), a gesture-level tracker (GT), and a situation-tracker (ST). AT deals with low-level pattern variations of human action and also has main responsibility for the segmentation of continuous action stream. GT deals with the temporally coordinated characteristics of concurrent attributes and ST keeps track of application-specific context knowledge. The overall model for comprehensible gestures could also serve as a global framework for generating gesture animation