We propose a dynamic facial expression recognition framework based on discrete choice models (DCM). We model the choice of a person who has to label a video sequence representing a facial expression. The originality is based on the explicit modeling of causal effects between the facial features and the recognition of the expression. Three models are proposed. The first assumes that only the last frame of the video triggers the choice of the expression. The second model is composed of two parts. The first part captures the evaluation of the facial expression within each frame in the sequence. The second part determines which frame triggers the choice. The third model is an extension of the second model. It assumes that the choice of the expression results from the average of expression perceptions within a group of frames. The models are estimated using videos from the Facial Expressions and Emotions Database (FEED). Labeling data on the videos has been obtained using an internet survey available at http://transp-or2.epfl.ch/videosurvey/. The prediction capability of the models is studied in order to check their validity. Finally the models are cross-validated using the estimation data.