This paper focuses on the interaction patterns of learners studying in pairs who were provided with multimedia learning material. In a previous article, we reported that learning scores were higher for dyads of an ‘animations’ condition than for dyads of a ‘static pictures’ condition. Results also showed that offering a persistent display of one snapshot of each animated sequence hindered collaborative learning. In the present paper, further analyses of verbal interactions within learning dyads were performed in order to have a better understanding of both the beneficial effect of animations and the detrimental effect of the presence of persistent snapshots of critical steps on collaborative learning. Results did not show any differences in terms of verbal categories between the two versions of the instructional material, that is, static versus animated pictures. Pairs who were provided with persistent snapshots of the multimedia sequences produced fewer utterances compared to participants without the snapshots. In addition, the persistent snapshots were detrimental both in terms of providing information about the learning content and in terms of producing utterances solely for the purpose of managing the interaction. In this study, evidence also showed that these two verbal categories were positively related to learning performances. Finally, mediation analyses revealed that the negative effect of persistent snapshots was mediated by the fact that peers of the snapshots condition produced less information providing and interaction management utterances. Results are interpreted using a psycholinguistic framework applied to computer-supported collaborative learning (CSCL) literature and general guidelines are derived for the use of dynamic material and persistency tools in the design of CSCL environments.