Computational Methods for Audio-Visual Analysis of Emergent Leadership in Teams
Face-to-face interactions are part of everyday life, ranging from family to working in teams and to global communities. Social psychologists have long studied these interactions with the aim of understanding behavior, motivations, and emergence of interaction patterns. An organization is environment rich in daily interactions including structured periodic meetings, planning, brainstorming, negotiations, decision-making and informal gatherings and leaders play a key role in many of them. Leader face problems, propose solutions, make decisions, and often are the main source of inspiration of the employees. Identifying emergent leaders at early stages in organizations is a key issue in organizational behavioral research, and a new problem in social computing. The study of this phenomenon requires sensing of natural face-to-face interactions, automatic extraction of behavioral cues and reliable machine learning algorithms to identify emergent leaders. In this thesis we present a computational approach to analyze emergence of leadership in small groups using multimodal audio and visual features. In the computational framework, we first present an analysis on how an emergent leader is perceived in newly formed, small groups. We present the ELEA (Emergent LEadership Analysis) corpus collected with the aim of analyzing emergence of leaders. We propose to analyze emergent leaders, using a variety of nonverbal cues studied in social psychology and automatically extracted from audio and video streams. Our analysis address how the emergent leader is perceived by his/her peers in terms of speaking and visual active, and its relation with the most dominant person (including external observers’ perception). We then propose to investigate which individual nonverbal channel (or combination of features from different channels) provides better inferences of the emergent leader and related conceptsusing unsupervised and supervised methods. We use a supervised collective approach which adds relational information to the nonverbal cues and compare its performance, with the performance of supervised (non-collective) and unsupervised methods. We also propose to capture the social visual attention patterns from automatically extracted features from video, in order to analyze who receives or gives the largest amount of visual attention in the group. Finally, with the aim of understanding who receives the largest amount of visual attention while speaking and who has the highest dominance ratio (i.e., many occurrences of looking at others while speaking and few occurrences of looking at others while not speaking). We synchronize the audio and video streams to capture the speaking and attention activity patterns. We end our analysis exploring the impact of the verbal content (language style) in the interactions and its influence in the perception of emergent leaders. For the language style analysis, we propose to compute word categories extracted from manual transcriptions of the discussions as well as from automatically detected keywords. We propose to use a supervised method to obtain the relevant features, and to use only the top word categories to predict the emergent leader and related concepts in each group. We then propose to differentiate word categories, between highly context-related and context-free, to explore the feasibility to infer the emergent leader in a fully automatic approach from the context-free language style. This dissertation address an audio and visual analysis of the ubiquitous phenomenon of emergent leadership in a fully automatic computational approach from face-to-face interactions. The nonverbal behavioral analysis is inspired in previous works on social psychology in the context of emergent leadership and related concepts. The automatically extracted nonverbal features are modeled to feed state-of-the-art machine learning techniques in order to infer emergent leaders.