Combining dynamic head pose-gaze mapping with the robot conversational state for attention recognition in human-robot interactions
The ability to recognize the visual focus of attention (VFOA, i.e. what or whom a person is looking at) of people is important for robots or conversational agents interacting with multiple people, since it plays a key role in turn-taking, engagement or intention monitoring. As eye gaze estimation is often impossible to achieve, most systems currently rely on head pose as an approximation, creating ambiguities since the same head pose can be used to look at different VFOA targets. To address this challenge, we propose a dynamic Bayesian model for the VFOA recognition from head pose, where we make two main contributions. First, taking inspiration from behavioral models describing the relationships between the body, head and gaze orientations involved in gaze shifts, we propose novel gaze models that dynamically and more accurately predict the expected head orientation used for looking in a given gaze target direction. This is a neglected aspect of previous works but essential for recognition. Secondly, we propose to exploit the robot conversational state (when he speaks, objects to which he refers) as context to net appropriate priors on candidate VFOA targets and reduce the inherent VFOA ambiguities. Experiments on a public dataset where the humanoid robot NAO plays the role of an art guide and quiz master demonstrate the benefit of the two contributions.