In this paper, we introduce probabilistic framework for robust identification of the user goals in human-robot speech-based interaction. The concept of Bayesian networks is used for interpreting multimodal signals in the spoken dialogue between a tour-guide robot and visitors in mass exhibition conditions. In particular, we report on experiments interpreting speech and laser scanner signals in the dialogue management system of the autonomous tour-guide robot RoboX, successfully deployed at the Swiss National Exhibition (Expo.02). A correct interpreta-tion of a user’s (visitor’s) goal or intention at each dialogue state is a key issue for successful voice-enabled com-munication between tour-guide robots and visitors. To infer the visitors’ goal under the uncertainty intrinsic to these two modalities, we introduce Bayesian networks for combining noisy speech recognition with data from a laser scanner, which is independent of acoustic noise. Experiments with real-world data, collected during the op-eration of RoboX at Expo.02 demonstrate the effectiveness of the approach in adverse environment. The proposed framework makes it possible to apply complex signal fusion techniques that can compensate for the lack of dedi-cated dialogue-based speech recognition error handling techniques.