In this paper, we introduce Bayesian networks architecture for combining speech-based information with that from another modality for error handling in human-robot dialogue system. In particular, we report on experiments interpreting speech and laser scanner signals in the dialogue management system of the autonomous tour-guide robot RoboX, successfully deployed at the Swiss National Exhibition (Expo.02). A correct interpretation of the user’s (visitor’s) goal or intention at each dialogue state under the uncertainty intrinsic to speech recognition accuracy is a key issue for successful voice-enabled communication between tour-guide robots and visitors. Bayesian networks are used to infer the goal of the user in presence of recognition errors, fusing speech recognition results along with information about the acoustic conditions and data from a laser scanner, which is independent of acoustic noise. Experiments with real-world data, collected during the operation of RoboX at Expo.02 demonstrate the effectiveness of the approach in adverse environment. The proposed architecture makes it possible to model error handling processes in spoken dialogue systems, which include complex combination of different multimodal information sources in cases where such information is available.