Text embedded in images and videos represents a rich source of information for content-based indexing and retrieval applications. In this paper, we present a new method for localizing and recognizing text in complex images and videos. Text localization is performed in a two step approach that combines the speed of a focusing step with the strength of a machine learning based text verification step. The experiments conducted show that the support vector machine is more appropriate for the verification task than the more commonly used neural networks. To perform text recognition on the localized regions, we propose a new multi-hypotheses method. Assuming different models of the text image, several segmentation hypotheses are produced. They are processed by an optical character recognition (OCR) system, and the result is selected from the generated strings according to a confidence value computed using language modeling and OCR statistics. Experiments show that this approach leads to much better results than the conventional method that tries to improve the individual segmentation algorithm. The whole system has been tested on several hours of videos and showed good performance when integrated in a sports video annotation system and a video indexing system within the framework of two European projects.