In this paper we propose a method to segment and recognize text embedded in video and images. We modelize the gray level distribution in the text images as mixture of gaussians, and then assign each pixel to one of the gaussian layer. The assignment is based on prior of the contextual information, which is modeled by a Markov random field (MRF) with online estimated coefficients. Each layer is then processed through a connected component analysis module and forwarded to the OCR system as one segmentation hypothesis. By varying the number of gaussians, multiple hypotheses are provided to an OCR system and the final result is selected from the set of outputs, leading to an improvement of the system's performances.