Files

Abstract

This thesis presents a PhD work on offline cursive handwriting recognition, the automatic transcription of cursive data when only its image is available. Two main approaches were used in the literature to solve the problem. The first one attempts to segment the words into letters and then applies Dynamic Programming techniques to perform the recognition. The second one converts the words into sequences of observation vectors and then recognizes them with Hidden Markov Models. Some efforts were made to apply human reading inspired models, but their success was limited to applications involving small lexica (20-30 words). The HMM based approach became dominant after some years and now few works still attempt to segment the words into characters. The Sayre's paradox (a word cannot be segmented before being recognized and cannot be recognized before being segmented~\cite{say73}) was shown to be an important limit of any segmentation based approach. The research focused essentially on two applications: postal address recognition and bankcheck reading. In both cases, the application environment provides information helpful for the recognition (the zip code and the courtesy amount respectively). This simplified the problem and had an important effect on the state of the art where several issues appear to be neglected: 1) The steps preceeding the recognition (preprocessing, normalization, segmentation and feature extraction) are based on heuristics often involving data dependent algorithms and parameters to be set empirically. This leads to systems that are not robust with respect to a change of data and require a heavy experimental effort to find an optimal configuration of the parameter set. 2) Although HMMs offer a solid theoretic framework to deal with the recognition problem, most of the systems presented in the literature are still far from taking full advantage from their application. 3) The research focused on single word recognition neglecting, with very few exceptions, the transcription of texts or word sequences (e.g. dates) where language modeling can be used. 4) The experimental setup is not always rigorous. It is frequent to find works where a clear distinction between training and test set is not made. The systems are often fitted to the test set by tuning the hyperparameters over it. This leads to an overestimation of the performance. In this thesis, the problem of cursive handwriting recognition is addressed taking into account the above issues. The system we developed is based on a sliding window approach. This allows one to avoid the segmentation, a step that typically involves ad hoc algorithms and heuristics that risk to be data dependent and not robust with respect to a change of data. The normalization (slant and slope removal) is based on a statistical approach that assumes very general hypotheses about the words and that it is completely adaptive. No hyperparameters are involved so that, when changing the data, no effort is needed to reconfigure this step for the new problem. Several modeling aspects have been investigated in order to take more advantage from the HMM framework. The use of transforms (linear and nonlinear Principal Component Analysis, Independent Component Analysis) making the feature vectors more suitable for the recognition are shown to improve significantly the recognition rate. HMM adaptation techniques (Maximum Likelihood Linear Regression and Maximum A Posteriori adaptation) have been used to improve the performance of a system trained using multiple writer data over the words written by a specific person. The recognition has been extended from single word to text line recognition. In this case, the only knowledge we have about the data to be recognized is that they are written in English and they must reproduce on average the statistics of any english text. Such hypothesis is used by applying Statistical Language Models (unigrams, bigrams and trigrams). Their application results in a significant improvement of the system. All the performed experiments were designed to be correct from a statistical point of view, to avoid any fitting of the system to the test set and to give realistic measures of the performance.

Details

Actions

Preview