Semi-Automatic Transcription Tool for Ancient Manuscripts

In this work, we investigate various techniques from the fields of shape analysis and image processing in order to construct a semi-automatic transcription tool for ancient manuscripts. First, we design a shape matching procedure using shape contexts, introduced in [1], and exploit this procedure to compute different distances between two arbitrary shapes/words. Then, we use Fischer discrimination to combine these distances in a single similarity measure and use it to naturally represent the words on a similarity graph. Finally, we investigate an unsupervised clustering analysis on this graph to create groups of semantically similar words and propose an uncertainty measure associated with the attribution of one word to a group. The clusters together with the uncertainty measure form the core of the semi-automatic transcription tool, that we test on a dataset of 42 words. The average classification accuracy achieved with this technique on this dataset is of 86%, which is quiet satisfying. This tool allows to reduce the actual number of words we need to type to transcript a document of 70%.

Mazzei, Andrea
Kaplan, Frédéric
Presented at:
IC Research Day 2014: Challenges in Big Data, SwissTech Convention Center, Lausanne, Switzerland, June 12, 2014
Main reference: [1] S. Belongie,J. malik, J. Puzicha, Shape Matching an Object Recognition using Shape Contexts, IEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 24, April 2002.

 Record created 2014-06-14, last modified 2018-03-17

Semi-Automatic Transcription Tool for Ancient Manuscripts-Methodology:
Download fulltextPDF
Download fulltextPDF
External links:
Download fulltextURL
Download fulltextURL
Download fulltextURL
Download fulltextURL
Rate this document:

Rate this document:
(Not yet reviewed)