Semi-Automatic Transcription Tool for Ancient Manuscripts

Simeoni, Matthieu Martin Jean-Andre

2014

Formats

Format
BibTeX
MARC
MARCXML
DublinCore
EndNote
NLM
RefWorks
RIS

Files

Abstract

In this work, we investigate various techniques from the fields of shape analysis and image processing in order to construct a semi-automatic transcription tool for ancient manuscripts. First, we design a shape matching procedure using shape contexts, introduced in [1], and exploit this procedure to compute different distances between two arbitrary shapes/words. Then, we use Fischer discrimination to combine these distances in a single similarity measure and use it to naturally represent the words on a similarity graph. Finally, we investigate an unsupervised clustering analysis on this graph to create groups of semantically similar words and propose an uncertainty measure associated with the attribution of one word to a group. The clusters together with the uncertainty measure form the core of the semi-automatic transcription tool, that we test on a dataset of 42 words. The average classification accuracy achieved with this technique on this dataset is of 86%, which is quiet satisfying. This tool allows to reduce the actual number of words we need to type to transcript a document of 70%.

Details

Title Semi-Automatic Transcription Tool for Ancient Manuscripts

Author(s) Simeoni, Matthieu Martin Jean-Andre

Advisor(s)

Mazzei, Andrea
Kaplan, Frédéric

Conference IC Research Day 2014: Challenges in Big Data, SwissTech Convention Center, Lausanne, Switzerland, June 12, 2014

Date 2014

Keywords

k-nearest neighbors; Pattern Recognition; Shape Distance; Shape Matching; Similarity graphs; Similarity Measure; Venice Time Machine; Venice Atlas; Digital Humanities

Note Main reference: [1] S. Belongie,J. malik, J. Puzicha, Shape Matching an Object Recognition using Shape Contexts, IEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 24, April 2002.

Additional link URL; URL; URL; URL

Laboratories DHLAB

Record Appears in Scientific production and competences > CDH - College of Humanities and social sciences > Digital Humanities Institute > DHLAB - Digital Humanities Laboratory
Work produced at EPFL
Posters

Record creation date 2014-06-14

Actions

Preview

Select file: