A Handwritten French Dataset for Word Spotting - CFRAMUZ

Arvanitopoulos, Nikolaos; Chevassus, Gaspard; Maggetti, Daniele; Süsstrunk, Sabine

doi:10.1145/3151509.3151523

conference paper

A Handwritten French Dataset for Word Spotting - CFRAMUZ

Arvanitopoulos, Nikolaos

•

Chevassus, Gaspard

•

Maggetti, Daniele

2017

HIP2017: Proceedings of the 4th International Workshop on Historical Document Imaging and Processing

The 4th International Workshop on Historical Document Imaging and Processing (HIP 2017)

We present a new and freely available dataset, CFRAMUZ, for segmentation-free word spotting research. The dataset consists of seven novels with a total number of 64 pages and 18000 words written in french by the Swiss writer C.F. Ramuz. The novels cover the writer’s whole period of life, therefore they show changes in the handwriting style. Together with the complete ground-truth of the dataset we provide an annotation tool. We provide evaluations of state-of-the-art word spotting approaches on this dataset. For completeness we also compare all the approaches on other commonly used datasets to demonstrate the new difficulties and challenges our new dataset introduces.