# Lausanne Historical Censuses Dataset HTR 35k Authors: Rappo Lucas [[ORCID](https://orcid.org/0000-0002-7172-2495)], Petitpierre Rémi [[ORCID](https://orcid.org/0000-0001-9138-6727)], and Kramer Marion. Affiliation: Digital Humanities Institute, EPFL, Lausanne, Switzerland. Date of first publication: 2023 ## Description This training dataset includes a total of 34,913 manually transcribed text segments. It is dedicated to the handwritten text recognition (HTR) of historical sources, typically tabular records, such as censuses. This dataset is based on a sample of 83 pages from the 19th century (1805-1898) censuses of Lausanne, Switzerland. The primary language of the documents is French, although many germanic names and toponyms are also found. ## Format The training data are formatted and provided on the model of the Bentham dataset. The format thus simply consists in a list of jpeg images, one per text segments, and their corresponding transcription, stored in a txt file. The file naming convention is 'yyyy-ppp-n', where 'y' stands for the year of publication of the census, and 'p' for the page number. ## Related publications Please note that the annotation and extraction methodology, as well as the complete evaluation of performance, including HTR benchmark and post-correction performance is published in : * Petitpierre R., Rappo L., Kramer M. (2023). An end-to-end pipeline for historical censuses processing. International Journal on Document Analysis and Recognition (IJDAR). doi: [10.1007/s10032-023-00428-9](https://link.springer.com/article/10.1007/s10032-023-00428-9) Tabular dataset resulting from automatic extraction are also available on Zenodo : * Petitpierre R., Rappo L., Kramer M., di Lenardo I. (2023). 1805-1898 Census Records of Lausanne : a Long Digital Dataset for Demographic History. Zenodo. doi: [10.5281/zenodo.7711640](https://doi.org/10.5281/zenodo.7711640) ## License The original digitized documents are provided by the Archives of the City of Lausanne. The present dataset is published under the license [CC BY 4.0 – Creative Commons Attribution 4.0 International](https://creativecommons.org/licenses/by/4.0/). ## Funding This work was supported by the public fund for Collaborative Research on Science and Society (CROSS 2021) “Names of Lausanne: the evolution of family names in administration records 1803–1900”, and the College of Humanities at EPFL.