# 1805-1898 Census Records of Lausanne : a Long Digital Dataset for Demographic History Authors: Petitpierre Rémi [[ORCID](https://orcid.org/0000-0001-9138-6727)], Kramer Marion, Rappo Lucas [[ORCID](https://orcid.org/0000-0002-7172-2495)], and di Lenardo Isabella [[ORCID](https://orcid.org/0000-0002-1747-9164)] Affiliation: Digital Humanities Institute, EPFL, Lausanne, Switzerland. ## Context This historical dataset stems from the project of automatic extraction of 72 census records of Lausanne, Switzerland. The complete dataset covers a century of historical demography in Lausanne (1805-1898), which corresponds to 18,831 pages, and nearly 6 million cells. ## Content The data published in this repository correspond to a first release, i.e. a diachronic slice of one register every 8 to 9 years. Unfortunately, the remaining data are currently under embargo. Their publication will take place as soon as possible, and at the latest by the end of 2023. In the meantime, the data presented here correspond to a large subset of 2,844 pages, which already allows to investigate most research hypotheses. ## Description The population censuses, digitized by the Archives of the city of Lausanne, continuously cover the evolution of the population in Lausanne throughout the 19th century, starting in 1805, with only one long interruption from 1814 to 1831. Highly detailed, they are an invaluable source for studying migration, economic and social history, and traces of cultural exchanges not only with Bern, but also with France and Italy. Indeed, the system of tracing family origin, specific to Switzerland, allows to follow the migratory movements of families long before the censuses appeared. The bourgeoisie is also an essential economic tracer. In addition, censuses extensively describe the organization of the social fabric into family nuclei, around which gravitate various boarders, workers, servants or apprentices, often living in the same apartment with the family. ## Production The structure and richness of censuses have also provided an opportunity to develop automatic methods for processing structured documents. The processing of censuses includes several steps, from the identification of text segments to the restructuring of information as digital tabular data, through Handwritten Text Recognition and the automatic segmentation of the structure using neural networks. Please note that the detailed extraction methodology, as well as the complete evaluation of performance and reliability is published in: * Petitpierre R., Rappo L., Kramer M. (2023). An end-to-end pipeline for historical censuses processing. International Journal on Document Analysis and Recognition (IJDAR). doi: [10.1007/s10032-023-00428-9](https://link.springer.com/article/10.1007/s10032-023-00428-9) ## Data structure The data are structured in rows and columns, with each row corresponding to a household. Multiple entries in the same column for a single household are separated by vertical bars ⟨|⟩. The center point ⟨·⟩ indicates an empty entry. For some columns (e.g., street name, house number, owner name), an empty entry indicates that the last non-empty value should be carried over. The page number is in the last column. ## Format The data are provided as CSV files. To open them with MS Excel, go to File > Import. Tick CSV file on import. In the Text Import Wizard, Tick 'Delimited', and select 'Unicode (UTF-8)' as file origin in the dropdown menu. In the next menu, tick 'Comma' as sole delimiter. ## Liability The data presented here are not manually checked. They are the raw results of the extraction, the reliability of which was thoroughly assessed in the above-mentioned publication. We insist on the fact that for any reuse of this data for research purposes, the implementation of an appropriate methodology is necessary, in order to deal with noise and uncertainty. This may typically include string distance heuristics, or statistical approaches. ## License The original digitized documents are provided by the Archives of the City of Lausanne. The present dataset is published under the license [CC BY 4.0 – Creative Commons Attribution 4.0 International](https://creativecommons.org/licenses/by/4.0/).