FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents

Jaume, Guillaume; Ekenel, Hazim Kemal; Thiran, Jean-Philippe

doi:10.1109/ICDARW.2019.10029

Jaume, Guillaume; Ekenel, Hazim Kemal; Thiran, Jean-Philippe

2019

Formats

Format
BibTeX
MARC
MARCXML
DublinCore
EndNote
NLM
RefWorks
RIS

Abstract

We present a new dataset for form understanding in noisy scanned documents (FUNSD) that aims at extracting and structuring the textual content of forms. The dataset comprises 199 real, fully annotated, scanned forms. The documents are noisy and vary widely in appearance, making form understanding (FoUn) a challenging task. The proposed dataset can be used for various tasks, including text detection, optical character recognition, spatial layout analysis, and entity labeling/linking. To the best of our knowledge, this is the first publicly available dataset with comprehensive annotations to address FoUn task.

Details

Title FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents

Author(s) Jaume, Guillaume ; Ekenel, Hazim Kemal ; Thiran, Jean-Philippe

Published in 2019 International Conference On Document Analysis And Recognition Workshops (Icdarw) And 2Nd International Workshop On Open Services And Tools For Document Analysis (Ost), Vol 2

Series Proceedings of the International Conference on Document Analysis and Recognition

Pages 1-6

Conference 15th IAPR International Conference on Document Analysis and Recognition (ICDAR) / 2nd International Workshop on Open Services and Tools for Document Analysis (OST), Sep 21-25, 2019, Sydney, AUSTRALIA

Date 2019-01-01

Publisher New York, IEEE

ISSN 1520-5363

ISBN 978-1-7281-5054-3

Keywords

text detection; optical character recognition; form understanding; spatial layout analysis; algorithms

DOI https://doi.org/10.1109/ICDARW.2019.10029

Other identifier(s) View record in Web of Science

Laboratories LTS5

Record Appears in Scientific production and competences > STI - School of Engineering > IEM - Institut d'Electricité et de Microtechnique > LTS5 - Signal Processing Laboratory 5
Peer-reviewed publications
Conference Papers
Work produced at EPFL
Published

Record creation date 2020-03-26

Abstract

Details

Actions