Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Conferences, Workshops, Symposiums, and Seminars
  4. FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents
 
conference paper

FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents

Jaume, Guillaume
•
Ekenel, Hazim Kemal
•
Thiran, Jean-Philippe  
January 1, 2019
2019 International Conference On Document Analysis And Recognition Workshops (Icdarw) And 2Nd International Workshop On Open Services And Tools For Document Analysis (Ost), Vol 2
15th IAPR International Conference on Document Analysis and Recognition (ICDAR) / 2nd International Workshop on Open Services and Tools for Document Analysis (OST)

We present a new dataset for form understanding in noisy scanned documents (FUNSD) that aims at extracting and structuring the textual content of forms. The dataset comprises 199 real, fully annotated, scanned forms. The documents are noisy and vary widely in appearance, making form understanding (FoUn) a challenging task. The proposed dataset can be used for various tasks, including text detection, optical character recognition, spatial layout analysis, and entity labeling/linking. To the best of our knowledge, this is the first publicly available dataset with comprehensive annotations to address FoUn task.

  • Details
  • Metrics
Type
conference paper
DOI
10.1109/ICDARW.2019.10029
Web of Science ID

WOS:000518781600001

Author(s)
Jaume, Guillaume
Ekenel, Hazim Kemal
Thiran, Jean-Philippe  
Date Issued

2019-01-01

Publisher

IEEE

Publisher place

New York

Published in
2019 International Conference On Document Analysis And Recognition Workshops (Icdarw) And 2Nd International Workshop On Open Services And Tools For Document Analysis (Ost), Vol 2
ISBN of the book

978-1-7281-5054-3

Series title/Series vol.

Proceedings of the International Conference on Document Analysis and Recognition

Start page

1

End page

6

Subjects

text detection

•

optical character recognition

•

form understanding

•

spatial layout analysis

•

algorithms

Editorial or Peer reviewed

REVIEWED

Written at

EPFL

EPFL units
LTS5  
Event nameEvent placeEvent date
15th IAPR International Conference on Document Analysis and Recognition (ICDAR) / 2nd International Workshop on Open Services and Tools for Document Analysis (OST)

Sydney, AUSTRALIA

Sep 21-25, 2019

Available on Infoscience
March 26, 2020
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/167667
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés