Information Extraction on the Web with Credibility Guarantee

Nguyen, Thanh Tam

Nguyen, Thanh Tam

2015

Formats

Format
BibTeX
MARC
MARCXML
DublinCore
EndNote
NLM
RefWorks
RIS

Fichiers

Résumé

The Web became the central medium for valuable sources of information extraction applications. However, such user-generated resources are often plagued by inaccuracies and misinformation due to the inherent openness and uncertainty of the Web. In this work we study the problem of extracting structured information out of Web data with a credibility guarantee. The ultimate goal is that not only the structured information should be extracted as much as possible but also its credibility is high. To achieve this goal, we propose a learning process to optimize the parameters of a probabilistic model that captures the relationships between users, their unstructured contents, and the underlying structured information. Our evaluations on real-world datasets show that our approach outperforms the baseline up to 6 times.

Détails

Titre Information Extraction on the Web with Credibility Guarantee

Auteur(s) Nguyen, Thanh Tam

Pagination 8

Date 2015

Mots-clés (libres)

trust management; credibility; information extraction; web data; quality guarantee

Laboratoires LSIR

Le document apparaît dans Production scientifique et compétences > I&C - Faculté Informatique & Communications > IINFCOM > LSIR - Laboratoire de systèmes d'information répartis
Travail produit à l'EPFL
Rapports techniques

Date de création de la notice 2016-03-25

Actions

Aperçu

Sélectionner le fichier :