Quality-aware similarity assessment for entity matching in Web data

Yerva, Surender Reddy; Miklós, Zoltán; Aberer, Karl

doi:10.1016/j.is.2011.09.007

Yerva, Surender Reddy; Miklós, Zoltán; Aberer, Karl

2012

Download

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DublinCore
EndNote
NLM
RefWorks
RIS

Files

Abstract

One of the key challenges to realize automated processing of the information on the Web, which is the central goal of the Semantic Web, is related to the entity matching problem. There are a number of tools that reliably recognize named entities, such as persons, companies, geographic locations, in Web documents. The names of these extracted entities are, however, non-unique; the same name on different Web pages might or might not refer to the same entity. The entity matching problem concerns of identifying the entities, which are referring to the same real-world entity. This problem is very similar to the entity resolution problem studied in relational databases, however, there are also several differences. Most importantly Web pages often only contain partial or incomplete information about the entities. Similarity functions try to capture the degree of belief about the equivalence of two entities, thus they play a crucial role in entity matching. The accuracy of the similarity functions highly depends on the applied assessment techniques, but also on some specific features of the entities. We propose systematic design strategies for combined similarity functions in this context. Our method relies on the combination of multiple evidences, with the help of estimated quality of the individual similarity values and with particular attention to missing information that is common in Web context. We study the effectiveness of our method in two specific instances of the general entity matching problem, namely the person name disambiguation and the Twitter message classification problem. In both cases, using our techniques in a very simple algorithmic framework we obtained better results than the state-of-the-art methods.

Details

Title Quality-aware similarity assessment for entity matching in Web data

Author(s) Yerva, Surender Reddy ; Miklós, Zoltán ; Aberer, Karl

Published in Information Systems Journal

Volume 37

Issue 4

Pages 336-351

Date 2012

Publisher Wiley-Blackwell

ISSN 1350-1917

Keywords

Entity matching; Web; Similarity functions; Person name disambiguation; Twitter message classification

DOI https://doi.org/10.1016/j.is.2011.09.007

Other identifier(s) View record in Web of Science

Laboratories LSIR

Record Appears in Scientific production and competences > I&C - School of Computer and Communication Sciences > IINFCOM > LSIR - Distributed Information Systems Laboratory
Peer-reviewed publications
Work produced at EPFL
Journal Articles
Published

Record creation date 2011-10-06

Files

Abstract

Details

PDF