Computing text semantic relatedness using the contents and links of a hypertext encyclopedia

Yazdani, Majid; Popescu-Belis, Andrei

doi:10.1016/j.artint.2012.06.004

Yazdani, Majid; Popescu-Belis, Andrei

2013

Télécharger

Formats

Format
BibTeX
MARC
MARCXML
DublinCore
EndNote
NLM
RefWorks
RIS

Fichiers

Résumé

We propose a method for computing semantic relatedness between words or texts by using knowledge from hypertext encyclopedias such as Wikipedia. A network of concepts is built by filtering the encyclopedia's articles, each concept corresponding to an article. Two types of weighted links between concepts are considered: one based on hyperlinks between the texts of the articles, and another one based on the lexical similarity between them. We propose and implement an efficient random walk algorithm that computes the distance between nodes, and then between sets of nodes, using the visiting probability from one (set of) node(s) to another. Moreover, to make the algorithm tractable, we propose and validate empirically two truncation methods, and then use an embedding space to learn an approximation of visiting probability. To evaluate the proposed distance, we apply our method to four important tasks in natural language processing: word similarity, document similarity, document clustering and classification, and ranking in information retrieval. The performance of the method is state-of-the-art or close to it for each task, thus demonstrating the generality of the knowledge resource. Moreover, using both hyperlinks and lexical similarity links improves the scores with respect to a method using only one of them, because hyperlinks bring additional real-world knowledge not captured by lexical similarity. (C) 2012 Elsevier B.V. All rights reserved.

Détails

Titre Computing text semantic relatedness using the contents and links of a hypertext encyclopedia

Auteur(s) Yazdani, Majid ; Popescu-Belis, Andrei

Publié dans Artificial Intelligence

Pagination 27

Volume 194

Pages 176-202

Date 2013

Editeur Amsterdam, Elsevier Science Bv

ISSN 0004-3702

Mots-clés (libres)

Text semantic relatedness; Distance metric learning; Learning to rank; Random walk; Text classification; Text similarity; Document clustering; Information retrieval; Word similarity

DOI https://doi.org/10.1016/j.artint.2012.06.004

Laboratoires LIDIAP

Le document apparaît dans Production scientifique et compétences > STI - Faculté des sciences et techniques de l'ingénieur > IEM - Institute of Electrical and Micro Engineering > LIDIAP - Laboratoire de l'IDIAP
Production scientifique et compétences > Euler Center for Signal Processing
Publications validées par des pairs
Travail produit à l'EPFL
Articles de journaux
Publié

Date de création de la notice 2013-12-19

Actions

Aperçu

Sélectionner le fichier :