Evolution of Topics and Novelty in Science

Ballester, Omar; Penner, Orion

2019

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DublinCore
EndNote
NLM
RefWorks
RIS

Résumé

Methods of estimating the similarity between individual publications is an area of long-standing interest in the scientometrics community. Traditional methods have generally relied on references and other metadata, while text mining approaches based on title and abstract text have appeared more frequently in recent years. In principle, Topic Models have great potential in this domain. But in practice, they are often difficult to successfully employ and, in particular, are notoriously inconsistent as latent space dimension grows. That is, running the same model, with the same parameters, on the same data, but with a different random seed produces radically different similarity estimates as the number of topics increase. In this manuscript we develop a simple, but novel, methodology for evaluating the robustness of topic models. Employing that methodology, we find that the neural network based Doc2Vec approach seems capable of providing (statistically) robust estimates of document-document similarities, even for topic spaces far larger than prudent for the most common topic model approach: Latent Dirichlet Allocation. As this is a work in progress, we do not venture deeply into the question of whether these estimates also reflect reality, but do provide some preliminary evidence and future directions for those efforts.

Détails

Titre Evolution of Topics and Novelty in Science

Auteur(s) Ballester, Omar ; Penner, Orion

Publié dans 17Th International Conference On Scientometrics & Informetrics (Issi2019), Vol Ii

Série Proceedings of the International Conference on Scientometrics and Informetrics

Pages 1606-1611

Présenté à 17th International Conference of the International-Society-for-Scientometrics-and-Informetrics (ISSI) on Scientometrics and Informetrics, Sep 02-05, 2019, Rome, ITALY

Date 2019-01-01

Editeur Leuven, INT SOC SCIENTOMETRICS & INFORMETRICS-ISSI

ISSN 2175-1935

ISBN 978-88-3381-118-5

Autres identifiant(s) Afficher la publication dans Web of Science

Laboratoires IIPP

Le document apparaît dans Production scientifique et compétences > CDM - Collège du Management de la Technologie > CDM Archives > IIPP - Chaire en politiques d'innovation et de propriété intellectuelle
Publications validées par des pairs
Papiers de conférence
Travail produit à l'EPFL
Publié

Date de création de la notice 2020-03-12