Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Journal articles
  4. Robustness, replicability and scalability in topic modelling
 
research article

Robustness, replicability and scalability in topic modelling

Ballester, Omar  
•
Penner, Orion  
February 1, 2022
Journal Of Informetrics

Approaches for estimating the similarity between individual publications are an area of long -standing interest in the scientometrics and informetrics communities. Traditional techniques have generally relied on references and other metadata, while text mining approaches based on title and abstract text have appeared more frequently in recent years. In principle, topic models have great potential in this domain. But, in practice, they are often difficult to employ successfully, and are notoriously inconsistent as latent space dimension grows. In this manuscript we identify the three properties all usable topic models should have: robustness, descriptive power and reflection of reality. We develop a novel method for evaluating the robustness of topic models and suggest a metric to assess and benchmark descriptive power as number of topics scale. Employing that procedure, we find that the neural-network-based paragraph embedding approach seems capable of providing statistically robust estimates of the document-document similarities, even for topic spaces far larger than what is usually considered prudent for the most common topic model approaches.

  • Details
  • Metrics
Type
research article
DOI
10.1016/j.joi.2021.101224
Web of Science ID

WOS:000730082000001

Author(s)
Ballester, Omar  
Penner, Orion  
Date Issued

2022-02-01

Publisher

ELSEVIER

Published in
Journal Of Informetrics
Volume

16

Issue

1

Article Number

101224

Subjects

Computer Science, Interdisciplinary Applications

•

Information Science & Library Science

•

Computer Science

•

scientometrics

•

topic modelling

•

stability

•

robustness

•

similarity

•

informetrics

•

word analysis

•

information

•

cocitation

•

science

•

field

Editorial or Peer reviewed

REVIEWED

Written at

EPFL

EPFL units
STIP  
Available on Infoscience
January 31, 2022
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/184866
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés