Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Conferences, Workshops, Symposiums, and Seminars
  4. Crosslingual Topic Modeling with WikiPDA
 
conference paper

Crosslingual Topic Modeling with WikiPDA

Piccardi, Tiziano  
•
West, Robert  
January 1, 2021
Proceedings Of The World Wide Web Conference 2021 (Www 2021)
30th World Wide Web Conference (WWW)

We present Wikipedia-based Polyglot Dirichlet Allocation (WikiPDA), a crosslingual topic model that learns to represent Wikipedia articles written in any language as distributions over a common set of language-independent topics. It leverages the fact that Wikipedia articles link to each other and are mapped to concepts in the Wikidata knowledge base, such that, when represented as bags of links, articles are inherently language-independent. WikiPDA works in two steps, by first densifying bags of links using matrix completion and then training a standard monolingual topic model. A human evaluation shows that WikiPDA produces more coherent topics than monolingual text-based latent Dirichlet allocation (LDA), thus offering crosslinguality at no cost. We demonstrate WikiPDA's utility in two applications: a study of topical biases in 28 Wikipedia language editions, and crosslingual supervised document classification. Finally, we highlight WikiPDA's capacity for zero-shot language transfer, where a model is reused for new languages without any fine-tuning. Researchers can benefit from WikiPDA as a practical tool for studying Wikipedia's content across its 299 language editions in interpretable ways, via an easy-to-use library publicly available at https://github.com/epfl-dlab/WikiPDA.

  • Details
  • Metrics
Type
conference paper
DOI
10.1145/3442381.3449805
Web of Science ID

WOS:000733621803006

Author(s)
Piccardi, Tiziano  
West, Robert  
Date Issued

2021-01-01

Publisher

ASSOC COMPUTING MACHINERY

Publisher place

New York

Published in
Proceedings Of The World Wide Web Conference 2021 (Www 2021)
ISBN of the book

978-1-4503-8312-7

Start page

3032

End page

3041

Subjects

Computer Science, Artificial Intelligence

•

Computer Science, Information Systems

•

Computer Science, Interdisciplinary Applications

•

Computer Science, Theory & Methods

•

Computer Science

Editorial or Peer reviewed

REVIEWED

Written at

EPFL

EPFL units
DLAB  
Event nameEvent placeEvent date
30th World Wide Web Conference (WWW)

ELECTR NETWORK

Apr 12-23, 2021

Available on Infoscience
March 28, 2022
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/186580
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés