Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Books and Book parts
  4. Rehabilitation of Count-based Models for Word Vector Representations
 
book part or chapter

Rehabilitation of Count-based Models for Word Vector Representations

Lebret, Rémi
•
Collobert, Ronan
2015
Computational Linguistics And Intelligent Text Processing (Cicling 2015), Pt I

Recent works on word representations mostly rely on predictive models. Distributed word representations (aka word embeddings) are trained to optimally predict the contexts in which the corresponding words tend to appear. Such models have succeeded in capturing word similarities as well as semantic and syntactic regularities. Instead, we aim at reviving interest in a model based on counts. We present a systematic study of the use of the Hellinger distance to extract semantic representations from the word co-occurrence statistics of large text corpora. We show that this distance gives good performance on word similarity and analogy tasks, with a proper type and size of context, and a dimensionality reduction based on a stochastic low-rank approximation. Besides being both simple and intuitive, this method also provides an encoding function which can be used to infer unseen words or phrases. This becomes a clear advantage compared to predictive models which must train these new words.

  • Details
  • Metrics
Type
book part or chapter
DOI
10.1007/978-3-319-18111-0_31
Web of Science ID

WOS:000362441400031

Author(s)
Lebret, Rémi
Collobert, Ronan
Date Issued

2015

Publisher

Springer International Publishing

Publisher place

Berlin

Published in
Computational Linguistics And Intelligent Text Processing (Cicling 2015), Pt I
ISBN of the book

978-3-319-18111-0

978-3-319-18110-3

Total of pages

13

Start page

417

End page

429

Series title/Series vol.

Lecture Notes in Computer Science

Volume
9041
Editorial or Peer reviewed

REVIEWED

Written at

EPFL

EPFL units
LIDIAP  
Available on Infoscience
July 19, 2015
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/116359
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés