Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Conferences, Workshops, Symposiums, and Seminars
  4. Text Similarity in Vector Space Models: A Comparative Study
 
Loading...
Thumbnail Image
conference paper

Text Similarity in Vector Space Models: A Comparative Study

Shahmirzadi, Omid  
•
Lugowski, Adam
•
Younge, Kenneth  
September 24, 2018
2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA)
IEEE-ICMLA (18th International Conference on Machine Learning and Applications 2019)

Automatic measurement of semantic text similarity is an important task in natural language processing. In this paper, we evaluate the performance of different vector space models to perform this task. We address the real-world problem of modeling patent-to-patent similarity and compare TFIDF (and related extensions), topic models (e.g., latent semantic indexing), and neural models (e.g., paragraph vectors). Contrary to expectations, the added computational cost of text embedding methods is justified only when: 1) the target text is condensed; and 2) the similarity comparison is trivial. Otherwise, TFIDF performs surprisingly well in other cases: in particular for longer and more technical texts or for making finer-grained distinctions between nearest neighbors. Unexpectedly, extensions to the TFIDF method, such as adding noun phrases or calculating term weights incrementally, were not helpful in our context.

  • Details
  • Metrics
Type
conference paper
DOI
10.1109/ICMLA.2019.00120
ArXiv ID

1810.00664

Author(s)
Shahmirzadi, Omid  
•
Lugowski, Adam
•
Younge, Kenneth  
Date Issued

2018-09-24

Journal
2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA)
Start page

659

End page

666

Subjects

text similarity

•

vector space model

•

text embedding

•

patent

•

big data

Note

Comments: 17 pages

Peer reviewed

REVIEWED

Written at

EPFL

EPFL units
TIS  
Event nameEvent placeEvent date
IEEE-ICMLA (18th International Conference on Machine Learning and Applications 2019)

Boca Raton, Florida, USA

December 16-19, 2019

Available on Infoscience
January 31, 2020
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/165067
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés