Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Conferences, Workshops, Symposiums, and Seminars
  4. Faster Parallel Training of Word Embeddings
 
conference paper

Faster Parallel Training of Word Embeddings

Wszola, Eliza
•
Jaggi, Martin  
•
Puschel, Markus
January 1, 2021
2021 Ieee 28Th International Conference On High Performance Computing, Data, And Analytics (Hipc 2021)
28th Annual IEEE International Conference on High Performance Computing, Data, and Analytics (HiPC)

Word embeddings have gained increasing popularity in the recent years due to the Word2vec library and its extension fastText that uses subword information. In this paper, we aim at improving the execution speed of fastText training on homogeneous multi- and manycore CPUs while maintaining accuracy. We present a novel open-source implementation that flexibly incorporates various algorithmic Variants including negative sample sharing, batched updates. and a byte-pair encoding-based alternative for subword units. We build these novel variants over a fastText implementation that we carefully optimized for the architecture, memory hierarchy, and parallelism of current manycore CPUs. Our experiments on three languages demonstrate 3-20x speed-up in training time at competitive semantic and syntactic accuracy.

  • Details
  • Metrics
Type
conference paper
DOI
10.1109/HiPC53243.2021.00017
Web of Science ID

WOS:000782316500004

Author(s)
Wszola, Eliza
Jaggi, Martin  
Puschel, Markus
Date Issued

2021-01-01

Publisher

IEEE COMPUTER SOC

Publisher place

Los Alamitos

Published in
2021 Ieee 28Th International Conference On High Performance Computing, Data, And Analytics (Hipc 2021)
ISBN of the book

978-1-6654-1016-8

Series title/Series vol.

International Conference on High Performance Computing

Start page

31

End page

41

Subjects

Computer Science, Hardware & Architecture

•

Computer Science, Software Engineering

•

Computer Science, Theory & Methods

•

Mathematics, Applied

•

Computer Science

•

Mathematics

•

machine learning

•

natural language processing

•

parallel computing

•

performance

•

word2vec

•

fasttext

Editorial or Peer reviewed

REVIEWED

Written at

EPFL

EPFL units
MLO  
Event nameEvent placeEvent date
28th Annual IEEE International Conference on High Performance Computing, Data, and Analytics (HiPC)

ELECTR NETWORK

Dec 17-18, 2021

Available on Infoscience
May 9, 2022
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/187637
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés