Faster Parallel Training of Word Embeddings

Wszola, Eliza; Jaggi, Martin; Puschel, Markus

doi:10.1109/HiPC53243.2021.00017

conference paper

Faster Parallel Training of Word Embeddings

Wszola, Eliza

•

Jaggi, Martin

•

Puschel, Markus

January 1, 2021

2021 Ieee 28Th International Conference On High Performance Computing, Data, And Analytics (Hipc 2021)

28th Annual IEEE International Conference on High Performance Computing, Data, and Analytics (HiPC)

Word embeddings have gained increasing popularity in the recent years due to the Word2vec library and its extension fastText that uses subword information. In this paper, we aim at improving the execution speed of fastText training on homogeneous multi- and manycore CPUs while maintaining accuracy. We present a novel open-source implementation that flexibly incorporates various algorithmic Variants including negative sample sharing, batched updates. and a byte-pair encoding-based alternative for subword units. We build these novel variants over a fastText implementation that we carefully optimized for the architecture, memory hierarchy, and parallelism of current manycore CPUs. Our experiments on three languages demonstrate 3-20x speed-up in training time at competitive semantic and syntactic accuracy.

Type

conference paper

DOI

10.1109/HiPC53243.2021.00017

Web of Science ID

WOS:000782316500004

Authors

Wszola, Eliza

•

Jaggi, Martin

•

Puschel, Markus

Publication date

2021-01-01

Publisher

IEEE COMPUTER SOC

Published in

2021 Ieee 28Th International Conference On High Performance Computing, Data, And Analytics (Hipc 2021)

ISBN of the book

978-1-6654-1016-8

Publisher place

Los Alamitos

Series title/Series vol.

International Conference on High Performance Computing

Start page

31

End page

41

Subjects

Computer Science, Har...

Computer Science, Sof...

Computer Science, The...

Mathematics, Applied

Computer Science

Mathematics

machine learning

natural language proc...

parallel computing

performance

word2vec

fasttext

Peer reviewed

REVIEWED

EPFL units

MLO

Event name	Event place	Event date
28th Annual IEEE International Conference on High Performance Computing, Data, and Analytics (HiPC)	ELECTR NETWORK	Dec 17-18, 2021

Available on Infoscience

May 9, 2022

Use this identifier to reference this record

https://infoscience.epfl.ch/handle/20.500.14299/187637