000253313 001__ 253313
000253313 005__ 20190812210014.0
000253313 037__ $$aCONF
000253313 245__ $$aLearning Word Vectors for 157 Languages
000253313 260__ $$c2018-02-19
000253313 269__ $$a2018-02-19
000253313 336__ $$aConference Papers
000253313 520__ $$aDistributed word representations, or word vectors, have recently been applied to many tasks in natural language processing, leading to state-of-the-art performance. A key ingredient to the successful application of these representations is to train them on very large corpora, and use these pre-trained models in downstream tasks. In this paper, we describe how we trained such high quality word representations for 157 languages. We used two sources of data to train these models: the free online encyclopedia Wikipedia and data from the common crawl project. We also introduce three new word analogy datasets to evaluate these word vectors, for French, Hindi and Polish. Finally, we evaluate our pre-trained word vectors on 10 languages for which evaluation datasets exists, showing very strong performance compared to previous models.
000253313 700__ $$aGrave, Edouard
000253313 700__ $$aBojanowski, Piotr
000253313 700__ $$g266850$$aGupta, Prakhar$$0250736
000253313 700__ $$aJoulin, Armand
000253313 700__ $$aMikolov, Tomas
000253313 710__ $$aGrave, Edouard
000253313 710__ $$aBojanowski, Piotr
000253313 710__ $$aJoulin, Armand
000253313 710__ $$aMikolov, Tomas
000253313 7112_ $$dMay 7-12, 2018$$cMiyazaki, Japan$$aLanguage Resources and Evaluation Conference
000253313 8560_ $$fprakhar.gupta@epfl.ch
000253313 8564_ $$uhttps://infoscience.epfl.ch/record/253313/files/1802.06893.pdf$$s118906
000253313 8564_ $$xpdfa$$uhttps://infoscience.epfl.ch/record/253313/files/1802.06893.pdf?subformat=pdfa$$s1466445
000253313 909C0 $$xU13319$$pMLO$$mjennifer.bachmann-ona@epfl.ch$$0252581
000253313 909CO $$qGLOBAL_SET$$pconf$$pIC$$ooai:infoscience.epfl.ch:253313
000253313 960__ $$aprakhar.gupta@epfl.ch
000253313 961__ $$amanon.velasco@epfl.ch
000253313 973__ $$rREVIEWED$$aOTHER
000253313 980__ $$aCONF
000253313 981__ $$aoverwrite