Small Languages, Big Data: Multilingual Computational Tools and Techniques for the Lexicography of Endangered Languages

The Kamusi Project, a multilingual online dictionary website, has as one of its goals to document the lexicons of en-dangered and less-resourced languages (LRLs). Kamusi.org provides a unified platform and repository for this kind of data that is both simple to use and free to researchers and the public. Since Kamusi has a separate entry for each homophone or polyseme, it can be used to produce sophisticated multilingual dictionaries. We have recently been confronting issues inherent in contact language-based lexi-cography, especially the elicitation of culturally-specific semantic terms, which cannot be obtained through fieldwork purely reliant on a contact language. To address this, we have designed a system of “balloons.” Based on a variety of fac-tors, balloons raise the likelihood of re-vealing terms and fields that have partic-ular relevance within a culture, rather than perpetuating linguistic bias toward the concerns and artifacts of more power-ful groups. Kamusi has also developed a smartphone application which can be used for crowdsourcing contributions and validation. It will also be invaluable in gathering oral data from speakers of en-dangered languages for the production of monolingual talking dictionaries. The first of these projects is planned for the Arrernte language in central Australia.


Editeur(s):
Good, Jeff
Hirschberg, Julia
Rambow, Owen
Publié dans:
Proceedings of the 2014 Workshop on the Use of Computational Methods in the Study of Endangered Languages, 15-23
Présenté à:
52nd Annual Meeting of the Association for Computational Linguistics, Baltimore, Maryland, USA, June 22-27, 2014
Année
2014
Publisher:
Stroudsburg, PA, USA, Association for Computational Linguistics
Mots-clefs:
Laboratoires:




 Notice créée le 2014-07-25, modifiée le 2018-09-13

n/a:
Télécharger le document
PDF

Évaluer ce document:

Rate this document:
1
2
3
 
(Pas encore évalué)