000200377 245__ $$aSmall Languages, Big Data: Multilingual Computational Tools and Techniques for the Lexicography of Endangered Languages
000200377 520__ $$aThe Kamusi Project, a multilingual online dictionary website, has as one of its goals to document the lexicons of en-dangered and less-resourced languages (LRLs). Kamusi.org provides a unified platform and repository for this kind of data that is both simple to use and free to researchers and the public. Since Kamusi has a separate entry for each homophone or polyseme, it can be used to produce sophisticated multilingual dictionaries. We have recently been confronting issues inherent in contact language-based lexi-cography, especially the elicitation of culturally-specific semantic terms, which cannot be obtained through fieldwork purely reliant on a contact language. To address this, we have designed a system of “balloons.” Based on a variety of fac-tors, balloons raise the likelihood of re-vealing terms and fields that have partic-ular relevance within a culture, rather than perpetuating linguistic bias toward the concerns and artifacts of more power-ful groups. Kamusi has also developed a smartphone application which can be used for crowdsourcing contributions and validation. It will also be invaluable in gathering oral data from speakers of en-dangered languages for the production of monolingual talking dictionaries. The first of these projects is planned for the Arrernte language in central Australia.
000200377 6531_ $$aendangered languages
000200377 6531_ $$amultilingual lexicography
000200377 6531_ $$acrowdsourcing
000200377 6531_ $$atalking dictionaries
52nd Annual Meeting of the Association for Computational Linguistics, Baltimore, Maryland, USA, June 22-27, 2014
000200377 773__ $$q15-23$$tProceedings of the 2014 Workshop on the Use of Computational Methods in the Study of Endangered Languages
