Collaboration in the Production of a Massively Multilingual Lexicon

Benjamin, Martin

conference paper

Benjamin, Martin

2014

LREC 2014 Proceedings

9th edition of the Language Resources and Evaluation Conference

This paper discusses the multiple approaches to collaboration that the Kamusi Project is employing in the creation of a massively multilingual lexical resource. The project’s data structure enables the inclusion of large amounts of rich data within each sense-specific entry, with transitive concept-based links across languages. Data collection involves mining existing data sets, language experts using an online editing system, crowdsourcing, and games with a purpose. The paper discusses the benefits and drawbacks of each of these elements, and the steps the project is taking to account for those. Special attention is paid to guiding crowd members with targeted questions that produce results in a specific format. Collaboration is seen as an essential method for generating large amounts of linguistic data, as well as for validating the data so it can be considered trustworthy.

Name

Collaboration in Lexicography - LREC 2014 - final.pdf

Type

Publisher's Version

Version

http://purl.org/coar/version/c_970fb48d4fbd8a85

Access type

openaccess

Size

353.39 KB

Format

Adobe PDF

Checksum (MD5)

c31613f4ef6d433c32b558f5963931a3