Infoscience

Conference paper

Excluded Linguistic Communities and the Production of an Inclusive Multilingual Digital Language Infrastructure

The consequence of linguistic digital exclusion is the inability of billions of people to access vital knowledge and economic resources that contribute to prosperity in an era of globalization. However, rectifying linguistic inequity is mostly absent from development discourse and the agendas of governments and agencies that undertake development activities. Most efforts to produce content for excluded languages depend on the haphazard occurrence of a commercial, academic, or programmatic purpose for an activity in a given language at a particular moment. The Kamusi Project seeks to address the digital linguistic divide by engaging communities in the systematic collection of codified data for any language – linguistic information that can be used in many kinds of advanced knowledge and technology resources. This paper explores assumptions about participants’ motivations and behaviors that underlie the project’s methods, including participation in online games and interactive mobile apps intended to elicit speakers’ knowledge of their own languages in ways that can be shared by others. While the Kamusi system aims to welcome all, disparities may continue to exclude those without substantial time, network access, equipment, digital experience, or literacy, leaving international members of a diasporic language group as its most active contributors. Further, smaller and more remote languages have, by definition, fewer potential participants and less access for participation, thus perpetuating their inability to jump the digital divide. Without external support for the time and effort necessary to gather linguistic knowledge, even the most carefully constructed tools will fail for thousands of languages spoken by millions of people, including many languages near extinction. This paper raises, without definitively resolving, the social challenges of a multilingual digital infrastructure platform that has the technical capacity to document every word in every language, but can only approach accomplishing this objective through the involvement of those who have the least access to taking part.

Related material