Building the Moroccan Darija WordNet (MDW) using Bilingual Resources
Moroccan Darija is one of the Arabic dialects, a continuum of under-resourced vernaculars. We develop a Moroccan Darija Wordnet (MDW) using a bilingual Moroccan-English dictionary, from which we collect nearly 13,000 definitions and over 15,000 lemmas. A Moroccan alphabet is set to make the MDW user-friendly. We link the Moroccan-English definitions to the Princeton WordNet using a method that found matches for about 77% of these, and estimated accuracy using confidence scores. Over 2,300 Moroccan synsets were verified as a first step of manual validation and are now included in the MDW, which is released as part of the Open Multilingual WordNet.
mwn.pdf
Preprint
http://purl.org/coar/version/c_71e4c1898caa6e32
openaccess
284.58 KB
Adobe PDF
bb6e19f008da14a920c561d065f5856a