Automatic Accentedness Evaluation of Non-Native Speech Using Phonetic and Sub-Phonetic Posterior Probabilities
Automatic evaluation of non-native speech accentedness has potential implications for not only language learning and accent identification systems but also for speaker and speech recognition systems. From the perspective of speech production, the two primary factors influencing the accentedness are the phonetic and prosodic structure. In this paper, we propose an approach for automatic accentedness evaluation based on comparison of instances of native and non-native speakers at the acoustic-phonetic level. Specifically, the proposed approach measures accentedness by comparing phone class conditional probability sequences corresponding to the instances of native and non-native speakers, respectively. We evaluate the proposed approach on the EMIME bilingual and EMIME Mandarin bilingual corpora, which contains English speech from native English speakers and various non-native English speakers, namely Finnish, German and Mandarin. We also investigate the influence of the granularity of the phonetic unit representation on the performance of the proposed accentedness measure. Our results indicate that the accentedness ratings by the proposed approach correlate consistently with the human ratings of accentedness. In addition, our studies show that the granularity of the phonetic unit representation that yields the best correlation with the human accentedness ratings varies with respect to the native language of the non-native speakers.