Abstract

In the literature, the task of dysarthric speech intelligibility assessment has been approached through development of different low-level feature representations, subspace modeling, phone confidence estimation or measurement of automatic speech recognition system accuracy. This paper proposes a novel approach where the intelligibility is estimated as the percentage of correct words uttered by a speaker with dysarthria by matching and verifying utterances of the speaker with dysarthria against control speakers' utterances in phone posterior feature space and broad phonetic posterior feature space. Experimental validation of the proposed approach on the UA-Speech database, with posterior feature estimators trained on the data from auxiliary domain and language, obtained a best Pearson's correlation coefficient (r) of 0.950 and Spearman's correlation coefficient (rho) of 0.957. Furthermore, replacing control speakers' speech with speech synthesized by a neural text-to-speech system obtained a best r of 0.931 and rho of 0.961.

Details