Correcting Confusion Matrices for Phone Recognizers

Modern speech recognition has many ways of quantifying the misrecognitions a speech recognizer makes. The errors in modern speech recognition makes extensive use of the Levenshtein algorithm to find the distance between the labeled target and the recognized hypothesis. This algorithm has problems when properly aligning substitution confusions due to the lack of knowledge about the system. This work addresses a shortcoming of the alignment provided by speech recognition analysis systems (HTK specifically) and provides a more applicable algorithm for aligning the hypothesis with the target. This new procedure takes into account the systematic errors the recognizer will make and uses that knowledge to produce correct alignments.

Related material