Towards ASR Based on Hierarchical Posterior-Based Keyword Recognition

The paper presents an alternative approach to automatic recognition of speech in which each targeted word is classified by a separate binary classifier against all other sounds. No time alignment is done. To build a recognizer for N words, N parallel binary classifiers are applied. The system first estimates uniformly sampled posterior probabilities of phoneme classes, followed by a second step in which a rather long sliding time window is applied to the phoneme posterior estimates and its content is classified by an artificial neural network to yield posterior probability of the keyword. On small vocabulary ASR task, the system still does not reach the performance of the state-of-the-art system but its conceptual simplicity, the ease of adding new target words, and its inherent resistance to out-of-vocabulary sounds may prove significant advantage in many applications.

Related material