TRAP-TANDEM: Data-driven extraction of temporal features from speech

Conventional features in automatic recognition of speech describe instantaneous shape of a short-time spectrum of speech. The TRAP-TANDEM features describe likelihoods of sub-word classess at a given time instant, derived from temporal trajectories of band-limited spectral densities in the vicinity of a given time instant. The paper presents some rationale behind the data-driven TRAP-TANDEM approach, briefly describes the technique, point to relevant oublications and summarizes results achieved so far.

Related material