A Large Margin Algorithm for Forced Alignment

Keshet, Joseph; Shalev-Shwartz, Shai; Singer, Yoram; Chazan, Dan

doi:10.1002/9780470742044.ch4

book part or chapter

A Large Margin Algorithm for Forced Alignment

Keshet, Joseph

•

Shalev-Shwartz, Shai

•

Singer, Yoram

more

Keshet, Joseph

•

Bengio, Samy

We describe and analyze a discriminative algorithm for learning to align a phoneme sequence of a speech utterance with its acoustical signal counterpart by predicting a timing sequence representing the phoneme start times. In contrast to common HMM-based approaches, our method employs a discriminative learning procedure in which the learning phase is tightly coupled with the forced alignment task. The alignment function we devise is based on mapping the input acoustic-symbolic representations of the speech utterance along with the target timing sequence into an abstract vector space. We suggest a specific mapping into the abstract vector-space which utilizes standard speech features (e.g. spectral distances) as well as confidence outputs of a frame-based phoneme classifier. Generalizing the notion of separation with a margin used in support vector machines (SVM) for binary classification, we cast the learning task as the problem of finding a vector in an abstract inner-product space. We set the prediction vector to be the solution of a minimization problem with a large set of constraints. Each constraint enforces a gap between the projection of the correct target timing sequence and the projection of an alternative, incorrect, timing sequence onto the vector. Though the number of constraints is very large, we describe a simple iterative algorithm for efficiently learning the vector and analyze the formal properties of the resulting learning algorithm. We report experimental results comparing the proposed algorithm to previous studies on forced alignment, which use hidden Markov models (HMM). The results obtained in our experiments using the discriminative alignment algorithm outperform the state-of-the-art systems on the TIMIT corpus.

Type

book part or chapter

DOI

10.1002/9780470742044.ch4

Authors

Keshet, Joseph

•

Shalev-Shwartz, Shai

•

Singer, Yoram

•

Chazan, Dan

Editors

Keshet, Joseph

•

Bengio, Samy

Publication date

2009

Publisher

John Wiley and Sons

Published in

Automatic Speech and Speaker Recognition: Large Margin and Kernel Methods

ISBN of the book

9780470696835

Start page

51

End page

68

EPFL units

LIDIAP

Available on Infoscience

February 11, 2010

Use this identifier to reference this record

https://infoscience.epfl.ch/handle/20.500.14299/46864