Personalising speech-to-speech translation: Unsupervised cross-lingual speaker adaptation for HMM-based speech synthesis

Dines, John; Liang, Hui; Saheer, Lakshmi; Gibson, Matthew; Byrne, William; Oura, Keiichiro; Tokuda, Keiichi; Yamagishi, Junichi; King, Simon; Wester, Mirjam; Hirsimäki, Teemu; Karhila, Reima; Kurimo, Mikko

doi:10.1016/j.csl.2011.08.003

research article

Personalising speech-to-speech translation: Unsupervised cross-lingual speaker adaptation for HMM-based speech synthesis

Dines, John

•

Liang, Hui

•

Saheer, Lakshmi

more

2013

Computer Speech and Language

In this paper we present results of unsupervised cross-lingual speaker adaptation applied to text-to-speech synthesis. The application of our research is the personalisation of speech-to-speech translation in which we employ a HMM statistical framework for both speech recognition and synthesis. This framework provides a logical mechanism to adapt synthesised speech output to the voice of the user by way of speech recognition. In this work we present results of several different unsupervised and cross-lingual adaptation approaches as well as an end-to-end speaker adaptive speech-to-speech translation system. Our experiments show that we can successfully apply speaker adaptation in both unsupervised and cross-lingual scenarios and our proposed algorithms seem to generalise well for several language pairs. We also discuss important future directions including the need for better evaluation metrics.

Name

Dines_CSL_2011.pdf

Access type

openaccess

Size

541.33 KB

Format

Adobe PDF

Checksum (MD5)

d187755db478b832e4ba73d951d28368