Enhancing State Mapping-Based Cross-Lingual Speaker Adaptation using Phonological Knowledge in a Data-Driven Manner
HMM state mapping with the Kullback-Leibler divergence as a distribution similarity measure is a simple and effective technique that enables cross-lingual speaker adaptation for speech synthesis. However, since this technique does not take any other potentially useful information into account for mapping construction, an approach involving phonological knowledge in a data-driven manner is proposed in order to produce better state mapping rules – state distributions from the input and output languages are clustered according to broad phonetic categories using a decision tree, and mapping rules are constructed only within each resultant leaf node. Apart from this, previous research shows that a regression class tree that follows the decision tree structure for state tying is detrimental to cross-lingual speaker adaptation. Thus it is also proposed to apply this new approach to regression class tree growth – state distributions from the output language are clustered according to broad phonetic categories using a decision tree, which is then directly used as a regression class tree for transform estimation. Experimental results show that the proposed approach can reduce mel-cepstral distortion consistently and produce state mapping rules and regression class trees that generalize to unseen test speakers. The impacts of the phonological/acoustic similarity between input and output languages upon the reliability of state mapping rules and upon the structure of regression class trees are also demonstrated and analyzed.