Syllabic Pitch Tuning for Neutral-to-Emotional Voice Conversion

Saheer, Lakshmi; Na, Xingyu; Cernak, Milos

report

Syllabic Pitch Tuning for Neutral-to-Emotional Voice Conversion

Saheer, Lakshmi

•

Na, Xingyu

•

Cernak, Milos

2015

Prosody plays an important role in both identification and synthesis of emotionalized speech. Prosodic features like pitch are usually estimated and altered at a segmental level based on short windows of speech (where the signal is expected to be quasi-stationary). This results in a frame-wise change of acoustical parameters for synthesizing emotionalized speech. In order to convert a neutral speech to an emotional speech from the same user, it might be better to alter the pitch parameters at the suprasegmental level like at the syllable-level since the changes in the signal are more subtle and smooth. In this paper we aim to show that the pitch transformation in a neutral-to-emotional voice conversion system may result in a better speech quality output if the transformations are performed at the supra-segmental (syllable) level rather than a frame-level change. Subjective evaluation results are shown to demonstrate if the naturalness, speaker similarity and the emotion recognition tasks show any performance difference.

Use this identifier to reference this record

https://infoscience.epfl.ch/handle/20.500.14299/119954

Name

Saheer_Idiap-RR-31-2015.pdf

Access type

openaccess

Size

660.57 KB

Format

Adobe PDF

Checksum (MD5)

7896c667ae95cf669ff894ea592eb52a