Files

Abstract

The prosody of the speech signal carries both linguistic and paralinguistic information. As such, there is a necessity of its modelling for the purpose of integrating it in speech technology systems. So far, there has been a multitude of proposed models focusing mainly on intonation, but a few also on energy and duration. The paper proposes an integrated approach to modelling the three dimensions of prosody through the use of atom decomposition techniques that we refer to as a Unified Prosody Model (UPM). The advantages of using such an integrated approach are illustrated in the task of emphasis detection, for which simple features are constructed based on the output of our UPM. A logistic regression classifier is trained and tested using these features and reaches an accuracy of 91%. This proof-of-concept algorithm illustrates the potential behind using the proposed UPM in improving prosody related speech research.

Details

Actions

Preview