An End-to-End Networks to Synthetize Intonation using a Generalized Command Response Model

Marelli, François; Schnell, Bastian; Bourlard, Hervé; Dutoit, T.; Garner, Philip N.

doi:10.1109/ICASSP.2019.8683815

Marelli, François; Schnell, Bastian; Bourlard, Hervé; Dutoit, T.; Garner, Philip N.

2019

Download

Formats

Format
BibTeX
MARC
MARCXML
DublinCore
EndNote
NLM
RefWorks
RIS

Files

Abstract

The generalized command response (GCR) model represents intonation as a superposition of muscle responses to spike command signals. We have previously shown that the spikes can be predicted by a two-stage system, consisting of a recurrent neural network and a post-processing procedure, but the responses themselves were fixed dictionary atoms. We propose an end-to-end neural architecture that replaces the dictionary atoms with trainable second-order recurrent elements analogous to recursive filters. We demonstrate gradient stability under modest conditions, and show that the system can be trained by imposing temporal sparsity constraints. Subjective listening tests demonstrate that the system can synthesize intonation with high naturalness, comparable to state-of-the-art acoustic models, and retains the physiological plausibility of the GCR model.

Details

Title An End-to-End Networks to Synthetize Intonation using a Generalized Command Response Model

Author(s) Marelli, François ; Schnell, Bastian ; Bourlard, Hervé ; Dutoit, T. ; Garner, Philip N.

Published in ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing

Pages 7040-7044

Date 2019

Publisher Idiap

Keywords

Digital IIR Filters; Fujisaki Model; neural networks; Prosody Modelling; speech synthesis

DOI https://doi.org/10.1109/ICASSP.2019.8683815

Other identifier(s) View record in Web of Science

Additional link Related documents

Laboratories LIDIAP

Record Appears in Scientific production and competences > STI - School of Engineering > IEM - Institut d'Electricité et de Microtechnique > LIDIAP - L'IDIAP Laboratory
Scientific production and competences > Euler Center for Signal Processing
Peer-reviewed publications
Conference Papers
Work produced at EPFL

Record creation date 2019-05-27

Actions

Preview

Select file: