Audio Feature Extraction with Convolutional Neural Autoencoders with Application to Voice Conversion

Feature extraction is a key step in many machine learning and signal processing applications. For speech signals in particular, it is important to derive features that contain both the vocal characteristics of the speaker and the content of the speech. In this paper, we introduce a convolutional auto-encoder (CAE) to extract features from speech represented via proposed short-time discrete cosine transform (STDCT). We then introduce a deep neural mapping at the encoding bottleneck to enable converting a source speaker’s speech to a target speaker’s speech while preserving the source-speech content. We further compare this approach to clustering-based and linear mappings.


Présenté à:
International Conference on Acoustics, Speech, and Signal Processing (ICASSP), London, May 12-17, 2019
Année
May 12 2019
Mots-clefs:
Laboratoires:




 Notice créée le 2018-11-25, modifiée le 2019-08-12

Fichiers:
Télécharger le document
PDF

Évaluer ce document:

Rate this document:
1
2
3
 
(Pas encore évalué)