A Domain Adaptation Approach to Improve Speaker Turn Embedding Using Face Representation

Le, Nam; Odobez, Jean-Marc

doi:10.1145/3136755.3136800

Le, Nam; Odobez, Jean-Marc

2017

Download

Formats

Format
BibTeX
MARC
MARCXML
DublinCore
EndNote
NLM
RefWorks
RIS

Files

Abstract

This paper proposes a novel approach to improve speaker modeling using knowledge transferred from face representation. In particular, we are interested in learning a discriminative metric which allows speaker turns to be compared directly, which is beneficial for tasks such as diarization and dialogue analysis. Our method improves the embedding space of speaker turns by applying maximum mean discrepancy loss to minimize the disparity between the distributions of facial and acoustic embedded features. This approach aims to discover the shared underlying structure of the two embedded spaces, thus enabling the transfer of knowledge from the richer face representation to the counterpart in speech. Experiments are conducted on broadcast TV news datasets, REPERE and ETAPE, to demonstrate the validity of our method. Quantitative results in verification and clustering tasks show promising improvement, especially in cases where speaker turns are short or the training data size is limited.

Details

Title A Domain Adaptation Approach to Improve Speaker Turn Embedding Using Face Representation

Author(s) Le, Nam ; Odobez, Jean-Marc

Published in ICMI '17: Proceedings of the 19th ACM International Conference on Multimodal Interaction

Pages 411–415

Conference ACM International Conference on Multimodal Interaction, Glasgow, Scotland

Date 2017

Publisher ACM

Keywords

domain adaptation; Metric learning; Multimodal person diarization

DOI https://doi.org/10.1145/3136755.3136800

Laboratories LIDIAP

Record Appears in Scientific production and competences > STI - School of Engineering > IEM - Institut d'Electricité et de Microtechnique > LIDIAP - L'IDIAP Laboratory
Scientific production and competences > Euler Center for Signal Processing
Conference Papers
Work produced at EPFL

Record creation date 2017-09-19

Actions

Preview

Select file: