Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Conferences, Workshops, Symposiums, and Seminars
  4. Improving speaker turn embedding by crossmodal transfer learning from face embedding
 
conference paper

Improving speaker turn embedding by crossmodal transfer learning from face embedding

Le, Nam
•
Odobez, Jean-Marc  
2017
2017 IEEE International Conference on Computer Vision Workshops (ICCVW)
ICCV Workshop on Computer Vision for Audio-Visual Media

Learning speaker turn embeddings has shown considerable improvement in situations where conventional speaker modeling approaches fail. However, this improvement is relatively limited when compared to the gain observed in face embedding learning, which has proven very successful for face verification and clustering tasks. Assuming that face and voices from the same identities share some latent properties (like age, gender, ethnicity), we propose two transfer learning approaches to leverage the knowledge from the face domain learned from thousands of identities for tasks in the speaker domain. These approaches, namely target embedding transfer and clustering structure transfer, utilize the structure of the source face embedding space at different granularities to regularize the target speaker turn embedding space as optimizing terms. Our methods are evaluated on two public broadcast corpora and yield promising advances over competitive baselines in verification and audio clustering tasks, especially when dealing with short speaker utterances. The analysis gives insight into characteristics of the embedding spaces and shows their potential applications.

  • Details
  • Metrics
Type
conference paper
DOI
10.1109/ICCVW.2017.58
Author(s)
Le, Nam
Odobez, Jean-Marc  
Date Issued

2017

Published in
2017 IEEE International Conference on Computer Vision Workshops (ICCVW)
Start page

428

End page

437

Written at

EPFL

EPFL units
LIDIAP  
Event name
ICCV Workshop on Computer Vision for Audio-Visual Media
Available on Infoscience
August 19, 2017
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/139729
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés