Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Conferences, Workshops, Symposiums, and Seminars
  4. Low-latency speaker spotting with online diarization and detection
 
conference paper

Low-latency speaker spotting with online diarization and detection

Patino, Jose
•
Yin, Ruiqing
•
Delgado, Hector
Show more
2018
The Speaker and Language Recognition Workshop (Odyssey 2018)
The Speaker and Language Recognition Workshop (Odyssey)

This paper introduces a new task termed low-latency speaker spotting (LLSS). Related to security and intelligence applications, the task involves the detection, as soon as possible, of known speakers within multi-speaker audio streams. The paper describes differences to the established fields of speaker diarization and automatic speaker verification and proposes a new protocol and metrics to support exploration of LLSS. These can be used together with an existing, publicly available database to assess the performance of LLSS solutions also proposed in the paper. They combine online diarization and speaker detection systems. Diarization systems include a naive, over-segmentation approach and fully-fledged online diarization using segmental i-vectors. Speaker detection is performed using Gaussian mixture models, i-vectors or neural speaker embeddings. Metrics reflect different approaches to characterise latency in addition to detection performance. The relative performance of each solution is dependent on latency. When higher latency is admissible, i-vector solutions perform well; embeddings excel when latency must be kept to a minimum. With a need to improve the reliability of online diarization and detection, the proposed LLSS framework provides a vehicle to fuel future research in both areas. In this respect, we embrace a reproducible research policy; results can be readily reproduced using publicly available resources and open source codes.

  • Details
  • Metrics
Type
conference paper
DOI
10.21437/Odyssey.2018-20
Author(s)
Patino, Jose
Yin, Ruiqing
Delgado, Hector
Bredin, Herve
Komaty, Alain
Wisniewski, Guillaume
Barras, Claude
Evans, Nicholas
Marcel, Sébastien
Date Issued

2018

Published in
The Speaker and Language Recognition Workshop (Odyssey 2018)
Start page

140

End page

146

Editorial or Peer reviewed

REVIEWED

Written at

EPFL

EPFL units
LIDIAP  
Event nameEvent placeEvent date
The Speaker and Language Recognition Workshop (Odyssey)

Les Sables d’Olonne, France

26-29 June 2018

Available on Infoscience
March 18, 2020
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/167407
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés