pyannote.audio: neural building blocks for speaker diarization

Bredin, Herve; Yin, Ruiqing; Coria, Juan Manuel; Korshunov, Pavel; Lavechin, Marvin; Fustes, Diego; Titeux, Hadrien; Bouaziz, Wassim; Gill, Marie-Philippe

doi:10.1109/ICASSP40776.2020.9052974

conference paper not in proceedings

pyannote.audio: neural building blocks for speaker diarization

Bredin, Herve

•

Yin, Ruiqing

•

Coria, Juan Manuel

more

2020

IEEE International Conference on Acoustics, Speech, and Signal Processing

We introduce pyannote.audio, an open-source toolkit written in Python for speaker diarization. Based on PyTorch machine learning framework, it provides a set of trainable end-to-end neural building blocks that can be combined and jointly optimized to build speaker diarization pipelines. pyannote.audio also comes with pre-trained models covering a wide range of domains for voice activity detection, speaker change detection, overlapped speech detection, and speaker embedding – reaching state-of-the-art performance for most of them.

Type

conference paper not in proceedings

DOI

10.1109/ICASSP40776.2020.9052974

ArXiv ID

1911.01255

Authors

Bredin, Herve

•

Yin, Ruiqing

•

Coria, Juan Manuel

•

Korshunov, Pavel

•

Lavechin, Marvin

•

Fustes, Diego

•

Titeux, Hadrien

•

Bouaziz, Wassim

•

Gill, Marie-Philippe

Publication date

2020

Peer reviewed

REVIEWED

EPFL units

LIDIAP

Event name

IEEE International Conference on Acoustics, Speech, and Signal Processing

Available on Infoscience

May 27, 2020

Use this identifier to reference this record

https://infoscience.epfl.ch/handle/20.500.14299/168970