Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Conferences, Workshops, Symposiums, and Seminars
  4. How Does Pre-Trained Wav2Vec 2.0 Perform On Domain-Shifted Asr? An Extensive Benchmark On Air Traffic Control Communications
 
conference paper

How Does Pre-Trained Wav2Vec 2.0 Perform On Domain-Shifted Asr? An Extensive Benchmark On Air Traffic Control Communications

Zuluaga-Gomez, Juan
•
Prasad, Amrutha
•
Nigmatulina, Iuliia
Show more
January 1, 2022
2022 Ieee Spoken Language Technology Workshop, Slt
IEEE Spoken Language Technology Workshop (SLT)

Recent work on self-supervised pre-training focus on leveraging large-scale unlabeled speech data to build robust end-to-end (E2E) acoustic models (AM) that can be later fine-tuned on downstream tasks e.g., automatic speech recognition (ASR). Yet, few works investigated the impact on performance when the data properties substantially differ between the pre-training and fine-tuning phases, termed domain shift. We target this scenario by analyzing the robustness of Wav2Vec 2.0 and XLS-R models on downstream ASR for a completely unseen domain, air traffic control (ATC) communications. We benchmark these two models on several open-source and challenging ATC databases with signal-to-noise ratio between 5 to 20 dB. Relative word error rate (WER) reductions between 20% to 40% are obtained in comparison to hybrid-based ASR baselines by only fine-tuning E2E acoustic models with a smaller fraction of labeled data. We analyze WERs on the low-resource scenario and gender bias carried by one ATC dataset.

  • Details
  • Metrics
Type
conference paper
DOI
10.1109/SLT54892.2023.10022724
Web of Science ID

WOS:000968851900028

Author(s)
Zuluaga-Gomez, Juan
Prasad, Amrutha
Nigmatulina, Iuliia
Sarfjoo, Seyyed Saeed
Motlicek, Petr  
Kleinert, Matthias
Helmke, Hartmut
Ohneiser, Oliver
Zhan, Qingran
Date Issued

2022-01-01

Publisher

IEEE

Publisher place

New York

Published in
2022 Ieee Spoken Language Technology Workshop, Slt
ISBN of the book

979-8-3503-9690-4

Series title/Series vol.

IEEE Workshop on Spoken Language Technology

Start page

205

End page

212

Subjects

Computer Science, Artificial Intelligence

•

Linguistics

•

Computer Science

•

automatic speech recognition

•

wav2vec 2.0

•

self-supervised pre-training

•

air traffic control communications

Editorial or Peer reviewed

REVIEWED

Written at

EPFL

EPFL units
LIDIAP  
Event nameEvent placeEvent date
IEEE Spoken Language Technology Workshop (SLT)

Doha, QATAR

Jan 09-12, 2023

Available on Infoscience
May 22, 2023
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/197761
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés