Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Conferences, Workshops, Symposiums, and Seminars
  4. Non-Intrusive Speech Quality Assessment with Transfer Learning and Subject-specific Scaling
 
Loading...
Thumbnail Image
conference paper

Non-Intrusive Speech Quality Assessment with Transfer Learning and Subject-specific Scaling

Nessler, Natalia
•
Cernak, Milos
•
Prandoni, Paolo  
Show more
January 1, 2021
Interspeech 2021
Interspeech Conference

In communication systems, it is crucial to estimate the perceived quality of audio and speech. The industrial standards for many years have been PESQ, 3QUEST, and POLQA, which are intrusive methods. This restricts the possibilities of using these metrics in real-world conditions, where we might not have access to the clean reference signal. In this work, we develop a new non-intrusive metric based on crowd-sourced data. We build a new speech dataset by combining publicly available speech, noises, and reverberations. Then we follow the ITU P.808 recommendation to label the dataset with mean opinion scores (MOS). Finally, we train a deep neural network to estimate the MOS from the speech data in a non-intrusive way. We propose two novelties in our work. First, we explore transfer learning by pre-training a model using a larger set of POLQA scores and finetuning with the smaller (and thus cheaper) human-labeled set. Secondly, we perform a subject-specific scaling in the MOS scores to adjust for their different subjective scales. Our model yields better accuracy than PESQ, POLQA, and other non-intrusive methods when evaluated on the independent VCTK test set. We also report misleading POLQA scores for reverberant speech.

  • Details
  • Metrics
Type
conference paper
DOI
10.21437/Interspeech.2021-1685
Web of Science ID

WOS:000841879502104

Author(s)
Nessler, Natalia
•
Cernak, Milos
•
Prandoni, Paolo  
•
Mainar, Pablo
Date Issued

2021-01-01

Publisher

ISCA-INT SPEECH COMMUNICATION ASSOC

Publisher place

Baixas

Journal
Interspeech 2021
Series title/Series vol.

Interspeech

Start page

2406

End page

2410

Subjects

speech quality assessment

•

polqa

•

neural networks

•

transfer learning

•

band

Peer reviewed

REVIEWED

Written at

EPFL

EPFL units
LCAV  
Event nameEvent placeEvent date
Interspeech Conference

Brno, CZECH REPUBLIC

Aug 30-Sep 03, 2021

Available on Infoscience
September 26, 2022
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/190962
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés