Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Conferences, Workshops, Symposiums, and Seminars
  4. Efficient Speech Quality Assessment Using Self-supervised Framewise Embeddings
 
conference paper

Efficient Speech Quality Assessment Using Self-supervised Framewise Embeddings

El Hajar, Karl
•
Wu, Zihan  
•
Scheidwasser-Clow, Neil
Show more
January 1, 2023
ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)

Automatic speech quality assessment is essential for audio researchers, developers, speech and language pathologists, and system quality engineers. The current state-of-the-art systems are based on framewise speech features (hand-engineered or learnable) combined with time dependency modeling. This paper proposes an efficient system with results comparable to the best performing model in the ConferencingSpeech 2022 challenge. Our proposed system is characterized by a smaller number of parameters (40-60x), fewer FLOPS (100x), lower memory consumption (10-15x), and lower latency (30x). Speech quality practitioners can therefore iterate much faster, deploy the system on resource-limited hardware, and, overall, the proposed system contributes to sustainable machine learning. The paper also concludes that framewise embeddings outperform utterance-level embeddings and that multi-task training with acoustic conditions modeling does not degrade speech quality prediction while providing better interpretation.

  • Details
  • Metrics
Type
conference paper
DOI
10.1109/ICASSP49357.2023.10095132
Web of Science ID

WOS:001549214003040

Author(s)
El Hajar, Karl

École Polytechnique Fédérale de Lausanne

Wu, Zihan  

École Polytechnique Fédérale de Lausanne

Scheidwasser-Clow, Neil

University of Copenhagen

Elbanna, Gasser  

École Polytechnique Fédérale de Lausanne

Cernak, Milos

Logitech Europe

Date Issued

2023-01-01

Publisher

IEEE

Publisher place

New York

Published in
ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
DOI of the book
https://doi.org/10.1109/ICASSP49357.2023
ISBN of the book

978-1-7281-6327-7

Series title/Series vol.

International Conference on Acoustics Speech and Signal Processing ICASSP

ISSN (of the series)

1520-6149

Subjects

speech quality assessment

•

audio embeddings

•

self-supervised learning

•

deep neural networks

•

transformers

Editorial or Peer reviewed

REVIEWED

Written at

EPFL

EPFL units
LCN1  
Event nameEvent acronymEvent placeEvent date
ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)

ICASSP 2023

Rhodes Island (Greece)

2023-06-04 - 2023-06-10

Available on Infoscience
February 24, 2026
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/260690
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés