Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Student works
  4. Objective perception metrics for audio quality
 
master thesis

Objective perception metrics for audio quality

Coldenhoff, Jozef
September 13, 2024

The ubiquity of modern telecommunication systems has led to a steady growth in the interest in audio quality assessment. With the cumbersome and expensive nature of conducting human listening tests, automated objective measures of audio quality have been developed. Many of the systems that have been developed focus on the assessment of speech leading to a relative lack of work on non-speech signals. In this thesis we explore methods for audio quality assessment that generalize well across signal types, including speech and non-speech audio. We focus on the development of non-intrusive methods that do not have access to the clean signal. This work investigates two main approaches: impairment representation learning and a novel semi-intrusive method. Impairment representation learning involves adapting deep learning models for contrastive learning to focus on distortions in audio signals rather than their content, with the goal of improving the generalization of audio quality metrics across diverse signal types. Experiments where we train the simple machine learning methods, such as K-Nearest Neighbors, and Support Vector Regression, show that both impairment-focused and content-focused representations achieve decent performance for both speech and non-speech signals, even when pre-training is conducted on speech. The semi-intrusive method, which mimics humans’ innate
ability to focus on a signal within a mixture, frames the audio quality assessment task as a multi-modal problem. In this approach, the model is trained to predict audio quality based on both text and audio inputs. Our experiments show that the model outperforms baselines when evaluated across a broad domain; however, its performance on narrow domains lags behind baseline methods.

  • Files
  • Details
  • Metrics
Loading...
Thumbnail Image
Name

Objective_perception_metrics_for_audio_quality_Final.pdf

Type

Main Document

Version

http://purl.org/coar/version/c_71e4c1898caa6e32

Access type

openaccess

License Condition

CC BY

Size

2.99 MB

Format

Adobe PDF

Checksum (MD5)

956309e6d39538d1d1d464e383be43f2

Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés