Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. EPFL thesis
  4. Modelling cochlea and its interaction with the auditory path for speech processing
 
doctoral thesis

Modelling cochlea and its interaction with the auditory path for speech processing

Coppieters De Gibson, Louise Clothilde  
2025

This thesis explores the intersection of physiological modelling and computational techniques in advancing Automatic Speech Recognition (ASR) systems. Contemporary ASR, often driven by attention models and self-supervised learning, has achieved remarkable accuracy, but remains decoupled from more recent physiological principles. In the meantime, significant progress has been made in understanding the function of the cochlea, the auditory system's sensory organ. Originally viewed as a passive filter bank, the cochlea is now understood to function as an active amplifier, well modelled by a Hopf oscillator.

The goal of this thesis is to investigate how such advances in physiological understanding can be studied in the context of such state of the art ASR techniques. To this end, the thesis is organised as two interacting threads.

In a first thread, we investigate modularity, which proposes strategies to integrate and combine different types of machine learning models, using different experts, or combine new frontend models with pretrained large transformer models. In a preliminary study, we show that modularity can be used to optimise an ASR model for different types of environmental noise.

In a second thread, we utilise modularity to investigate how to incorporate improved cochlear understanding into ASR systems, creating a two-way bridge where insights from computational approaches inform auditory physiology. After studying established techniques such as CARFAC and SincNet, we investigate trainable filter banks within a convolutional neural network (CNN) structure to determine key hyperparameters for ASR performance. This study also highlights interesting insights filters tend to learn when able to train in an ASR context.

Finally, we combine the threads by embedding a Hopf-based cochlear model within an ASR system, informed by the learned filter bank parameters. We show that the Hopf mechanism demonstrates the expected cube root compression and gain control. Moreover, a larger feedback loop, modelling the olivocochlear efferent path further enhances the overall performance. The resulting system, offers valuable insights for future interdisciplinary studies between ASR and physiological auditory models.

  • Files
  • Details
  • Metrics
Loading...
Thumbnail Image
Name

EPFL_TH10908.pdf

Type

Main Document

Version

http://purl.org/coar/version/c_be7fb7dd8ff6fe43

Access type

openaccess

License Condition

N/A

Size

5.18 MB

Format

Adobe PDF

Checksum (MD5)

7719512e1d9e29a09bca30211119618e

Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés