Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. EPFL thesis
  4. On matching data and model in LF-MMI-based dysarthric speech recognition
 
doctoral thesis

On matching data and model in LF-MMI-based dysarthric speech recognition

Hermann, Enno  
2023

In light of steady progress in machine learning, automatic speech recognition (ASR) is entering more and more areas of our daily life, but people with dysarthria and other speech pathologies are left behind. Their voices are underrepresented in the training data and so different from typical speech that ASR systems fail to recognise them. This thesis aims to adapt both acoustic models and training data of ASR systems in order to better handle dysarthric speech.

We first build state-of-the-art acoustic models based on sequence-discriminative lattice-free maximum mutual information (LF-MMI) training that serve as baselines for the following experiments. We propose the dynamic combination of models trained on either control, dysarthric, or both groups of speakers to address the acoustic variability of dysarthric speech. Furthermore, we combine models trained with either phoneme or grapheme acoustic units in order to implicitly handle pronunciation variants.

Second, we develop a framework to analyse the acoustic space of ASR training data and its discriminability. We observe that these discriminability measures are strongly linked to subjective intelligibility ratings of dysarthric speakers and ASR performance.

Finally, we compare a range of data augmentation methods, including voice conversion and speech synthesis, for creating artificial dysarthric training data for ASR systems. With our analysis framework, we find that these methods reproduce characteristics of natural dysarthric speech.

  • Files
  • Details
  • Metrics
Type
doctoral thesis
DOI
10.5075/epfl-thesis-9681
Author(s)
Hermann, Enno  
Advisors
Odobez, Jean-Marc  
•
Magimai Doss, Mathew  
Jury

Prof. Pascal Frossard (président) ; Dr Jean-Marc Odobez, Dr Mathew Magimai Doss (directeurs) ; Prof. Jean-Philippe Thiran, Prof. Elmar Nöth, Prof. Isabel Trancoso (rapporteurs)

Date Issued

2023

Publisher

EPFL

Publisher place

Lausanne

Public defense year

2023-06-29

Thesis number

9681

Total of pages

102

Subjects

automatic speech recognition

•

dysarthria

•

pathological speech processing

•

LF-MMI

•

acoustic subword units

•

data augmentation

EPFL units
LIDIAP  
Faculty
STI  
School
IEM  
Doctoral School
EDEE  
Available on Infoscience
June 19, 2023
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/198518
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés