Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. EPFL thesis
  4. Automatic pathological speech assessment
 
Loading...
Thumbnail Image
doctoral thesis

Automatic pathological speech assessment

Janbakhshi, Parvaneh  
2022

Many pathologies cause impairments in the speech production mechanism resulting in reduced speech intelligibility and communicative ability. To assist the clinical diagnosis, treatment and management of speech disorders, automatic pathological speech assessments are indispensable. Such automatic assessments provide reliable, objective, and cost-effective assessment in contrast to subjective and time-consuming auditory-perceptual analyses performed by clinicians.
Among crucial automatic analyses for developing potential computer-aided tools are speech pathology detection, i.e., discriminating between normal and pathological speech, and speech intelligibility assessment, i.e., predicting an intelligibility index correlated with the percentage of words correctly understood by human listeners. The goal of this thesis is to propose novel data-driven approaches to aid the development of a clinical assistive tool for automatic pathological speech assessment with two purposes, i.e., pathological speech detection and intelligibility assessment.

First, we focus on the development of novel machine learning approaches to address the pathological speech detection task. Motivated by the clinical evidence on spectro-temporal distortions associated with pathological speech, we propose a subspace-based speech pathology detection approach that relies on analyzing subspaces spanned by the dominant spectral or temporal patterns of speech.
Although the temporal subspace-based approach yields a high performance, it requires time-alignment and having access to phonetically-balanced utterances from all speakers. To avoid the time-alignment and also to assess the efficacy of deep learning approaches for such a task, we propose analyzing pairwise distance matrices computed from speech representations using convolutional neural networks.
Furthermore, to be able to achieve pathological speech detection without requiring constraints on the phonetic content, we propose different supervised representation learning approaches using convolutional neural networks to learn robust and relevant feature representations. We demonstrate the effectiveness of the proposed approaches through different experiments across different databases.

Second, we focus on developing reliable automatic pathological speech intelligibility measures overcoming several drawbacks of the state-of-the-art measures while outperforming them. We first propose a measure based on short-time objective intelligibility assessment.
Further, we provide a solution to ensure its applicability across scenarios with different phonetic content across speakers. We also propose intelligibility measures based on analyzing speech subspaces. The subspace-based intelligibility measures are applicable to different scenarios while overcoming the drawbacks of the previously described measure.
We validate the performance of the proposed measures across languages and diseases.

Finally, insights are provided on a potential clinical assistive tool for pathological speech detection and intelligibility assessment. To this end, we jointly validate the applicability of two of the previously described approaches, i.e., temporal subspace-based speech pathology detection and short-time objective intelligibility assessment. As our approaches for both tasks achieve a high performance independently of the language and disease, we confirm the possibility of developing such a multi-purpose clinical assistive tool.

  • Files
  • Details
  • Metrics
Type
doctoral thesis
DOI
10.5075/epfl-thesis-9483
Author(s)
Janbakhshi, Parvaneh  
Advisors
Bourlard, Hervé  
•
Kodrasi, Ina  
Jury

Dr Jean-Marc Vesin (président) ; Prof. Hervé Bourlard, Dr Ina Kodrasi (directeurs) ; Prof. Jean-Philippe Thiran, Prof. Philip Green, Prof. Mads Græsbøll Christensen (rapporteurs)

Date Issued

2022

Publisher

EPFL

Publisher place

Lausanne

Public defense year

2022-06-13

Thesis number

9483

Total of pages

146

Subjects

pathological speech intelligibility

•

pathological speech detection

•

ESTOI

•

convolutional neural network

•

subspace-based learning

•

supervised speech representation learning

•

feature separation

•

dysarthria

•

Parkinson's disease

•

Cerebral Palsy

EPFL units
LIDIAP  
Faculty
STI  
School
IEM  
Doctoral School
EDEE  
Available on Infoscience
May 31, 2022
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/188207
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés