Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. EPFL thesis
  4. Advancing Neural Representations for Paralinguistic Analysis: From Speech Emotion to Parkinson's Disease Assessment
 
doctoral thesis

Advancing Neural Representations for Paralinguistic Analysis: From Speech Emotion to Parkinson's Disease Assessment

Purohit, Tilak  
2026

Paralinguistics comes from the Greek preposition para, meaning "alongside". It refers to the study of information in speech that goes beyond words, capturing cues such as emotion, personality, gender, and health. Rather than focusing on 'what' is said, it focuses more on 'how' it is said. It examines speech through two complementary temporal dimensions: states, which represent short-term affective variations (such as emotion or stress), and traits, which reflect long-term speaker characteristics, including gender, age, or pathological conditions like Parkinsonâ s disease. Stable traits influence how transient states are expressed and perceived, forming a continuum that underlies paralinguistic analysis.

Traditional Speech Emotion Recognition (SER) approaches rely on either (a) suprasegmental modeling of handcrafted acoustic descriptors or (b) direct modeling of long-duration speech signals (typically 4â 6 seconds) using deep neural networks. This thesis departs from these paradigms by introducing a short-segment modeling strategy, showing that emotion-relevant information can be effectively captured from 250 ms speech waveform segments using an end-to-end Convolutional Neural Network (CNN). Across multiple emotion corpora, the proposed model achieved performance comparable to utterance-level systems and outperformed handcrafted features extracted over the same 250 ms duration. Relevance-signal-based interpretability analysis revealed that the CNN learns emotion-relevant cepstral features, confirming the strength of data-driven short-segment modeling.

Building on this, a phonetically aware neural modeling framework was proposed to explore whether phonetic information captures emotional cues. By leveraging neural features that encode phonetic information, this approach consistently outperformed traditional acoustic features across benchmark datasets. These results highlight the importance of phonetic information in emotion modeling.

Extending these findings from transient states to persistent traits, the study examined Speech Foundation Models (SFMs) for detecting neurological conditions, focusing on Parkinsonâ s disease (PD). In low-resource clinical scenarios, parameter-efficient adaptation strategies such as layer selection and Low-Rank Adaptation (LoRA) were introduced. We observed that the layer selection method matched the performance of full fine-tuning while requiring significantly fewer parameters. Notably, the application of LoRA to the Whisper model surpassed other methods, suggesting that models pretrained for task-specific speech recognition are conducive to efficient adaptation for PD speech detection.

To further explore the interaction between states and traits, the research addressed comorbid depression detection in PD, a challenging task due to overlapping vocal characteristics. In this low-data setting, large SFMs failed to generalize well, whereas interpretable handcrafted acoustic features with robust feature selection proved more effective. Analysis showed that depression manifests through different acoustic markers: while non-PD depression is dominated by source-related features, PD-related depression reflects both source and system cues.

Overall, this work advances the understanding of how paralinguistic information is encoded in neural representations, bridging interpretability and scalability toward the development of robust, explainable models for paralinguistic States and Traits inference.

  • Files
  • Details
  • Metrics
Loading...
Thumbnail Image
Name

EPFL_TH11387.pdf

Type

Main Document

Version

Published version

Access type

openaccess

License Condition

N/A

Size

9.36 MB

Format

Adobe PDF

Checksum (MD5)

8c946efec0b26e953f5359c000deab3e

Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés