Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Conferences, Workshops, Symposiums, and Seminars
  4. Hybrid Handcrafted and Learnable Audio Representation for Analysis of Speech Under Cognitive and Physical Load
 
conference paper

Hybrid Handcrafted and Learnable Audio Representation for Analysis of Speech Under Cognitive and Physical Load

Elbanna, Gasser  
•
Biryukov, Alice
•
Scheidwasser-Clow, Neil
Show more
January 1, 2022
Interspeech 2022
Interspeech Conference

As a neurophysiological response to threat or adverse conditions, stress can affect cognition, emotion and behaviour with potentially detrimental effects on health in the case of sustained exposure. Since the affective content of speech is inherently modulated by an individual's physical and mental state, a substantial body of research has been devoted to the study of paralinguistic correlates of stress-inducing task load. Historically, voice stress analysis has been conducted using conventional digital signal processing (DSP) techniques. Despite the development of modern methods based on deep neural networks (DNNs), accurately detecting stress in speech remains difficult due to the wide variety of stressors and considerable variability in individual stress perception. To that end, we introduce a set of five datasets for task load detection in speech. The voice recordings were collected as either cognitive or physical stress was induced in the cohort of volunteers, with a cumulative number of more than a hundred speakers. We used the datasets to design and evaluate a novel self-supervised audio representation that leverages the effectiveness of handcrafted features (DSPbased) and the complexity of data-driven DNN representations. Notably, the proposed approach outperformed both extensive handcrafted feature sets and novel DNN-based audio representation learning approaches.

  • Details
  • Metrics
Type
conference paper
DOI
10.21437/Interspeech.2022-10498
Web of Science ID

WOS:000900724500078

Author(s)
Elbanna, Gasser  
Biryukov, Alice
Scheidwasser-Clow, Neil
Orlandic, Lara  
Mainar, Pablo
Kegler, Mikolaj
Beckmann, Pierre
Cernak, Milos
Date Issued

2022-01-01

Publisher

ISCA-INT SPEECH COMMUNICATION ASSOC

Publisher place

Baixas

Published in
Interspeech 2022
Series title/Series vol.

Interspeech

Start page

386

End page

390

Subjects

Acoustics

•

Audiology & Speech-Language Pathology

•

Computer Science, Artificial Intelligence

•

Engineering, Electrical & Electronic

•

Acoustics

•

Audiology & Speech-Language Pathology

•

Computer Science

•

Engineering

•

computational paralinguistics

•

voice stress analysis

•

audio representation learning

•

dsp features

•

deep learning

•

emotion

Editorial or Peer reviewed

REVIEWED

Written at

EPFL

EPFL units
ESL  
Event nameEvent placeEvent date
Interspeech Conference

Incheon, SOUTH KOREA

Sep 18-22, 2022

Available on Infoscience
March 27, 2023
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/196514
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés