Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Conferences, Workshops, Symposiums, and Seminars
  4. Emotion information recovery potential of wav2vec2 network fine-tuned for speech recognition task
 
conference paper

Emotion information recovery potential of wav2vec2 network fine-tuned for speech recognition task

Purohit, Tilak  
•
Magimai-Doss, Mathew
2025
ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
IEEE International Conference on Acoustics, Speech, and Signal Processing

Fine-tuning has become a norm to achieve state-of-the-art performance when employing pre-trained networks like foundation models. These models are typically pre-trained on large-scale unannotated data using self-supervised learning (SSL) methods. The SSL-based pre-training on large-scale data enables the network to learn the inherent structure/properties of the data, providing it with capabilities in generalization and knowledge transfer for various downstream tasks. However, when fine-tuned for a specific task, these models become task-specific. Finetuning may cause distortions in the patterns learned by the network during pre-training. In this work, we investigate these distortions by analyzing the network's information recovery capabilities by designing a study where speech emotion recognition is the target task and automatic speech recognition is an intermediary task. We show that the network recovers the task-specific information but with a shift in the decisions also through attention analysis, we demonstrate some layers do not recover the information fully.

  • Details
  • Metrics
Type
conference paper
DOI
10.1109/ICASSP49660.2025.10890800
Scopus ID

2-s2.0-105009695309

Author(s)
Purohit, Tilak  

École Polytechnique Fédérale de Lausanne

Magimai-Doss, Mathew

Institut Dalle Molle D'intelligence Artificielle Perceptive

Date Issued

2025

Publisher

Institute of Electrical and Electronics Engineers Inc.

Published in
ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
ISBN of the book

979-8-3503-6874-1

Subjects

ASR

•

Domain adaptation

•

Finetuning

•

Foundation Models

•

Speech Emotion Recognition

•

wav2vec2.0

Editorial or Peer reviewed

REVIEWED

Written at

EPFL

EPFL units
LIDIAP  
Event nameEvent acronymEvent placeEvent date
IEEE International Conference on Acoustics, Speech, and Signal Processing

ICASSP 2025

Hyderabad, India

2025-04-06 - 2025-04-11

FunderFunding(s)Grant NumberGrant URL

SNSF

40B2-0_194794

Available on Infoscience
July 14, 2025
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/252248
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés