Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Conferences, Workshops, Symposiums, and Seminars
  4. Word-level Embeddings for Cross-Task Transfer Learning in Speech Processing
 
conference paper

Word-level Embeddings for Cross-Task Transfer Learning in Speech Processing

Beckmann, Pierre  
•
Kegler, Mikolaj
•
Cernak, Milos
January 1, 2021
29Th European Signal Processing Conference (Eusipco 2021)
29th European Signal Processing Conference (EUSIPCO)

Recent breakthroughs in deep learning often rely on representation learning and knowledge transfer. In recent years, unsupervised and self-supervised techniques for learning speech representation were developed to foster automatic speech recognition. Up to date, most of these approaches are task-specific and designed for within-task transfer learning between different datasets or setups of a particular task. In turn, learning task-independent representation of speech and cross-task applications of transfer learning remain less common. Here, we introduce an encoder capturing word-level representations of speech for cross-task transfer learning. We demonstrate the application of the pre-trained encoder in four distinct speech and audio processing tasks: (i) speech enhancement, (ii) language identification, (iii) speech, noise, and music classification, and (iv) speaker identification. In each task, we compare the performance of our cross-task transfer learning approach to task-specific baselines. Our results show that the speech representation captured by the encoder through the pre-training is transferable across distinct speech processing tasks and datasets. Notably, even simple applications of our pre-trained encoder outperformed task-specific methods, or were comparable, depending on the task.

  • Details
  • Metrics
Type
conference paper
DOI
10.23919/EUSIPCO54536.2021.9616254
Web of Science ID

WOS:000764066600090

Author(s)
Beckmann, Pierre  
•
Kegler, Mikolaj
•
Cernak, Milos
Date Issued

2021-01-01

Publisher

EUROPEAN ASSOC SIGNAL SPEECH & IMAGE PROCESSING-EURASIP

Publisher place

Kessariani

Published in
29Th European Signal Processing Conference (Eusipco 2021)
ISBN of the book

978-9-0827-9706-0

Series title/Series vol.

European Signal Processing Conference

Start page

446

End page

450

Subjects

Acoustics

•

Computer Science, Software Engineering

•

Engineering, Electrical & Electronic

•

Imaging Science & Photographic Technology

•

Telecommunications

•

Computer Science

•

Engineering

•

speech processing

•

deep learning

•

transfer learning

•

feature extraction

Editorial or Peer reviewed

REVIEWED

Written at

EPFL

EPFL units
LSIR  
Event nameEvent placeEvent date
29th European Signal Processing Conference (EUSIPCO)

ELECTR NETWORK

Aug 23-27, 2021

Available on Infoscience
April 25, 2022
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/187336
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés