Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Conferences, Workshops, Symposiums, and Seminars
  4. Denoising and Raw-waveform Networks for Weakly-Supervised Gender Identification on Noisy Speech
 
conference paper

Denoising and Raw-waveform Networks for Weakly-Supervised Gender Identification on Noisy Speech

Sebastian, Jilt
•
Kumar, Manoj
•
Kumar, D. S. Pavan
Show more
January 1, 2018
19Th Annual Conference Of The International Speech Communication Association (Interspeech 2018), Vols 1-6
19th Annual Conference of the International-Speech-Communication-Association (INTERSPEECH 2018)

This paper presents a raw-waveform neural network and uses it along with a denoising network for clustering in weakly supervised learning scenarios under extreme noise conditions. Specifically, we consider language independent Automatic Gender Recognition (AGR) on a set of varied noise conditions and Signal to Noise Ratios (SNRs). We formulate the denoising problem as a source separation task and train the system using a discriminative criterion in order to enhance output SNRs. A denoising Recurrent Neural Network (RNN) is first trained on a small subset (roughly one-fifth) of the data for learning a speech specific mask. The denoised speech signal is then directly fed as input to a raw-waveform convolutional neural network (CNN) trained with denoised speech. We evaluate the standalone performance of denoiser in terms of various signal-to-noise measures and discuss its contribution towards robust AGR. An absolute improvement of 11.06% and 13.33% is achieved by the combined pipeline over the i-vector SVM baseline system for 0 dB and -5 dB SNR conditions, respectively. We further analyse the information captured by the first CNN layer in both noisy and denoised speech.

  • Details
  • Metrics
Type
conference paper
DOI
10.21437/Interspeech.2018-2321
Web of Science ID

WOS:000465363900062

Author(s)
Sebastian, Jilt
Kumar, Manoj
Kumar, D. S. Pavan
Magimai-Doss, Mathew  
Murthy, Hema A.
Narayanan, Shrikanth
Date Issued

2018-01-01

Publisher

ISCA-INT SPEECH COMMUNICATION ASSOC

Publisher place

Baixas

Published in
19Th Annual Conference Of The International Speech Communication Association (Interspeech 2018), Vols 1-6
ISBN of the book

978-1-5108-7221-9

Series title/Series vol.

Interspeech

Start page

292

End page

296

Subjects

Computer Science, Artificial Intelligence

•

Computer Science, Theory & Methods

•

Engineering, Electrical & Electronic

•

Computer Science

•

Engineering

•

speech enhancement

•

automatic gender recognition

•

convolutional neural network

•

recurrent neural network

Editorial or Peer reviewed

REVIEWED

Written at

EPFL

EPFL units
LIDIAP  
Event nameEvent placeEvent date
19th Annual Conference of the International-Speech-Communication-Association (INTERSPEECH 2018)

Hyderabad, INDIA

Aug 02-Sep 06, 2018

Available on Infoscience
June 18, 2019
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/157713
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés