Denoising and Raw-waveform Networks for Weakly-Supervised Gender Identification on Noisy Speech

Sebastian, JiltKumar, ManojKumar, D. S. PavanMagimai-Doss, MathewMurthy, Hema A.Narayanan, Shrikanth2019-06-182019-06-182019-06-182018-01-0110.21437/Interspeech.2018-2321https://infoscience.epfl.ch/handle/20.500.14299/157713WOS:000465363900062This paper presents a raw-waveform neural network and uses it along with a denoising network for clustering in weakly supervised learning scenarios under extreme noise conditions. Specifically, we consider language independent Automatic Gender Recognition (AGR) on a set of varied noise conditions and Signal to Noise Ratios (SNRs). We formulate the denoising problem as a source separation task and train the system using a discriminative criterion in order to enhance output SNRs. A denoising Recurrent Neural Network (RNN) is first trained on a small subset (roughly one-fifth) of the data for learning a speech specific mask. The denoised speech signal is then directly fed as input to a raw-waveform convolutional neural network (CNN) trained with denoised speech. We evaluate the standalone performance of denoiser in terms of various signal-to-noise measures and discuss its contribution towards robust AGR. An absolute improvement of 11.06% and 13.33% is achieved by the combined pipeline over the i-vector SVM baseline system for 0 dB and -5 dB SNR conditions, respectively. We further analyse the information captured by the first CNN layer in both noisy and denoised speech.Computer Science, Artificial IntelligenceComputer Science, Theory & MethodsEngineering, Electrical & ElectronicComputer ScienceEngineeringspeech enhancementautomatic gender recognitionconvolutional neural networkrecurrent neural networkDenoising and Raw-waveform Networks for Weakly-Supervised Gender Identification on Noisy Speechtext::conference output::conference proceedings::conference paper