On the Use of Convolutional Neural Networks for Speech Presentation Attack Detection

Research in the area of automatic speaker verification (ASV) has advanced enough for the industry to start using ASV systems in practical applications. However, these systems are highly vulnerable to spoofing or presentation attacks (PAs), limiting their wide deployment. Several speech-based presentation attack detection (PAD) methods have been proposed recently but most of them are based on hand crafted frequency or phase-based features. Although convolutional neural networks (CNN) have already shown breakthrough results in face recognition, little is understood whether CNNs are as effective in detecting presentation attacks in speech. In this paper, to investigate the applicability of CNNs for PAD, we consider shallow and deep examples of CNN architectures implemented using Tensorflow and compare their performances with the state of the art MFCC with GMM-based system on two large databases with presentation attacks: publicly available voicePA and proprietary BioCPqD-PA. We study the impact of increasing the depth of CNNs on the performance, and note how they perform on unknown attacks, by using one database to train and another to evaluate. The results demonstrate that CNNs are able to learn a database significantly better (increasing depth also improves the performance), compared to hand crafted features. However, CNN-based PADs still lack the ability to generalize across databases and are unable to detect unknown attacks well.


Presented at:
International Conference on Identity, Security and Behavior Analysis
Year:
2018
Laboratories:




 Record created 2017-12-19, last modified 2019-12-05

n/a:
Download fulltext
PDF

Rate this document:

Rate this document:
1
2
3
 
(Not yet reviewed)