Presentation Attack Detection Using Long-Term Spectral Statistics for Trustworthy Speaker Verification
In recent years, there has been a growing interest in developing countermeasures against non zero-effort attacks for speaker verification systems. Until now, the focus has been on logical access attacks, where the spoofed samples are injected into the system through a software-based process. This paper investigates a more realistic type of attack, referred to as physical access or presentation attacks, where the spoofed samples are presented as input to the microphone. To detect such attacks, we propose a binary classifier based approach that uses long-term spectral statistics as feature input. Experimental studies on the AVspoof database, which contains presentation attacks based on replay, speech synthesis and voice conversion, shows that the proposed approach can yield significantly low detection error rate with a linear classifier (half total error rate of 0.038%). Furthermore, an investigation on Interspeech 2015 ASVspoof challenge dataset shows that it is equally capable of detecting logical access attacks.