Aural and automatic forensic speaker recognition in mismatched conditions

Drygajlo, A

doi:10.1558/sll.2005.12.2.214

research article

Aural and automatic forensic speaker recognition in mismatched conditions

Alexander, A

•

Dessimoz, D

•

Botti, F

2005

The International Journal of Speech, Language and the Law

In this article, we compare aural and automatic speaker recognition in the context of forensic analyses, using a Bayesian framework for the interpretation of evidence. We use perceptual tests performed by non-experts and compare their performance with that of an automatic speaker recognition system. These experiments are performed with 90 phonetically untrained subjects. Several forensic cases were simulated, using the Polyphone IPSC-02 database, varying in linguistic content and technical conditions of recording. We estimate the strength of evidence for both humans and the baseline automatic system, calculating likelihood ratios using perceptual scores for humans and log-likelihood scores for the automatic system. A methodology analogous to the Bayesian interpretation in forensic automatic speaker recognition is applied to the perceptual scores given by humans in order to estimate the strength of evidence. The degradation of the accuracy of human recognition in mismatched recording conditions is contrasted with that of the automatic system under similar recording conditions. The conditions considered are fixed telephone, cellular telephone and noisy speech in forensically realistic conditions. The perceptual cues that the human subjects use to perceive differences in voices are studied, along with their importance in different recording conditions. We observe that while automatic speaker recognition shows higher accuracy in matched conditions of training and testing, its performance degrades significantly in mismatched conditions. Aural recognition accuracy is also observed to degrade from matched conditions to mismatched conditions and in mismatched conditions, the baseline automatic systems showed comparable or slightly degraded performance compared to the aural recognition systems. The baseline automatic system with adaptation to noisy conditions showed comparable or better performance than aural recognition. The higher level perceptual cues used by human listeners in order to recognise speakers are discussed. We also discuss the possibility of increasing the accuracy of automatic systems using the perceptual cues that remain robust to mismatched recording conditions.

Type

research article

DOI

10.1558/sll.2005.12.2.214

Web of Science ID

WOS:000249491700003

Author(s)

Alexander, A

Dessimoz, D

Botti, F

Drygajlo, A

Date Issued

2005

Published in

The International Journal of Speech, Language and the Law

Volume

12.2

Start page

214

End page

234

Subjects

Aural speaker recognition

•

Automatic speaker recognition

•

Strength of evidence

•

Mismatched recording conditions

Editorial or Peer reviewed

REVIEWED

Written at

EPFL

EPFL units

LIDIAP

Available on Infoscience

October 20, 2009

Use this identifier to reference this record

https://infoscience.epfl.ch/handle/20.500.14299/43784