Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Journal articles
  4. Aural and automatic forensic speaker recognition in mismatched conditions
 
research article

Aural and automatic forensic speaker recognition in mismatched conditions

Alexander, A
•
Dessimoz, D
•
Botti, F
Show more
2005
The International Journal of Speech, Language and the Law

In this article, we compare aural and automatic speaker recognition in the context of forensic analyses, using a Bayesian framework for the interpretation of evidence. We use perceptual tests performed by non-experts and compare their performance with that of an automatic speaker recognition system. These experiments are performed with 90 phonetically untrained subjects. Several forensic cases were simulated, using the Polyphone IPSC-02 database, varying in linguistic content and technical conditions of recording. We estimate the strength of evidence for both humans and the baseline automatic system, calculating likelihood ratios using perceptual scores for humans and log-likelihood scores for the automatic system. A methodology analogous to the Bayesian interpretation in forensic automatic speaker recognition is applied to the perceptual scores given by humans in order to estimate the strength of evidence. The degradation of the accuracy of human recognition in mismatched recording conditions is contrasted with that of the automatic system under similar recording conditions. The conditions considered are fixed telephone, cellular telephone and noisy speech in forensically realistic conditions. The perceptual cues that the human subjects use to perceive differences in voices are studied, along with their importance in different recording conditions. We observe that while automatic speaker recognition shows higher accuracy in matched conditions of training and testing, its performance degrades significantly in mismatched conditions. Aural recognition accuracy is also observed to degrade from matched conditions to mismatched conditions and in mismatched conditions, the baseline automatic systems showed comparable or slightly degraded performance compared to the aural recognition systems. The baseline automatic system with adaptation to noisy conditions showed comparable or better performance than aural recognition. The higher level perceptual cues used by human listeners in order to recognise speakers are discussed. We also discuss the possibility of increasing the accuracy of automatic systems using the perceptual cues that remain robust to mismatched recording conditions.

  • Details
  • Metrics
Type
research article
DOI
10.1558/sll.2005.12.2.214
Web of Science ID

WOS:000249491700003

Author(s)
Alexander, A
Dessimoz, D
Botti, F
Drygajlo, A  
Date Issued

2005

Published in
The International Journal of Speech, Language and the Law
Volume

12.2

Start page

214

End page

234

Subjects

Aural speaker recognition

•

Automatic speaker recognition

•

Strength of evidence

•

Mismatched recording conditions

Editorial or Peer reviewed

REVIEWED

Written at

EPFL

EPFL units
LIDIAP  
Available on Infoscience
October 20, 2009
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/43784
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés