Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Journal articles
  4. Neural Network Based End-to-End Query by Example Spoken Term Detection
 
research article

Neural Network Based End-to-End Query by Example Spoken Term Detection

Ram, Dhananjay
•
Miculicich, Lesly
•
Bourlard, Herve  
January 1, 2020
Ieee-Acm Transactions On Audio Speech And Language Processing

This article focuses on the problem of query by example spoken term detection (QbE-STD) in zero-resource scenario. State-of-the-art approaches primarily rely on dynamic time warping (DTW) based template matching techniques using phone posterior or bottleneck features extracted from a deep neural network (DNN). We use both monolingual and multilingual bottleneck features, and show that multilingual features perform increasingly better with more training languages. Previously, it has been shown that the DTW based matching can be replaced with a CNN based matching while using posterior features. Here, we show that the CNN based matching outperforms DTW based matching using bottleneck features as well. In this case, the feature extraction and pattern matching stages of our QbE-STD system are optimized independently of each other. We propose to integrate these two stages in a fully neural network based end-to-end learning framework to enable joint optimization of those two stages simultaneously. The proposed approaches are evaluated on two challenging multilingual datasets: Spoken Web Search 2013 and Query by Example Search on Speech Task 2014, demonstrating in each case significant improvements.

  • Details
  • Metrics
Type
research article
DOI
10.1109/TASLP.2020.2988788
Web of Science ID

WOS:000538077700005

Author(s)
Ram, Dhananjay
Miculicich, Lesly
Bourlard, Herve  
Date Issued

2020-01-01

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

Published in
Ieee-Acm Transactions On Audio Speech And Language Processing
Volume

28

Start page

1416

End page

1427

Subjects

Acoustics

•

Engineering, Electrical & Electronic

•

Acoustics

•

Engineering

•

feature extraction

•

task analysis

•

neural networks

•

training

•

hidden markov models

•

neurons

•

speech processing

•

spoken term detection

•

query by example

•

deep neural network

•

bottleneck features

•

end-to-end

•

subsequence detection

•

features

Editorial or Peer reviewed

REVIEWED

Written at

EPFL

EPFL units
LIDIAP  
Available on Infoscience
June 20, 2020
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/169485
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés