Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Conferences, Workshops, Symposiums, and Seminars
  4. CNN based Query by Example Spoken Term Detection
 
conference paper

CNN based Query by Example Spoken Term Detection

Ram, Dhananjay
•
Miculicich, Lesly
•
Bourlard, Hervé
2018
19Th Annual Conference Of The International Speech Communication Association (Interspeech 2018), Vols 1-6
Proceedings of Interspeech

In this work, we address the problem of query by example spoken term detection (QbE-STD) in zero-resource scenario. State of the art solutions usually rely on dynamic time warping (DTW) based template matching. In contrast, we propose here to tackle the problem as binary classification of images. Similar to the DTW approach, we rely on deep neural network (DNN) based posterior probabilities as feature vectors. The posteriors from a spoken query and a test utterance are used to compute frame-level similarities in a matrix form. This matrix contains somewhere a quasi-diagonal pattern if the query occurs in the test utterance. We propose to use this matrix as an image and train a convolutional neural network (CNN) for identifying the pattern and make a decision about the occurrence of the query. This language independent system is evaluated on SWS 2013 and is shown to give 10% relative improvement over a highly competitive baseline system based on DTW. Experiments on QUESST 2014 database gives similar improvements showing that the approach generalizes to other databases as well.

  • Details
  • Metrics
Type
conference paper
DOI
10.21437/Interspeech.2018-1722
Web of Science ID

WOS:000465363900019

Author(s)
Ram, Dhananjay
Miculicich, Lesly
Bourlard, Hervé
Date Issued

2018

Publisher

ISCA-INT SPEECH COMMUNICATION ASSOC

Publisher place

Baixas

Published in
19Th Annual Conference Of The International Speech Communication Association (Interspeech 2018), Vols 1-6
ISBN of the book

978-1-5108-7221-9

Series title/Series vol.

Interspeech

Start page

92

End page

96

Subjects

deep neural network

•

posterior probabilities

•

convolutional neural network

•

query by example

•

spoken term detection

•

cnn

•

dtw

•

qbe

•

std

URL

Related documents

http://publications.idiap.ch/downloads/papers/2018/Ram_CNNBASEDQUERYBYEXAMPLESPOKENTERMDETECTION_2018.pdf
Editorial or Peer reviewed

REVIEWED

Written at

EPFL

EPFL units
LIDIAP  
Event nameEvent placeEvent date
Proceedings of Interspeech

Hyderabad, INDIA

Aug 02-Sep 06, 2018

Available on Infoscience
July 26, 2018
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/147551
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés