Multilingual Bottleneck Features For Query By Example Spoken Term Detection

Ram, DhananjayMiculicich, LeslyBourlard, Herve2020-06-282020-06-282020-06-282019-01-0110.1109/ASRU46091.2019.9003752https://infoscience.epfl.ch/handle/20.500.14299/169664WOS:000539883100083State of the art solutions to query by example spoken term detection (QbE-STD) rely on bottleneck feature representation of the query and audio document. Here, we present a study on QbE-STD performance using several monolingual as well as multilingual bottleneck features extracted from feed forward networks. In contrast to previous works, we use multitask learning to train the multilingual networks which perform significantly better than the concatenated monolingual features. Additionally, we propose to employ residual networks (ResNet) to estimate the bottleneck features and show significant improvements over the corresponding feed forward network based features. The neural networks are trained on GlobalPhone corpus and QbE-STD experiments are performed on a very challenging QUESST 2014 database.Computer Science, Artificial IntelligenceComputer Sciencemultilingual featurebottleneck featureresidual networkmultitask learningquery by example spoken term detectionneural-networksMultilingual Bottleneck Features For Query By Example Spoken Term Detectiontext::conference output::conference proceedings::conference paper