Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Reports, Documentation, and Standards
  4. Comparison of Subword Segmentation Methods for Open-vocabulary ASR using a Difficulty Metric
 
report

Comparison of Subword Segmentation Methods for Open-vocabulary ASR using a Difficulty Metric

Khosravani, Abbas
•
Musat, Claudiu
•
Garner, Philip N.
Show more
2020

We experiment with subword segmentation approaches that are widely used to address the open vocabulary problem in the context of end-to-end automatic speech recognition (ASR). For morphologically rich languages such as German which has many rare words mainly due to compound words, there is an increasing interest in subword-level word representation based on, e.g., byte-pair encoding and unigram language model. However, we are not aware of any systematic comparative analysis of different approaches. To this end, we propose a framework which estimates a difficulty score of a test utterance for the ASR model based on an out-of-vocabulary metric. Using this framework we run experiments on several subword segmentation approaches, which provides us with comparative analysis on the strengths and weaknesses of them. For the ASR model, we employ a fully convolutional sequence-to-sequence encoder architecture using time-depth separable convolution blocks and a lexicon-free beam search decoding with n-grams subword language model. Additionally, we leverage multiple models with different word representations to investigate their impact on ASR performance

  • Details
  • Metrics
Type
report
Author(s)
Khosravani, Abbas
Musat, Claudiu
Garner, Philip N.
Lazaridis, Alexandros
Date Issued

2020

Subjects

German language

•

end-to-end

•

open vocabulary

•

speech recognition

•

subword segmentation

URL

Link to IDIAP database

http://publications.idiap.ch/downloads/papers/2020/Khosravani_INTERSPEECH_2020.pdf
Written at

EPFL

EPFL units
LIDIAP  
Available on Infoscience
July 23, 2020
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/170328
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés