Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Conferences, Workshops, Symposiums, and Seminars
  4. Exploiting un-transcribed foreign data for speech recognition in well-resourced languages
 
conference paper

Exploiting un-transcribed foreign data for speech recognition in well-resourced languages

Imseng, David  
•
Potard, Blaise
•
Motlicek, Petr
Show more
2014
2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
IEEE International Conference on Acoustics, Speech and Signal Processing

Manual transcription of audio databases for automatic speech recognition (ASR) training is a costly and time-consuming process. State-of-the-art hybrid ASR systems that are based on deep neural networks (DNN) can exploit un-transcribed foreign data during unsupervised DNN pre-training or semi-supervised DNN training. We investigate the relevance of foreign data characteristics, in particular domain and language. Using three different datasets of the MediaParl and Ester databases, our experiments suggest that domain and language are equally important. Foreign data recorded under matched conditions (language and domain) yields the most improvement. The resulting ASR system yields about 5% relative improvement compared to the baseline system only trained on transcribed data. Our studies also reveal that the amount of foreign data used for semi-supervised training can be significantly reduced without degrading the ASR performance if confidence measure based data selection is employed.

  • Files
  • Details
  • Metrics
Loading...
Thumbnail Image
Name

Imseng_ICASSP_2014.pdf

Access type

openaccess

Size

106.69 KB

Format

Adobe PDF

Checksum (MD5)

006d157cc0d4c7076c7ba62a71bffaac

Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés