Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Reports, Documentation, and Standards
  4. Comparative Study on Sentence Boundary Prediction for German and English Broadcast News
 
report

Comparative Study on Sentence Boundary Prediction for German and English Broadcast News

Wang, Yang
•
Nanchen, Alexandre
•
Lazaridis, Alexandros
Show more
2017

We present a comparative study on sentence boundary prediction for German and English broadcast news that explores generalization across different languages. In the feature extraction stage, word pause duration is firstly extracted from word aligned speech, and forward and backward language models are utilized to extract textual features. Then a gradient boosted machine is optimized by grid search to map these features to punctuation marks. Experimental results confirm that word pause duration is a simple yet effective feature to predict whether there is a sentence boundary after that word. We found that Bayes risk derived from pause duration distributions of sentence boundary words and non-boundary words is an effective measure to assess the inherent difficulty of sentence boundary prediction. The proposed method achieved F-measures of over 90% on reference text and around 90% on ASR transcript for both German broadcast news corpus and English multi-genre broadcast news corpus. This demonstrates the state of the art performance of the proposed method.

  • Files
  • Details
  • Metrics
Loading...
Thumbnail Image
Name

Wang_Idiap-RR-18-2017.pdf

Access type

openaccess

Size

841.74 KB

Format

Adobe PDF

Checksum (MD5)

8290b3e63266752d90bb31b24980b1a6

Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés