Adaptive Importance Sampling to Accelerate Training of a Neural Probabilistic Language Model

Bengio, Yoshua; Senécal, Jean-Sébastien

report

Bengio, Yoshua

•

Senécal, Jean-Sébastien

2003

Previous work on statistical language modeling has shown that it is possible to train a feed-forward neural network to approximate probabilities over sequences of words, resulting in significant error reduction when compared to standard baseline models. However, in order to train the model on the maximum likelihood criterion, one has to make, for each example, as many network passes as there are words in the vocabulary. We introduce adaptive importance sampling as a way to accelerate training of the model. We show that a very significant speed-up can be obtained on standard problems.

Name

rr-03-35.pdf

Access type

openaccess

Size

219.17 KB

Format

Adobe PDF

Checksum (MD5)

71f6988ccbd2ad748715282ded9a15d7