Adaptive Importance Sampling to Accelerate Training of a Neural Probabilistic Language Model
Previous work on statistical language modeling has shown that it is possible to train a feed-forward neural network to approximate probabilities over sequences of words, resulting in significant error reduction when compared to standard baseline models. However, in order to train the model on the maximum likelihood criterion, one has to make, for each example, as many network passes as there are words in the vocabulary. We introduce adaptive importance sampling as a way to accelerate training of the model. We show that a very significant speed-up can be obtained on standard problems.
rr-03-35.pdf
openaccess
219.17 KB
Adobe PDF
71f6988ccbd2ad748715282ded9a15d7