Files

Abstract

Pre-training complex language models is essential for the success of the recent methods such as BERT or OpenAI GPT. Their size makes not only the pre-training phase, but also consecutive applications to be computationally expensive. BERT-like models excel at token-level tasks as they provide reliable token embeddings, but they fall short when it comes to sentence or higher-level structure embeddings. The reason is that these models do not have a built-in mechanism that explicitly provides such representations. We introduce Light and Multigranural BERT that has similar complexity to BERT in the number of parameters, but is about 3 times faster by modifying the input representation, which consequently introduces changes to the attention mechanism and at the same time produces reliable segment embeddings as it is one of our training objectives. The model we publish achieves 70.7% on the MNLI task, which is promising bearing in mind there were two major issues with it.

Details

PDF