Building on Efficient Foundations: Effectively Training LLMs with Structured Feedforward Layers

Wei, Xiuying; Moalla, Skander; Pascanu, Razvan; Gulcehre, Caglar

conference paper

Wei, Xiuying

•

Moalla, Skander

•

Pascanu, Razvan

December 2, 2024

Advances in Neural Information Processing Systems 37 (NeurIPS 2024)

38th Annual Conference on Neural Information Processing Systems

State-of-the-art results in large language models (LLMs) often rely on scale, which becomes computationally expensive. This has sparked a research agenda to reduce these models' parameter counts and computational costs without significantly impacting their performance. Our study focuses on transformer-based LLMs, specifically targeting the computationally intensive feedforward networks (FFNs), which are less studied than attention blocks. We consider three structured linear parameterizations of the FFN using efficient low-rank and block-diagonal matrices. In contrast to many previous works that examined these approximations, our study i) explores these structures from a training-from-scratch perspective, ii) scales up to 1.3B parameters, and iii) is conducted within recent Transformer-based LLMs rather than convolutional architectures. We demonstrate that these structures can lead to actual computational gains in various scenarios, including online decoding when using a pre-merge technique. Additionally, we propose a novel training regime, called self-guided training, aimed at improving the poor training dynamics that these approximations exhibit when used from initialization. Interestingly, the scaling performance of structured matrices is explored, revealing steeper curves in scaling training FLOPs, along with a favorable scaling trend in the overtraining regime. Specifically, we show that wide and structured networks can utilize training FLOPs more efficiently, with fewer parameters and lower loss than dense models at their optimal trade-off. Our code is available at https://github.com/CLAIRE-Labo/StructuredFFN/tree/main.

Name

neurips_2024.pdf

Type

Main Document

Version

Published version

Access type

openaccess

License Condition

N/A

Size

5.23 MB

Format

Adobe PDF

Checksum (MD5)

98e4bad0b4ecdf98eb699bfc64d809aa