Learnable filter-banks for CNN-based audio applications

Peic Tukuljac, HelenaRicaud, BenjaminAspert, NicolasColbois, Laurent2022-04-072022-04-072022-04-072022-03-2810.7557/18.6279https://infoscience.epfl.ch/handle/20.500.14299/186900We investigate the design of a convolutional layer where kernels are parameterized functions. This layer aims at being the input layer of convolutional neural networks for audio applications or applications involving time-series. The kernels are defined as one-dimensional functions having a band-pass filter shape, with a limited number of trainable parameters. Building on the literature on this topic, we confirm that networks having such an input layer can achieve state-of-the-art accuracy on several audio classification tasks. We explore the effect of different parameters on the network accuracy and learning ability. This approach reduces the number of weights to be trained and enables larger kernel sizes, an advantage for audio applications. Furthermore, the learned filters bring additional interpretability and a better understanding of the audio properties exploited by the network.deep learningfiltergammatoneaudioLearnable filter-banks for CNN-based audio applicationstext::conference output::conference proceedings::conference paper