Evolution of Neural Network Architectures for Speech Recognition

Bourlard, Hervé

conference paper

Bourlard, Hervé

January 1, 2018

19Th Annual Conference of the International Speech Communication Association (Interspeech 2018), Vols 1-6

19th Annual Conference of the International-Speech-Communication-Association (INTERSPEECH 2018)

Over these last few years, the use of Artificial Neural Networks (ANNs), now often referred to as deep learning or Deep Neural Networks (DNNs), has significantly reshaped research and development in a variety of signal and information processing tasks. While further boosting the state-of-the-art in Automatic Speech Recognition (ASR), recent progresses in the field have also allowed for more flexible and faster developments in emerging markets and multilingual societies (e.g., under-resourced languages). In this talk, we will provide a historical account of ANN architectures used for ASR since the mid-1980's, and now used in most ASR and spoken language understanding applications. We will start by recalling/revisiting key links between ANNs and statistical inference, discriminant analysis, and linear/nonlinear algebra. Finally, we will briefly discuss more recent trends towards novel DNN-based ASR approaches, including complex hierarchical systems, sparse recovery modeling, and "end-to-end systems." However, and in spite of the recent progress in the area, we still lack basic understanding of the problems in hands. Although more and more tools are now available, in association with basically "unlimited" processing and data resources, we still fail in building principled ASR models and theories. Alternatively, we are still relying on "ignorance-based" models, often exposing limitations of our understanding, rather than enriching the field of ASR. Discussion of these limitations will underpin all of our overview.