Design and analysis of a systolic array for neural computation
Research on artificial neural networks (ANNs) has been carried out for more than five decades. A renewed interest appeared in the 80's with the finding of powerful models like J. Hopfield's recurrent networks, T. Kohonen's self-organizing feature maps, and the back-propagation rule. At that time, there was no platform that was at the same time versatile enough for any ANN model to be implemented and fast enough to solve large problems. Super-computers were the sole exception to this rule, but were prohibitively expensive for most applications. However, both research scientists and application engineers clearly identified the need for such a computing power. This triggered many projects in the field. In parallel, research on multi-processor systems started during the 60's. Systolic arrays have been proposed in 1979 as a means to fully exploit the possibilities of VLSI. Two previous theses, by F. Blayo and C. Lehmann, have studied the use of bi-dimensional systolic arrays for neural computation. At first, the presented system, called GENES, has been designed for the Hopfield model. Extensions to other ANNs have also been proposed. The goal of the present thesis is to study, design, build, and analyze an efficient accelerator for neural computation. In a first step, the GENES architecture has been extended towards generality and efficiency. This includes a thorough analysis of ANN models, of other neural computers, and of previous GENES implementations. The result of this work is the GENES IV integrated circuit, whose architecture has been co-designed by P. Ienne and the author. The main part of this thesis discusses the architecture, the design and the analysis of the MANTRA I machine, a neural computer based on a GENES IV array with up to 40 × 40 processing elements (PEs). The delta rule (and hence the Perceptron and Adaline rules), the back-propagation rule, the Hopfield model, and the Kohonen model can be implemented on this system. Although not a generic system, such a machine may be regarded as a multi-model neural computer. A prototype has been running for a year and is used daily by software designers. Several novel features distinguish the MANTRA I machine from other neural systems. First, it belongs to the few existing neural computers, contrary to the majority of implementations, which are specific to an application or to an algorithm. The machine does not hard-wire any algorithm, but provides the necessary primitives to implement the target models. This is a key feature for research, since several algorithms or versions of an algorithm can be tested on a problem. It is an important aspect for applications as well, because different ANN models are often cascaded to solve a problem. The GENES IV array — that is, the computing core of the MANTRA I machine — features synapse-level parallelism (i.e., one real or virtual PE is allocated per synapse or neural connection), while most other systems exploit only neuron-level parallelism (i.e., one PE per neuron). Hence, this system aims at a much finer parallelism grain and is well suited for massively parallel architectures. The problem size that can be computed by a neural accelerator should not be limited by the hardware (except for memory size). Therefore, it is essential to support time-sharing of PEs. On the MANTRA I machine, this is achieved by the concept of virtual arrays. Matrices are divided into sub-matrices that can be mapped onto the physical array, which is then time-shared among them. An efficient mechanism has been implemented to swap sub-matrices in background, while some other computation is performed. Since systolic arrays are pipelined systems, it is important to avoid emptying and re-filling them too often, in order to keep the hardware utilization rate high. Therefore, a systolic instruction flow has been implemented, so that each instruction follows the data for which it has been issued. Like any SIMD system, the MANTRA I machine is composed of a parallel or SIMD module and a control module. The SIMD module consists of the GENES IV array and a set of dedicated units designed for the computation that scales with the number of neurons and would poorly fit on a bi-dimensional array. A complex system of FIFOs and memories sustains the required input/output streams for the systolic array. The control module is a complete SISD system. Its tasks are (1) to control the SIMD module by dispatching instructions, (2) to manage data input and output, (3) to communicate with the external world, and (4) to perform data pre- and post-processing. The SIMD instructions are of the very long instruction word (VLIW) type. Synchronization between the two modules is achieved by an instruction FIFO. The performance of the MANTRA I machine has been analyzed using the delta rule. Measurements show that the sustained performance is very close to its peak value, as long as the problem fits in the memory banks connected to the GENES IV array. Experiments have also been run to investigate the impact of the constraints imposed by the hardware on the convergence of algorithms. Finally, the use of systolic arrays as neural accelerators is discussed in the light of the experience acquired with the GENES IV array and the MANTRA I machine. The weaknesses of the machine are analyzed, and several solutions are proposed to avoid them in a future design. A general discussion of the future of neural computers concludes this thesis.