A Fast Parallel Matrix Multiplication Reconfigurable Unit Utilized In Face Recognitions Systems
In this paper we present a reconfigurable device which significantly improves the execution time of the most computational intensive functions of three of the most widely used face recognition algorithms; those tasks multiply very large dense matrices. The presented architecture utilizes numerous digital signal processing units (DSPs) organized in a parallel manner within a state-of-the-art FPGA device. In order to accelerate those functions we have implemented a "blocked" matrix multiplication algorithm which multiplies certain sub-matrices of fixed-point 32-bit numbers; the size of the sub-matrices has been selected so as to fully exploit the resources of the underlying reconfigurable device. Our system is up to 550 times faster than a conventional general purpose processor when implementing the most CPU intensive parts of a number of very widely used Face Identification Schemes, whereas it is more than 40 times faster than the similar schemes implemented in reconfigurable devices. Moreover, our system is general enough so as to be efficiently utilized in any application incorporating fixed-point matrix multiplications.
- View record in Web of Science
Record created on 2010-11-30, modified on 2016-08-09