A Fast Parallel Matrix Multiplication Reconfigurable Unit Utilized In Face Recognitions Systems

In this paper we present a reconfigurable device which significantly improves the execution time of the most computational intensive functions of three of the most widely used face recognition algorithms; those tasks multiply very large dense matrices. The presented architecture utilizes numerous digital signal processing units (DSPs) organized in a parallel manner within a state-of-the-art FPGA device. In order to accelerate those functions we have implemented a "blocked" matrix multiplication algorithm which multiplies certain sub-matrices of fixed-point 32-bit numbers; the size of the sub-matrices has been selected so as to fully exploit the resources of the underlying reconfigurable device. Our system is up to 550 times faster than a conventional general purpose processor when implementing the most CPU intensive parts of a number of very widely used Face Identification Schemes, whereas it is more than 40 times faster than the similar schemes implemented in reconfigurable devices. Moreover, our system is general enough so as to be efficiently utilized in any application incorporating fixed-point matrix multiplications.

Published in:
Fpl: 2009 International Conference On Field Programmable Logic And Applications, 276-281
Presented at:
International Conference on Field Programmable Logic and Applications, Prague, CZECH REPUBLIC, Aug 31-Sep 02, 2009
Ieee Service Center, 445 Hoes Lane, Po Box 1331, Piscataway, Nj 08855-1331 Usa

 Record created 2010-11-30, last modified 2018-03-17

Rate this document:

Rate this document:
(Not yet reviewed)