High-Performance Modular Multiplication on the Cell Processor
This paper presents software implementation speed records for modular multiplication arithmetic on the synergistic processing elements of the Cell broadband engine (Cell) architecture. The focus is on moduli which are of special interest in elliptic curve cryptography, that is, moduli of bit-lengths ranging from 192- to 521-bit. Finite field arithmetic using primes which allow particularly fast reduction is compared to Montgomery multiplication. The special primes considered are the five recommended NIST primes, as specified in the FIPS 186-3 standard, and the prime used in the elliptic curve curve25519. While presented and benchmarked on the Cell architecture, the proposed techniques to efficiently implement the modular multiplication algorithms are suited to run on any architecture which is able to compute multiple computations concurrently; e.g. graphics processing units.