Cyme: A Library Maximizing SIMD Computation on User-Defined Containers

This paper presents Cyme, a C++ library aiming at abstracting the usage of SIMD instructions while maximizing the usage of the underlying hardware. Unlike similar efforts such as Boost.simd or VC, Cyme provides generic high level containers to the users which hides SIMD complexity. Cyme accomplishes this by 1) optimization of the Abstract Syntax Tree using Expression Templates Programming to prevent temporary copies and maximize the use of Fuse Multiply Add instructions and 2) creating a data layout in memory (AoS or AoSoA), which minimizes data addressing and manipulation throughout all SIMD registers. Implementation of Cyme library has been accomplished on the IBM Blue Gene/Q architecture using the 256 bit SIMD extensions (QPX) of the Power A2 processor. Functionality of the library is demonstrated on a computationally intensive kernel of a neuro-scientific application where an increase of GFlop/s performance by a factor of 6.72 over the original implementation is observed using Clang compiler

Kunkel, Julian Martin
Ludwig, Thomas
Meuer, Hans Werner
Published in:
Supercomputing, 8488, 440-449
Presented at:
29th International Conference, ISC 2014, Leipzig, Germany, June 22-26, 2014
Cham, Springer International Publishing

 Record created 2016-02-09, last modified 2018-09-13

Rate this document:

Rate this document:
(Not yet reviewed)