With the aim of enabling state-of-the-art gyrokinetic PIC codes to benefit from the performance of recent multithreaded devices, we developed an application from a platform called the "PIC-engine" [1, 2, 3] embedding simplified basic features of the PIC method. The application solves the gyrokinetic equations in a sheared plasma slab using B-spline finite elements up to fourth order to represent the self-consistent electrostatic field. Preliminary studies of the so-called Particle-In-Fourier (PIF) approach, which uses Fourier modes as basis functions in the periodic dimensions of the system instead of the real-space grid, show that this method can be faster than PIC for simulations with a small number of Fourier modes. Similarly to the PIC-engine, multiple levels of parallelism have been implemented using MPI+OpenMP [2] and MPI+OpenACC [1], the latter exploiting the computational power of GPUs without requiring complete code rewriting. It is shown that sorting particles [3] can lead to performance improvement by increasing data locality and vectorizing grid memory access. Weak scalability tests have been successfully run on the GPU-equipped Cray XC30 Piz Daint (at CSCS) up to 4,096 nodes. The reduced time-to-solution will enable more realistic and thus more computationally intensive simulations of turbulent transport in magnetic fusion devices.