With the aim of enabling state-of-the-art gyrokinetic PIC codes to benefit from the performance of recent multithreaded devices, we developed an application from a platform called the "PIC-engine" [1, 2, 3] embedding simplified basic features of the PIC method. The application solves the gyrokinetic equations in a sheared plasma slab using B-spline finite elements up to fourth order to represent the self-consistent electrostatic field. Preliminary studies of the so-called Particle-In-Fourier (PIF) approach, which uses Fourier modes as basis functions in the periodic dimensions of the system instead of the real-space grid, show that this method can be faster than PIC for simulations with a small number of Fourier modes. Similarly to the PIC-engine, multiple levels of parallelism have been implemented using MPI+OpenMP  and MPI+OpenACC , the latter exploiting the computational power of GPUs without requiring complete code rewriting. It is shown that sorting particles  can lead to performance improvement by increasing data locality and vectorizing grid memory access. Weak scalability tests have been successfully run on the GPU-equipped Cray XC30 Piz Daint (at CSCS) up to 4,096 nodes. The reduced time-to-solution will enable more realistic and thus more computationally intensive simulations of turbulent transport in magnetic fusion devices.