Numerical simulation of Pelton turbine hydrodynamics is helpful to identify the energy loss mechanisms in the runner and minimize their effect. However, it is a challenging task that involves handling the unsteady free surface flow and moving boundaries requiring dynamic mesh approach, as well as run-time local grid refinements at the interphase. Unlike the mesh-based methods, the Lagrangian particle-based methods are robust in handling free surface problems with moving boundaries such as Pelton turbine flow. Within the framework of the present research, the 3-D Finite Volume Particle Method (FVPM) has been developed and accelerated on Graphics Processing Unit (GPU). FVPM is locally conservative and consistent, employing an Arbitrary Lagrangian-Eulerian (ALE) approach for particle motion to achieve a reasonably uniform particle distribution. The method is based on spherical-support top-hat kernels in which the particle interaction vectors are computed and used to weigh the conservative flux exchange. To capture the turbulence, the standard and realizable k-¿ as well as k-¿ Shear Stress Transport (SST) turbulence models have been implemented and integrated into ALE based FVPM. The wall function approach has been used for near-wall turbulence computations. The solver is called GPU-SPHEROS and has been implemented from scratch in the CUDA C++ parallel computing platform. All the parallel algorithms and data structures have been designed speci¿cally for the GPU many-core architecture. The roofline analysis method has been utilized to assess the performance of the CUDA kernels and define the appropriate optimization strategies. In particular, the neighbor search algorithm, accounting for almost a third of the overall computation time, features an e¿cient Space-Filling Curve (SFC) as well as an optimized octree construction and traverse procedure. The memory-bound interaction vector computation, accounting for almost two-thirds of the overall computation time, features ¿xed-size memory pre-allocation, and an e¿cient data ordering to reduce memory transactions and avoid the cost of dynamic memory operations. A speedup by a factor of almost six times has been achieved for a single NVIDIA® TeslaTM P100 16GB GPU with GP100 Pascal architecture vs. a dual-setup Broadwell Intel® Xeon® E5-2690 v4 CPU node with 28 total physical cores. Once GPU-SPHEROS validated, it is used for jet interference investigation in a six-jet Pelton turbine as an industrial-size practical application. The numerical simulations have been performed at eight operating points ranging from N / N_BEP = 89% to N / N_BEP = 131%, where N is the runner rotational speed, and BEP is the Best Efficiency Point. It is shown by the numerical results that a significant torque and efficiency drop occurs at high speed factors due to jet interference, whereas large load fluctuations caused by jet disturbance can occur at about any N not equal to N_BEP. Compared to the available experimental data provided by Hitachi-Mitsubishi Hydro Corporation, the torque and efficiency trends, as well as the range of the specific speed in which the jets interfere, are well-predicted, which provides confidence in the use of the GPU-SPHEROS for the design optimization of Pelton turbines. All the multi-jet Pelton turbine computations have been performed on Piz Daint, a GPU-powered supercomputer with 5¿704 GPU nodes, developed and operated by Swiss National Supercomputing Centre - CSCS.