Coherent multi-bunch interactions can cause severe impacts on the beams in circular colliders. To understand the dynamics of such interactions, the accelerator physics community relies on high-performance tracking codes. Ensuring causality produces a severe bottleneck in simulations including both intra-beam and inter-beam interactions between the bunches in the beams. COMBI was developed to study such interactions. Its parallel algorithm greatly limits its efficiency if the number of bunches outnumbers the number of separate calculations per turn, or when the calculations vary in computational complexity. A new parallel algorithm, COMBIp, addresses the identified challenges with improved partitioning of the calculations and asynchronous communication between the bunches. The unavoidable bottleneck is now a limitation on the number of compute nodes that can be applied efficiently, instead of a limitation on the physics that can be simulated efficiently. The modifications have led to a great speedup from the old parallel algorithm, up to the number of bunches per beam. (C) 2019 The Author(s). Published by Elsevier B.V.