Parendi: Thousand-Way Parallel RTL Simulation
Hardware development critically depends on cycle-accurate RTL simulation. However, as chip complexity increases, conventional single-threaded simulation becomes impractical due to stagnant single-core performance. Parendi is an RTL simulator that addresses this challenge by exploiting the abundant fine-grained parallelism inherent in RTL simulation and efficiently mapping it onto the massively parallel Graphcore IPU (Intelligence Processing Unit) architecture. Parendi scales up to 5888 cores on 4 Graphcore IPU sockets. It allows us to run large RTL designs up to 4× faster than the most powerful state-of-the-art x64 multicore systems. To achieve this performance, we developed new partitioning and compilation techniques and carefully quantified the synchronization, communication, and computation costs of parallel RTL simulation: The paper comprehensively analyzes these factors and details the strategies that Parendi uses to optimize them.
3676641.3716010.pdf
Main Document
Published version
openaccess
CC BY
1.54 MB
Adobe PDF
4151d3b447b5964b22fc6873fb8cbfd9