Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Conferences, Workshops, Symposiums, and Seminars
  4. Manticore: Hardware-Accelerated RTL Simulation with Static Bulk-Synchronous Parallelism
 
Loading...
Thumbnail Image
conference paper

Manticore: Hardware-Accelerated RTL Simulation with Static Bulk-Synchronous Parallelism

Emami, Mahyar  
•
Kashani, Sahand  
•
Kamahori, Keisuke
Show more
Aamodt, TM
•
Jerger, NE
Show more
January 1, 2023
Proceedings Of The 28Th Acm International Conference On Architectural Support For Programming Languages And Operating Systems, Asplos 2023, Vol 4
28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS)

The demise of Moore's Law and Dennard Scaling has revived interest in specialized computer architectures and accelerators. Verification and testing of this hardware depend heavily upon cycle-accurate simulation of register-transfer-level (RTL) designs. The fastest software RTL simulators can simulate designs at 1-1000 kHz, i.e., more than three orders of magnitude slower than hardware. Improved simulators can increase designers' productivity by speeding design iterations and permitting more exhaustive exploration. One possibility is to exploit low-level parallelism, as RTL expresses considerable fine-grain concurrency. Unfortunately, state-of-the-art RTL simulators often perform best on a single core since modern processors cannot effectively exploit fine-grain parallelism. This work presents Manticore: a parallel computer designed to accelerate RTL simulation. Manticore uses a static bulk-synchronous parallel (BSP) execution model to eliminate fine-grain synchronization overhead. It relies entirely on a compiler to schedule resources and communication, which is feasible since RTL code contains few divergent execution paths. With static scheduling, communication and synchronization no longer incur runtime overhead, making fine-grain parallelism practical. Moreover, static scheduling dramatically simplifies processor implementation, significantly increasing the number of cores that fit on a chip. Our 225-core FPGA implementation running at 475 MHz outperforms a state-of-the-art RTL simulator running on desktop and server computers in 8 out of 9 benchmarks.

  • Details
  • Metrics
Type
conference paper
DOI
10.1145/3623278.3624750
Web of Science ID

WOS:001161547900014

Author(s)
Emami, Mahyar  
•
Kashani, Sahand  
•
Kamahori, Keisuke
•
Pourghannad, Mohammad Sepehr
•
Raj, Ritik
•
Larus, James R.  
Editors
Aamodt, TM
•
Jerger, NE
•
Swift, M
Date Issued

2023-01-01

Publisher

Assoc Computing Machinery

Publisher place

New York

Published in
Proceedings Of The 28Th Acm International Conference On Architectural Support For Programming Languages And Operating Systems, Asplos 2023, Vol 4
ISBN of the book

979-8-4007-0394-2

Start page

219

End page

237

Subjects

Technology

•

Gate-Level Simulation

Peer reviewed

REVIEWED

Written at

EPFL

EPFL units
DCSL  
Event nameEvent placeEvent date
28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS)

Vancouver, CANADA

MAR 25-29, 2023

Available on Infoscience
March 18, 2024
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/206385
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés