

FEMU: An Open-Source RISC-V Emulation Platform for the Exploration of Accelerator-based Edge Applications





Simone Machetti<sup>1</sup>, Miguel Peón-Quirós<sup>2</sup>, Deniz Kasap<sup>1</sup>, Juan Sapriza<sup>1</sup>, Rubén Rodríguez<sup>1</sup>, José Miranda<sup>1</sup>, Pasquale Davide Schiavone<sup>1</sup>, David Atienza<sup>1,2</sup>

<sup>1</sup>Embedded Systems Laboratory, <sup>2</sup>EcoCloud — École Polytechnique Fédérale de Lausanne

# Architecture and Features...

1. Implemented on the Xilinx Zynq-7020 chip on the Pynq-Z2 board

# 2. Configurable RISC-V CPU:

- CV32E20
- CV32E40P
- CV32E40X

- 3. Configurable number and size of memory banks
- 4. Configurable peripherals

### 5a. Configurable bus topology:

• One-at-a-time

PROCESSING SYSTEM

Fully-connected

### 5b. Configurable bus addressing:

- Contiguous Interleaved
- PYNQ-Z2 BOARD XILINX ZYNQ-7020 RTL LINUX **BOARD COMPONENTS** X-HEEP MEMORY BANK 2 BANK 0 JTAG DEBUG **GPIO ARM Cortex** COPROC CPU UNIT **OBI BUS** OBI 2 AXI SD CTRL SD CARD **PERIPHERAL** ALWAYS-ON PERIPHERAL DOMAIN DOMAIN ACCEL SPI/I2C **POWER** PLIC OTHER FLASH MANAGER PERIPH SRAM **PERFORMANCE** DDR CTRL DRAM PERIPH 2 AXI COUNTERS

### 6. Configurable power modes:

- Clock-gating
- Power-gating RAM retention
- 7. XIF: configurable interface to plug coprocessors

PROGRAMMABLE LOGIC

8. XAIF: configurable interface to plug accelerators

# 9. JTAG virtualization on Linux

- 10. RAM virtualization on Linux: used to expand the X-HEEP RAM size using the board DRAM
- 11. Peripherals virtualization on **Linux:** used to perform virtual ADC acquisitions (using SPI, I2C, etc.) from the DRAM or SD card memories
- **12. Profiling counters:** to measure performance and estimate energy

# Open-source...



X-HEEP



https://github.com/esl-epfl/x-heep

# **FEMU-HW**



https://github.com/esl-epfl/x-heep-femu

### **FEMU-SW**



https://github.com/esl-epfl/x-heep-femu-sdk

# Performance and Energy Estimation...

### Performance estimation:

profiling counters measure the time spent by each component of the architecture in each power state during application execution



**Application** evaluation and optimization:

based on the performance and energy estimation

# **Energy estimation:**

profiling counters values are combined with post-silicon power values from our HEEpocrates chip to estimate the energy consumption of the executed application

# **HEEPocrates...**

**Architecture:** X-HEEP + coarse-grained reconfigurable array (CGRA) accelerator + in-memory computing (IMC) accelerator

Technology: TSMC 65nm CMOS technology

# **HEEPocrates**







3mm