## QEMU AND SOC SIMULATION

# FRANÇOIS-FRÉDÉRIC OZOG

A non-exclusive, irrevocable, royalty-free copyright permission is granted by Shokubai.tech to use this material in developing all future revisions and editions of the resulting draft and approved Accellera Systems Initiative SystemC standard, and in derivative works based on the standard.

### QEMU: USE CASES

#### Emulated Computer

(x86, Arm64, Arm, Power, S390...)

Physical Computer (x86, Arm64, Power, S390...)

- Develop for Arm 32 bits devices on Intel X86
- Use Windows x86 on Apple Mac Pro M1
- Cloud PCs
- Simulation

### s devices on Intel X86 Apple Mac Pro M1

### QEMU SOFTWARE PROJECT ASSESSMENT

- Total Physical Source Lines of Code (SLOC)
  - Aarch64 80K, Intel 50K, IBM S390 20K, RiscV 20K
  - Devices 600K
  - Infrastructure 200K, test 200K
- Development Effort Estimate, Person-Years
  - Reality: over 1,300 contributors
- Schedule Estimate, Years
- Total Estimated Cost to Develop average salary = \$56,286/year, overhead = 2.40

generated using David A. Wheeler's 'SLOCCount'

#### = 2,287,077

= 673.40

= 6.36

= \$ 90,967,540

### QEMU PROJECT FACTS SHEET

- Over 1,300 contributors •



### OPEN SOURCE AND GOVERNANCE

- **Open Source closed governance**  $\bullet$ 
  - FreeRTOS, Microsoft Azure RTOS
- Strategic governance
  - SystemC •
  - DPDK •
  - OpenAMP •
- Industry initiatives •
  - SOAFEE
    - OEM and Tier1s, Hypervisor vendors, SoC vendors to address hypervisor portability, starting with device assignment
- Qemu/KVM •
  - Evolving towards more "strategic" (latest kvm forum interactions)

## VEHICLE VALIDATION THROUGH SIMULATIC

World Physics accuracy





Emulated HW In the Loop

Hardware in the loop

Car Computing accuracy

## VEHICLE VALIDATION THROUGH SIMULATION

World Physics accuracy



Full (tires, Cx...)

Car Physics accuracy



Emulated HW In the Loop

Hardware

Car Computing accuracy

that can be realistically operated

## SYSTEM UNITESTING VS SIMULATION

### System Unitesting

- Existence of a simulation framework
- Abstract time( can be controlled by simulator framework) •
- Runs most of the stack but some shim layers to deal with simulator framework  $\bullet$ 
  - For instance time control  $\bullet$
  - Supplemental testing code can be large  $\bullet$
- Simulation  $\bullet$ 
  - No simulation framework per say (digital car twin in digital world)  $\bullet$
  - Runs the exact stack ullet
  - Physical time (meaning no enforced timing 1s = 1s; not meaning time constraints) •

### SIMULATION GOALS

#### Goals •

- what is the right SoC parameters (# performance cores, # low power cores, accelerators...)
- Hybrid Zonal architecture analysis •
- See how the system behaves •
- Ideas based on DVCon •
  - "Elaboration" phase at a higher level (LIDAR pieces in the digital world) •

### VECU LEVEL 4 SIMULATION TECHNOLOGIES

- Arm Fast Models, 100% accurate but really slow!
- Arm Virtual Hardware, in the cloud, derived from Corellium
  - Expects to have Arm AI accelerators integrated  $\bullet$
  - Unclear how to simulate accelerators, heterogeneous platforms •
- Corellium, high performance, centered on mobile devices emulation
- Qemu with SystemC/TLM, broad range of capabilities
- Siemens Veloce Hycon •
  - Can leverage Qemu with SystemC TLM, Arm Fast Models •

For automotive, need interfaces with digital worlds such as CARLA

## SIMULATION: HYBRIDATION QEMU+PROCESSOR SUPPORT

|                                                                    | Processor virt | Enhanced                                  | Qemu                                      |        |  |
|--------------------------------------------------------------------|----------------|-------------------------------------------|-------------------------------------------|--------|--|
|                                                                    |                | Processor Virt                            | Qenna                                     |        |  |
| Processor architecture                                             | Limited to CPU | Limited to CPU                            | Emulate any<br>architecture               |        |  |
| Processor generation/feature<br>(MPAM for instance)                | Limited to CPU | Emulate any feature<br>(can be high cost) | Emulate any feature<br>(can be high cost) | 001 06 |  |
| Devices                                                            | Flexible       | Flexible                                  | Flexible                                  |        |  |
| Performance                                                        | High           | High                                      | Low to High                               | 1      |  |
| Heterogeneous simulation<br>(multiple domains, processor<br>types) | No             | Architecturally built-in Qemu, SystemC    | Possible (Xilinx<br>proprietary)          |        |  |

### QEMU + SYSTEMC FOR SIMULATION

SystemC



Accelerator

Other computer

Concretely, not possible to empulate processor feature AND speed for Cortex-A side Not out of the box (Linaro Heterogenous Platform Project may address that in the future)



### ACCELERATED HETEROGENEOUS QEMU + SYSTEMC

SystemC

Accelerator

Other computer



SystemC





#### FPGA

### EXPLORATIONS

Linux KVM implements too much in-kernel, not exposing enough control

- Nested virtualization on-hold
- Lacks fine grained VMM control of instructions  $\bullet$
- MacOS HVF, implements very little, not exposing enough control  $\bullet$ (based on VMM implementation (7.4Ksloc) capable of booting single processor Linux)
  - Lacks fine grained VMM control of instructions •
  - Unknown nested virtualization support
  - Impossible to change in-kernel behavior support ullet
  - 1300 lines of code to implement a GICv3,v4  $\bullet$ 
    - but cannot support direct IRQ injection in VM as requires in-kernel support  $\bullet$
  - Lacks control of SMMU and MMU

## NEXT STEPS

- Implement HVF like API on top of KVM on Arm (no x86 work)
  - KVM in kernel extensions
    - fine grained "exits"
    - BIGlittle phase 1 (A72 2Ghz, A72 1GHz)
    - S-EL1 emulation
  - KVM "raw mode" (no upstream work)
    - BIGlittle phase 2: CPU (A72, A57)
    - S-EL1 in userland
    - S-EL3
  - KVM user enhancements phase 2
    - Heterogeneous phase 2 CPU (A72, R5, M7)
    - KVM enhancements: KVM enhancements: secure world support
    - KVM enhancements: partial MPAM feature implementation
    - KVM enhancement: simulated GICv4 on a physical GICv3 (no legacy)
- Planning
  - Around three months of workload over the next 18 months



THANK YOU www.shokubai.tech



### QEMU: EMULATION

#### Emulated Computer

(x86, Arm64, Arm, Power, S390...)

Physical Computer (x86, Arm64, Power, S390...)

#### Processing •

- Emulated MMU, IOMMU, interrupt controller •
- Devices  $\bullet$ 
  - TPM...)
  - Para virtualized devices (Virtio)
- "Context"
  - Firmware (normal, secure, others)

• Processor model: just-In-Time cross-compiler (TCG) • Speed is simulated by software with huge perf tax • Emulated modes: Arm secure mode, Intel SMM...

• Emulated devices (x86 disk controller, PL011 UART,

## QEMU ACCELERATION BUILDING ON VIRTUALIZATION



Physical Computer (Arm64)

- Processing •
  - Methods: KVM on Linux, HVF on MacOS... D
  - Processor model is the same as host  $\bullet$
  - •
  - Just normal processing mode
- "Context"
  - Firmware (normal only)

• Speed can be hardware controlled with SCMI at no tax Accelerated MMU, IOMMU, interrupt controller

## DIGITAL"THING" TWIN IN DIGITAL "WORLD" TWIN

CARLA, Nvidia Omniverse...



