I am encountering a critical non-determinism issue while running network benchmarks in Gem5-dpdk Full System (FS) mode using the Intel DPDK framework. Despite using a cycle-accurate CPU model, a stable checkpoint, and fixed hardware configurations, the simulation results are varying between identical runs, which violates the expected determinism of the simulator.
System Configuration & Environment
- Gem5 Version: gem5-dpdk (ISPASS’24), for DPDK
- CPU Model: Core simulation -> O3_ARM_v7a_3, Checkpoint -> AtomicSimpleCPU
- Environment: Gem5 FS Mode (running from a checkpoint), KVM is NOT used, application contains no random logic.
- Application: DPDK MACSWAP (Network benchmark).
The Critical Bug: Non-Deterministic TXPackets
The received packets (RXPackets) remain perfectly deterministic, but the transmitted packets (TXPackets) count varies significantly. This strongly suggests a timing-dependent race condition that affects the completion of the TX process near the end of the simulation.
The table below shows the results from two consecutive runs with identical configurations (e.g., 128-byte packet size, 233 Gbps rate), where the TX packet count differs substantially.
Metric
Run 1
Run 2
RX
13924017
13924017
TX
11448637
13922306
So, my question is :
- Is this non-deterministic result a genuine problem? Specifically, should experiments executed in Gem5 FS mode with identical configurations yield perfectly deterministic results?
- Have others encountered non-deterministic results under similar conditions (FS mode, NIC I/O experiments) despite using a stable configuration? If so, what was the root cause of the problem (e.g., a specific flaw in the NIC model, DMA timing issue, or event queue race), and how was it resolved (e.g., a code patch or specific config fix)?
Best regards,
Sungwook.
I am encountering a critical non-determinism issue while running network benchmarks in Gem5-dpdk Full System (FS) mode using the Intel DPDK framework. Despite using a cycle-accurate CPU model, a stable checkpoint, and fixed hardware configurations, the simulation results are varying between identical runs, which violates the expected determinism of the simulator.
System Configuration & Environment
* Gem5 Version: gem5-dpdk (ISPASS’24), for DPDK
* CPU Model: Core simulation -> O3_ARM_v7a_3, Checkpoint -> AtomicSimpleCPU
* Environment: Gem5 FS Mode (running from a checkpoint), KVM is NOT used, application contains no random logic.
* Application: DPDK MACSWAP (Network benchmark).
The Critical Bug: Non-Deterministic TXPackets
The received packets (RXPackets) remain perfectly deterministic, but the transmitted packets (TXPackets) count varies significantly. This strongly suggests a timing-dependent race condition that affects the completion of the TX process near the end of the simulation.
The table below shows the results from two consecutive runs with identical configurations (e.g., 128-byte packet size, 233 Gbps rate), where the TX packet count differs substantially.
Metric
Run 1
Run 2
RX
13924017
13924017
TX
11448637
13922306
So, my question is :
1. Is this non-deterministic result a genuine problem? Specifically, should experiments executed in Gem5 FS mode with identical configurations yield perfectly deterministic results?
2. Have others encountered non-deterministic results under similar conditions (FS mode, NIC I/O experiments) despite using a stable configuration? If so, what was the root cause of the problem (e.g., a specific flaw in the NIC model, DMA timing issue, or event queue race), and how was it resolved (e.g., a code patch or specific config fix)?
Best regards,
Sungwook.