gem5-users@gem5.org

The gem5 Users mailing list

View all threads

Different simulation results on different computers with the same configuration

2
2497597
Tue, Sep 20, 2022 2:49 AM

Thanks for answering my question ,but I don't think it's bue to random number libraries.
I run the command with increasing injection rate with step=0.02.
I run the same script and get the output as below on my new computer.

average_packet_latency = 25.337632                       (Unspecified)
average_packet_latency = 25.437901                       (Unspecified)
average_packet_latency = 21.756217                       (Unspecified)
average_packet_latency = 22.113257                       (Unspecified)
average_packet_latency = 23.345574                       (Unspecified)
average_packet_latency = 23.661025                       (Unspecified)
average_packet_latency = 24.320524                       (Unspecified)
average_packet_latency = 24.179487                       (Unspecified)
average_packet_latency = 25.514403                       (Unspecified)
average_packet_latency = 26.073759                       (Unspecified)
average_packet_latency = 26.263281                       (Unspecified)
average_packet_latency = 26.547242                       (Unspecified)
average_packet_latency = 27.091405                       (Unspecified)
average_packet_latency = 26.297297                       (Unspecified)
average_packet_latency = 27.140832                       (Unspecified)
average_packet_latency = 27.789159                       (Unspecified)
average_packet_latency = 26.496507                       (Unspecified)
average_packet_latency = 27.239526                       (Unspecified)
average_packet_latency = 27.588123                       (Unspecified)
average_packet_latency = 27.579588                       (Unspecified)
average_packet_latency = 27.810236                       (Unspecified)
average_packet_latency = 28.613595                       (Unspecified)
average_packet_latency = 28.414169                       (Unspecified)
average_packet_latency = 28.499775                       (Unspecified)
average_packet_latency = 28.954376                       (Unspecified)

and my old computer's result is

average_packet_latency = 15.510408                       (Unspecified)
average_packet_latency = 15.528615                       (Unspecified)
average_packet_latency = 15.682214                       (Unspecified)
average_packet_latency = 15.695504                       (Unspecified)
average_packet_latency = 15.769957                       (Unspecified)
average_packet_latency = 15.821728                       (Unspecified)
average_packet_latency = 15.912262                       (Unspecified)
average_packet_latency = 16.051925                       (Unspecified)
average_packet_latency = 16.167249                       (Unspecified)
average_packet_latency = 16.319634                       (Unspecified)
average_packet_latency = 16.479105                       (Unspecified)
average_packet_latency = 16.725313                       (Unspecified)
average_packet_latency = 17.055812                       (Unspecified)
average_packet_latency = 17.588959                       (Unspecified)
average_packet_latency = 18.500431                       (Unspecified)
average_packet_latency = 21.669417                       (Unspecified)
average_packet_latency = 103.241365                       (Unspecified)
average_packet_latency = 273.002675                       (Unspecified)
average_packet_latency = 430.695013                       (Unspecified)
average_packet_latency = 596.634683                       (Unspecified)
average_packet_latency = 732.220679                       (Unspecified)
average_packet_latency = 854.214438                       (Unspecified)
average_packet_latency = 969.032975                       (Unspecified)
average_packet_latency = 1087.468352                       (Unspecified)
average_packet_latency = 1207.588344                       (Unspecified)

Obviously,the old computer's result is reasonable.
I really don't understand why there is such a stark difference.

Thanks for answering my question ,but I don't think it's bue to random number libraries. I run the command with increasing injection rate with step=0.02. I run the same script and get the output as below on my new computer. average_packet_latency = 25.337632                       (Unspecified) average_packet_latency = 25.437901                       (Unspecified) average_packet_latency = 21.756217                       (Unspecified) average_packet_latency = 22.113257                       (Unspecified) average_packet_latency = 23.345574                       (Unspecified) average_packet_latency = 23.661025                       (Unspecified) average_packet_latency = 24.320524                       (Unspecified) average_packet_latency = 24.179487                       (Unspecified) average_packet_latency = 25.514403                       (Unspecified) average_packet_latency = 26.073759                       (Unspecified) average_packet_latency = 26.263281                       (Unspecified) average_packet_latency = 26.547242                       (Unspecified) average_packet_latency = 27.091405                       (Unspecified) average_packet_latency = 26.297297                       (Unspecified) average_packet_latency = 27.140832                       (Unspecified) average_packet_latency = 27.789159                       (Unspecified) average_packet_latency = 26.496507                       (Unspecified) average_packet_latency = 27.239526                       (Unspecified) average_packet_latency = 27.588123                       (Unspecified) average_packet_latency = 27.579588                       (Unspecified) average_packet_latency = 27.810236                       (Unspecified) average_packet_latency = 28.613595                       (Unspecified) average_packet_latency = 28.414169                       (Unspecified) average_packet_latency = 28.499775                       (Unspecified) average_packet_latency = 28.954376                       (Unspecified) and my old computer's result is average_packet_latency = 15.510408                       (Unspecified) average_packet_latency = 15.528615                       (Unspecified) average_packet_latency = 15.682214                       (Unspecified) average_packet_latency = 15.695504                       (Unspecified) average_packet_latency = 15.769957                       (Unspecified) average_packet_latency = 15.821728                       (Unspecified) average_packet_latency = 15.912262                       (Unspecified) average_packet_latency = 16.051925                       (Unspecified) average_packet_latency = 16.167249                       (Unspecified) average_packet_latency = 16.319634                       (Unspecified) average_packet_latency = 16.479105                       (Unspecified) average_packet_latency = 16.725313                       (Unspecified) average_packet_latency = 17.055812                       (Unspecified) average_packet_latency = 17.588959                       (Unspecified) average_packet_latency = 18.500431                       (Unspecified) average_packet_latency = 21.669417                       (Unspecified) average_packet_latency = 103.241365                       (Unspecified) average_packet_latency = 273.002675                       (Unspecified) average_packet_latency = 430.695013                       (Unspecified) average_packet_latency = 596.634683                       (Unspecified) average_packet_latency = 732.220679                       (Unspecified) average_packet_latency = 854.214438                       (Unspecified) average_packet_latency = 969.032975                       (Unspecified) average_packet_latency = 1087.468352                       (Unspecified) average_packet_latency = 1207.588344                       (Unspecified) Obviously,the old computer's result is reasonable. I really don't understand why there is such a stark difference.
GB
gabriel.busnot@arteris.com
Tue, Sep 20, 2022 9:16 AM

Hi,

That’s indeed a large difference that looks more like a bug than anything else.

To (try) reproduce the bug, can you please provide us with the exact commit, build command and run command you are using?

A few possible causes of divergence are:

  1. The random number generator, as suggested in the previous answer. I don’t think it is the case as the main gem5 random number generator is based on std::mt19937_64 which is deterministic for a given seed and the seed is picked deterministically in gem5 by default (it has been serving me just fine for a couple of years).

  2. Relying on addresses to define order. This can occur, for instance, when storing pointers in a std::[unordered_]{map,set} with default comparison or hash functions that rely on the pointer value that itself depends on the memory allocator that gets its memory ranges from the OS.

  3. Undefined behavior. This is the cursed benediction of unsafe languages like C and C++. A program with undefined behavior in it can lead to, well… undefined result. Most hidden undefined behaviors have benign consequences (that’s why they remain hidden) but one of the consequences is slightly different behavior of a program based on, among other things, the compiler and the optimization level. Compilers can do the assumption that undefined behavior never happens to perform clever deductions on the state of the program and perform killer optimizations. Depending on the smartness of the compiler you use (the one on the recent computer likely being smarter than the one on the older computer), you will get odd results.

Beug 2 can be found by running the same program with the same input several times with different optimization levels, compilers or host. It is also a good idea to load the host while running the program under test to increase system entropy and reduce the odds of always having the same addresses provided by the system.

For beug 3, it will likely suffice to run the program with and without optimization enabled (i.e., gem5.debug and gem5.opt).

In all cases, you should run the simulation with as many debug flags enabled as possible and compare the outputs. I recommend using

colordiff -U 1000 <(simulation command 1) <(simulation command 2) | less -R

Best,

Gabriel

Hi, That’s indeed a large difference that looks more like a bug than anything else. To (try) reproduce the bug, can you please provide us with the exact commit, build command and run command you are using? A few possible causes of divergence are: 1. The random number generator, as suggested in the previous answer. I don’t think it is the case as the main gem5 random number generator is based on std::mt19937_64 which is deterministic for a given seed and the seed is picked deterministically in gem5 by default (it has been serving me just fine for a couple of years). 2. Relying on addresses to define order. This can occur, for instance, when storing pointers in a std::\[unordered_\]{map,set} with default comparison or hash functions that rely on the pointer value that itself depends on the memory allocator that gets its memory ranges from the OS. 3. Undefined behavior. This is the cursed benediction of unsafe languages like C and C++. A program with undefined behavior in it ***can*** lead to, well… undefined result. Most hidden undefined behaviors have benign consequences (that’s why they remain hidden) but one of the consequences is slightly different behavior of a program based on, among other things, the compiler and the optimization level. Compilers can do the assumption that undefined behavior never happens to perform clever deductions on the state of the program and perform killer optimizations. Depending on the smartness of the compiler you use (the one on the recent computer likely being smarter than the one on the older computer), you will get odd results. Beug 2 can be found by running the same program with the same input several times with different optimization levels, compilers or host. It is also a good idea to load the host while running the program under test to increase system entropy and reduce the odds of always having the same addresses provided by the system. For beug 3, it will likely suffice to run the program with and without optimization enabled (i.e., gem5.debug and gem5.opt). In all cases, you should run the simulation with as many debug flags enabled as possible and compare the outputs. I recommend using > colordiff -U 1000 <(simulation command 1) <(simulation command 2) | less -R Best, Gabriel