Thanks for answering.
But I don't think the 3 possible causes has the reason.
Because I find now I run the same script on the same computer by using the same version gem5 in differrent folders.
That is say,I use scons to build 2 gem5 with the same version,one of the 2 gem5 has the correct result.
I find the new gem5 I build on my old computer report incorrect result as same as the result on my new computer.
I find the new one I build miss some folders.
"pycache" and "cpu_tests"
I think it's the reason,but I don't know how to fix it.
------------------ 原始邮件 ------------------
发件人: "The gem5 Users mailing list" <gabriel.busnot@arteris.com>;
发送时间: 2022年9月20日(星期二) 下午5:16
收件人: "gem5-users"<gem5-users@gem5.org>;
主题: [gem5-users] Re: Different simulation results on different computers with the same configuration
Hi,
That’s indeed a large difference that looks more like a bug than anything else.
To (try) reproduce the bug, can you please provide us with the exact commit, build command and run command you are using?
A few possible causes of divergence are:
The random number generator, as suggested in the previous answer. I don’t think it is the case as the main gem5 random number generator is based on std::mt19937_64 which is deterministic for a given seed and the seed is picked deterministically in gem5 by default (it has been serving me just fine for a couple of years).
Relying on addresses to define order. This can occur, for instance, when storing pointers in a std::[unordered_]{map,set} with default comparison or hash functions that rely on the pointer value that itself depends on the memory allocator that gets its memory ranges from the OS.
Undefined behavior. This is the cursed benediction of unsafe languages like C and C++. A program with undefined behavior in it can lead to, well… undefined result. Most hidden undefined behaviors have benign consequences (that’s why they remain hidden) but one of the consequences is slightly different behavior of a program based on, among other things, the compiler and the optimization level. Compilers can do the assumption that undefined behavior never happens to perform clever deductions on the state of the program and perform killer optimizations. Depending on the smartness of the compiler you use (the one on the recent computer likely being smarter than the one on the older computer), you will get odd results.
Beug 2 can be found by running the same program with the same input several times with different optimization levels, compilers or host. It is also a good idea to load the host while running the program under test to increase system entropy and reduce the odds of always having the same addresses provided by the system.
For beug 3, it will likely suffice to run the program with and without optimization enabled (i.e., gem5.debug and gem5.opt).
In all cases, you should run the simulation with as many debug flags enabled as possible and compare the outputs. I recommend using
colordiff -U 1000 <(simulation command 1) <(simulation command 2) | less -R
Best,
Gabriel
Not sure to understand what you mean by “correct result".
Can you please provide us with:
gem5 version (commit SHA)
build command
run command
Can you please also double-check that you are using supported versions of python and gcc/clang. Also check that config.ini is the same in all cases.
Gabriel
On 9/20/2022 7:11 AM, 2497597 wrote:
Thanks for answering.
But I don't think the 3 possible causes has the reason.
Because I find now I run the same script on the same computer by using the same version gem5 in
differrent folders.
That is say,I use scons to build 2 gem5 with the same version,one of the 2 gem5 has the correct result.
I find the new gem5 I build on my old computer report incorrect result as same as the result on my
new computer.
I find the new one I build miss some folders.
"pycache" and "cpu_tests"
I think it's the reason,but I don't know how to fix it.
Maybe ... but just by having different lengths of path names, where things are
allocated in memory can vary. Then reason 2 (or even reason 3) can still apply.
This is because various strings will be allocated based on those path names,
leading to different addresses for things, and even different stack alignment.
I have seen this in C / C++ programs before and it can be nasty to ferret out.
Best wishes - Eliot Moss
------------------ 原始邮件 ------------------
发件人: "The gem5 Users mailing list" gabriel.busnot@arteris.com;
发送时间: 2022年9月20日(星期二) 下午5:16
收件人: "gem5-users"gem5-users@gem5.org;
主题: [gem5-users] Re: Different simulation results on different computers with the same configuration
Hi,
That’s indeed a large difference that looks more like a bug than anything else.
To (try) reproduce the bug, can you please provide us with the exact commit, build command and run
command you are using?
A few possible causes of divergence are:
The random number generator, as suggested in the previous answer. I don’t think it is the case
as the main gem5 random number generator is based on std::mt19937_64 which is deterministic for
a given seed and the seed is picked deterministically in gem5 by default (it has been serving me
just fine for a couple of years).
Relying on addresses to define order. This can occur, for instance, when storing pointers in a
std::[unordered_]{map,set} with default comparison or hash functions that rely on the pointer
value that itself depends on the memory allocator that gets its memory ranges from the OS.
Undefined behavior. This is the cursed benediction of unsafe languages like C and C++. A program
with undefined behavior in it /*can */lead to, well… undefined result. Most hidden undefined
behaviors have benign consequences (that’s why they remain hidden) but one of the consequences
is slightly different behavior of a program based on, among other things, the compiler and the
optimization level. Compilers can do the assumption that undefined behavior never happens to
perform clever deductions on the state of the program and perform killer optimizations.
Depending on the smartness of the compiler you use (the one on the recent computer likely being
smarter than the one on the older computer), you will get odd results.
Beug 2 can be found by running the same program with the same input several times with different
optimization levels, compilers or host. It is also a good idea to load the host while running the
program under test to increase system entropy and reduce the odds of always having the same
addresses provided by the system.
For beug 3, it will likely suffice to run the program with and without optimization enabled (i.e.,
gem5.debug and gem5.opt).
In all cases, you should run the simulation with as many debug flags enabled as possible and compare
the outputs. I recommend using
colordiff -U 1000 <(simulation command 1) <(simulation command 2) | less -R
Best,
Gabriel
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-leave@gem5.org