gem5-users@gem5.org

The gem5 Users mailing list

View all threads

回复:Re: Different simulation results on different computers with the same configuration

2
2497597
Tue, Sep 20, 2022 11:11 AM

Thanks for  answering.
But I don't think the 3 possible causes has the reason.
Because I find now I run the same script on the same computer by using the same version gem5 in differrent folders.
That is say,I use scons to build 2 gem5 with the same version,one of the 2 gem5 has the correct result.
I find the new gem5 I build on my old computer report incorrect result as same as the result on my new computer.
I find the new one I build miss some folders.
"pycache" and "cpu_tests"
I think it's the reason,but I don't know how to fix it.

------------------ 原始邮件 ------------------
发件人:                                                                                                                        "The gem5 Users mailing list"                                                                                    <gabriel.busnot@arteris.com>;
发送时间: 2022年9月20日(星期二) 下午5:16
收件人: "gem5-users"<gem5-users@gem5.org>;

主题: [gem5-users] Re: Different simulation results on different computers with the same configuration

Hi,

That’s indeed a large difference that looks more like a bug than anything else.

To (try) reproduce the bug, can you please provide us with the exact commit, build command and run command you are using?

A few possible causes of divergence are:

The random number generator, as suggested in the previous answer. I don’t think it is the case as the main gem5 random number generator is based on std::mt19937_64 which is deterministic for a given seed and the seed is picked deterministically in gem5 by default (it has been serving me just fine for a couple of years).

Relying on addresses to define order. This can occur, for instance, when storing pointers in a std::[unordered_]{map,set} with default comparison or hash functions that rely on the pointer value that itself depends on the memory allocator that gets its memory ranges from the OS.

Undefined behavior. This is the cursed benediction of unsafe languages like C and C++. A program with undefined behavior in it can lead to, well… undefined result. Most hidden undefined behaviors have benign consequences (that’s why they remain hidden) but one of the consequences is slightly different behavior of a program based on, among other things, the compiler and the optimization level. Compilers can do the assumption that undefined behavior never happens to perform clever deductions on the state of the program and perform killer optimizations. Depending on the smartness of the compiler you use (the one on the recent computer likely being smarter than the one on the older computer), you will get odd results.

Beug 2 can be found by running the same program with the same input several times with different optimization levels, compilers or host. It is also a good idea to load the host while running the program under test to increase system entropy and reduce the odds of always having the same addresses provided by the system.

For beug 3, it will likely suffice to run the program with and without optimization enabled (i.e., gem5.debug and gem5.opt).

In all cases, you should run the simulation with as many debug flags enabled as possible and compare the outputs. I recommend using

colordiff -U 1000 <(simulation command 1) <(simulation command 2) | less -R

Best,

Gabriel

Thanks for&nbsp; answering. But I don't think the 3 possible causes has the reason. Because I find now I run the same script on the same computer by using the same version gem5 in differrent folders. That is say,I use scons to build 2 gem5 with the same version,one of the 2 gem5 has the correct result. I find the new gem5 I build on my old computer report incorrect result as same as the result on my new computer. I find the new one I build miss some folders. "__pycache__" and "cpu_tests" I think it's the reason,but I don't know how to fix it. ------------------&nbsp;原始邮件&nbsp;------------------ 发件人: "The gem5 Users mailing list" <gabriel.busnot@arteris.com&gt;; 发送时间:&nbsp;2022年9月20日(星期二) 下午5:16 收件人:&nbsp;"gem5-users"<gem5-users@gem5.org&gt;; 主题:&nbsp;[gem5-users] Re: Different simulation results on different computers with the same configuration Hi, That’s indeed a large difference that looks more like a bug than anything else. To (try) reproduce the bug, can you please provide us with the exact commit, build command and run command you are using? A few possible causes of divergence are: The random number generator, as suggested in the previous answer. I don’t think it is the case as the main gem5 random number generator is based on std::mt19937_64 which is deterministic for a given seed and the seed is picked deterministically in gem5 by default (it has been serving me just fine for a couple of years). Relying on addresses to define order. This can occur, for instance, when storing pointers in a std::[unordered_]{map,set} with default comparison or hash functions that rely on the pointer value that itself depends on the memory allocator that gets its memory ranges from the OS. Undefined behavior. This is the cursed benediction of unsafe languages like C and C++. A program with undefined behavior in it can lead to, well… undefined result. Most hidden undefined behaviors have benign consequences (that’s why they remain hidden) but one of the consequences is slightly different behavior of a program based on, among other things, the compiler and the optimization level. Compilers can do the assumption that undefined behavior never happens to perform clever deductions on the state of the program and perform killer optimizations. Depending on the smartness of the compiler you use (the one on the recent computer likely being smarter than the one on the older computer), you will get odd results. Beug 2 can be found by running the same program with the same input several times with different optimization levels, compilers or host. It is also a good idea to load the host while running the program under test to increase system entropy and reduce the odds of always having the same addresses provided by the system. For beug 3, it will likely suffice to run the program with and without optimization enabled (i.e., gem5.debug and gem5.opt). In all cases, you should run the simulation with as many debug flags enabled as possible and compare the outputs. I recommend using colordiff -U 1000 <(simulation command 1) <(simulation command 2) | less -R Best, Gabriel
GB
gabriel.busnot@arteris.com
Tue, Sep 20, 2022 11:40 AM

Not sure to understand what you mean by “correct result".

Can you please provide us with:

  1. gem5 version (commit SHA)

  2. build command

  3. run command

Can you please also double-check that you are using supported versions of python and gcc/clang. Also check that config.ini is the same in all cases.

Gabriel

Not sure to understand what you mean by “correct result". Can you please provide us with: 1. gem5 version (commit SHA) 2. build command 3. run command Can you please also double-check that you are using supported versions of python and gcc/clang. Also check that config.ini is the same in all cases. Gabriel
EM
Eliot Moss
Tue, Sep 20, 2022 12:23 PM

On 9/20/2022 7:11 AM, 2497597 wrote:

Thanks for  answering.
But I don't think the 3 possible causes has the reason.
Because I find now I run the same script on the same computer by using the same version gem5 in
differrent folders.
That is say,I use scons to build 2 gem5 with the same version,one of the 2 gem5 has the correct result.
I find the new gem5 I build on my old computer report incorrect result as same as the result on my
new computer.
I find the new one I build miss some folders.
"pycache" and "cpu_tests"
I think it's the reason,but I don't know how to fix it.

Maybe ... but just by having different lengths of path names, where things are
allocated in memory can vary.  Then reason 2 (or even reason 3) can still apply.
This is because various strings will be allocated based on those path names,
leading to different addresses for things, and even different stack alignment.
I have seen this in C / C++ programs before and it can be nasty to ferret out.

Best wishes - Eliot Moss

------------------ 原始邮件 ------------------
发件人: "The gem5 Users mailing list" gabriel.busnot@arteris.com;
发送时间: 2022年9月20日(星期二) 下午5:16
收件人: "gem5-users"gem5-users@gem5.org;
主题: [gem5-users] Re: Different simulation results on different computers with the same configuration

Hi,

That’s indeed a large difference that looks more like a bug than anything else.

To (try) reproduce the bug, can you please provide us with the exact commit, build command and run
command you are using?

A few possible causes of divergence are:

 The random number generator, as suggested in the previous answer. I don’t think it is the case
 as the main gem5 random number generator is based on std::mt19937_64 which is deterministic for
 a given seed and the seed is picked deterministically in gem5 by default (it has been serving me
 just fine for a couple of years).
 Relying on addresses to define order. This can occur, for instance, when storing pointers in a
 std::[unordered_]{map,set} with default comparison or hash functions that rely on the pointer
 value that itself depends on the memory allocator that gets its memory ranges from the OS.
 Undefined behavior. This is the cursed benediction of unsafe languages like C and C++. A program
 with undefined behavior in it /*can */lead to, well… undefined result. Most hidden undefined
 behaviors have benign consequences (that’s why they remain hidden) but one of the consequences
 is slightly different behavior of a program based on, among other things, the compiler and the
 optimization level. Compilers can do the assumption that undefined behavior never happens to
 perform clever deductions on the state of the program and perform killer optimizations.
 Depending on the smartness of the compiler you use (the one on the recent computer likely being
 smarter than the one on the older computer), you will get odd results.

Beug 2 can be found by running the same program with the same input several times with different
optimization levels, compilers or host. It is also a good idea to load the host while running the
program under test to increase system entropy and reduce the odds of always having the same
addresses provided by the system.

For beug 3, it will likely suffice to run the program with and without optimization enabled (i.e.,
gem5.debug and gem5.opt).

In all cases, you should run the simulation with as many debug flags enabled as possible and compare
the outputs. I recommend using

 colordiff -U 1000 <(simulation command 1) <(simulation command 2) | less -R

Best,

Gabriel


gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-leave@gem5.org

On 9/20/2022 7:11 AM, 2497597 wrote: > Thanks for  answering. > But I don't think the 3 possible causes has the reason. > Because I find now I run the same script on the same computer by using the same version gem5 in > differrent folders. > That is say,I use scons to build 2 gem5 with the same version,one of the 2 gem5 has the correct result. > I find the new gem5 I build on my old computer report incorrect result as same as the result on my > new computer. > I find the new one I build miss some folders. > "__pycache__" and "cpu_tests" > I think it's the reason,but I don't know how to fix it. Maybe ... but just by having different lengths of path names, where things are allocated in memory can vary. Then reason 2 (or even reason 3) can still apply. This is because various strings will be allocated based on those path names, leading to different addresses for things, and even different stack alignment. I have seen this in C / C++ programs before and it can be nasty to ferret out. Best wishes - Eliot Moss > ------------------ 原始邮件 ------------------ > *发件人:* "The gem5 Users mailing list" <gabriel.busnot@arteris.com>; > *发送时间:* 2022年9月20日(星期二) 下午5:16 > *收件人:* "gem5-users"<gem5-users@gem5.org>; > *主题:* [gem5-users] Re: Different simulation results on different computers with the same configuration > > Hi, > > That’s indeed a large difference that looks more like a bug than anything else. > > To (try) reproduce the bug, can you please provide us with the exact commit, build command and run > command you are using? > > A few possible causes of divergence are: > > 1. > > The random number generator, as suggested in the previous answer. I don’t think it is the case > as the main gem5 random number generator is based on std::mt19937_64 which is deterministic for > a given seed and the seed is picked deterministically in gem5 by default (it has been serving me > just fine for a couple of years). > > 2. > > Relying on addresses to define order. This can occur, for instance, when storing pointers in a > std::[unordered_]{map,set} with default comparison or hash functions that rely on the pointer > value that itself depends on the memory allocator that gets its memory ranges from the OS. > > 3. > > Undefined behavior. This is the cursed benediction of unsafe languages like C and C++. A program > with undefined behavior in it /*can */lead to, well… undefined result. Most hidden undefined > behaviors have benign consequences (that’s why they remain hidden) but one of the consequences > is slightly different behavior of a program based on, among other things, the compiler and the > optimization level. Compilers can do the assumption that undefined behavior never happens to > perform clever deductions on the state of the program and perform killer optimizations. > Depending on the smartness of the compiler you use (the one on the recent computer likely being > smarter than the one on the older computer), you will get odd results. > > Beug 2 can be found by running the same program with the same input several times with different > optimization levels, compilers or host. It is also a good idea to load the host while running the > program under test to increase system entropy and reduce the odds of always having the same > addresses provided by the system. > > For beug 3, it will likely suffice to run the program with and without optimization enabled (i.e., > gem5.debug and gem5.opt). > > In all cases, you should run the simulation with as many debug flags enabled as possible and compare > the outputs. I recommend using > > colordiff -U 1000 <(simulation command 1) <(simulation command 2) | less -R > > Best, > > Gabriel > > > _______________________________________________ > gem5-users mailing list -- gem5-users@gem5.org > To unsubscribe send an email to gem5-users-leave@gem5.org