gem5-users@gem5.org

The gem5 Users mailing list

View all threads

Reproducible/deterministic runs

HE
Hossam ElAtali
Thu, Sep 26, 2024 11:04 PM

Hi,

I wanted to ask if there is a way to make full-system runs deterministic. I tried running with faketime set at a fixed date and time (to possibly trick any rng seeds) but that doesn't help.

The reason I'm asking is because of the following situation I run into. I have a modified gem5 and when I run the memtest.py config with gem5.opt, I get a write/read mismatch at a certain point. When I rerun with debug flags on, the mismatch doesn't occur (or occurs earlier/later). I also tried the following:

Let's say the mismatch is caught at tick 1000 (for demonstrative purposes). If I run...

./build/X86/gem5.opt -re --silent-redirect -d "test" configs/example/memtest.py -f 0 -c 2:1:1:1 -t 2:1:1:1:3

...or...

./build/X86/gem5.opt -re --silent-redirect -d "test" --debug-flags=CacheAll,MemTest --debug-file=dbg --debug-start=5000 configs/example/memtest.py -f 0 -c 2:1:1:1 -t 2:1:1:1:3

...I get a mismatch at tick 1000 (before tracing starts in the second case). If I run...

./build/X86/gem5.opt -re --silent-redirect -d "test" --debug-flags=CacheAll,MemTest --debug-file=dbg --debug-start=600 configs/example/memtest.py -f 0 -c 2:1:1:1 -t 2:1:1:1:3

...I don't get a mismatch at tick 1000 and the simulation continues past that point (and might find a different mismatch afterwards).

So is there any way to make a run completely deterministic so these differences don't occur? That way I can run gem5 quickly without tracing until I hit a mismatch, then rerun with tracing enabled right before that point. Thanks a lot for your help.

Best,
Hossam

Hi, I wanted to ask if there is a way to make full-system runs deterministic. I tried running with faketime set at a fixed date and time (to possibly trick any rng seeds) but that doesn't help. The reason I'm asking is because of the following situation I run into. I have a modified gem5 and when I run the memtest.py config with gem5.opt, I get a write/read mismatch at a certain point. When I rerun with debug flags on, the mismatch doesn't occur (or occurs earlier/later). I also tried the following: Let's say the mismatch is caught at tick 1000 (for demonstrative purposes). If I run... ./build/X86/gem5.opt -re --silent-redirect -d "test" configs/example/memtest.py -f 0 -c 2:1:1:1 -t 2:1:1:1:3 ...or... ./build/X86/gem5.opt -re --silent-redirect -d "test" --debug-flags=CacheAll,MemTest --debug-file=dbg --debug-start=5000 configs/example/memtest.py -f 0 -c 2:1:1:1 -t 2:1:1:1:3 ...I get a mismatch at tick 1000 (before tracing starts in the second case). If I run... ./build/X86/gem5.opt -re --silent-redirect -d "test" --debug-flags=CacheAll,MemTest --debug-file=dbg --debug-start=600 configs/example/memtest.py -f 0 -c 2:1:1:1 -t 2:1:1:1:3 ...I don't get a mismatch at tick 1000 and the simulation continues past that point (and might find a different mismatch afterwards). So is there any way to make a run completely deterministic so these differences don't occur? That way I can run gem5 quickly without tracing until I hit a mismatch, then rerun with tracing enabled right before that point. Thanks a lot for your help. Best, Hossam
LB
LUIS BERTRAN ALVAREZ
Fri, Sep 27, 2024 6:39 AM

Hi Hossam,

From what I understand you modified the memtest.py to run full-system.
Which CPU models are you using to boot the OS? KVM CPU models are not
deterministic. If you still want to bypass your cache system to avoid
longer booting times you can use the non-caching simple CPU model
(deterministic).

Best,
Luis

Le 27/09/2024 à 01:04, Hossam ElAtali via gem5-users a écrit :

Hi,

I wanted to ask if there is a way to make full-system runs
deterministic. I tried running with |faketime| set at a fixed date and
time (to possibly trick any rng seeds) but that doesn't help.

The reason I'm asking is because of the following situation I run
into. I have a modified gem5 and when I run the memtest.py config with
gem5.opt, I get a write/read mismatch at a certain point. When I rerun
with debug flags on, the mismatch doesn't occur (or occurs
earlier/later). I also tried the following:

Let's say the mismatch is caught at tick 1000 (for demonstrative
purposes). If I run...

|./build/X86/gem5.opt -re --silent-redirect -d "test"
configs/example/memtest.py -f 0 -c 2:1:1:1 -t 2:1:1:1:3|

...or...

|./build/X86/gem5.opt -re --silent-redirect -d "test"
--debug-flags=CacheAll,MemTest --debug-file=dbg
--debug-start=||5000|| configs/example/memtest.py -f 0 -c 2:1:1:1 -t
2:1:1:1:3|

...I get a mismatch at tick 1000 (before tracing starts in the second
case). If I run...

|./build/X86/gem5.opt -re --silent-redirect -d "test"
--debug-flags=CacheAll,MemTest --debug-file=dbg --debug-start=600
configs/example/memtest.py -f 0 -c 2:1:1:1 -t 2:1:1:1:3|

...I don't get a mismatch at tick 1000 and the simulation continues
past that point (and might find a different mismatch afterwards).

So is there any way to make a run completely deterministic so these
differences don't occur? That way I can run gem5 quickly without
tracing until I hit a mismatch, then rerun with tracing enabled right
before that point. Thanks a lot for your help.

Best,
Hossam


gem5-users mailing list --gem5-users@gem5.org
To unsubscribe send an email togem5-users-leave@gem5.org

Hi Hossam, From what I understand you modified the memtest.py to run full-system. Which CPU models are you using to boot the OS? KVM CPU models are not deterministic. If you still want to bypass your cache system to avoid longer booting times you can use the non-caching simple CPU model (deterministic). Best, Luis Le 27/09/2024 à 01:04, Hossam ElAtali via gem5-users a écrit : > Hi, > > I wanted to ask if there is a way to make full-system runs > deterministic. I tried running with |faketime| set at a fixed date and > time (to possibly trick any rng seeds) but that doesn't help. > > The reason I'm asking is because of the following situation I run > into. I have a modified gem5 and when I run the memtest.py config with > gem5.opt, I get a write/read mismatch at a certain point. When I rerun > with debug flags on, the mismatch doesn't occur (or occurs > earlier/later). I also tried the following: > > Let's say the mismatch is caught at tick 1000 (for demonstrative > purposes). If I run... > > |./build/X86/gem5.opt -re --silent-redirect -d "test" > configs/example/memtest.py -f 0 -c 2:1:1:1 -t 2:1:1:1:3| > > ...or... > > |./build/X86/gem5.opt -re --silent-redirect -d "test" > --debug-flags=CacheAll,MemTest --debug-file=dbg > --debug-start=||5000|| configs/example/memtest.py -f 0 -c 2:1:1:1 -t > 2:1:1:1:3| > > ...I get a mismatch at tick 1000 (before tracing starts in the second > case). If I run... > > |./build/X86/gem5.opt -re --silent-redirect -d "test" > --debug-flags=CacheAll,MemTest --debug-file=dbg --debug-start=600 > configs/example/memtest.py -f 0 -c 2:1:1:1 -t 2:1:1:1:3| > > ...I don't get a mismatch at tick 1000 and the simulation continues > past that point (and might find a different mismatch afterwards). > > So is there any way to make a run completely deterministic so these > differences don't occur? That way I can run gem5 quickly without > tracing until I hit a mismatch, then rerun with tracing enabled right > before that point. Thanks a lot for your help. > > Best, > Hossam > > _______________________________________________ > gem5-users mailing list --gem5-users@gem5.org > To unsubscribe send an email togem5-users-leave@gem5.org
HE
Hossam ElAtali
Fri, Sep 27, 2024 3:25 PM

Hi Luis,

I think I misspoke when I said full-system, since there are no cpus in the memtest.py configuration. Only the MemTest objects that generate reads and writes. Are you saying the only source of randomness is whatever generates the reads/writes? So the caches, mem controller, DRAM model, etc. do not introduce any randomness?

For runs with CPU models, I understand how KVM models are non-deterministic, but is there a way to make other models (e.g. o3) deterministic as well (or are they deterministic by default)?

Best,
Hossam


From: LUIS BERTRAN ALVAREZ luis.bertran-alvarez@lirmm.fr
Sent: Friday, September 27, 2024 2:39:41 AM
To: The gem5 Users mailing list gem5-users@gem5.org
Cc: Hossam ElAtali hossam.elatali@uwaterloo.ca
Subject: Re: [gem5-users] Reproducible/deterministic runs

Hi Hossam,

From what I understand you modified the memtest.py to run full-system.
Which CPU models are you using to boot the OS? KVM CPU models are not deterministic. If you still want to bypass your cache system to avoid longer booting times you can use the non-caching simple CPU model (deterministic).

Best,
Luis

Le 27/09/2024 à 01:04, Hossam ElAtali via gem5-users a écrit :
Hi,

I wanted to ask if there is a way to make full-system runs deterministic. I tried running with faketime set at a fixed date and time (to possibly trick any rng seeds) but that doesn't help.

The reason I'm asking is because of the following situation I run into. I have a modified gem5 and when I run the memtest.py config with gem5.opt, I get a write/read mismatch at a certain point. When I rerun with debug flags on, the mismatch doesn't occur (or occurs earlier/later). I also tried the following:

Let's say the mismatch is caught at tick 1000 (for demonstrative purposes). If I run...

./build/X86/gem5.opt -re --silent-redirect -d "test" configs/example/memtest.py -f 0 -c 2:1:1:1 -t 2:1:1:1:3

...or...

./build/X86/gem5.opt -re --silent-redirect -d "test" --debug-flags=CacheAll,MemTest --debug-file=dbg --debug-start=5000 configs/example/memtest.py -f 0 -c 2:1:1:1 -t 2:1:1:1:3

...I get a mismatch at tick 1000 (before tracing starts in the second case). If I run...

./build/X86/gem5.opt -re --silent-redirect -d "test" --debug-flags=CacheAll,MemTest --debug-file=dbg --debug-start=600 configs/example/memtest.py -f 0 -c 2:1:1:1 -t 2:1:1:1:3

...I don't get a mismatch at tick 1000 and the simulation continues past that point (and might find a different mismatch afterwards).

So is there any way to make a run completely deterministic so these differences don't occur? That way I can run gem5 quickly without tracing until I hit a mismatch, then rerun with tracing enabled right before that point. Thanks a lot for your help.

Best,
Hossam


gem5-users mailing list -- gem5-users@gem5.orgmailto:gem5-users@gem5.org
To unsubscribe send an email to gem5-users-leave@gem5.orgmailto:gem5-users-leave@gem5.org

Hi Luis, I think I misspoke when I said full-system, since there are no cpus in the memtest.py configuration. Only the MemTest objects that generate reads and writes. Are you saying the only source of randomness is whatever generates the reads/writes? So the caches, mem controller, DRAM model, etc. do not introduce any randomness? For runs with CPU models, I understand how KVM models are non-deterministic, but is there a way to make other models (e.g. o3) deterministic as well (or are they deterministic by default)? Best, Hossam ________________________________ From: LUIS BERTRAN ALVAREZ <luis.bertran-alvarez@lirmm.fr> Sent: Friday, September 27, 2024 2:39:41 AM To: The gem5 Users mailing list <gem5-users@gem5.org> Cc: Hossam ElAtali <hossam.elatali@uwaterloo.ca> Subject: Re: [gem5-users] Reproducible/deterministic runs Hi Hossam, From what I understand you modified the memtest.py to run full-system. Which CPU models are you using to boot the OS? KVM CPU models are not deterministic. If you still want to bypass your cache system to avoid longer booting times you can use the non-caching simple CPU model (deterministic). Best, Luis Le 27/09/2024 à 01:04, Hossam ElAtali via gem5-users a écrit : Hi, I wanted to ask if there is a way to make full-system runs deterministic. I tried running with faketime set at a fixed date and time (to possibly trick any rng seeds) but that doesn't help. The reason I'm asking is because of the following situation I run into. I have a modified gem5 and when I run the memtest.py config with gem5.opt, I get a write/read mismatch at a certain point. When I rerun with debug flags on, the mismatch doesn't occur (or occurs earlier/later). I also tried the following: Let's say the mismatch is caught at tick 1000 (for demonstrative purposes). If I run... ./build/X86/gem5.opt -re --silent-redirect -d "test" configs/example/memtest.py -f 0 -c 2:1:1:1 -t 2:1:1:1:3 ...or... ./build/X86/gem5.opt -re --silent-redirect -d "test" --debug-flags=CacheAll,MemTest --debug-file=dbg --debug-start=5000 configs/example/memtest.py -f 0 -c 2:1:1:1 -t 2:1:1:1:3 ...I get a mismatch at tick 1000 (before tracing starts in the second case). If I run... ./build/X86/gem5.opt -re --silent-redirect -d "test" --debug-flags=CacheAll,MemTest --debug-file=dbg --debug-start=600 configs/example/memtest.py -f 0 -c 2:1:1:1 -t 2:1:1:1:3 ...I don't get a mismatch at tick 1000 and the simulation continues past that point (and might find a different mismatch afterwards). So is there any way to make a run completely deterministic so these differences don't occur? That way I can run gem5 quickly without tracing until I hit a mismatch, then rerun with tracing enabled right before that point. Thanks a lot for your help. Best, Hossam _______________________________________________ gem5-users mailing list -- gem5-users@gem5.org<mailto:gem5-users@gem5.org> To unsubscribe send an email to gem5-users-leave@gem5.org<mailto:gem5-users-leave@gem5.org>
HE
Hossam ElAtali
Fri, Sep 27, 2024 4:52 PM

I wanted to check if the MemTest objects were the source of randomness, but I see that memtest.cc uses the gem5::Random class which uses a default-seeded std::mt19937_64 generator. So it should be deterministic, assuming no race conditions between them. If the randomness is not coming from the MemTest objects themselves, is it possible that there is some randomness introduced in the event loop when multiple events are scheduled at the same tick? For example, if multiple MemTest objects are scheduled to send a request at a certain tick, could the order in which their "tick()" function get called differ?

Best,
Hossam


From: Hossam ElAtali hossam.elatali@uwaterloo.ca
Sent: Friday, September 27, 2024 11:25 AM
To: LUIS BERTRAN ALVAREZ luis.bertran-alvarez@lirmm.fr; The gem5 Users mailing list gem5-users@gem5.org
Subject: Re: [gem5-users] Reproducible/deterministic runs

Hi Luis,

I think I misspoke when I said full-system, since there are no cpus in the memtest.py configuration. Only the MemTest objects that generate reads and writes. Are you saying the only source of randomness is whatever generates the reads/writes? So the caches, mem controller, DRAM model, etc. do not introduce any randomness?

For runs with CPU models, I understand how KVM models are non-deterministic, but is there a way to make other models (e.g. o3) deterministic as well (or are they deterministic by default)?

Best,
Hossam


From: LUIS BERTRAN ALVAREZ luis.bertran-alvarez@lirmm.fr
Sent: Friday, September 27, 2024 2:39:41 AM
To: The gem5 Users mailing list gem5-users@gem5.org
Cc: Hossam ElAtali hossam.elatali@uwaterloo.ca
Subject: Re: [gem5-users] Reproducible/deterministic runs

Hi Hossam,

From what I understand you modified the memtest.py to run full-system.
Which CPU models are you using to boot the OS? KVM CPU models are not deterministic. If you still want to bypass your cache system to avoid longer booting times you can use the non-caching simple CPU model (deterministic).

Best,
Luis

Le 27/09/2024 à 01:04, Hossam ElAtali via gem5-users a écrit :
Hi,

I wanted to ask if there is a way to make full-system runs deterministic. I tried running with faketime set at a fixed date and time (to possibly trick any rng seeds) but that doesn't help.

The reason I'm asking is because of the following situation I run into. I have a modified gem5 and when I run the memtest.py config with gem5.opt, I get a write/read mismatch at a certain point. When I rerun with debug flags on, the mismatch doesn't occur (or occurs earlier/later). I also tried the following:

Let's say the mismatch is caught at tick 1000 (for demonstrative purposes). If I run...

./build/X86/gem5.opt -re --silent-redirect -d "test" configs/example/memtest.py -f 0 -c 2:1:1:1 -t 2:1:1:1:3

...or...

./build/X86/gem5.opt -re --silent-redirect -d "test" --debug-flags=CacheAll,MemTest --debug-file=dbg --debug-start=5000 configs/example/memtest.py -f 0 -c 2:1:1:1 -t 2:1:1:1:3

...I get a mismatch at tick 1000 (before tracing starts in the second case). If I run...

./build/X86/gem5.opt -re --silent-redirect -d "test" --debug-flags=CacheAll,MemTest --debug-file=dbg --debug-start=600 configs/example/memtest.py -f 0 -c 2:1:1:1 -t 2:1:1:1:3

...I don't get a mismatch at tick 1000 and the simulation continues past that point (and might find a different mismatch afterwards).

So is there any way to make a run completely deterministic so these differences don't occur? That way I can run gem5 quickly without tracing until I hit a mismatch, then rerun with tracing enabled right before that point. Thanks a lot for your help.

Best,
Hossam


gem5-users mailing list -- gem5-users@gem5.orgmailto:gem5-users@gem5.org
To unsubscribe send an email to gem5-users-leave@gem5.orgmailto:gem5-users-leave@gem5.org

I wanted to check if the MemTest objects were the source of randomness, but I see that memtest.cc uses the gem5::Random class which uses a default-seeded std::mt19937_64 generator. So it should be deterministic, assuming no race conditions between them. If the randomness is not coming from the MemTest objects themselves, is it possible that there is some randomness introduced in the event loop when multiple events are scheduled at the same tick? For example, if multiple MemTest objects are scheduled to send a request at a certain tick, could the order in which their "tick()" function get called differ? Best, Hossam ________________________________ From: Hossam ElAtali <hossam.elatali@uwaterloo.ca> Sent: Friday, September 27, 2024 11:25 AM To: LUIS BERTRAN ALVAREZ <luis.bertran-alvarez@lirmm.fr>; The gem5 Users mailing list <gem5-users@gem5.org> Subject: Re: [gem5-users] Reproducible/deterministic runs Hi Luis, I think I misspoke when I said full-system, since there are no cpus in the memtest.py configuration. Only the MemTest objects that generate reads and writes. Are you saying the only source of randomness is whatever generates the reads/writes? So the caches, mem controller, DRAM model, etc. do not introduce any randomness? For runs with CPU models, I understand how KVM models are non-deterministic, but is there a way to make other models (e.g. o3) deterministic as well (or are they deterministic by default)? Best, Hossam ________________________________ From: LUIS BERTRAN ALVAREZ <luis.bertran-alvarez@lirmm.fr> Sent: Friday, September 27, 2024 2:39:41 AM To: The gem5 Users mailing list <gem5-users@gem5.org> Cc: Hossam ElAtali <hossam.elatali@uwaterloo.ca> Subject: Re: [gem5-users] Reproducible/deterministic runs Hi Hossam, From what I understand you modified the memtest.py to run full-system. Which CPU models are you using to boot the OS? KVM CPU models are not deterministic. If you still want to bypass your cache system to avoid longer booting times you can use the non-caching simple CPU model (deterministic). Best, Luis Le 27/09/2024 à 01:04, Hossam ElAtali via gem5-users a écrit : Hi, I wanted to ask if there is a way to make full-system runs deterministic. I tried running with faketime set at a fixed date and time (to possibly trick any rng seeds) but that doesn't help. The reason I'm asking is because of the following situation I run into. I have a modified gem5 and when I run the memtest.py config with gem5.opt, I get a write/read mismatch at a certain point. When I rerun with debug flags on, the mismatch doesn't occur (or occurs earlier/later). I also tried the following: Let's say the mismatch is caught at tick 1000 (for demonstrative purposes). If I run... ./build/X86/gem5.opt -re --silent-redirect -d "test" configs/example/memtest.py -f 0 -c 2:1:1:1 -t 2:1:1:1:3 ...or... ./build/X86/gem5.opt -re --silent-redirect -d "test" --debug-flags=CacheAll,MemTest --debug-file=dbg --debug-start=5000 configs/example/memtest.py -f 0 -c 2:1:1:1 -t 2:1:1:1:3 ...I get a mismatch at tick 1000 (before tracing starts in the second case). If I run... ./build/X86/gem5.opt -re --silent-redirect -d "test" --debug-flags=CacheAll,MemTest --debug-file=dbg --debug-start=600 configs/example/memtest.py -f 0 -c 2:1:1:1 -t 2:1:1:1:3 ...I don't get a mismatch at tick 1000 and the simulation continues past that point (and might find a different mismatch afterwards). So is there any way to make a run completely deterministic so these differences don't occur? That way I can run gem5 quickly without tracing until I hit a mismatch, then rerun with tracing enabled right before that point. Thanks a lot for your help. Best, Hossam _______________________________________________ gem5-users mailing list -- gem5-users@gem5.org<mailto:gem5-users@gem5.org> To unsubscribe send an email to gem5-users-leave@gem5.org<mailto:gem5-users-leave@gem5.org>
SB
Srikant Bharadwaj
Fri, Sep 27, 2024 4:53 PM

Hi,
I have faced this issue in non-FS mode in the past and it always eventually
boiled down to two things:

  1. If you running your jobs in a cluster - it is possible that the
    ld.so.cache is different in each host. This means that your simulations
    will be slightly different because of the libraries loaded by gem5. If you
    are not trying your simulations on job clusters, then ignore this point.
  2. There is a memory leak - this is getting triggered in one way when you
    don’t have debug flags on and another way when you do. Solution is to run
    your simulation with a tool like valgrind to detect them. It’s often easier
    to find the code line where this is getting triggered by running valgrind
    on a debug gem5 build.

Thanks,
Srikant

On Fri, Sep 27, 2024 at 8:28 AM Hossam ElAtali via gem5-users <
gem5-users@gem5.org> wrote:

Hi Luis,

I think I misspoke when I said full-system, since there are no cpus in the
memtest.py configuration. Only the MemTest objects that generate reads and
writes. Are you saying the only source of randomness is whatever generates
the reads/writes? So the caches, mem controller, DRAM model, etc. do not
introduce any randomness?

For runs with CPU models, I understand how KVM models are
non-deterministic, but is there a way to make other models (e.g. o3)
deterministic as well (or are they deterministic by default)?

Best,
Hossam

From: LUIS BERTRAN ALVAREZ luis.bertran-alvarez@lirmm.fr
Sent: Friday, September 27, 2024 2:39:41 AM
To: The gem5 Users mailing list gem5-users@gem5.org
Cc: Hossam ElAtali hossam.elatali@uwaterloo.ca
Subject: Re: [gem5-users] Reproducible/deterministic runs

Hi Hossam,

From what I understand you modified the memtest.py to run full-system.
Which CPU models are you using to boot the OS? KVM CPU models are not
deterministic. If you still want to bypass your cache system to avoid
longer booting times you can use the non-caching simple CPU model
(deterministic).

Best,
Luis

Le 27/09/2024 à 01:04, Hossam ElAtali via gem5-users a écrit :

Hi,

I wanted to ask if there is a way to make full-system runs deterministic.
I tried running with faketime set at a fixed date and time (to possibly
trick any rng seeds) but that doesn't help.

The reason I'm asking is because of the following situation I run into. I
have a modified gem5 and when I run the memtest.py config with gem5.opt, I
get a write/read mismatch at a certain point. When I rerun with debug flags
on, the mismatch doesn't occur (or occurs earlier/later). I also tried the
following:

Let's say the mismatch is caught at tick 1000 (for demonstrative
purposes). If I run...

./build/X86/gem5.opt -re --silent-redirect -d "test"
configs/example/memtest.py -f 0 -c 2:1:1:1 -t 2:1:1:1:3

...or...

./build/X86/gem5.opt -re --silent-redirect -d "test"
--debug-flags=CacheAll,MemTest --debug-file=dbg --debug-start=5000 configs/example/memtest.py
-f 0 -c 2:1:1:1 -t 2:1:1:1:3

...I get a mismatch at tick 1000 (before tracing starts in the second
case). If I run...

./build/X86/gem5.opt -re --silent-redirect -d "test"
--debug-flags=CacheAll,MemTest --debug-file=dbg --debug-start=600
configs/example/memtest.py -f 0 -c 2:1:1:1 -t 2:1:1:1:3

...I don't get a mismatch at tick 1000 and the simulation continues past
that point (and might find a different mismatch afterwards).

So is there any way to make a run completely deterministic so these
differences don't occur? That way I can run gem5 quickly without tracing
until I hit a mismatch, then rerun with tracing enabled right before that
point. Thanks a lot for your help.

Best,
Hossam


gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-leave@gem5.org


gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-leave@gem5.org

Hi, I have faced this issue in non-FS mode in the past and it always eventually boiled down to two things: 1. If you running your jobs in a cluster - it is possible that the ld.so.cache is different in each host. This means that your simulations will be slightly different because of the libraries loaded by gem5. If you are not trying your simulations on job clusters, then ignore this point. 2. There is a memory leak - this is getting triggered in one way when you don’t have debug flags on and another way when you do. Solution is to run your simulation with a tool like valgrind to detect them. It’s often easier to find the code line where this is getting triggered by running valgrind on a debug gem5 build. Thanks, Srikant On Fri, Sep 27, 2024 at 8:28 AM Hossam ElAtali via gem5-users < gem5-users@gem5.org> wrote: > Hi Luis, > > I think I misspoke when I said full-system, since there are no cpus in the > memtest.py configuration. Only the MemTest objects that generate reads and > writes. Are you saying the only source of randomness is whatever generates > the reads/writes? So the caches, mem controller, DRAM model, etc. do not > introduce any randomness? > > For runs with CPU models, I understand how KVM models are > non-deterministic, but is there a way to make other models (e.g. o3) > deterministic as well (or are they deterministic by default)? > > Best, > Hossam > ------------------------------ > *From:* LUIS BERTRAN ALVAREZ <luis.bertran-alvarez@lirmm.fr> > *Sent:* Friday, September 27, 2024 2:39:41 AM > *To:* The gem5 Users mailing list <gem5-users@gem5.org> > *Cc:* Hossam ElAtali <hossam.elatali@uwaterloo.ca> > *Subject:* Re: [gem5-users] Reproducible/deterministic runs > > > Hi Hossam, > > > From what I understand you modified the memtest.py to run full-system. > Which CPU models are you using to boot the OS? KVM CPU models are not > deterministic. If you still want to bypass your cache system to avoid > longer booting times you can use the non-caching simple CPU model > (deterministic). > > Best, > Luis > > > Le 27/09/2024 à 01:04, Hossam ElAtali via gem5-users a écrit : > > Hi, > > I wanted to ask if there is a way to make full-system runs deterministic. > I tried running with faketime set at a fixed date and time (to possibly > trick any rng seeds) but that doesn't help. > > The reason I'm asking is because of the following situation I run into. I > have a modified gem5 and when I run the memtest.py config with gem5.opt, I > get a write/read mismatch at a certain point. When I rerun with debug flags > on, the mismatch doesn't occur (or occurs earlier/later). I also tried the > following: > > Let's say the mismatch is caught at tick 1000 (for demonstrative > purposes). If I run... > > ./build/X86/gem5.opt -re --silent-redirect -d "test" > configs/example/memtest.py -f 0 -c 2:1:1:1 -t 2:1:1:1:3 > > ...or... > > ./build/X86/gem5.opt -re --silent-redirect -d "test" > --debug-flags=CacheAll,MemTest --debug-file=dbg --debug-start=5000 configs/example/memtest.py > -f 0 -c 2:1:1:1 -t 2:1:1:1:3 > > ...I get a mismatch at tick 1000 (before tracing starts in the second > case). If I run... > > ./build/X86/gem5.opt -re --silent-redirect -d "test" > --debug-flags=CacheAll,MemTest --debug-file=dbg --debug-start=600 > configs/example/memtest.py -f 0 -c 2:1:1:1 -t 2:1:1:1:3 > > ...I don't get a mismatch at tick 1000 and the simulation continues past > that point (and might find a different mismatch afterwards). > > So is there any way to make a run completely deterministic so these > differences don't occur? That way I can run gem5 quickly without tracing > until I hit a mismatch, then rerun with tracing enabled right before that > point. Thanks a lot for your help. > > Best, > Hossam > > _______________________________________________ > gem5-users mailing list -- gem5-users@gem5.org > To unsubscribe send an email to gem5-users-leave@gem5.org > > _______________________________________________ > gem5-users mailing list -- gem5-users@gem5.org > To unsubscribe send an email to gem5-users-leave@gem5.org >
SB
Srikant Bharadwaj
Fri, Sep 27, 2024 4:54 PM

Hi,
I have faced this issue in non-FS mode in the past and it always eventually
boiled down to two things:

  1. If you running your jobs in a cluster - it is possible that the
    ld.so.cache is different in each host. This means that your simulations
    will be slightly different because of the libraries loaded by gem5. If you
    are not trying your simulations on job clusters, then ignore this point.
  2. There is a memory leak - this is getting triggered in one way when you
    don’t have debug flags on and another way when you do. Solution is to run
    your simulation with a tool like valgrind to detect them. It’s often easier
    to find the code line where this is getting triggered by running valgrind
    on a debug gem5 build.

Thanks,
Srikant

On Fri, Sep 27, 2024 at 8:28 AM Hossam ElAtali via gem5-users <
gem5-users@gem5.org> wrote:

Hi Luis,

I think I misspoke when I said full-system, since there are no cpus in the
memtest.py configuration. Only the MemTest objects that generate reads and
writes. Are you saying the only source of randomness is whatever generates
the reads/writes? So the caches, mem controller, DRAM model, etc. do not
introduce any randomness?

For runs with CPU models, I understand how KVM models are
non-deterministic, but is there a way to make other models (e.g. o3)
deterministic as well (or are they deterministic by default)?

Best,
Hossam

From: LUIS BERTRAN ALVAREZ luis.bertran-alvarez@lirmm.fr
Sent: Friday, September 27, 2024 2:39:41 AM
To: The gem5 Users mailing list gem5-users@gem5.org
Cc: Hossam ElAtali hossam.elatali@uwaterloo.ca
Subject: Re: [gem5-users] Reproducible/deterministic runs

Hi Hossam,

From what I understand you modified the memtest.py to run full-system.
Which CPU models are you using to boot the OS? KVM CPU models are not
deterministic. If you still want to bypass your cache system to avoid
longer booting times you can use the non-caching simple CPU model
(deterministic).

Best,
Luis

Le 27/09/2024 à 01:04, Hossam ElAtali via gem5-users a écrit :

Hi,

I wanted to ask if there is a way to make full-system runs deterministic.
I tried running with faketime set at a fixed date and time (to possibly
trick any rng seeds) but that doesn't help.

The reason I'm asking is because of the following situation I run into. I
have a modified gem5 and when I run the memtest.py config with gem5.opt, I
get a write/read mismatch at a certain point. When I rerun with debug flags
on, the mismatch doesn't occur (or occurs earlier/later). I also tried the
following:

Let's say the mismatch is caught at tick 1000 (for demonstrative
purposes). If I run...

./build/X86/gem5.opt -re --silent-redirect -d "test"
configs/example/memtest.py -f 0 -c 2:1:1:1 -t 2:1:1:1:3

...or...

./build/X86/gem5.opt -re --silent-redirect -d "test"
--debug-flags=CacheAll,MemTest --debug-file=dbg --debug-start=5000 configs/example/memtest.py
-f 0 -c 2:1:1:1 -t 2:1:1:1:3

...I get a mismatch at tick 1000 (before tracing starts in the second
case). If I run...

./build/X86/gem5.opt -re --silent-redirect -d "test"
--debug-flags=CacheAll,MemTest --debug-file=dbg --debug-start=600
configs/example/memtest.py -f 0 -c 2:1:1:1 -t 2:1:1:1:3

...I don't get a mismatch at tick 1000 and the simulation continues past
that point (and might find a different mismatch afterwards).

So is there any way to make a run completely deterministic so these
differences don't occur? That way I can run gem5 quickly without tracing
until I hit a mismatch, then rerun with tracing enabled right before that
point. Thanks a lot for your help.

Best,
Hossam


gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-leave@gem5.org


gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-leave@gem5.org

Hi, I have faced this issue in non-FS mode in the past and it always eventually boiled down to two things: 1. If you running your jobs in a cluster - it is possible that the ld.so.cache is different in each host. This means that your simulations will be slightly different because of the libraries loaded by gem5. If you are not trying your simulations on job clusters, then ignore this point. 2. There is a memory leak - this is getting triggered in one way when you don’t have debug flags on and another way when you do. Solution is to run your simulation with a tool like valgrind to detect them. It’s often easier to find the code line where this is getting triggered by running valgrind on a debug gem5 build. Thanks, Srikant On Fri, Sep 27, 2024 at 8:28 AM Hossam ElAtali via gem5-users < gem5-users@gem5.org> wrote: > Hi Luis, > > I think I misspoke when I said full-system, since there are no cpus in the > memtest.py configuration. Only the MemTest objects that generate reads and > writes. Are you saying the only source of randomness is whatever generates > the reads/writes? So the caches, mem controller, DRAM model, etc. do not > introduce any randomness? > > For runs with CPU models, I understand how KVM models are > non-deterministic, but is there a way to make other models (e.g. o3) > deterministic as well (or are they deterministic by default)? > > Best, > Hossam > ------------------------------ > *From:* LUIS BERTRAN ALVAREZ <luis.bertran-alvarez@lirmm.fr> > *Sent:* Friday, September 27, 2024 2:39:41 AM > *To:* The gem5 Users mailing list <gem5-users@gem5.org> > *Cc:* Hossam ElAtali <hossam.elatali@uwaterloo.ca> > *Subject:* Re: [gem5-users] Reproducible/deterministic runs > > > Hi Hossam, > > > From what I understand you modified the memtest.py to run full-system. > Which CPU models are you using to boot the OS? KVM CPU models are not > deterministic. If you still want to bypass your cache system to avoid > longer booting times you can use the non-caching simple CPU model > (deterministic). > > Best, > Luis > > > Le 27/09/2024 à 01:04, Hossam ElAtali via gem5-users a écrit : > > Hi, > > I wanted to ask if there is a way to make full-system runs deterministic. > I tried running with faketime set at a fixed date and time (to possibly > trick any rng seeds) but that doesn't help. > > The reason I'm asking is because of the following situation I run into. I > have a modified gem5 and when I run the memtest.py config with gem5.opt, I > get a write/read mismatch at a certain point. When I rerun with debug flags > on, the mismatch doesn't occur (or occurs earlier/later). I also tried the > following: > > Let's say the mismatch is caught at tick 1000 (for demonstrative > purposes). If I run... > > ./build/X86/gem5.opt -re --silent-redirect -d "test" > configs/example/memtest.py -f 0 -c 2:1:1:1 -t 2:1:1:1:3 > > ...or... > > ./build/X86/gem5.opt -re --silent-redirect -d "test" > --debug-flags=CacheAll,MemTest --debug-file=dbg --debug-start=5000 configs/example/memtest.py > -f 0 -c 2:1:1:1 -t 2:1:1:1:3 > > ...I get a mismatch at tick 1000 (before tracing starts in the second > case). If I run... > > ./build/X86/gem5.opt -re --silent-redirect -d "test" > --debug-flags=CacheAll,MemTest --debug-file=dbg --debug-start=600 > configs/example/memtest.py -f 0 -c 2:1:1:1 -t 2:1:1:1:3 > > ...I don't get a mismatch at tick 1000 and the simulation continues past > that point (and might find a different mismatch afterwards). > > So is there any way to make a run completely deterministic so these > differences don't occur? That way I can run gem5 quickly without tracing > until I hit a mismatch, then rerun with tracing enabled right before that > point. Thanks a lot for your help. > > Best, > Hossam > > _______________________________________________ > gem5-users mailing list -- gem5-users@gem5.org > To unsubscribe send an email to gem5-users-leave@gem5.org > > _______________________________________________ > gem5-users mailing list -- gem5-users@gem5.org > To unsubscribe send an email to gem5-users-leave@gem5.org >