gem5-users@gem5.org

The gem5 Users mailing list

View all threads

Segmentation fault when restoring checkpoint in with DerivO3CPU

Đức Anh
Mon, Oct 4, 2021 8:46 PM

Dear all,

I am trying to use the checkpoint feature to skip the long and tired Linux
booting part of the ARM FS simulation. However, Gem5 throws the
Segmentation fault when I try to restore the checkpoint. It works fine with
AtomicSimpleCPU, though.

Here is the script I used to take the checkpoint

  • run ./m5 checkpoint through the connected terminal
  • in the python script, run m5.checkpoint("m5out/cpt.%d") and m5.simulate()
    again.

Then I restore the checkpoint by:
m5.instantiate(<checkpoint_dir>)
m5.simulate()

There is no CPU switching. I used DeriveO3CPU

I also tried the fs.py script with --checkpoint-restore option but the
problem persists. Here is the error log:

--- BEGIN LIBC BACKTRACE ---
build/ARM/gem5.opt(_ZN4gem515print_backtraceEv+0x2c)[0x55a2b680abec]
build/ARM/gem5.opt(+0x1c346ef)[0x55a2b68266ef]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x12980)[0x7fa8ee493980]
--- END LIBC BACKTRACE ---
Segmentation fault (core dumped)

I read in one of the older emails that fs.py is outdated. Perhaps dealing
with checkpoint and DerivO3CPU need another way?

Best regards,
Duc Anh

Dear all, I am trying to use the checkpoint feature to skip the long and tired Linux booting part of the ARM FS simulation. However, Gem5 throws the Segmentation fault when I try to restore the checkpoint. It works fine with AtomicSimpleCPU, though. Here is the script I used to take the checkpoint - run ./m5 checkpoint through the connected terminal - in the python script, run m5.checkpoint("m5out/cpt.%d") and m5.simulate() again. Then I restore the checkpoint by: m5.instantiate(<checkpoint_dir>) m5.simulate() There is no CPU switching. I used DeriveO3CPU I also tried the fs.py script with --checkpoint-restore option but the problem persists. Here is the error log: --- BEGIN LIBC BACKTRACE --- build/ARM/gem5.opt(_ZN4gem515print_backtraceEv+0x2c)[0x55a2b680abec] build/ARM/gem5.opt(+0x1c346ef)[0x55a2b68266ef] /lib/x86_64-linux-gnu/libpthread.so.0(+0x12980)[0x7fa8ee493980] --- END LIBC BACKTRACE --- Segmentation fault (core dumped) I read in one of the older emails that fs.py is outdated. Perhaps dealing with checkpoint and DerivO3CPU need another way? Best regards, Duc Anh
EM
Eliot Moss
Tue, Oct 5, 2021 1:43 AM

On 10/4/2021 4:46 PM, Đức Anh via gem5-users wrote:

Dear all,

I am trying to use the checkpoint feature to skip the long and tired Linux booting part of the ARM
FS simulation. However, Gem5 throws the Segmentation fault when I try to restore the checkpoint. It
works fine with AtomicSimpleCPU, though.

Here is the script I used to take the checkpoint

  • run ./m5 checkpoint through the connected terminal
  • in the python script, run m5.checkpoint("m5out/cpt.%d") and m5.simulate() again.

Then I restore the checkpoint by:
m5.instantiate(<checkpoint_dir>)
m5.simulate()

There is no CPU switching. I used DeriveO3CPU

I also tried the fs.py script with --checkpoint-restore option but the problem persists. Here is the
error log:

--- BEGIN LIBC BACKTRACE ---
build/ARM/gem5.opt(_ZN4gem515print_backtraceEv+0x2c)[0x55a2b680abec]
build/ARM/gem5.opt(+0x1c346ef)[0x55a2b68266ef]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x12980)[0x7fa8ee493980]
--- END LIBC BACKTRACE ---
Segmentation fault (core dumped)

I read in one of the older emails that fs.py is outdated. Perhaps dealing with checkpoint and
DerivO3CPU need another way?

See recent posts - folks are on the trail of a bug in either
the C++ code, C/C++ libraries, or C++ compiler / tool chain.
If it's the same bug then it has to do with the simulator's
memory management somehow (or a pointer gets zapped) - the
result is a bad free.  This may NOT be the same issue.  The
backtrace I get from gem5.opt indicates a bad free with
tcmalloc reporting it.

fs.py has worked for me, maybe with some customizations.
I think that the wording about it is trying to suggest that
it's perhaps a bit rigid or limiting for things a lot of people
may want to do ...

Regards - Eliot Moss

On 10/4/2021 4:46 PM, Đức Anh via gem5-users wrote: > Dear all, > > I am trying to use the checkpoint feature to skip the long and tired Linux booting part of the ARM > FS simulation. However, Gem5 throws the Segmentation fault when I try to restore the checkpoint. It > works fine with AtomicSimpleCPU, though. > > Here is the script I used to take the checkpoint > - run ./m5 checkpoint through the connected terminal > - in the python script, run m5.checkpoint("m5out/cpt.%d") and m5.simulate() again. > > Then I restore the checkpoint by: > m5.instantiate(<checkpoint_dir>) > m5.simulate() > > There is no CPU switching. I used DeriveO3CPU > > I also tried the fs.py script with --checkpoint-restore option but the problem persists. Here is the > error log: > > --- BEGIN LIBC BACKTRACE --- > build/ARM/gem5.opt(_ZN4gem515print_backtraceEv+0x2c)[0x55a2b680abec] > build/ARM/gem5.opt(+0x1c346ef)[0x55a2b68266ef] > /lib/x86_64-linux-gnu/libpthread.so.0(+0x12980)[0x7fa8ee493980] > --- END LIBC BACKTRACE --- > Segmentation fault (core dumped) > > I read in one of the older emails that fs.py is outdated. Perhaps dealing with checkpoint and > DerivO3CPU need another way? See recent posts - folks are on the trail of a bug in either the C++ code, C/C++ libraries, or C++ compiler / tool chain. If it's the same bug then it has to do with the simulator's memory management somehow (or a pointer gets zapped) - the result is a bad free. This may NOT be the same issue. The backtrace I get from gem5.opt indicates a bad free with tcmalloc reporting it. fs.py has worked for me, maybe with some customizations. I think that the wording about it is trying to suggest that it's perhaps a bit rigid or limiting for things a lot of people may want to do ... Regards - Eliot Moss
Đức Anh
Tue, Oct 5, 2021 12:01 PM

Hi Eliot,

How could you restore the checkpoint with fs.py? What command option did
you use? I read through the fs.py file so I
believed m5.instatiate(<checkpoint_dir>) is enough. I am using

  • gem5 version [DEVELOP-FOR-V21.2]
  • commit 6811158b28bd293487fb5e4bbbfb4bc2d5c259cb
  • GCC version 7.5.0 and 11.1.0 (I tried both)
  • Ubuntu 18.04

Best regards,
Duc Anh

On Tue, 5 Oct 2021 at 03:43, Eliot Moss <moss(a)cs.umass.edu> wrote:

On 10/4/2021 4:46 PM, Đức Anh via gem5-users wrote:

Dear all,

I am trying to use the checkpoint feature to skip the long and tired

Linux booting part of the ARM

FS simulation. However, Gem5 throws the Segmentation fault when I try to

restore the checkpoint. It

works fine with AtomicSimpleCPU, though.

Here is the script I used to take the checkpoint

  • run ./m5 checkpoint through the connected terminal
  • in the python script, run m5.checkpoint("m5out/cpt.%d") and

m5.simulate() again.

Then I restore the checkpoint by:
m5.instantiate(<checkpoint_dir>)
m5.simulate()

There is no CPU switching. I used DeriveO3CPU

I also tried the fs.py script with --checkpoint-restore option but the

problem persists. Here is the

error log:

--- BEGIN LIBC BACKTRACE ---
build/ARM/gem5.opt(_ZN4gem515print_backtraceEv+0x2c)[0x55a2b680abec]
build/ARM/gem5.opt(+0x1c346ef)[0x55a2b68266ef]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x12980)[0x7fa8ee493980]
--- END LIBC BACKTRACE ---
Segmentation fault (core dumped)

I read in one of the older emails that fs.py is outdated. Perhaps

dealing with checkpoint and

DerivO3CPU need another way?

See recent posts - folks are on the trail of a bug in either
the C++ code, C/C++ libraries, or C++ compiler / tool chain.
If it's the same bug then it has to do with the simulator's
memory management somehow (or a pointer gets zapped) - the
result is a bad free.  This may NOT be the same issue.  The
backtrace I get from gem5.opt indicates a bad free with
tcmalloc reporting it.

fs.py has worked for me, maybe with some customizations.
I think that the wording about it is trying to suggest that
it's perhaps a bit rigid or limiting for things a lot of people
may want to do ...

Regards - Eliot Moss

Hi Eliot, How could you restore the checkpoint with fs.py? What command option did you use? I read through the fs.py file so I believed m5.instatiate(<checkpoint_dir>) is enough. I am using - gem5 version [DEVELOP-FOR-V21.2] - commit 6811158b28bd293487fb5e4bbbfb4bc2d5c259cb - GCC version 7.5.0 and 11.1.0 (I tried both) - Ubuntu 18.04 Best regards, Duc Anh On Tue, 5 Oct 2021 at 03:43, Eliot Moss <moss(a)cs.umass.edu> wrote: > On 10/4/2021 4:46 PM, Đức Anh via gem5-users wrote: > > Dear all, > > > > I am trying to use the checkpoint feature to skip the long and tired > Linux booting part of the ARM > > FS simulation. However, Gem5 throws the Segmentation fault when I try to > restore the checkpoint. It > > works fine with AtomicSimpleCPU, though. > > > > Here is the script I used to take the checkpoint > > - run ./m5 checkpoint through the connected terminal > > - in the python script, run m5.checkpoint("m5out/cpt.%d") and > m5.simulate() again. > > > > Then I restore the checkpoint by: > > m5.instantiate(<checkpoint_dir>) > > m5.simulate() > > > > There is no CPU switching. I used DeriveO3CPU > > > > I also tried the fs.py script with --checkpoint-restore option but the > problem persists. Here is the > > error log: > > > > --- BEGIN LIBC BACKTRACE --- > > build/ARM/gem5.opt(_ZN4gem515print_backtraceEv+0x2c)[0x55a2b680abec] > > build/ARM/gem5.opt(+0x1c346ef)[0x55a2b68266ef] > > /lib/x86_64-linux-gnu/libpthread.so.0(+0x12980)[0x7fa8ee493980] > > --- END LIBC BACKTRACE --- > > Segmentation fault (core dumped) > > > > I read in one of the older emails that fs.py is outdated. Perhaps > dealing with checkpoint and > > DerivO3CPU need another way? > > See recent posts - folks are on the trail of a bug in either > the C++ code, C/C++ libraries, or C++ compiler / tool chain. > If it's the same bug then it has to do with the simulator's > memory management somehow (or a pointer gets zapped) - the > result is a bad free. This may NOT be the same issue. The > backtrace I get from gem5.opt indicates a bad free with > tcmalloc reporting it. > > fs.py has worked for me, maybe with some customizations. > I think that the wording about it is trying to suggest that > it's perhaps a bit rigid or limiting for things a lot of people > may want to do ... > > Regards - Eliot Moss >
EM
Eliot Moss
Tue, Oct 5, 2021 2:33 PM

On 10/5/2021 8:01 AM, Đức Anh wrote:

Hi Eliot,

How could you restore the checkpoint with fs.py? What command option did you use? I read through the
fs.py file so I believed m5.instatiate(<checkpoint_dir>) is enough. I am using

  • gem5 version [DEVELOP-FOR-V21.2]
  • commit 6811158b28bd293487fb5e4bbbfb4bc2d5c259cb
  • GCC version 7.5.0 and 11.1.0 (I tried both)
  • Ubuntu 18.04

Here's the command I run.  My OS kernel is modified, but not in ways that
would matter.  My gem5 version 21.0, not 21.2.  My gcc is 10.3.0.  I believe
my fs.py is out-of-the-box.  You probably know this, but some of the arguments
on the command line are to gem5 and others are to fs.py.  It's important to
get them in the right place (before / after fs.py on the command line).

I believe it looks for the checkpoint directory inside the --path directory.

./build/X86/gem5.fast --stats-file=stats_gem5.txt.gz
--outdir=/home/moss/Autoclean/work/scenarios/a01/default/m5out
--path=/home/moss/Autoclean/work/scenarios/a01/default/m5out --quiet --listener-mode=on
./configs/example/fs.py --cpu-type=DerivO3CPU --num-cpu=1 --cpu-clock=3GHz
--script=/home/moss/Autoclean/work/scenarios/a01/default/m5out/do.rcs --caches --l2cache
--l1i_size=32kB --l1d_size=64kB --l2_size=4MB --mem-type=LPDDR5_6400_1x16_BG_BL32 --mem-size=512MB
--cacheline_size=64 --kernel=/home/moss/Autoclean/work/linux_kernel/vmlinux-5.2.3
--disk-image=/home/moss/Autoclean/work/gem5-images/ubuntu-18.04/ubuntu-18.04-base.img
--disk-image=/home/moss/Autoclean/AUTOC/../work/gem5-images/extra.img.2430.tmp
--checkpoint-restore=1 --restore-with-cpu=DerivO3CP

Best wishes - EM

On 10/5/2021 8:01 AM, Đức Anh wrote: > Hi Eliot, > > How could you restore the checkpoint with fs.py? What command option did you use? I read through the > fs.py file so I believed m5.instatiate(<checkpoint_dir>) is enough. I am using > - gem5 version [DEVELOP-FOR-V21.2] > - commit 6811158b28bd293487fb5e4bbbfb4bc2d5c259cb > - GCC version 7.5.0 and 11.1.0 (I tried both) > - Ubuntu 18.04 Here's the command I run. My OS kernel is modified, but not in ways that would matter. My gem5 version 21.0, not 21.2. My gcc is 10.3.0. I believe my fs.py is out-of-the-box. You probably know this, but some of the arguments on the command line are to gem5 and others are to fs.py. It's important to get them in the right place (before / after fs.py on the command line). I believe it looks for the checkpoint directory inside the --path directory. ./build/X86/gem5.fast --stats-file=stats_gem5.txt.gz --outdir=/home/moss/Autoclean/work/scenarios/a01/default/m5out --path=/home/moss/Autoclean/work/scenarios/a01/default/m5out --quiet --listener-mode=on ./configs/example/fs.py --cpu-type=DerivO3CPU --num-cpu=1 --cpu-clock=3GHz --script=/home/moss/Autoclean/work/scenarios/a01/default/m5out/do.rcs --caches --l2cache --l1i_size=32kB --l1d_size=64kB --l2_size=4MB --mem-type=LPDDR5_6400_1x16_BG_BL32 --mem-size=512MB --cacheline_size=64 --kernel=/home/moss/Autoclean/work/linux_kernel/vmlinux-5.2.3 --disk-image=/home/moss/Autoclean/work/gem5-images/ubuntu-18.04/ubuntu-18.04-base.img --disk-image=/home/moss/Autoclean/AUTOC/../work/gem5-images/extra.img.2430.tmp --checkpoint-restore=1 --restore-with-cpu=DerivO3CP Best wishes - EM