gem5-users@gem5.org

The gem5 Users mailing list

View all threads

Simulation problem on restoring simpoint checkpoint

L
lsh125184@163.com
Tue, Apr 30, 2024 7:50 AM

Hi

I am currently using Simpoint to make some application running checkpoints on SPEC2006. According to some guidance online, my steps are as following:

  1. Running benchmark on gem5 in SE mode using options “--simpoint-profile --simpoint-interval 100000” to generate simpoint.bb.gz file

  2. Running Simpoint options like this to generate test.simpoint and test.weight “-loadFVFile simpoint.bb.gz -maxK 30 -saveSimpoints test.simpoint -saveSimpointWeights test.weight -inputVectorsGzipped”

  3. Rerun gem5 with options: “--take-simpoint-checkpoint=./445.gobmk-simpoint/test.simpoint,./445.gobmk-simpoint/test.weight,1000000,0 ” to generate checkpoint like: “cpt.simpoint_00_inst_1000000_weight_0.193939_interval_1000000_warmup_0”

  4. Restore using checkpoint generate by step 3 using options: “--restore-simpoint-checkpoint -r 2 --checkpoint-dir ./445.gobmk-simpoint \“

My problem is I can run Step1-3 successfully and checkpoints of the benchmark is generated.

However, when I try running Step4 with whichever checkpoint, the simulation process just terminate after execution of ONLY 1 INSTRUCTION, here’s the output:

Resuming from ./445.gobmk-simpoint/cpt.simpoint_04_inst_90000000_weight_0.018182_interval_1000000_warmup_0
Resuming from SimPoint #4, start_inst:90000000, weight:0.018182, interval:1000000, warmup:0
Global frequency set at 1000000000000 ticks per second
warn: No dot file generated. Please install pydot to generate the dot file and pdf.
src/mem/dram_interface.cc:690: warn: DRAM device capacity (8192 Mbytes) does not match the address range assigned (512 Mbytes)
src/arch/riscv/isa.cc:275: info: RVV enabled, VLEN = 256 bits, ELEN = 64 bits
src/base/statistics.hh:279: warn: One of the stats is a legacy stat. Legacy stat is a stat that does not belong to any statistics::Group. Legacy stat is deprecated.
system.remote_gdb: Listening for connections on port 7003
src/sim/process.cc:396: warn: Checkpoints for pipes, device drivers and sockets do not work.
Switch at curTick count:10000
src/sim/simulate.cc:199: info: Entering event queue @ 58886401000.  Starting simulation...
Switched CPUS @ tick 58886401500
switching cpus
src/sim/simulate.cc:199: info: Entering event queue @ 58886401500.  Starting simulation...
src/sim/power_state.cc:105: warn: PowerState: Already in the requested power state, request ignored
src/sim/simulate.cc:199: info: Entering event queue @ 58886402000.  Starting simulation...
Warmed up! Dumping and resetting stats!
src/sim/simulate.cc:199: info: Entering event queue @ 58886402500.  Starting simulation...

Exiting @ tick 58886411000 because simulate() limit reached

As shown above, simulation ENCOUNTER EXITEVENT immediately after entering eventq and stats.txt shows only 1 INSTRUCTION is done.

My Question is: Is there somthing wrong about my procedure or Did I do something wrong?

Im running under gem5v23.1, Simpoint 3.2 in RISCV arch.

Hi I am currently using Simpoint to make some application running checkpoints on SPEC2006. According to some guidance online, my steps are as following: 1. Running benchmark on gem5 in SE mode using options “--simpoint-profile --simpoint-interval 100000” to generate simpoint.bb.gz file 2. Running Simpoint options like this to generate test.simpoint and test.weight “-loadFVFile simpoint.bb.gz -maxK 30 -saveSimpoints test.simpoint -saveSimpointWeights test.weight -inputVectorsGzipped” 3. Rerun gem5 with options: “--take-simpoint-checkpoint=./445.gobmk-simpoint/test.simpoint,./445.gobmk-simpoint/test.weight,1000000,0 ” to generate checkpoint like: “cpt.simpoint_00_inst_1000000_weight_0.193939_interval_1000000_warmup_0” 4. Restore using checkpoint generate by step 3 using options: “--restore-simpoint-checkpoint -r 2 --checkpoint-dir ./445.gobmk-simpoint \\“ My problem is I can run Step1-3 successfully and checkpoints of the benchmark is generated. However, when I try running Step4 with whichever checkpoint, the simulation process just terminate after execution of ONLY 1 INSTRUCTION, here’s the output: ``` Resuming from ./445.gobmk-simpoint/cpt.simpoint_04_inst_90000000_weight_0.018182_interval_1000000_warmup_0 ``` ``` Resuming from SimPoint #4, start_inst:90000000, weight:0.018182, interval:1000000, warmup:0 ``` ``` Global frequency set at 1000000000000 ticks per second ``` ``` warn: No dot file generated. Please install pydot to generate the dot file and pdf. ``` ``` src/mem/dram_interface.cc:690: warn: DRAM device capacity (8192 Mbytes) does not match the address range assigned (512 Mbytes) ``` ``` src/arch/riscv/isa.cc:275: info: RVV enabled, VLEN = 256 bits, ELEN = 64 bits ``` ``` src/base/statistics.hh:279: warn: One of the stats is a legacy stat. Legacy stat is a stat that does not belong to any statistics::Group. Legacy stat is deprecated. ``` ``` system.remote_gdb: Listening for connections on port 7003 ``` ``` src/sim/process.cc:396: warn: Checkpoints for pipes, device drivers and sockets do not work. ``` ``` Switch at curTick count:10000 ``` ``` src/sim/simulate.cc:199: info: Entering event queue @ 58886401000. Starting simulation... ``` ``` Switched CPUS @ tick 58886401500 ``` ``` switching cpus ``` ``` src/sim/simulate.cc:199: info: Entering event queue @ 58886401500. Starting simulation... ``` ``` src/sim/power_state.cc:105: warn: PowerState: Already in the requested power state, request ignored ``` ``` src/sim/simulate.cc:199: info: Entering event queue @ 58886402000. Starting simulation... ``` ``` Warmed up! Dumping and resetting stats! ``` ``` src/sim/simulate.cc:199: info: Entering event queue @ 58886402500. Starting simulation... ``` Exiting @ tick 58886411000 because simulate() limit reached As shown above, simulation ENCOUNTER EXITEVENT immediately after entering eventq and stats.txt shows only 1 INSTRUCTION is done. My Question is: Is there somthing wrong about my procedure or Did I do something wrong? Im running under gem5v23.1, Simpoint 3.2 in RISCV arch.
L
lsh125184@163.com
Tue, Apr 30, 2024 8:06 AM

I do some PDB debug on configs/common/Simulation.py and found:

  1. restoreSimpointCheckpoint() is executed

  2. I can run m5.simulate() in pdb to enable simulation go beyond this Exit :


src/sim/simulate.cc:199: info: Entering event queue @ 58886402500.  Starting simulation...

Exiting @ tick 58886411000 because simulate() limit reached

Run m5.simulate() in PDB

(Pdb)  m5.simulate()

src/sim/simulate.cc:199: info: Entering event queue @ 58886449000.  Starting simulation...

<_m5.event.GlobalSimLoopExitEvent object at 0x7fba289f8af0>

(Pdb)  m5.simulate()

src/sim/simulate.cc:199: info: Entering event queue @ 64395564000.  Starting simulation...


I don’t know if this is correct for restoring checkpoint because the simulation should continute to exceed on 58886449000 instead of 64395564000. And no stats generated although I use m5.stats.dump() or m5.stats.reset() on PDB.

Any help would be appreciated!

Thanks.

I do some PDB debug on configs/common/Simulation.py and found: 1. restoreSimpointCheckpoint() is executed 2. I can run m5.simulate() in pdb to enable simulation go beyond this Exit : --- src/sim/simulate.cc:199: info: Entering event queue @ 58886402500. Starting simulation... Exiting @ tick 58886411000 because simulate() limit reached Run m5.simulate() in PDB (Pdb) m5.simulate() src/sim/simulate.cc:199: info: Entering event queue @ 58886449000. Starting simulation... <_m5.event.GlobalSimLoopExitEvent object at 0x7fba289f8af0> (Pdb) m5.simulate() src/sim/simulate.cc:199: info: Entering event queue @ 64395564000. Starting simulation... --- I don’t know if this is correct for restoring checkpoint because the simulation should continute to exceed on 58886449000 instead of 64395564000. And no stats generated although I use m5.stats.dump() or m5.stats.reset() on PDB. Any help would be appreciated! Thanks.
AB
Ananda Biswas
Tue, Apr 30, 2024 2:27 PM

Can you share your full command used in Step-4?

On Tue, Apr 30, 2024 at 3:07 AM shuhao ling via gem5-users <
gem5-users@gem5.org> wrote:

I do some PDB debug on configs/common/Simulation.py and found:

1.

restoreSimpointCheckpoint() is executed
2.

I can run m5.simulate() in pdb to enable simulation go beyond this
Exit :

src/sim/simulate.cc:199: info: Entering event queue @ 58886402500.
Starting simulation...

Exiting @ tick 58886411000 because simulate() limit reached

Run m5.simulate() in PDB

(Pdb) m5.simulate()

src/sim/simulate.cc:199: info: Entering event queue @ 58886449000.
Starting simulation...

<_m5.event.GlobalSimLoopExitEvent object at 0x7fba289f8af0>

(Pdb) m5.simulate()

src/sim/simulate.cc:199: info: Entering event queue @ 64395564000.
Starting simulation...

I don’t know if this is correct for restoring checkpoint because the
simulation should continute to exceed on 58886449000 instead of
64395564000. And no stats generated although I use m5.stats.dump() or
m5.stats.reset() on PDB.

Any help would be appreciated!

Thanks.


gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-leave@gem5.org

Can you share your full command used in Step-4? On Tue, Apr 30, 2024 at 3:07 AM shuhao ling via gem5-users < gem5-users@gem5.org> wrote: > I do some PDB debug on configs/common/Simulation.py and found: > > 1. > > restoreSimpointCheckpoint() is executed > 2. > > I can run m5.simulate() in pdb to enable simulation go beyond this > Exit : > > > ------------------------------ > > > src/sim/simulate.cc:199: info: Entering event queue @ 58886402500. > Starting simulation... > > Exiting @ tick 58886411000 because simulate() limit reached > > Run m5.simulate() in PDB > > (Pdb) m5.simulate() > > src/sim/simulate.cc:199: info: Entering event queue @ 58886449000. > Starting simulation... > > <_m5.event.GlobalSimLoopExitEvent object at 0x7fba289f8af0> > > (Pdb) m5.simulate() > > src/sim/simulate.cc:199: info: Entering event queue @ 64395564000. > Starting simulation... > ------------------------------ > > I don’t know if this is correct for restoring checkpoint because the > simulation should continute to exceed on 58886449000 instead of > 64395564000. And no stats generated although I use m5.stats.dump() or > m5.stats.reset() on PDB. > > > Any help would be appreciated! > > Thanks. > _______________________________________________ > gem5-users mailing list -- gem5-users@gem5.org > To unsubscribe send an email to gem5-users-leave@gem5.org >
L
lsh125184@163.com
Mon, May 6, 2024 1:35 AM

Hi

Very sorry for the late reply, I am using command line like this, I try restore with MinorCPU with `—caches` and AtomicSimpleCPU without `—caches`, it resualted in the same situation though.

build/RISCV/gem5.fast \     --outdir=/home/Documents/gem5/spec/445.gobmk-simpoint \     configs/deprecated/example/se.py --num-cpus=1 --cpu-type=MinorCPU --caches --mem-size=512MB \     --restore-simpoint-checkpoint -r 2 --checkpoint-dir ./spec/445.gobmk-simpoint \     --cmd=/home/Documents/gem5/spec/run/445.gobmk/riscv-445.gobmk-input1-240423-162455/run/gobmk_base.gcc43-64bit '--options=--quiet --mode gtp' \     --output=/home/Documents/gem5/spec/run/445.gobmk/riscv-445.gobmk-input1-240423-162455/run/13x13.out \     --input=/home/Documents/gem5/spec/run/445.gobmk/riscv-445.gobmk-input1-240423-162455/run/13x13.tst \     --errout=/home/Documents/gem5/spec/run/445.gobmk/riscv-445.gobmk-input1-240423-162455/run/13x13.err  

And I did some debug for event queue and found that the CPU actually running TickEvent 1 by 1 before the ExitEvent was Triggered. The ExitEvent was triggered when its Tick was hit.

Thanks for your help

Regards

Ling

Hi Very sorry for the late reply, I am using command line like this, I try restore with MinorCPU with \`—caches\` and AtomicSimpleCPU without \`—caches\`, it resualted in the same situation though. ``` build/RISCV/gem5.fast \ --outdir=/home/Documents/gem5/spec/445.gobmk-simpoint \ configs/deprecated/example/se.py --num-cpus=1 --cpu-type=MinorCPU --caches --mem-size=512MB \ --restore-simpoint-checkpoint -r 2 --checkpoint-dir ./spec/445.gobmk-simpoint \ --cmd=/home/Documents/gem5/spec/run/445.gobmk/riscv-445.gobmk-input1-240423-162455/run/gobmk_base.gcc43-64bit '--options=--quiet --mode gtp' \ --output=/home/Documents/gem5/spec/run/445.gobmk/riscv-445.gobmk-input1-240423-162455/run/13x13.out \ --input=/home/Documents/gem5/spec/run/445.gobmk/riscv-445.gobmk-input1-240423-162455/run/13x13.tst \ --errout=/home/Documents/gem5/spec/run/445.gobmk/riscv-445.gobmk-input1-240423-162455/run/13x13.err ``` And I did some debug for event queue and found that the CPU actually running TickEvent 1 by 1 before the ExitEvent was Triggered. The ExitEvent was triggered when its Tick was hit. Thanks for your help Regards Ling
L
lsh125184@163.com
Mon, May 6, 2024 3:04 AM

Hi

I guess I found out where this ExitEvent is inserted, in configs/common/Simulation.py:751:

        else:
            print(f"Switch at curTick count:{str(10000)}")
            exit_event = m5.simulate(10000)  

This line return a ExitEvent afert 10000 Ticks after checkpoint restore, when the ExitEvent is triggered. In my case, it is 13073282500, 10000 Ticks afeter my checkpoint curTick=13073272500.  I change it to m5.simulate() and ExitEvent is scheduled to further Ticks(MAXTICK i think). Finally, the checkpoint restore successfully.

Why is there a m5.simulate(10000) working as deault? Am i restoring correctly?

Any help would be appreciate!

Regards

Ling

Hi I guess I found out where this ExitEvent is inserted, in configs/common/Simulation.py:751: ```         else: ``` ```             print(f"Switch at curTick count:{str(10000)}") ``` ```             exit_event = m5.simulate(10000) ``` This line return a ExitEvent afert 10000 Ticks after checkpoint restore, when the ExitEvent is triggered. In my case, it is 13073282500, 10000 Ticks afeter my checkpoint curTick=13073272500. I change it to **m5.simulate()** and ExitEvent is scheduled to further Ticks(MAXTICK i think). Finally, the checkpoint restore successfully. Why is there a m5.simulate(10000) working as deault? Am i restoring correctly? Any help would be appreciate! Regards Ling