Hi,
I am currently working on GPU model of Gem5 there while running alexnet
benchmark in DNNMARK suite in 24-0, I am currently getting an error that my
memory size is less. So to resolve it I have increased memory size to 8GB.
After increasing the memory size I am getting a different error which I am
not able to resolve, I am attaching the error and the command which I have
used
Error--
src/arch/x86/faults.cc:167: panic: Tried to write unmapped address
0x7fffffffdf80.
PC: (0x7ffff8009475=>0x7ffff8009478).(0=>1), Instr: MOV_M_R : st edx,
SS:[t0 + rsp]
Memory Usage: 20324256 KBytes
Program aborted at tick 190131659500
--- BEGIN LIBC BACKTRACE ---
build/VEGA_X86/gem5.opt(_ZN4gem515print_backtraceEv+0x30)[0x5601f24103f0]
build/VEGA_X86/gem5.opt(_ZN4gem512abortHandlerEi+0x4c)[0x5601f243b8ac]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x14420)[0x7f94cc539420]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb)[0x7f94cb70300b]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x12b)[0x7f94cb6e2859]
build/VEGA_X86/gem5.opt(+0xdebc05)[0x5601f224dc05]
build/VEGA_X86/gem5.opt(_ZN4gem56X86ISA9PageFault6invokeEPNS_13ThreadContextERKNS_14RefCountingPtrINS_10StaticInstEEE+0x1b6)[0x5601f3222856]
build/VEGA_X86/gem5.opt(_ZN4gem513BaseSimpleCPU9advancePCERKSt10shared_ptrINS_9FaultBaseEE+0xde)[0x5601f3a07e1e]
build/VEGA_X86/gem5.opt(_ZN4gem515TimingSimpleCPU11advanceInstERKSt10shared_ptrINS_9FaultBaseEE+0xc6)[0x5601f39ff026]
build/VEGA_X86/gem5.opt(_ZN4gem515TimingSimpleCPU17finishTranslationEPNS_21WholeTranslationStateE+0x111)[0x5601f3a02c81]
build/VEGA_X86/gem5.opt(_ZN4gem515DataTranslationIPNS_15TimingSimpleCPUEE6finishERKSt10shared_ptrINS_9FaultBaseEERKS4_INS_7RequestEEPNS_13ThreadContextENS_7BaseMMU4ModeE+0xe3)[0x5601f3a060f3]
build/VEGA_X86/gem5.opt(_ZN4gem56X86ISA3TLB15translateTimingERKSt10shared_ptrINS_7RequestEEPNS_13ThreadContextEPNS_7BaseMMU11TranslationENS9_4ModeE+0xd2)[0x5601f3263e82]
build/VEGA_X86/gem5.opt(_ZN4gem515TimingSimpleCPU8writeMemEPhjmNS_5FlagsImEEPmRKSt6vectorIbSaIbEE+0x758)[0x5601f3a04888]
build/VEGA_X86/gem5.opt(_ZN4gem517SimpleExecContext8writeMemEPhjmNS_5FlagsImEEPmRKSt6vectorIbSaIbEE+0x57)[0x5601f3a099a7]
build/VEGA_X86/gem5.opt(+0x22559e6)[0x5601f36b79e6]
build/VEGA_X86/gem5.opt(_ZNK4gem510X86ISAInst2St11initiateAccEPNS_11ExecContextEPNS_5trace10InstRecordE+0x168)[0x5601f36f46f8]
build/VEGA_X86/gem5.opt(_ZN4gem515TimingSimpleCPU14completeIfetchEPNS_6PacketE+0x16d)[0x5601f39ffefd]
build/VEGA_X86/gem5.opt(_ZN4gem510EventQueue10serviceOneEv+0x175)[0x5601f2427eb5]
build/VEGA_X86/gem5.opt(_ZN4gem59doSimLoopEPNS_10EventQueueE+0x70)[0x5601f245cb30]
build/VEGA_X86/gem5.opt(_ZN4gem58simulateEm+0x28b)[0x5601f245d1bb]
build/VEGA_X86/gem5.opt(+0x2a12620)[0x5601f3e74620]
build/VEGA_X86/gem5.opt(+0xdcc388)[0x5601f222e388]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x2a8748)[0x7f94cc7f0748]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x8dd8)[0x7f94cc5c5f48]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0x8fb)[0x7f94cc712e4b]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyFunction_Vectorcall+0x94)[0x7f94cc7f0124]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x74d6d)[0x7f94cc5bcd6d]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x7d86)[0x7f94cc5c4ef6]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0x8fb)[0x7f94cc712e4b]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(PyEval_EvalCodeEx+0x42)[0x7f94cc7131d2]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(PyEval_EvalCode+0x1f)[0x7f94cc7135bf]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x1cfc01)[0x7f94cc717c01]
--- END LIBC BACKTRACE ---
For more info on how to address this issue, please visit
https://www.gem5.org/documentation/general_docs/common-errors/
Command-
docker run --rm -v ${PWD}:${PWD} -v
${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0
-w ${PWD} ghcr.io/gem5/gcn-gpu:v24-0 build/VEGA_X86/gem5.opt
configs/example/apu_se.py -n3
--benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_alexnet
-c dnnmark_test_alexnet --options="-config
gem5-resources/src/gpu/DNNMark/config_example/alexnet.dnnmark -mmap
gem5-resources/src/gpu/DNNMark/mmap.bin" --mem-size=8GB
I am also getting similar error for test_fwd_conv and VGG.
Regards,
Ravikant
Hi Ravikant,
From looking at the details below, it appears you are using the GPUSE gem5
support. In this version, I don’t believe we ever officially got AlexNet
or VGG working. For fwd_conv there was a prior message on this mailing
list about some of the issues with it, but I’m having a hard time finding
it on my phone. Maybe check the gem5 message archive?
Regarding AlexNet I spent a bunch of time on them, but every time I fixed a
bug there was another one a layer or two after. We have made a number of
fixes since I last tried, but seems the state is the same, sadly. This
failure you are running into is a common one, sort of like a “segfault”
error message but in gem5. Basically the error message is telling you that
the program is accessing (writing) some memory address it shouldn’t be.
Unfortunately since many different bugs led to this error there isn’t a
perfect place I can point you to. But if you are willing to help us debug,
here are some ideas:
Ultimately if you are willing to help debug this issue we can try to
provide some guidance on how to fix it (and then hopefully you will
consider contributing the bug fix back!). Let us know what happens with
the above and we can go from there.
Hope this helps,
Matt
On Thu, Aug 8, 2024 at 10:19 AM Ravikant Bhardwaj via gem5-users <
gem5-users@gem5.org> wrote:
Hi,
I am currently working on GPU model of Gem5 there while running alexnet
benchmark in DNNMARK suite in 24-0, I am currently getting an error that my
memory size is less. So to resolve it I have increased memory size to 8GB.
After increasing the memory size I am getting a different error which I am
not able to resolve, I am attaching the error and the command which I have
used
Error--
src/arch/x86/faults.cc:167: panic: Tried to write unmapped address
0x7fffffffdf80.
PC: (0x7ffff8009475=>0x7ffff8009478).(0=>1), Instr: MOV_M_R : st edx,
SS:[t0 + rsp]
Memory Usage: 20324256 KBytes
Program aborted at tick 190131659500
--- BEGIN LIBC BACKTRACE ---
build/VEGA_X86/gem5.opt(_ZN4gem515print_backtraceEv+0x30)[0x5601f24103f0]
build/VEGA_X86/gem5.opt(_ZN4gem512abortHandlerEi+0x4c)[0x5601f243b8ac]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x14420)[0x7f94cc539420]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb)[0x7f94cb70300b]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x12b)[0x7f94cb6e2859]
build/VEGA_X86/gem5.opt(+0xdebc05)[0x5601f224dc05]
build/VEGA_X86/gem5.opt(_ZN4gem56X86ISA9PageFault6invokeEPNS_13ThreadContextERKNS_14RefCountingPtrINS_10StaticInstEEE+0x1b6)[0x5601f3222856]
build/VEGA_X86/gem5.opt(_ZN4gem513BaseSimpleCPU9advancePCERKSt10shared_ptrINS_9FaultBaseEE+0xde)[0x5601f3a07e1e]
build/VEGA_X86/gem5.opt(_ZN4gem515TimingSimpleCPU11advanceInstERKSt10shared_ptrINS_9FaultBaseEE+0xc6)[0x5601f39ff026]
build/VEGA_X86/gem5.opt(_ZN4gem515TimingSimpleCPU17finishTranslationEPNS_21WholeTranslationStateE+0x111)[0x5601f3a02c81]
build/VEGA_X86/gem5.opt(_ZN4gem515DataTranslationIPNS_15TimingSimpleCPUEE6finishERKSt10shared_ptrINS_9FaultBaseEERKS4_INS_7RequestEEPNS_13ThreadContextENS_7BaseMMU4ModeE+0xe3)[0x5601f3a060f3]
build/VEGA_X86/gem5.opt(_ZN4gem56X86ISA3TLB15translateTimingERKSt10shared_ptrINS_7RequestEEPNS_13ThreadContextEPNS_7BaseMMU11TranslationENS9_4ModeE+0xd2)[0x5601f3263e82]
build/VEGA_X86/gem5.opt(_ZN4gem515TimingSimpleCPU8writeMemEPhjmNS_5FlagsImEEPmRKSt6vectorIbSaIbEE+0x758)[0x5601f3a04888]
build/VEGA_X86/gem5.opt(_ZN4gem517SimpleExecContext8writeMemEPhjmNS_5FlagsImEEPmRKSt6vectorIbSaIbEE+0x57)[0x5601f3a099a7]
build/VEGA_X86/gem5.opt(+0x22559e6)[0x5601f36b79e6]
build/VEGA_X86/gem5.opt(_ZNK4gem510X86ISAInst2St11initiateAccEPNS_11ExecContextEPNS_5trace10InstRecordE+0x168)[0x5601f36f46f8]
build/VEGA_X86/gem5.opt(_ZN4gem515TimingSimpleCPU14completeIfetchEPNS_6PacketE+0x16d)[0x5601f39ffefd]
build/VEGA_X86/gem5.opt(_ZN4gem510EventQueue10serviceOneEv+0x175)[0x5601f2427eb5]
build/VEGA_X86/gem5.opt(_ZN4gem59doSimLoopEPNS_10EventQueueE+0x70)[0x5601f245cb30]
build/VEGA_X86/gem5.opt(_ZN4gem58simulateEm+0x28b)[0x5601f245d1bb]
build/VEGA_X86/gem5.opt(+0x2a12620)[0x5601f3e74620]
build/VEGA_X86/gem5.opt(+0xdcc388)[0x5601f222e388]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x2a8748)[0x7f94cc7f0748]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x8dd8)[0x7f94cc5c5f48]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0x8fb)[0x7f94cc712e4b]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyFunction_Vectorcall+0x94)[0x7f94cc7f0124]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x74d6d)[0x7f94cc5bcd6d]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x7d86)[0x7f94cc5c4ef6]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0x8fb)[0x7f94cc712e4b]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(PyEval_EvalCodeEx+0x42)[0x7f94cc7131d2]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(PyEval_EvalCode+0x1f)[0x7f94cc7135bf]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x1cfc01)[0x7f94cc717c01]
--- END LIBC BACKTRACE ---
For more info on how to address this issue, please visit
https://www.gem5.org/documentation/general_docs/common-errors/
Command-
docker run --rm -v ${PWD}:${PWD} -v
${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0
-w ${PWD} ghcr.io/gem5/gcn-gpu:v24-0 build/VEGA_X86/gem5.opt
configs/example/apu_se.py -n3
--benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_alexnet
-c dnnmark_test_alexnet --options="-config
gem5-resources/src/gpu/DNNMark/config_example/alexnet.dnnmark -mmap
gem5-resources/src/gpu/DNNMark/mmap.bin" --mem-size=8GB
I am also getting similar error for test_fwd_conv and VGG.
Regards,
Ravikant
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-leave@gem5.org
Hi matt,
Thanks for reply. Currently I was looking for running CPU and GPU workloads
together on Gem5. Like running square and hello world application on cpu
and gpu. For that I have tried running by adding --
subprocess.call(["--cmd=./tests/test-progs/hello/bin/x86/linux/hello",
'--cpu-type=DerivO3CPU', "--l1d_size=32kB", "--l1i_size=32kB",
"--l2_size=256kB", "-- caches", "--l2cache", "--l3cache", "--l3_size=8MB",
"--num-cpus=8", "--mem-size=8192MB"])
in apu_se.py , but its not reading the file in command. Can there be a way
to run cpu gpu workloads in parallel. Like first offloading GPU workload
from cpu and then on other cores we can run cpu workloads. If
thats possible please guide. And also I was trying to see if shared LLC
between CPU and GPU can be added in Gem5, if there's a way for adding
that, it will also be good.
Regards,
Ravikant
Hi Ravikant,
If I understand your request, you are trying to run multiple processes (one
on CPU, one on CPU+GPU) simultaneously. I have never tried doing this, and
thus do not know how to make it work or not. My guess though is that gem5
does not support running multiple concurrent processes. Sorry I don't have
better news; maybe someone else who knows more about this can help.
In terms of shared LLC: that is already done, effectively, with the
directory between the CPU and GPU in apu_se.py. By default its size is 0B
(i.e., it just acts as a directory, not a cache) but you can change the
size.
Thanks,
Matt
On Sat, Sep 7, 2024 at 8:31 AM Ravikant Bhardwaj ravikant7031@gmail.com
wrote:
Hi matt,
Thanks for reply. Currently I was looking for running CPU and GPU
workloads together on Gem5. Like running square and hello world application
on cpu and gpu. For that I have tried running by adding --
subprocess.call(["--cmd=./tests/test-progs/hello/bin/x86/linux/hello",
'--cpu-type=DerivO3CPU', "--l1d_size=32kB", "--l1i_size=32kB",
"--l2_size=256kB", "-- caches", "--l2cache", "--l3cache", "--l3_size=8MB",
"--num-cpus=8", "--mem-size=8192MB"])
in apu_se.py , but its not reading the file in command. Can there be a way
to run cpu gpu workloads in parallel. Like first offloading GPU workload
from cpu and then on other cores we can run cpu workloads. If
thats possible please guide. And also I was trying to see if shared LLC
between CPU and GPU can be added in Gem5, if there's a way for adding
that, it will also be good.
Regards,
Ravikant