gem5-users@gem5.org

The gem5 Users mailing list

View all threads

Gem5 gpu

RB
Ravikant Bhardwaj
Thu, Aug 8, 2024 2:07 PM

Hi,

I am currently working on GPU model of Gem5 there while running alexnet
benchmark in DNNMARK suite in 24-0, I am currently getting an error that my
memory size is less. So to resolve it I have increased memory size to 8GB.
After increasing the memory size I am getting a different error which I am
not able to resolve, I am attaching the error and the command which I have
used

Error--

src/arch/x86/faults.cc:167: panic: Tried to write unmapped address
0x7fffffffdf80.
PC: (0x7ffff8009475=>0x7ffff8009478).(0=>1), Instr:  MOV_M_R : st  edx,
SS:[t0 + rsp]
Memory Usage: 20324256 KBytes
Program aborted at tick 190131659500
--- BEGIN LIBC BACKTRACE ---
build/VEGA_X86/gem5.opt(_ZN4gem515print_backtraceEv+0x30)[0x5601f24103f0]
build/VEGA_X86/gem5.opt(_ZN4gem512abortHandlerEi+0x4c)[0x5601f243b8ac]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x14420)[0x7f94cc539420]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb)[0x7f94cb70300b]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x12b)[0x7f94cb6e2859]
build/VEGA_X86/gem5.opt(+0xdebc05)[0x5601f224dc05]
build/VEGA_X86/gem5.opt(_ZN4gem56X86ISA9PageFault6invokeEPNS_13ThreadContextERKNS_14RefCountingPtrINS_10StaticInstEEE+0x1b6)[0x5601f3222856]
build/VEGA_X86/gem5.opt(_ZN4gem513BaseSimpleCPU9advancePCERKSt10shared_ptrINS_9FaultBaseEE+0xde)[0x5601f3a07e1e]
build/VEGA_X86/gem5.opt(_ZN4gem515TimingSimpleCPU11advanceInstERKSt10shared_ptrINS_9FaultBaseEE+0xc6)[0x5601f39ff026]
build/VEGA_X86/gem5.opt(_ZN4gem515TimingSimpleCPU17finishTranslationEPNS_21WholeTranslationStateE+0x111)[0x5601f3a02c81]
build/VEGA_X86/gem5.opt(_ZN4gem515DataTranslationIPNS_15TimingSimpleCPUEE6finishERKSt10shared_ptrINS_9FaultBaseEERKS4_INS_7RequestEEPNS_13ThreadContextENS_7BaseMMU4ModeE+0xe3)[0x5601f3a060f3]
build/VEGA_X86/gem5.opt(_ZN4gem56X86ISA3TLB15translateTimingERKSt10shared_ptrINS_7RequestEEPNS_13ThreadContextEPNS_7BaseMMU11TranslationENS9_4ModeE+0xd2)[0x5601f3263e82]
build/VEGA_X86/gem5.opt(_ZN4gem515TimingSimpleCPU8writeMemEPhjmNS_5FlagsImEEPmRKSt6vectorIbSaIbEE+0x758)[0x5601f3a04888]
build/VEGA_X86/gem5.opt(_ZN4gem517SimpleExecContext8writeMemEPhjmNS_5FlagsImEEPmRKSt6vectorIbSaIbEE+0x57)[0x5601f3a099a7]
build/VEGA_X86/gem5.opt(+0x22559e6)[0x5601f36b79e6]
build/VEGA_X86/gem5.opt(_ZNK4gem510X86ISAInst2St11initiateAccEPNS_11ExecContextEPNS_5trace10InstRecordE+0x168)[0x5601f36f46f8]
build/VEGA_X86/gem5.opt(_ZN4gem515TimingSimpleCPU14completeIfetchEPNS_6PacketE+0x16d)[0x5601f39ffefd]
build/VEGA_X86/gem5.opt(_ZN4gem510EventQueue10serviceOneEv+0x175)[0x5601f2427eb5]
build/VEGA_X86/gem5.opt(_ZN4gem59doSimLoopEPNS_10EventQueueE+0x70)[0x5601f245cb30]
build/VEGA_X86/gem5.opt(_ZN4gem58simulateEm+0x28b)[0x5601f245d1bb]
build/VEGA_X86/gem5.opt(+0x2a12620)[0x5601f3e74620]
build/VEGA_X86/gem5.opt(+0xdcc388)[0x5601f222e388]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x2a8748)[0x7f94cc7f0748]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x8dd8)[0x7f94cc5c5f48]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0x8fb)[0x7f94cc712e4b]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyFunction_Vectorcall+0x94)[0x7f94cc7f0124]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x74d6d)[0x7f94cc5bcd6d]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x7d86)[0x7f94cc5c4ef6]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0x8fb)[0x7f94cc712e4b]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(PyEval_EvalCodeEx+0x42)[0x7f94cc7131d2]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(PyEval_EvalCode+0x1f)[0x7f94cc7135bf]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x1cfc01)[0x7f94cc717c01]
--- END LIBC BACKTRACE ---
For more info on how to address this issue, please visit
https://www.gem5.org/documentation/general_docs/common-errors/

Command-

docker run --rm -v ${PWD}:${PWD} -v
${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0
-w ${PWD} ghcr.io/gem5/gcn-gpu:v24-0 build/VEGA_X86/gem5.opt
configs/example/apu_se.py -n3
--benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_alexnet
-c dnnmark_test_alexnet --options="-config
gem5-resources/src/gpu/DNNMark/config_example/alexnet.dnnmark -mmap
gem5-resources/src/gpu/DNNMark/mmap.bin" --mem-size=8GB

I am also getting similar error for test_fwd_conv and VGG.
Regards,
Ravikant

Hi, I am currently working on GPU model of Gem5 there while running alexnet benchmark in DNNMARK suite in 24-0, I am currently getting an error that my memory size is less. So to resolve it I have increased memory size to 8GB. After increasing the memory size I am getting a different error which I am not able to resolve, I am attaching the error and the command which I have used Error-- src/arch/x86/faults.cc:167: panic: Tried to write unmapped address 0x7fffffffdf80. PC: (0x7ffff8009475=>0x7ffff8009478).(0=>1), Instr: MOV_M_R : st edx, SS:[t0 + rsp] Memory Usage: 20324256 KBytes Program aborted at tick 190131659500 --- BEGIN LIBC BACKTRACE --- build/VEGA_X86/gem5.opt(_ZN4gem515print_backtraceEv+0x30)[0x5601f24103f0] build/VEGA_X86/gem5.opt(_ZN4gem512abortHandlerEi+0x4c)[0x5601f243b8ac] /lib/x86_64-linux-gnu/libpthread.so.0(+0x14420)[0x7f94cc539420] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb)[0x7f94cb70300b] /lib/x86_64-linux-gnu/libc.so.6(abort+0x12b)[0x7f94cb6e2859] build/VEGA_X86/gem5.opt(+0xdebc05)[0x5601f224dc05] build/VEGA_X86/gem5.opt(_ZN4gem56X86ISA9PageFault6invokeEPNS_13ThreadContextERKNS_14RefCountingPtrINS_10StaticInstEEE+0x1b6)[0x5601f3222856] build/VEGA_X86/gem5.opt(_ZN4gem513BaseSimpleCPU9advancePCERKSt10shared_ptrINS_9FaultBaseEE+0xde)[0x5601f3a07e1e] build/VEGA_X86/gem5.opt(_ZN4gem515TimingSimpleCPU11advanceInstERKSt10shared_ptrINS_9FaultBaseEE+0xc6)[0x5601f39ff026] build/VEGA_X86/gem5.opt(_ZN4gem515TimingSimpleCPU17finishTranslationEPNS_21WholeTranslationStateE+0x111)[0x5601f3a02c81] build/VEGA_X86/gem5.opt(_ZN4gem515DataTranslationIPNS_15TimingSimpleCPUEE6finishERKSt10shared_ptrINS_9FaultBaseEERKS4_INS_7RequestEEPNS_13ThreadContextENS_7BaseMMU4ModeE+0xe3)[0x5601f3a060f3] build/VEGA_X86/gem5.opt(_ZN4gem56X86ISA3TLB15translateTimingERKSt10shared_ptrINS_7RequestEEPNS_13ThreadContextEPNS_7BaseMMU11TranslationENS9_4ModeE+0xd2)[0x5601f3263e82] build/VEGA_X86/gem5.opt(_ZN4gem515TimingSimpleCPU8writeMemEPhjmNS_5FlagsImEEPmRKSt6vectorIbSaIbEE+0x758)[0x5601f3a04888] build/VEGA_X86/gem5.opt(_ZN4gem517SimpleExecContext8writeMemEPhjmNS_5FlagsImEEPmRKSt6vectorIbSaIbEE+0x57)[0x5601f3a099a7] build/VEGA_X86/gem5.opt(+0x22559e6)[0x5601f36b79e6] build/VEGA_X86/gem5.opt(_ZNK4gem510X86ISAInst2St11initiateAccEPNS_11ExecContextEPNS_5trace10InstRecordE+0x168)[0x5601f36f46f8] build/VEGA_X86/gem5.opt(_ZN4gem515TimingSimpleCPU14completeIfetchEPNS_6PacketE+0x16d)[0x5601f39ffefd] build/VEGA_X86/gem5.opt(_ZN4gem510EventQueue10serviceOneEv+0x175)[0x5601f2427eb5] build/VEGA_X86/gem5.opt(_ZN4gem59doSimLoopEPNS_10EventQueueE+0x70)[0x5601f245cb30] build/VEGA_X86/gem5.opt(_ZN4gem58simulateEm+0x28b)[0x5601f245d1bb] build/VEGA_X86/gem5.opt(+0x2a12620)[0x5601f3e74620] build/VEGA_X86/gem5.opt(+0xdcc388)[0x5601f222e388] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x2a8748)[0x7f94cc7f0748] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x8dd8)[0x7f94cc5c5f48] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0x8fb)[0x7f94cc712e4b] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyFunction_Vectorcall+0x94)[0x7f94cc7f0124] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x74d6d)[0x7f94cc5bcd6d] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x7d86)[0x7f94cc5c4ef6] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0x8fb)[0x7f94cc712e4b] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(PyEval_EvalCodeEx+0x42)[0x7f94cc7131d2] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(PyEval_EvalCode+0x1f)[0x7f94cc7135bf] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x1cfc01)[0x7f94cc717c01] --- END LIBC BACKTRACE --- For more info on how to address this issue, please visit https://www.gem5.org/documentation/general_docs/common-errors/ Command- docker run --rm -v ${PWD}:${PWD} -v ${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 -w ${PWD} ghcr.io/gem5/gcn-gpu:v24-0 build/VEGA_X86/gem5.opt configs/example/apu_se.py -n3 --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_alexnet -c dnnmark_test_alexnet --options="-config gem5-resources/src/gpu/DNNMark/config_example/alexnet.dnnmark -mmap gem5-resources/src/gpu/DNNMark/mmap.bin" --mem-size=8GB I am also getting similar error for test_fwd_conv and VGG. Regards, Ravikant
MS
Matt Sinclair
Fri, Aug 9, 2024 9:15 PM

Hi Ravikant,

From looking at the details below, it appears you are using the GPUSE gem5
support.  In this version, I don’t believe we ever officially got AlexNet
or VGG working.  For fwd_conv there was a prior message on this mailing
list about some of the issues with it, but I’m having a hard time finding
it on my phone. Maybe check the gem5 message archive?

Regarding AlexNet I spent a bunch of time on them, but every time I fixed a
bug there was another one a layer or two after.  We have made a number of
fixes since I last tried, but seems the state is the same, sadly.  This
failure you are running into is a common one, sort of like a “segfault”
error message but in gem5.  Basically the error message is telling you that
the program is accessing (writing) some memory address it shouldn’t be.
Unfortunately since many different bugs led to this error there isn’t a
perfect place I can point you to.  But if you are willing to help us debug,
here are some ideas:

  • if you run DNNMark with its DEBUG flag set, it will print more
    information about what it was trying to do around where the failure
    occurred, which might help us provide more useful help.
  • are you running with stable or develop?  If you are using stable, I
    recommend trying develop — we are pushing bug fixes there often.
  • I have not tried DNNMark with GPUFS yet, but if you are willing to give
    it a try (and/or don’t need GPUSE for your research), I would recommend
    trying DNNMark with GPUFS.  We have been focusing most of our effort on
    supporting GPUFS in recent months because it is much easier to support
    newer ROCm versions in it.
  • if none of the above solve your problem, you’d need to get a trace then
    to identify where this unmapped address is coming from and fix that.

Ultimately if you are willing to help debug this issue we can try to
provide some guidance on how to fix it (and then hopefully you will
consider contributing the bug fix back!).  Let us know what happens with
the above and we can go from there.

Hope this helps,
Matt

On Thu, Aug 8, 2024 at 10:19 AM Ravikant Bhardwaj via gem5-users <
gem5-users@gem5.org> wrote:

Hi,

I am currently working on GPU model of Gem5 there while running alexnet
benchmark in DNNMARK suite in 24-0, I am currently getting an error that my
memory size is less. So to resolve it I have increased memory size to 8GB.
After increasing the memory size I am getting a different error which I am
not able to resolve, I am attaching the error and the command which I have
used

Error--

src/arch/x86/faults.cc:167: panic: Tried to write unmapped address
0x7fffffffdf80.
PC: (0x7ffff8009475=>0x7ffff8009478).(0=>1), Instr:  MOV_M_R : st  edx,
SS:[t0 + rsp]
Memory Usage: 20324256 KBytes
Program aborted at tick 190131659500
--- BEGIN LIBC BACKTRACE ---
build/VEGA_X86/gem5.opt(_ZN4gem515print_backtraceEv+0x30)[0x5601f24103f0]
build/VEGA_X86/gem5.opt(_ZN4gem512abortHandlerEi+0x4c)[0x5601f243b8ac]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x14420)[0x7f94cc539420]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb)[0x7f94cb70300b]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x12b)[0x7f94cb6e2859]
build/VEGA_X86/gem5.opt(+0xdebc05)[0x5601f224dc05]

build/VEGA_X86/gem5.opt(_ZN4gem56X86ISA9PageFault6invokeEPNS_13ThreadContextERKNS_14RefCountingPtrINS_10StaticInstEEE+0x1b6)[0x5601f3222856]

build/VEGA_X86/gem5.opt(_ZN4gem513BaseSimpleCPU9advancePCERKSt10shared_ptrINS_9FaultBaseEE+0xde)[0x5601f3a07e1e]

build/VEGA_X86/gem5.opt(_ZN4gem515TimingSimpleCPU11advanceInstERKSt10shared_ptrINS_9FaultBaseEE+0xc6)[0x5601f39ff026]

build/VEGA_X86/gem5.opt(_ZN4gem515TimingSimpleCPU17finishTranslationEPNS_21WholeTranslationStateE+0x111)[0x5601f3a02c81]

build/VEGA_X86/gem5.opt(_ZN4gem515DataTranslationIPNS_15TimingSimpleCPUEE6finishERKSt10shared_ptrINS_9FaultBaseEERKS4_INS_7RequestEEPNS_13ThreadContextENS_7BaseMMU4ModeE+0xe3)[0x5601f3a060f3]

build/VEGA_X86/gem5.opt(_ZN4gem56X86ISA3TLB15translateTimingERKSt10shared_ptrINS_7RequestEEPNS_13ThreadContextEPNS_7BaseMMU11TranslationENS9_4ModeE+0xd2)[0x5601f3263e82]

build/VEGA_X86/gem5.opt(_ZN4gem515TimingSimpleCPU8writeMemEPhjmNS_5FlagsImEEPmRKSt6vectorIbSaIbEE+0x758)[0x5601f3a04888]

build/VEGA_X86/gem5.opt(_ZN4gem517SimpleExecContext8writeMemEPhjmNS_5FlagsImEEPmRKSt6vectorIbSaIbEE+0x57)[0x5601f3a099a7]
build/VEGA_X86/gem5.opt(+0x22559e6)[0x5601f36b79e6]

build/VEGA_X86/gem5.opt(_ZNK4gem510X86ISAInst2St11initiateAccEPNS_11ExecContextEPNS_5trace10InstRecordE+0x168)[0x5601f36f46f8]

build/VEGA_X86/gem5.opt(_ZN4gem515TimingSimpleCPU14completeIfetchEPNS_6PacketE+0x16d)[0x5601f39ffefd]

build/VEGA_X86/gem5.opt(_ZN4gem510EventQueue10serviceOneEv+0x175)[0x5601f2427eb5]

build/VEGA_X86/gem5.opt(_ZN4gem59doSimLoopEPNS_10EventQueueE+0x70)[0x5601f245cb30]
build/VEGA_X86/gem5.opt(_ZN4gem58simulateEm+0x28b)[0x5601f245d1bb]
build/VEGA_X86/gem5.opt(+0x2a12620)[0x5601f3e74620]
build/VEGA_X86/gem5.opt(+0xdcc388)[0x5601f222e388]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x2a8748)[0x7f94cc7f0748]

/lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x8dd8)[0x7f94cc5c5f48]

/lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0x8fb)[0x7f94cc712e4b]

/lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyFunction_Vectorcall+0x94)[0x7f94cc7f0124]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x74d6d)[0x7f94cc5bcd6d]

/lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x7d86)[0x7f94cc5c4ef6]

/lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0x8fb)[0x7f94cc712e4b]

/lib/x86_64-linux-gnu/libpython3.8.so.1.0(PyEval_EvalCodeEx+0x42)[0x7f94cc7131d2]

/lib/x86_64-linux-gnu/libpython3.8.so.1.0(PyEval_EvalCode+0x1f)[0x7f94cc7135bf]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x1cfc01)[0x7f94cc717c01]
--- END LIBC BACKTRACE ---
For more info on how to address this issue, please visit
https://www.gem5.org/documentation/general_docs/common-errors/

Command-

docker run --rm -v ${PWD}:${PWD} -v
${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0
-w ${PWD} ghcr.io/gem5/gcn-gpu:v24-0 build/VEGA_X86/gem5.opt
configs/example/apu_se.py -n3
--benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_alexnet
-c dnnmark_test_alexnet --options="-config
gem5-resources/src/gpu/DNNMark/config_example/alexnet.dnnmark -mmap
gem5-resources/src/gpu/DNNMark/mmap.bin" --mem-size=8GB

I am also getting similar error for test_fwd_conv and VGG.
Regards,
Ravikant


gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-leave@gem5.org

Hi Ravikant, From looking at the details below, it appears you are using the GPUSE gem5 support. In this version, I don’t believe we ever officially got AlexNet or VGG working. For fwd_conv there was a prior message on this mailing list about some of the issues with it, but I’m having a hard time finding it on my phone. Maybe check the gem5 message archive? Regarding AlexNet I spent a bunch of time on them, but every time I fixed a bug there was another one a layer or two after. We have made a number of fixes since I last tried, but seems the state is the same, sadly. This failure you are running into is a common one, sort of like a “segfault” error message but in gem5. Basically the error message is telling you that the program is accessing (writing) some memory address it shouldn’t be. Unfortunately since many different bugs led to this error there isn’t a perfect place I can point you to. But if you are willing to help us debug, here are some ideas: - if you run DNNMark with its DEBUG flag set, it will print more information about what it was trying to do around where the failure occurred, which might help us provide more useful help. - are you running with stable or develop? If you are using stable, I recommend trying develop — we are pushing bug fixes there often. - I have not tried DNNMark with GPUFS yet, but if you are willing to give it a try (and/or don’t need GPUSE for your research), I would recommend trying DNNMark with GPUFS. We have been focusing most of our effort on supporting GPUFS in recent months because it is much easier to support newer ROCm versions in it. - if none of the above solve your problem, you’d need to get a trace then to identify where this unmapped address is coming from and fix that. Ultimately if you are willing to help debug this issue we can try to provide some guidance on how to fix it (and then hopefully you will consider contributing the bug fix back!). Let us know what happens with the above and we can go from there. Hope this helps, Matt On Thu, Aug 8, 2024 at 10:19 AM Ravikant Bhardwaj via gem5-users < gem5-users@gem5.org> wrote: > Hi, > > I am currently working on GPU model of Gem5 there while running alexnet > benchmark in DNNMARK suite in 24-0, I am currently getting an error that my > memory size is less. So to resolve it I have increased memory size to 8GB. > After increasing the memory size I am getting a different error which I am > not able to resolve, I am attaching the error and the command which I have > used > > Error-- > > src/arch/x86/faults.cc:167: panic: Tried to write unmapped address > 0x7fffffffdf80. > PC: (0x7ffff8009475=>0x7ffff8009478).(0=>1), Instr: MOV_M_R : st edx, > SS:[t0 + rsp] > Memory Usage: 20324256 KBytes > Program aborted at tick 190131659500 > --- BEGIN LIBC BACKTRACE --- > build/VEGA_X86/gem5.opt(_ZN4gem515print_backtraceEv+0x30)[0x5601f24103f0] > build/VEGA_X86/gem5.opt(_ZN4gem512abortHandlerEi+0x4c)[0x5601f243b8ac] > /lib/x86_64-linux-gnu/libpthread.so.0(+0x14420)[0x7f94cc539420] > /lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb)[0x7f94cb70300b] > /lib/x86_64-linux-gnu/libc.so.6(abort+0x12b)[0x7f94cb6e2859] > build/VEGA_X86/gem5.opt(+0xdebc05)[0x5601f224dc05] > > build/VEGA_X86/gem5.opt(_ZN4gem56X86ISA9PageFault6invokeEPNS_13ThreadContextERKNS_14RefCountingPtrINS_10StaticInstEEE+0x1b6)[0x5601f3222856] > > build/VEGA_X86/gem5.opt(_ZN4gem513BaseSimpleCPU9advancePCERKSt10shared_ptrINS_9FaultBaseEE+0xde)[0x5601f3a07e1e] > > build/VEGA_X86/gem5.opt(_ZN4gem515TimingSimpleCPU11advanceInstERKSt10shared_ptrINS_9FaultBaseEE+0xc6)[0x5601f39ff026] > > build/VEGA_X86/gem5.opt(_ZN4gem515TimingSimpleCPU17finishTranslationEPNS_21WholeTranslationStateE+0x111)[0x5601f3a02c81] > > build/VEGA_X86/gem5.opt(_ZN4gem515DataTranslationIPNS_15TimingSimpleCPUEE6finishERKSt10shared_ptrINS_9FaultBaseEERKS4_INS_7RequestEEPNS_13ThreadContextENS_7BaseMMU4ModeE+0xe3)[0x5601f3a060f3] > > build/VEGA_X86/gem5.opt(_ZN4gem56X86ISA3TLB15translateTimingERKSt10shared_ptrINS_7RequestEEPNS_13ThreadContextEPNS_7BaseMMU11TranslationENS9_4ModeE+0xd2)[0x5601f3263e82] > > build/VEGA_X86/gem5.opt(_ZN4gem515TimingSimpleCPU8writeMemEPhjmNS_5FlagsImEEPmRKSt6vectorIbSaIbEE+0x758)[0x5601f3a04888] > > build/VEGA_X86/gem5.opt(_ZN4gem517SimpleExecContext8writeMemEPhjmNS_5FlagsImEEPmRKSt6vectorIbSaIbEE+0x57)[0x5601f3a099a7] > build/VEGA_X86/gem5.opt(+0x22559e6)[0x5601f36b79e6] > > build/VEGA_X86/gem5.opt(_ZNK4gem510X86ISAInst2St11initiateAccEPNS_11ExecContextEPNS_5trace10InstRecordE+0x168)[0x5601f36f46f8] > > build/VEGA_X86/gem5.opt(_ZN4gem515TimingSimpleCPU14completeIfetchEPNS_6PacketE+0x16d)[0x5601f39ffefd] > > build/VEGA_X86/gem5.opt(_ZN4gem510EventQueue10serviceOneEv+0x175)[0x5601f2427eb5] > > build/VEGA_X86/gem5.opt(_ZN4gem59doSimLoopEPNS_10EventQueueE+0x70)[0x5601f245cb30] > build/VEGA_X86/gem5.opt(_ZN4gem58simulateEm+0x28b)[0x5601f245d1bb] > build/VEGA_X86/gem5.opt(+0x2a12620)[0x5601f3e74620] > build/VEGA_X86/gem5.opt(+0xdcc388)[0x5601f222e388] > /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x2a8748)[0x7f94cc7f0748] > > /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x8dd8)[0x7f94cc5c5f48] > > /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0x8fb)[0x7f94cc712e4b] > > /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyFunction_Vectorcall+0x94)[0x7f94cc7f0124] > /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x74d6d)[0x7f94cc5bcd6d] > > /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x7d86)[0x7f94cc5c4ef6] > > /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0x8fb)[0x7f94cc712e4b] > > /lib/x86_64-linux-gnu/libpython3.8.so.1.0(PyEval_EvalCodeEx+0x42)[0x7f94cc7131d2] > > /lib/x86_64-linux-gnu/libpython3.8.so.1.0(PyEval_EvalCode+0x1f)[0x7f94cc7135bf] > /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x1cfc01)[0x7f94cc717c01] > --- END LIBC BACKTRACE --- > For more info on how to address this issue, please visit > https://www.gem5.org/documentation/general_docs/common-errors/ > > > > Command- > > docker run --rm -v ${PWD}:${PWD} -v > ${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 > -w ${PWD} ghcr.io/gem5/gcn-gpu:v24-0 build/VEGA_X86/gem5.opt > configs/example/apu_se.py -n3 > --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_alexnet > -c dnnmark_test_alexnet --options="-config > gem5-resources/src/gpu/DNNMark/config_example/alexnet.dnnmark -mmap > gem5-resources/src/gpu/DNNMark/mmap.bin" --mem-size=8GB > > > I am also getting similar error for test_fwd_conv and VGG. > Regards, > Ravikant > _______________________________________________ > gem5-users mailing list -- gem5-users@gem5.org > To unsubscribe send an email to gem5-users-leave@gem5.org >
RB
Ravikant Bhardwaj
Sat, Sep 7, 2024 1:30 PM

Hi matt,

Thanks for reply. Currently I was looking for running CPU and GPU workloads
together on Gem5. Like running square and hello world application on cpu
and gpu. For that I have tried running by adding  --
subprocess.call(["--cmd=./tests/test-progs/hello/bin/x86/linux/hello",
'--cpu-type=DerivO3CPU', "--l1d_size=32kB", "--l1i_size=32kB",
"--l2_size=256kB", "--  caches", "--l2cache", "--l3cache", "--l3_size=8MB",
"--num-cpus=8", "--mem-size=8192MB"])
in apu_se.py , but its not reading the file in command. Can there be a way
to run cpu gpu workloads in parallel. Like first offloading GPU workload
from cpu and then on other cores we can run cpu workloads. If
thats possible please guide. And also I was trying to see if shared LLC
between CPU and GPU can be added in Gem5, if there's  a way for adding
that, it will also be good.
Regards,
Ravikant

Hi matt, Thanks for reply. Currently I was looking for running CPU and GPU workloads together on Gem5. Like running square and hello world application on cpu and gpu. For that I have tried running by adding -- subprocess.call(["--cmd=./tests/test-progs/hello/bin/x86/linux/hello", '--cpu-type=DerivO3CPU', "--l1d_size=32kB", "--l1i_size=32kB", "--l2_size=256kB", "-- caches", "--l2cache", "--l3cache", "--l3_size=8MB", "--num-cpus=8", "--mem-size=8192MB"]) in apu_se.py , but its not reading the file in command. Can there be a way to run cpu gpu workloads in parallel. Like first offloading GPU workload from cpu and then on other cores we can run cpu workloads. If thats possible please guide. And also I was trying to see if shared LLC between CPU and GPU can be added in Gem5, if there's a way for adding that, it will also be good. Regards, Ravikant
MS
Matt Sinclair
Thu, Sep 12, 2024 1:54 PM

Hi Ravikant,

If I understand your request, you are trying to run multiple processes (one
on CPU, one on CPU+GPU) simultaneously.  I have never tried doing this, and
thus do not know how to make it work or not.  My guess though is that gem5
does not support running multiple concurrent processes.  Sorry I don't have
better news; maybe someone else who knows more about this can help.

In terms of shared LLC: that is already done, effectively, with the
directory between the CPU and GPU in apu_se.py.  By default its size is 0B
(i.e., it just acts as a directory, not a cache) but you can change the
size.

Thanks,
Matt

On Sat, Sep 7, 2024 at 8:31 AM Ravikant Bhardwaj ravikant7031@gmail.com
wrote:

Hi matt,

Thanks for reply. Currently I was looking for running CPU and GPU
workloads together on Gem5. Like running square and hello world application
on cpu and gpu. For that I have tried running by adding  --
subprocess.call(["--cmd=./tests/test-progs/hello/bin/x86/linux/hello",
'--cpu-type=DerivO3CPU', "--l1d_size=32kB", "--l1i_size=32kB",
"--l2_size=256kB", "--  caches", "--l2cache", "--l3cache", "--l3_size=8MB",
"--num-cpus=8", "--mem-size=8192MB"])
in apu_se.py , but its not reading the file in command. Can there be a way
to run cpu gpu workloads in parallel. Like first offloading GPU workload
from cpu and then on other cores we can run cpu workloads. If
thats possible please guide. And also I was trying to see if shared LLC
between CPU and GPU can be added in Gem5, if there's  a way for adding
that, it will also be good.
Regards,
Ravikant

Hi Ravikant, If I understand your request, you are trying to run multiple processes (one on CPU, one on CPU+GPU) simultaneously. I have never tried doing this, and thus do not know how to make it work or not. My guess though is that gem5 does not support running multiple concurrent processes. Sorry I don't have better news; maybe someone else who knows more about this can help. In terms of shared LLC: that is already done, effectively, with the directory between the CPU and GPU in apu_se.py. By default its size is 0B (i.e., it just acts as a directory, not a cache) but you can change the size. Thanks, Matt On Sat, Sep 7, 2024 at 8:31 AM Ravikant Bhardwaj <ravikant7031@gmail.com> wrote: > Hi matt, > > Thanks for reply. Currently I was looking for running CPU and GPU > workloads together on Gem5. Like running square and hello world application > on cpu and gpu. For that I have tried running by adding -- > subprocess.call(["--cmd=./tests/test-progs/hello/bin/x86/linux/hello", > '--cpu-type=DerivO3CPU', "--l1d_size=32kB", "--l1i_size=32kB", > "--l2_size=256kB", "-- caches", "--l2cache", "--l3cache", "--l3_size=8MB", > "--num-cpus=8", "--mem-size=8192MB"]) > in apu_se.py , but its not reading the file in command. Can there be a way > to run cpu gpu workloads in parallel. Like first offloading GPU workload > from cpu and then on other cores we can run cpu workloads. If > thats possible please guide. And also I was trying to see if shared LLC > between CPU and GPU can be added in Gem5, if there's a way for adding > that, it will also be good. > Regards, > Ravikant >