Empathy List Archives

AM

Anoop Mysore

Tue, Aug 15, 2023 7:00 PM

I am trying to port CHAI benchmarks
https://github.com/chai-benchmarks/chaisimilarly to
gem5-resources/src/gpu/pannotia
https://github.com/gem5/gem5-resources/tree/stable/src/gpu/pannotia. I
was able to HIPify (through the perl script + some manual changes) all the
code files, and ran the BFS program. I see the following error message at
the point of launching the CPU threads here
https://github.com/mysoreanoop/chai/blob/678c18fd551fbf12f4abbb05ab7164f1b588be68/HIP-U-gem5/BFS/main.cpp#L273
(fork
of HIPified CHAI). I do not see any of the prints from the CPU threads
which leads me to believe the error is to do with the threads not being
launched or a related error.

(This looks related; incorporated the suggestion of linking against
-pthread: https://stackoverflow.com/a/6485728)

The stderr log is below; any help is appreciated.

....
AM: Launching CPU
terminate called after throwing an instance of 'std::system_error'
what(): Resource temporarily unavailable
build/GCN3_X86/sim/faults.cc:60: panic: panic condition !FullSystem
occurred: fault (General-Protection) detected @ PC
(0x7ffff6afa941=>0x7ffff6afa942).(0=>1)
Memory Usage: 19704072 KBytes

Program aborted at tick 441590522500
--- BEGIN LIBC BACKTRACE ---
gem5/build/GCN3_X86/gem5.opt(+0x550200)[0x55a709b31200]
gem5/build/GCN3_X86/gem5.opt(+0x57d46e)[0x55a709b5e46e]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x14420)[0x7f18881a0420]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb)[0x7f188734800b]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x12b)[0x7f1887327859]
gem5/build/GCN3_X86/gem5.opt(+0x4be295)[0x55a709a9f295]
gem5/build/GCN3_X86/gem5.opt(+0x5f6169)[0x55a709bd7169]
gem5/build/GCN3_X86/gem5.opt(+0x9fd9ed)[0x55a709fde9ed]
gem5/build/GCN3_X86/gem5.opt(+0x15b1d10)[0x55a70ab92d10]
gem5/build/GCN3_X86/gem5.opt(+0x15b2fd5)[0x55a70ab93fd5]
gem5/build/GCN3_X86/gem5.opt(+0x15b5620)[0x55a70ab96620]
gem5/build/GCN3_X86/gem5.opt(+0x15b6348)[0x55a70ab97348]
gem5/build/GCN3_X86/gem5.opt(+0x15c2954)[0x55a70aba3954]
gem5/build/GCN3_X86/gem5.opt(+0x56a082)[0x55a709b4b082]
gem5/build/GCN3_X86/gem5.opt(+0x59e2c4)[0x55a709b7f2c4]
gem5/build/GCN3_X86/gem5.opt(+0x59e8a3)[0x55a709b7f8a3]
gem5/build/GCN3_X86/gem5.opt(+0x4ed462)[0x55a709ace462]
gem5/build/GCN3_X86/gem5.opt(+0x4af427)[0x55a709a90427]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x2a8738)[0x7f1888459738]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x8dd8)[0x7f188822ef48]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0x8fb)[0x7f188837be3b]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyFunction_Vectorcall+0x94)[0x7f1888459114]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x74d6d)[0x7f1888225d6d]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x7d86)[0x7f188822def6]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0x8fb)[0x7f188837be3b]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(PyEval_EvalCodeEx+0x42)[0x7f188837c1c2]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(PyEval_EvalCode+0x1f)[0x7f188837c5af]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x1cfbf1)[0x7f1888380bf1]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x25f537)[0x7f1888410537]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x74d6d)[0x7f1888225d6d]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x12fd)[0x7f188822746d]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x8006b)[0x7f188823106b]
--- END LIBC BACKTRACE ---
Failed to execute default signal handler!

I am trying to port CHAI benchmarks <https://github.com/chai-benchmarks/chai>similarly to gem5-resources/src/gpu/pannotia <https://github.com/gem5/gem5-resources/tree/stable/src/gpu/pannotia>. I was able to HIPify (through the perl script + some manual changes) all the code files, and ran the BFS program. I see the following error message at the point of launching the CPU threads here <https://github.com/mysoreanoop/chai/blob/678c18fd551fbf12f4abbb05ab7164f1b588be68/HIP-U-gem5/BFS/main.cpp#L273> (fork of HIPified CHAI). I do not see any of the prints from the CPU threads which leads me to believe the error is to do with the threads not being launched or a related error. (This looks related; incorporated the suggestion of linking against -pthread: https://stackoverflow.com/a/6485728) The stderr log is below; any help is appreciated. _________ .... AM: Launching CPU terminate called after throwing an instance of 'std::system_error' what(): Resource temporarily unavailable build/GCN3_X86/sim/faults.cc:60: panic: panic condition !FullSystem occurred: fault (General-Protection) detected @ PC (0x7ffff6afa941=>0x7ffff6afa942).(0=>1) Memory Usage: 19704072 KBytes Program aborted at tick 441590522500 --- BEGIN LIBC BACKTRACE --- gem5/build/GCN3_X86/gem5.opt(+0x550200)[0x55a709b31200] gem5/build/GCN3_X86/gem5.opt(+0x57d46e)[0x55a709b5e46e] /lib/x86_64-linux-gnu/libpthread.so.0(+0x14420)[0x7f18881a0420] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb)[0x7f188734800b] /lib/x86_64-linux-gnu/libc.so.6(abort+0x12b)[0x7f1887327859] gem5/build/GCN3_X86/gem5.opt(+0x4be295)[0x55a709a9f295] gem5/build/GCN3_X86/gem5.opt(+0x5f6169)[0x55a709bd7169] gem5/build/GCN3_X86/gem5.opt(+0x9fd9ed)[0x55a709fde9ed] gem5/build/GCN3_X86/gem5.opt(+0x15b1d10)[0x55a70ab92d10] gem5/build/GCN3_X86/gem5.opt(+0x15b2fd5)[0x55a70ab93fd5] gem5/build/GCN3_X86/gem5.opt(+0x15b5620)[0x55a70ab96620] gem5/build/GCN3_X86/gem5.opt(+0x15b6348)[0x55a70ab97348] gem5/build/GCN3_X86/gem5.opt(+0x15c2954)[0x55a70aba3954] gem5/build/GCN3_X86/gem5.opt(+0x56a082)[0x55a709b4b082] gem5/build/GCN3_X86/gem5.opt(+0x59e2c4)[0x55a709b7f2c4] gem5/build/GCN3_X86/gem5.opt(+0x59e8a3)[0x55a709b7f8a3] gem5/build/GCN3_X86/gem5.opt(+0x4ed462)[0x55a709ace462] gem5/build/GCN3_X86/gem5.opt(+0x4af427)[0x55a709a90427] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x2a8738)[0x7f1888459738] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x8dd8)[0x7f188822ef48] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0x8fb)[0x7f188837be3b] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyFunction_Vectorcall+0x94)[0x7f1888459114] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x74d6d)[0x7f1888225d6d] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x7d86)[0x7f188822def6] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0x8fb)[0x7f188837be3b] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(PyEval_EvalCodeEx+0x42)[0x7f188837c1c2] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(PyEval_EvalCode+0x1f)[0x7f188837c5af] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x1cfbf1)[0x7f1888380bf1] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x25f537)[0x7f1888410537] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x74d6d)[0x7f1888225d6d] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x12fd)[0x7f188822746d] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x8006b)[0x7f188823106b] --- END LIBC BACKTRACE --- Failed to execute default signal handler! _________

AM

Anoop Mysore

Wed, Aug 16, 2023 2:48 PM

Curiously, running the gem5.debug executable with gdb within docker results
in:
Reading symbols from gem5/build/GCN3_X86/gem5.debug...
(gdb) quit
(the quit wasn't a command I provided, it just quits automatically). Is gdb
working with gem5 GCN3 in Docker?

I ran gem5.opt with ExecAll and SyscallAll debug flags, the debug tail and
the simerr logs are attached.
I don't see anything peculiar other than a tgkill syscall with a SIGABRT
sent to a thread thereafter halting within a few instructions.

On Tue, Aug 15, 2023 at 9:00 PM Anoop Mysore mysanoop@gmail.com wrote:

I am trying to port CHAI benchmarks
https://github.com/chai-benchmarks/chaisimilarly to
gem5-resources/src/gpu/pannotia
https://github.com/gem5/gem5-resources/tree/stable/src/gpu/pannotia. I
was able to HIPify (through the perl script + some manual changes) all the
code files, and ran the BFS program. I see the following error message at
the point of launching the CPU threads here
https://github.com/mysoreanoop/chai/blob/678c18fd551fbf12f4abbb05ab7164f1b588be68/HIP-U-gem5/BFS/main.cpp#L273 (fork
of HIPified CHAI). I do not see any of the prints from the CPU threads
which leads me to believe the error is to do with the threads not being
launched or a related error.

(This looks related; incorporated the suggestion of linking against
-pthread: https://stackoverflow.com/a/6485728)

The stderr log is below; any help is appreciated.

....
AM: Launching CPU
terminate called after throwing an instance of 'std::system_error'
what(): Resource temporarily unavailable
build/GCN3_X86/sim/faults.cc:60: panic: panic condition !FullSystem
occurred: fault (General-Protection) detected @ PC
(0x7ffff6afa941=>0x7ffff6afa942).(0=>1)
Memory Usage: 19704072 KBytes

Program aborted at tick 441590522500
--- BEGIN LIBC BACKTRACE ---
gem5/build/GCN3_X86/gem5.opt(+0x550200)[0x55a709b31200]
gem5/build/GCN3_X86/gem5.opt(+0x57d46e)[0x55a709b5e46e]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x14420)[0x7f18881a0420]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb)[0x7f188734800b]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x12b)[0x7f1887327859]
gem5/build/GCN3_X86/gem5.opt(+0x4be295)[0x55a709a9f295]
gem5/build/GCN3_X86/gem5.opt(+0x5f6169)[0x55a709bd7169]
gem5/build/GCN3_X86/gem5.opt(+0x9fd9ed)[0x55a709fde9ed]
gem5/build/GCN3_X86/gem5.opt(+0x15b1d10)[0x55a70ab92d10]
gem5/build/GCN3_X86/gem5.opt(+0x15b2fd5)[0x55a70ab93fd5]
gem5/build/GCN3_X86/gem5.opt(+0x15b5620)[0x55a70ab96620]
gem5/build/GCN3_X86/gem5.opt(+0x15b6348)[0x55a70ab97348]
gem5/build/GCN3_X86/gem5.opt(+0x15c2954)[0x55a70aba3954]
gem5/build/GCN3_X86/gem5.opt(+0x56a082)[0x55a709b4b082]
gem5/build/GCN3_X86/gem5.opt(+0x59e2c4)[0x55a709b7f2c4]
gem5/build/GCN3_X86/gem5.opt(+0x59e8a3)[0x55a709b7f8a3]
gem5/build/GCN3_X86/gem5.opt(+0x4ed462)[0x55a709ace462]
gem5/build/GCN3_X86/gem5.opt(+0x4af427)[0x55a709a90427]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x2a8738)[0x7f1888459738]

/lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x8dd8)[0x7f188822ef48]

/lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0x8fb)[0x7f188837be3b]

/lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyFunction_Vectorcall+0x94)[0x7f1888459114]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x74d6d)[0x7f1888225d6d]

/lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x7d86)[0x7f188822def6]

/lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0x8fb)[0x7f188837be3b]

/lib/x86_64-linux-gnu/libpython3.8.so.1.0(PyEval_EvalCodeEx+0x42)[0x7f188837c1c2]

/lib/x86_64-linux-gnu/libpython3.8.so.1.0(PyEval_EvalCode+0x1f)[0x7f188837c5af]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x1cfbf1)[0x7f1888380bf1]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x25f537)[0x7f1888410537]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x74d6d)[0x7f1888225d6d]

/lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x12fd)[0x7f188822746d]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x8006b)[0x7f188823106b]
--- END LIBC BACKTRACE ---
Failed to execute default signal handler!

Curiously, running the gem5.debug executable with gdb within docker results in: Reading symbols from gem5/build/GCN3_X86/gem5.debug... (gdb) quit (the quit wasn't a command I provided, it just quits automatically). Is gdb working with gem5 GCN3 in Docker? I ran gem5.opt with ExecAll and SyscallAll debug flags, the debug tail and the simerr logs are attached. I don't see anything peculiar other than a tgkill syscall with a SIGABRT sent to a thread thereafter halting within a few instructions. On Tue, Aug 15, 2023 at 9:00 PM Anoop Mysore <mysanoop@gmail.com> wrote: > I am trying to port CHAI benchmarks > <https://github.com/chai-benchmarks/chai>similarly to > gem5-resources/src/gpu/pannotia > <https://github.com/gem5/gem5-resources/tree/stable/src/gpu/pannotia>. I > was able to HIPify (through the perl script + some manual changes) all the > code files, and ran the BFS program. I see the following error message at > the point of launching the CPU threads here > <https://github.com/mysoreanoop/chai/blob/678c18fd551fbf12f4abbb05ab7164f1b588be68/HIP-U-gem5/BFS/main.cpp#L273> (fork > of HIPified CHAI). I do not see any of the prints from the CPU threads > which leads me to believe the error is to do with the threads not being > launched or a related error. > > (This looks related; incorporated the suggestion of linking against > -pthread: https://stackoverflow.com/a/6485728) > > The stderr log is below; any help is appreciated. > _________ > .... > AM: Launching CPU > terminate called after throwing an instance of 'std::system_error' > what(): Resource temporarily unavailable > build/GCN3_X86/sim/faults.cc:60: panic: panic condition !FullSystem > occurred: fault (General-Protection) detected @ PC > (0x7ffff6afa941=>0x7ffff6afa942).(0=>1) > Memory Usage: 19704072 KBytes > > Program aborted at tick 441590522500 > --- BEGIN LIBC BACKTRACE --- > gem5/build/GCN3_X86/gem5.opt(+0x550200)[0x55a709b31200] > gem5/build/GCN3_X86/gem5.opt(+0x57d46e)[0x55a709b5e46e] > /lib/x86_64-linux-gnu/libpthread.so.0(+0x14420)[0x7f18881a0420] > /lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb)[0x7f188734800b] > /lib/x86_64-linux-gnu/libc.so.6(abort+0x12b)[0x7f1887327859] > gem5/build/GCN3_X86/gem5.opt(+0x4be295)[0x55a709a9f295] > gem5/build/GCN3_X86/gem5.opt(+0x5f6169)[0x55a709bd7169] > gem5/build/GCN3_X86/gem5.opt(+0x9fd9ed)[0x55a709fde9ed] > gem5/build/GCN3_X86/gem5.opt(+0x15b1d10)[0x55a70ab92d10] > gem5/build/GCN3_X86/gem5.opt(+0x15b2fd5)[0x55a70ab93fd5] > gem5/build/GCN3_X86/gem5.opt(+0x15b5620)[0x55a70ab96620] > gem5/build/GCN3_X86/gem5.opt(+0x15b6348)[0x55a70ab97348] > gem5/build/GCN3_X86/gem5.opt(+0x15c2954)[0x55a70aba3954] > gem5/build/GCN3_X86/gem5.opt(+0x56a082)[0x55a709b4b082] > gem5/build/GCN3_X86/gem5.opt(+0x59e2c4)[0x55a709b7f2c4] > gem5/build/GCN3_X86/gem5.opt(+0x59e8a3)[0x55a709b7f8a3] > gem5/build/GCN3_X86/gem5.opt(+0x4ed462)[0x55a709ace462] > gem5/build/GCN3_X86/gem5.opt(+0x4af427)[0x55a709a90427] > /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x2a8738)[0x7f1888459738] > > /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x8dd8)[0x7f188822ef48] > > /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0x8fb)[0x7f188837be3b] > > /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyFunction_Vectorcall+0x94)[0x7f1888459114] > /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x74d6d)[0x7f1888225d6d] > > /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x7d86)[0x7f188822def6] > > /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0x8fb)[0x7f188837be3b] > > /lib/x86_64-linux-gnu/libpython3.8.so.1.0(PyEval_EvalCodeEx+0x42)[0x7f188837c1c2] > > /lib/x86_64-linux-gnu/libpython3.8.so.1.0(PyEval_EvalCode+0x1f)[0x7f188837c5af] > /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x1cfbf1)[0x7f1888380bf1] > /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x25f537)[0x7f1888410537] > /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x74d6d)[0x7f1888225d6d] > > /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x12fd)[0x7f188822746d] > /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x8006b)[0x7f188823106b] > --- END LIBC BACKTRACE --- > Failed to execute default signal handler! > _________ >

MS

Matt Sinclair

Wed, Aug 16, 2023 3:03 PM

Hi Anoop,

A few things here:

Regarding the original failure (at least the !FS part), this is normally
happening either because of the GPU Target ISA (e.g., gfx900) you used in
your Makefile (e.g., it is not supported) or because you didn't properly
specify what GPU ISA you are using when running the program. So, what is
your command line for running this application and what ISA are you
specifying in your Makefile?
If the "what()" is the real source of the error, then I think this could
be related to the number of CPU thread contexts you are running with gem5.
What did you set "-n" to?
Regarding gdb, @Matt P: did you remove gdb from what is installed in the
Docker a while back? If so, I think Anoop would need to add it back and
create a local docker or something like that.
Setting aside the above, it would be wonderful if you contribute the CHAI
benchmarks to gem5-resources once you get them working! Please let us know
if we can do anything to help with that.

Thanks,
Matt

On Wed, Aug 16, 2023 at 9:51 AM Anoop Mysore via gem5-users <
gem5-users@gem5.org> wrote:

Curiously, running the gem5.debug executable with gdb within docker results
in:
Reading symbols from gem5/build/GCN3_X86/gem5.debug...
(gdb) quit
(the quit wasn't a command I provided, it just quits automatically). Is
gdb working with gem5 GCN3 in Docker?