gem5-users@gem5.org

The gem5 Users mailing list

View all threads

Error in an application running on gem5 GCN3 (with apu_se.py)

AM
Anoop Mysore
Tue, Aug 15, 2023 7:00 PM

I am trying to port CHAI benchmarks
https://github.com/chai-benchmarks/chaisimilarly to
gem5-resources/src/gpu/pannotia
https://github.com/gem5/gem5-resources/tree/stable/src/gpu/pannotia. I
was able to HIPify (through the perl script + some manual changes) all the
code files, and ran the BFS program. I see the following error message at
the point of launching the CPU threads here
https://github.com/mysoreanoop/chai/blob/678c18fd551fbf12f4abbb05ab7164f1b588be68/HIP-U-gem5/BFS/main.cpp#L273
(fork
of HIPified CHAI). I do not see any of the prints from the CPU threads
which leads me to believe the error is to do with the threads not being
launched or a related error.

(This looks related; incorporated the suggestion of linking against
-pthread: https://stackoverflow.com/a/6485728)

The stderr log is below; any help is appreciated.


....
AM: Launching CPU
terminate called after throwing an instance of 'std::system_error'
what():  Resource temporarily unavailable
build/GCN3_X86/sim/faults.cc:60: panic: panic condition !FullSystem
occurred: fault (General-Protection) detected @ PC
(0x7ffff6afa941=>0x7ffff6afa942).(0=>1)
Memory Usage: 19704072 KBytes

Program aborted at tick 441590522500
--- BEGIN LIBC BACKTRACE ---
gem5/build/GCN3_X86/gem5.opt(+0x550200)[0x55a709b31200]
gem5/build/GCN3_X86/gem5.opt(+0x57d46e)[0x55a709b5e46e]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x14420)[0x7f18881a0420]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb)[0x7f188734800b]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x12b)[0x7f1887327859]
gem5/build/GCN3_X86/gem5.opt(+0x4be295)[0x55a709a9f295]
gem5/build/GCN3_X86/gem5.opt(+0x5f6169)[0x55a709bd7169]
gem5/build/GCN3_X86/gem5.opt(+0x9fd9ed)[0x55a709fde9ed]
gem5/build/GCN3_X86/gem5.opt(+0x15b1d10)[0x55a70ab92d10]
gem5/build/GCN3_X86/gem5.opt(+0x15b2fd5)[0x55a70ab93fd5]
gem5/build/GCN3_X86/gem5.opt(+0x15b5620)[0x55a70ab96620]
gem5/build/GCN3_X86/gem5.opt(+0x15b6348)[0x55a70ab97348]
gem5/build/GCN3_X86/gem5.opt(+0x15c2954)[0x55a70aba3954]
gem5/build/GCN3_X86/gem5.opt(+0x56a082)[0x55a709b4b082]
gem5/build/GCN3_X86/gem5.opt(+0x59e2c4)[0x55a709b7f2c4]
gem5/build/GCN3_X86/gem5.opt(+0x59e8a3)[0x55a709b7f8a3]
gem5/build/GCN3_X86/gem5.opt(+0x4ed462)[0x55a709ace462]
gem5/build/GCN3_X86/gem5.opt(+0x4af427)[0x55a709a90427]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x2a8738)[0x7f1888459738]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x8dd8)[0x7f188822ef48]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0x8fb)[0x7f188837be3b]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyFunction_Vectorcall+0x94)[0x7f1888459114]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x74d6d)[0x7f1888225d6d]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x7d86)[0x7f188822def6]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0x8fb)[0x7f188837be3b]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(PyEval_EvalCodeEx+0x42)[0x7f188837c1c2]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(PyEval_EvalCode+0x1f)[0x7f188837c5af]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x1cfbf1)[0x7f1888380bf1]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x25f537)[0x7f1888410537]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x74d6d)[0x7f1888225d6d]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x12fd)[0x7f188822746d]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x8006b)[0x7f188823106b]
--- END LIBC BACKTRACE ---
Failed to execute default signal handler!


I am trying to port CHAI benchmarks <https://github.com/chai-benchmarks/chai>similarly to gem5-resources/src/gpu/pannotia <https://github.com/gem5/gem5-resources/tree/stable/src/gpu/pannotia>. I was able to HIPify (through the perl script + some manual changes) all the code files, and ran the BFS program. I see the following error message at the point of launching the CPU threads here <https://github.com/mysoreanoop/chai/blob/678c18fd551fbf12f4abbb05ab7164f1b588be68/HIP-U-gem5/BFS/main.cpp#L273> (fork of HIPified CHAI). I do not see any of the prints from the CPU threads which leads me to believe the error is to do with the threads not being launched or a related error. (This looks related; incorporated the suggestion of linking against -pthread: https://stackoverflow.com/a/6485728) The stderr log is below; any help is appreciated. _________ .... AM: Launching CPU terminate called after throwing an instance of 'std::system_error' what(): Resource temporarily unavailable build/GCN3_X86/sim/faults.cc:60: panic: panic condition !FullSystem occurred: fault (General-Protection) detected @ PC (0x7ffff6afa941=>0x7ffff6afa942).(0=>1) Memory Usage: 19704072 KBytes Program aborted at tick 441590522500 --- BEGIN LIBC BACKTRACE --- gem5/build/GCN3_X86/gem5.opt(+0x550200)[0x55a709b31200] gem5/build/GCN3_X86/gem5.opt(+0x57d46e)[0x55a709b5e46e] /lib/x86_64-linux-gnu/libpthread.so.0(+0x14420)[0x7f18881a0420] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb)[0x7f188734800b] /lib/x86_64-linux-gnu/libc.so.6(abort+0x12b)[0x7f1887327859] gem5/build/GCN3_X86/gem5.opt(+0x4be295)[0x55a709a9f295] gem5/build/GCN3_X86/gem5.opt(+0x5f6169)[0x55a709bd7169] gem5/build/GCN3_X86/gem5.opt(+0x9fd9ed)[0x55a709fde9ed] gem5/build/GCN3_X86/gem5.opt(+0x15b1d10)[0x55a70ab92d10] gem5/build/GCN3_X86/gem5.opt(+0x15b2fd5)[0x55a70ab93fd5] gem5/build/GCN3_X86/gem5.opt(+0x15b5620)[0x55a70ab96620] gem5/build/GCN3_X86/gem5.opt(+0x15b6348)[0x55a70ab97348] gem5/build/GCN3_X86/gem5.opt(+0x15c2954)[0x55a70aba3954] gem5/build/GCN3_X86/gem5.opt(+0x56a082)[0x55a709b4b082] gem5/build/GCN3_X86/gem5.opt(+0x59e2c4)[0x55a709b7f2c4] gem5/build/GCN3_X86/gem5.opt(+0x59e8a3)[0x55a709b7f8a3] gem5/build/GCN3_X86/gem5.opt(+0x4ed462)[0x55a709ace462] gem5/build/GCN3_X86/gem5.opt(+0x4af427)[0x55a709a90427] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x2a8738)[0x7f1888459738] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x8dd8)[0x7f188822ef48] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0x8fb)[0x7f188837be3b] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyFunction_Vectorcall+0x94)[0x7f1888459114] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x74d6d)[0x7f1888225d6d] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x7d86)[0x7f188822def6] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0x8fb)[0x7f188837be3b] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(PyEval_EvalCodeEx+0x42)[0x7f188837c1c2] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(PyEval_EvalCode+0x1f)[0x7f188837c5af] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x1cfbf1)[0x7f1888380bf1] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x25f537)[0x7f1888410537] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x74d6d)[0x7f1888225d6d] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x12fd)[0x7f188822746d] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x8006b)[0x7f188823106b] --- END LIBC BACKTRACE --- Failed to execute default signal handler! _________
AM
Anoop Mysore
Wed, Aug 16, 2023 2:48 PM

Curiously, running the gem5.debug executable with gdb within docker results
in:
Reading symbols from gem5/build/GCN3_X86/gem5.debug...
(gdb) quit
(the quit wasn't a command I provided, it just quits automatically). Is gdb
working with gem5 GCN3 in Docker?

I ran gem5.opt with ExecAll and SyscallAll debug flags, the debug tail and
the simerr logs are attached.
I don't see anything peculiar other than a tgkill syscall with a SIGABRT
sent to a thread thereafter halting within a few instructions.

On Tue, Aug 15, 2023 at 9:00 PM Anoop Mysore mysanoop@gmail.com wrote:

I am trying to port CHAI benchmarks
https://github.com/chai-benchmarks/chaisimilarly to
gem5-resources/src/gpu/pannotia
https://github.com/gem5/gem5-resources/tree/stable/src/gpu/pannotia. I
was able to HIPify (through the perl script + some manual changes) all the
code files, and ran the BFS program. I see the following error message at
the point of launching the CPU threads here
https://github.com/mysoreanoop/chai/blob/678c18fd551fbf12f4abbb05ab7164f1b588be68/HIP-U-gem5/BFS/main.cpp#L273 (fork
of HIPified CHAI). I do not see any of the prints from the CPU threads
which leads me to believe the error is to do with the threads not being
launched or a related error.

(This looks related; incorporated the suggestion of linking against
-pthread: https://stackoverflow.com/a/6485728)

The stderr log is below; any help is appreciated.


....
AM: Launching CPU
terminate called after throwing an instance of 'std::system_error'
what():  Resource temporarily unavailable
build/GCN3_X86/sim/faults.cc:60: panic: panic condition !FullSystem
occurred: fault (General-Protection) detected @ PC
(0x7ffff6afa941=>0x7ffff6afa942).(0=>1)
Memory Usage: 19704072 KBytes

Program aborted at tick 441590522500
--- BEGIN LIBC BACKTRACE ---
gem5/build/GCN3_X86/gem5.opt(+0x550200)[0x55a709b31200]
gem5/build/GCN3_X86/gem5.opt(+0x57d46e)[0x55a709b5e46e]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x14420)[0x7f18881a0420]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb)[0x7f188734800b]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x12b)[0x7f1887327859]
gem5/build/GCN3_X86/gem5.opt(+0x4be295)[0x55a709a9f295]
gem5/build/GCN3_X86/gem5.opt(+0x5f6169)[0x55a709bd7169]
gem5/build/GCN3_X86/gem5.opt(+0x9fd9ed)[0x55a709fde9ed]
gem5/build/GCN3_X86/gem5.opt(+0x15b1d10)[0x55a70ab92d10]
gem5/build/GCN3_X86/gem5.opt(+0x15b2fd5)[0x55a70ab93fd5]
gem5/build/GCN3_X86/gem5.opt(+0x15b5620)[0x55a70ab96620]
gem5/build/GCN3_X86/gem5.opt(+0x15b6348)[0x55a70ab97348]
gem5/build/GCN3_X86/gem5.opt(+0x15c2954)[0x55a70aba3954]
gem5/build/GCN3_X86/gem5.opt(+0x56a082)[0x55a709b4b082]
gem5/build/GCN3_X86/gem5.opt(+0x59e2c4)[0x55a709b7f2c4]
gem5/build/GCN3_X86/gem5.opt(+0x59e8a3)[0x55a709b7f8a3]
gem5/build/GCN3_X86/gem5.opt(+0x4ed462)[0x55a709ace462]
gem5/build/GCN3_X86/gem5.opt(+0x4af427)[0x55a709a90427]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x2a8738)[0x7f1888459738]

/lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x8dd8)[0x7f188822ef48]

/lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0x8fb)[0x7f188837be3b]

/lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyFunction_Vectorcall+0x94)[0x7f1888459114]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x74d6d)[0x7f1888225d6d]

/lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x7d86)[0x7f188822def6]

/lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0x8fb)[0x7f188837be3b]

/lib/x86_64-linux-gnu/libpython3.8.so.1.0(PyEval_EvalCodeEx+0x42)[0x7f188837c1c2]

/lib/x86_64-linux-gnu/libpython3.8.so.1.0(PyEval_EvalCode+0x1f)[0x7f188837c5af]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x1cfbf1)[0x7f1888380bf1]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x25f537)[0x7f1888410537]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x74d6d)[0x7f1888225d6d]

/lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x12fd)[0x7f188822746d]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x8006b)[0x7f188823106b]
--- END LIBC BACKTRACE ---
Failed to execute default signal handler!


Curiously, running the gem5.debug executable with gdb within docker results in: Reading symbols from gem5/build/GCN3_X86/gem5.debug... (gdb) quit (the quit wasn't a command I provided, it just quits automatically). Is gdb working with gem5 GCN3 in Docker? I ran gem5.opt with ExecAll and SyscallAll debug flags, the debug tail and the simerr logs are attached. I don't see anything peculiar other than a tgkill syscall with a SIGABRT sent to a thread thereafter halting within a few instructions. On Tue, Aug 15, 2023 at 9:00 PM Anoop Mysore <mysanoop@gmail.com> wrote: > I am trying to port CHAI benchmarks > <https://github.com/chai-benchmarks/chai>similarly to > gem5-resources/src/gpu/pannotia > <https://github.com/gem5/gem5-resources/tree/stable/src/gpu/pannotia>. I > was able to HIPify (through the perl script + some manual changes) all the > code files, and ran the BFS program. I see the following error message at > the point of launching the CPU threads here > <https://github.com/mysoreanoop/chai/blob/678c18fd551fbf12f4abbb05ab7164f1b588be68/HIP-U-gem5/BFS/main.cpp#L273> (fork > of HIPified CHAI). I do not see any of the prints from the CPU threads > which leads me to believe the error is to do with the threads not being > launched or a related error. > > (This looks related; incorporated the suggestion of linking against > -pthread: https://stackoverflow.com/a/6485728) > > The stderr log is below; any help is appreciated. > _________ > .... > AM: Launching CPU > terminate called after throwing an instance of 'std::system_error' > what(): Resource temporarily unavailable > build/GCN3_X86/sim/faults.cc:60: panic: panic condition !FullSystem > occurred: fault (General-Protection) detected @ PC > (0x7ffff6afa941=>0x7ffff6afa942).(0=>1) > Memory Usage: 19704072 KBytes > > Program aborted at tick 441590522500 > --- BEGIN LIBC BACKTRACE --- > gem5/build/GCN3_X86/gem5.opt(+0x550200)[0x55a709b31200] > gem5/build/GCN3_X86/gem5.opt(+0x57d46e)[0x55a709b5e46e] > /lib/x86_64-linux-gnu/libpthread.so.0(+0x14420)[0x7f18881a0420] > /lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb)[0x7f188734800b] > /lib/x86_64-linux-gnu/libc.so.6(abort+0x12b)[0x7f1887327859] > gem5/build/GCN3_X86/gem5.opt(+0x4be295)[0x55a709a9f295] > gem5/build/GCN3_X86/gem5.opt(+0x5f6169)[0x55a709bd7169] > gem5/build/GCN3_X86/gem5.opt(+0x9fd9ed)[0x55a709fde9ed] > gem5/build/GCN3_X86/gem5.opt(+0x15b1d10)[0x55a70ab92d10] > gem5/build/GCN3_X86/gem5.opt(+0x15b2fd5)[0x55a70ab93fd5] > gem5/build/GCN3_X86/gem5.opt(+0x15b5620)[0x55a70ab96620] > gem5/build/GCN3_X86/gem5.opt(+0x15b6348)[0x55a70ab97348] > gem5/build/GCN3_X86/gem5.opt(+0x15c2954)[0x55a70aba3954] > gem5/build/GCN3_X86/gem5.opt(+0x56a082)[0x55a709b4b082] > gem5/build/GCN3_X86/gem5.opt(+0x59e2c4)[0x55a709b7f2c4] > gem5/build/GCN3_X86/gem5.opt(+0x59e8a3)[0x55a709b7f8a3] > gem5/build/GCN3_X86/gem5.opt(+0x4ed462)[0x55a709ace462] > gem5/build/GCN3_X86/gem5.opt(+0x4af427)[0x55a709a90427] > /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x2a8738)[0x7f1888459738] > > /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x8dd8)[0x7f188822ef48] > > /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0x8fb)[0x7f188837be3b] > > /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyFunction_Vectorcall+0x94)[0x7f1888459114] > /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x74d6d)[0x7f1888225d6d] > > /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x7d86)[0x7f188822def6] > > /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0x8fb)[0x7f188837be3b] > > /lib/x86_64-linux-gnu/libpython3.8.so.1.0(PyEval_EvalCodeEx+0x42)[0x7f188837c1c2] > > /lib/x86_64-linux-gnu/libpython3.8.so.1.0(PyEval_EvalCode+0x1f)[0x7f188837c5af] > /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x1cfbf1)[0x7f1888380bf1] > /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x25f537)[0x7f1888410537] > /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x74d6d)[0x7f1888225d6d] > > /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x12fd)[0x7f188822746d] > /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x8006b)[0x7f188823106b] > --- END LIBC BACKTRACE --- > Failed to execute default signal handler! > _________ >
MS
Matt Sinclair
Wed, Aug 16, 2023 3:03 PM

Hi Anoop,

A few things here:

  • Regarding the original failure (at least the !FS part), this is normally
    happening either because of the GPU Target ISA (e.g., gfx900) you used in
    your Makefile (e.g., it is not supported) or because you didn't properly
    specify what GPU ISA you are using when running the program.  So, what is
    your command line for running this application and what ISA are you
    specifying in your Makefile?
  • If the "what()" is the real source of the error, then I think this could
    be related to the number of CPU thread contexts you are running with gem5.
    What did you set "-n" to?
  • Regarding gdb, @Matt P: did you remove gdb from what is installed in the
    Docker a while back?  If so, I think Anoop would need to add it back and
    create a local docker or something like that.
  • Setting aside the above, it would be wonderful if you contribute the CHAI
    benchmarks to gem5-resources once you get them working!  Please let us know
    if we can do anything to help with that.

Thanks,
Matt

On Wed, Aug 16, 2023 at 9:51 AM Anoop Mysore via gem5-users <
gem5-users@gem5.org> wrote:

Curiously, running the gem5.debug executable with gdb within docker results
in:
Reading symbols from gem5/build/GCN3_X86/gem5.debug...
(gdb) quit
(the quit wasn't a command I provided, it just quits automatically). Is
gdb working with gem5 GCN3 in Docker?

I ran gem5.opt with ExecAll and SyscallAll debug flags, the debug tail and
the simerr logs are attached.
I don't see anything peculiar other than a tgkill syscall with a SIGABRT
sent to a thread thereafter halting within a few instructions.

On Tue, Aug 15, 2023 at 9:00 PM Anoop Mysore mysanoop@gmail.com wrote:

I am trying to port CHAI benchmarks
https://github.com/chai-benchmarks/chaisimilarly to
gem5-resources/src/gpu/pannotia
https://github.com/gem5/gem5-resources/tree/stable/src/gpu/pannotia. I
was able to HIPify (through the perl script + some manual changes) all the
code files, and ran the BFS program. I see the following error message at
the point of launching the CPU threads here
https://github.com/mysoreanoop/chai/blob/678c18fd551fbf12f4abbb05ab7164f1b588be68/HIP-U-gem5/BFS/main.cpp#L273 (fork
of HIPified CHAI). I do not see any of the prints from the CPU threads
which leads me to believe the error is to do with the threads not being
launched or a related error.

(This looks related; incorporated the suggestion of linking against
-pthread: https://stackoverflow.com/a/6485728)

The stderr log is below; any help is appreciated.


....
AM: Launching CPU
terminate called after throwing an instance of 'std::system_error'
what():  Resource temporarily unavailable
build/GCN3_X86/sim/faults.cc:60: panic: panic condition !FullSystem
occurred: fault (General-Protection) detected @ PC
(0x7ffff6afa941=>0x7ffff6afa942).(0=>1)
Memory Usage: 19704072 KBytes

Program aborted at tick 441590522500
--- BEGIN LIBC BACKTRACE ---
gem5/build/GCN3_X86/gem5.opt(+0x550200)[0x55a709b31200]
gem5/build/GCN3_X86/gem5.opt(+0x57d46e)[0x55a709b5e46e]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x14420)[0x7f18881a0420]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb)[0x7f188734800b]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x12b)[0x7f1887327859]
gem5/build/GCN3_X86/gem5.opt(+0x4be295)[0x55a709a9f295]
gem5/build/GCN3_X86/gem5.opt(+0x5f6169)[0x55a709bd7169]
gem5/build/GCN3_X86/gem5.opt(+0x9fd9ed)[0x55a709fde9ed]
gem5/build/GCN3_X86/gem5.opt(+0x15b1d10)[0x55a70ab92d10]
gem5/build/GCN3_X86/gem5.opt(+0x15b2fd5)[0x55a70ab93fd5]
gem5/build/GCN3_X86/gem5.opt(+0x15b5620)[0x55a70ab96620]
gem5/build/GCN3_X86/gem5.opt(+0x15b6348)[0x55a70ab97348]
gem5/build/GCN3_X86/gem5.opt(+0x15c2954)[0x55a70aba3954]
gem5/build/GCN3_X86/gem5.opt(+0x56a082)[0x55a709b4b082]
gem5/build/GCN3_X86/gem5.opt(+0x59e2c4)[0x55a709b7f2c4]
gem5/build/GCN3_X86/gem5.opt(+0x59e8a3)[0x55a709b7f8a3]
gem5/build/GCN3_X86/gem5.opt(+0x4ed462)[0x55a709ace462]
gem5/build/GCN3_X86/gem5.opt(+0x4af427)[0x55a709a90427]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x2a8738)[0x7f1888459738]

/lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x8dd8)[0x7f188822ef48]

/lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0x8fb)[0x7f188837be3b]

/lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyFunction_Vectorcall+0x94)[0x7f1888459114]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x74d6d)[0x7f1888225d6d]

/lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x7d86)[0x7f188822def6]

/lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0x8fb)[0x7f188837be3b]

/lib/x86_64-linux-gnu/libpython3.8.so.1.0(PyEval_EvalCodeEx+0x42)[0x7f188837c1c2]

/lib/x86_64-linux-gnu/libpython3.8.so.1.0(PyEval_EvalCode+0x1f)[0x7f188837c5af]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x1cfbf1)[0x7f1888380bf1]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x25f537)[0x7f1888410537]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x74d6d)[0x7f1888225d6d]

/lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x12fd)[0x7f188822746d]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x8006b)[0x7f188823106b]
--- END LIBC BACKTRACE ---
Failed to execute default signal handler!



gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-leave@gem5.org

Hi Anoop, A few things here: - Regarding the original failure (at least the !FS part), this is normally happening either because of the GPU Target ISA (e.g., gfx900) you used in your Makefile (e.g., it is not supported) or because you didn't properly specify what GPU ISA you are using when running the program. So, what is your command line for running this application and what ISA are you specifying in your Makefile? - If the "what()" is the real source of the error, then I think this could be related to the number of CPU thread contexts you are running with gem5. What did you set "-n" to? - Regarding gdb, @Matt P: did you remove gdb from what is installed in the Docker a while back? If so, I think Anoop would need to add it back and create a local docker or something like that. - Setting aside the above, it would be wonderful if you contribute the CHAI benchmarks to gem5-resources once you get them working! Please let us know if we can do anything to help with that. Thanks, Matt On Wed, Aug 16, 2023 at 9:51 AM Anoop Mysore via gem5-users < gem5-users@gem5.org> wrote: > Curiously, running the gem5.debug executable with gdb within docker results > in: > Reading symbols from gem5/build/GCN3_X86/gem5.debug... > (gdb) quit > (the quit wasn't a command I provided, it just quits automatically). Is > gdb working with gem5 GCN3 in Docker? > > I ran gem5.opt with ExecAll and SyscallAll debug flags, the debug tail and > the simerr logs are attached. > I don't see anything peculiar other than a tgkill syscall with a SIGABRT > sent to a thread thereafter halting within a few instructions. > > On Tue, Aug 15, 2023 at 9:00 PM Anoop Mysore <mysanoop@gmail.com> wrote: > >> I am trying to port CHAI benchmarks >> <https://github.com/chai-benchmarks/chai>similarly to >> gem5-resources/src/gpu/pannotia >> <https://github.com/gem5/gem5-resources/tree/stable/src/gpu/pannotia>. I >> was able to HIPify (through the perl script + some manual changes) all the >> code files, and ran the BFS program. I see the following error message at >> the point of launching the CPU threads here >> <https://github.com/mysoreanoop/chai/blob/678c18fd551fbf12f4abbb05ab7164f1b588be68/HIP-U-gem5/BFS/main.cpp#L273> (fork >> of HIPified CHAI). I do not see any of the prints from the CPU threads >> which leads me to believe the error is to do with the threads not being >> launched or a related error. >> >> (This looks related; incorporated the suggestion of linking against >> -pthread: https://stackoverflow.com/a/6485728) >> >> The stderr log is below; any help is appreciated. >> _________ >> .... >> AM: Launching CPU >> terminate called after throwing an instance of 'std::system_error' >> what(): Resource temporarily unavailable >> build/GCN3_X86/sim/faults.cc:60: panic: panic condition !FullSystem >> occurred: fault (General-Protection) detected @ PC >> (0x7ffff6afa941=>0x7ffff6afa942).(0=>1) >> Memory Usage: 19704072 KBytes >> >> Program aborted at tick 441590522500 >> --- BEGIN LIBC BACKTRACE --- >> gem5/build/GCN3_X86/gem5.opt(+0x550200)[0x55a709b31200] >> gem5/build/GCN3_X86/gem5.opt(+0x57d46e)[0x55a709b5e46e] >> /lib/x86_64-linux-gnu/libpthread.so.0(+0x14420)[0x7f18881a0420] >> /lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb)[0x7f188734800b] >> /lib/x86_64-linux-gnu/libc.so.6(abort+0x12b)[0x7f1887327859] >> gem5/build/GCN3_X86/gem5.opt(+0x4be295)[0x55a709a9f295] >> gem5/build/GCN3_X86/gem5.opt(+0x5f6169)[0x55a709bd7169] >> gem5/build/GCN3_X86/gem5.opt(+0x9fd9ed)[0x55a709fde9ed] >> gem5/build/GCN3_X86/gem5.opt(+0x15b1d10)[0x55a70ab92d10] >> gem5/build/GCN3_X86/gem5.opt(+0x15b2fd5)[0x55a70ab93fd5] >> gem5/build/GCN3_X86/gem5.opt(+0x15b5620)[0x55a70ab96620] >> gem5/build/GCN3_X86/gem5.opt(+0x15b6348)[0x55a70ab97348] >> gem5/build/GCN3_X86/gem5.opt(+0x15c2954)[0x55a70aba3954] >> gem5/build/GCN3_X86/gem5.opt(+0x56a082)[0x55a709b4b082] >> gem5/build/GCN3_X86/gem5.opt(+0x59e2c4)[0x55a709b7f2c4] >> gem5/build/GCN3_X86/gem5.opt(+0x59e8a3)[0x55a709b7f8a3] >> gem5/build/GCN3_X86/gem5.opt(+0x4ed462)[0x55a709ace462] >> gem5/build/GCN3_X86/gem5.opt(+0x4af427)[0x55a709a90427] >> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x2a8738)[0x7f1888459738] >> >> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x8dd8)[0x7f188822ef48] >> >> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0x8fb)[0x7f188837be3b] >> >> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyFunction_Vectorcall+0x94)[0x7f1888459114] >> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x74d6d)[0x7f1888225d6d] >> >> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x7d86)[0x7f188822def6] >> >> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0x8fb)[0x7f188837be3b] >> >> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(PyEval_EvalCodeEx+0x42)[0x7f188837c1c2] >> >> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(PyEval_EvalCode+0x1f)[0x7f188837c5af] >> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x1cfbf1)[0x7f1888380bf1] >> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x25f537)[0x7f1888410537] >> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x74d6d)[0x7f1888225d6d] >> >> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x12fd)[0x7f188822746d] >> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x8006b)[0x7f188823106b] >> --- END LIBC BACKTRACE --- >> Failed to execute default signal handler! >> _________ >> > _______________________________________________ > gem5-users mailing list -- gem5-users@gem5.org > To unsubscribe send an email to gem5-users-leave@gem5.org >
AM
Anoop Mysore
Thu, Aug 17, 2023 4:40 AM

Thank you, Matt, having 10 CPUs (up from previous 3) in the simulated
system seems to make it work! (At least, I don't see that error at that
point anymore). Is "resource temporarily unavailable" commonly due to CPU
count? Curious to know how you made that connection.

Re gdb: I am indeed using a local docker build
(gem5/util/dockerfiles/gcn-gpu) with an added gdb installation -- is that
what you meant?

Will send in a PR to the repo soon as I'm done :)

On Wed, Aug 16, 2023, 5:03 PM Matt Sinclair mattdsinclair.wisc@gmail.com
wrote:

Hi Anoop,

A few things here:

  • Regarding the original failure (at least the !FS part), this is normally
    happening either because of the GPU Target ISA (e.g., gfx900) you used in
    your Makefile (e.g., it is not supported) or because you didn't properly
    specify what GPU ISA you are using when running the program.  So, what is
    your command line for running this application and what ISA are you
    specifying in your Makefile?
  • If the "what()" is the real source of the error, then I think this could
    be related to the number of CPU thread contexts you are running with gem5.
    What did you set "-n" to?
  • Regarding gdb, @Matt P: did you remove gdb from what is installed in the
    Docker a while back?  If so, I think Anoop would need to add it back and
    create a local docker or something like that.
  • Setting aside the above, it would be wonderful if you contribute the
    CHAI benchmarks to gem5-resources once you get them working!  Please let us
    know if we can do anything to help with that.

Thanks,
Matt

On Wed, Aug 16, 2023 at 9:51 AM Anoop Mysore via gem5-users <
gem5-users@gem5.org> wrote:

Curiously, running the gem5.debug executable with gdb within docker results
in:
Reading symbols from gem5/build/GCN3_X86/gem5.debug...
(gdb) quit
(the quit wasn't a command I provided, it just quits automatically). Is
gdb working with gem5 GCN3 in Docker?

I ran gem5.opt with ExecAll and SyscallAll debug flags, the debug tail
and the simerr logs are attached.
I don't see anything peculiar other than a tgkill syscall with a SIGABRT
sent to a thread thereafter halting within a few instructions.

On Tue, Aug 15, 2023 at 9:00 PM Anoop Mysore mysanoop@gmail.com wrote:

I am trying to port CHAI benchmarks
https://github.com/chai-benchmarks/chaisimilarly to
gem5-resources/src/gpu/pannotia
https://github.com/gem5/gem5-resources/tree/stable/src/gpu/pannotia.
I was able to HIPify (through the perl script + some manual changes) all
the code files, and ran the BFS program. I see the following error message
at the point of launching the CPU threads here
https://github.com/mysoreanoop/chai/blob/678c18fd551fbf12f4abbb05ab7164f1b588be68/HIP-U-gem5/BFS/main.cpp#L273 (fork
of HIPified CHAI). I do not see any of the prints from the CPU threads
which leads me to believe the error is to do with the threads not being
launched or a related error.

(This looks related; incorporated the suggestion of linking against
-pthread: https://stackoverflow.com/a/6485728)

The stderr log is below; any help is appreciated.


....
AM: Launching CPU
terminate called after throwing an instance of 'std::system_error'
what():  Resource temporarily unavailable
build/GCN3_X86/sim/faults.cc:60: panic: panic condition !FullSystem
occurred: fault (General-Protection) detected @ PC
(0x7ffff6afa941=>0x7ffff6afa942).(0=>1)
Memory Usage: 19704072 KBytes

Program aborted at tick 441590522500
--- BEGIN LIBC BACKTRACE ---
gem5/build/GCN3_X86/gem5.opt(+0x550200)[0x55a709b31200]
gem5/build/GCN3_X86/gem5.opt(+0x57d46e)[0x55a709b5e46e]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x14420)[0x7f18881a0420]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb)[0x7f188734800b]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x12b)[0x7f1887327859]
gem5/build/GCN3_X86/gem5.opt(+0x4be295)[0x55a709a9f295]
gem5/build/GCN3_X86/gem5.opt(+0x5f6169)[0x55a709bd7169]
gem5/build/GCN3_X86/gem5.opt(+0x9fd9ed)[0x55a709fde9ed]
gem5/build/GCN3_X86/gem5.opt(+0x15b1d10)[0x55a70ab92d10]
gem5/build/GCN3_X86/gem5.opt(+0x15b2fd5)[0x55a70ab93fd5]
gem5/build/GCN3_X86/gem5.opt(+0x15b5620)[0x55a70ab96620]
gem5/build/GCN3_X86/gem5.opt(+0x15b6348)[0x55a70ab97348]
gem5/build/GCN3_X86/gem5.opt(+0x15c2954)[0x55a70aba3954]
gem5/build/GCN3_X86/gem5.opt(+0x56a082)[0x55a709b4b082]
gem5/build/GCN3_X86/gem5.opt(+0x59e2c4)[0x55a709b7f2c4]
gem5/build/GCN3_X86/gem5.opt(+0x59e8a3)[0x55a709b7f8a3]
gem5/build/GCN3_X86/gem5.opt(+0x4ed462)[0x55a709ace462]
gem5/build/GCN3_X86/gem5.opt(+0x4af427)[0x55a709a90427]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x2a8738)[0x7f1888459738]

/lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x8dd8)[0x7f188822ef48]

/lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0x8fb)[0x7f188837be3b]

/lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyFunction_Vectorcall+0x94)[0x7f1888459114]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x74d6d)[0x7f1888225d6d]

/lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x7d86)[0x7f188822def6]

/lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0x8fb)[0x7f188837be3b]

/lib/x86_64-linux-gnu/libpython3.8.so.1.0(PyEval_EvalCodeEx+0x42)[0x7f188837c1c2]

/lib/x86_64-linux-gnu/libpython3.8.so.1.0(PyEval_EvalCode+0x1f)[0x7f188837c5af]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x1cfbf1)[0x7f1888380bf1]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x25f537)[0x7f1888410537]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x74d6d)[0x7f1888225d6d]

/lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x12fd)[0x7f188822746d]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x8006b)[0x7f188823106b]
--- END LIBC BACKTRACE ---
Failed to execute default signal handler!



gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-leave@gem5.org

Thank you, Matt, having 10 CPUs (up from previous 3) in the simulated system seems to make it work! (At least, I don't see that error at that point anymore). Is "resource temporarily unavailable" commonly due to CPU count? Curious to know how you made that connection. Re gdb: I am indeed using a local docker build (gem5/util/dockerfiles/gcn-gpu) with an added gdb installation -- is that what you meant? Will send in a PR to the repo soon as I'm done :) On Wed, Aug 16, 2023, 5:03 PM Matt Sinclair <mattdsinclair.wisc@gmail.com> wrote: > Hi Anoop, > > A few things here: > > - Regarding the original failure (at least the !FS part), this is normally > happening either because of the GPU Target ISA (e.g., gfx900) you used in > your Makefile (e.g., it is not supported) or because you didn't properly > specify what GPU ISA you are using when running the program. So, what is > your command line for running this application and what ISA are you > specifying in your Makefile? > - If the "what()" is the real source of the error, then I think this could > be related to the number of CPU thread contexts you are running with gem5. > What did you set "-n" to? > - Regarding gdb, @Matt P: did you remove gdb from what is installed in the > Docker a while back? If so, I think Anoop would need to add it back and > create a local docker or something like that. > - Setting aside the above, it would be wonderful if you contribute the > CHAI benchmarks to gem5-resources once you get them working! Please let us > know if we can do anything to help with that. > > Thanks, > Matt > > On Wed, Aug 16, 2023 at 9:51 AM Anoop Mysore via gem5-users < > gem5-users@gem5.org> wrote: > >> Curiously, running the gem5.debug executable with gdb within docker results >> in: >> Reading symbols from gem5/build/GCN3_X86/gem5.debug... >> (gdb) quit >> (the quit wasn't a command I provided, it just quits automatically). Is >> gdb working with gem5 GCN3 in Docker? >> >> I ran gem5.opt with ExecAll and SyscallAll debug flags, the debug tail >> and the simerr logs are attached. >> I don't see anything peculiar other than a tgkill syscall with a SIGABRT >> sent to a thread thereafter halting within a few instructions. >> >> On Tue, Aug 15, 2023 at 9:00 PM Anoop Mysore <mysanoop@gmail.com> wrote: >> >>> I am trying to port CHAI benchmarks >>> <https://github.com/chai-benchmarks/chai>similarly to >>> gem5-resources/src/gpu/pannotia >>> <https://github.com/gem5/gem5-resources/tree/stable/src/gpu/pannotia>. >>> I was able to HIPify (through the perl script + some manual changes) all >>> the code files, and ran the BFS program. I see the following error message >>> at the point of launching the CPU threads here >>> <https://github.com/mysoreanoop/chai/blob/678c18fd551fbf12f4abbb05ab7164f1b588be68/HIP-U-gem5/BFS/main.cpp#L273> (fork >>> of HIPified CHAI). I do not see any of the prints from the CPU threads >>> which leads me to believe the error is to do with the threads not being >>> launched or a related error. >>> >>> (This looks related; incorporated the suggestion of linking against >>> -pthread: https://stackoverflow.com/a/6485728) >>> >>> The stderr log is below; any help is appreciated. >>> _________ >>> .... >>> AM: Launching CPU >>> terminate called after throwing an instance of 'std::system_error' >>> what(): Resource temporarily unavailable >>> build/GCN3_X86/sim/faults.cc:60: panic: panic condition !FullSystem >>> occurred: fault (General-Protection) detected @ PC >>> (0x7ffff6afa941=>0x7ffff6afa942).(0=>1) >>> Memory Usage: 19704072 KBytes >>> >>> Program aborted at tick 441590522500 >>> --- BEGIN LIBC BACKTRACE --- >>> gem5/build/GCN3_X86/gem5.opt(+0x550200)[0x55a709b31200] >>> gem5/build/GCN3_X86/gem5.opt(+0x57d46e)[0x55a709b5e46e] >>> /lib/x86_64-linux-gnu/libpthread.so.0(+0x14420)[0x7f18881a0420] >>> /lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb)[0x7f188734800b] >>> /lib/x86_64-linux-gnu/libc.so.6(abort+0x12b)[0x7f1887327859] >>> gem5/build/GCN3_X86/gem5.opt(+0x4be295)[0x55a709a9f295] >>> gem5/build/GCN3_X86/gem5.opt(+0x5f6169)[0x55a709bd7169] >>> gem5/build/GCN3_X86/gem5.opt(+0x9fd9ed)[0x55a709fde9ed] >>> gem5/build/GCN3_X86/gem5.opt(+0x15b1d10)[0x55a70ab92d10] >>> gem5/build/GCN3_X86/gem5.opt(+0x15b2fd5)[0x55a70ab93fd5] >>> gem5/build/GCN3_X86/gem5.opt(+0x15b5620)[0x55a70ab96620] >>> gem5/build/GCN3_X86/gem5.opt(+0x15b6348)[0x55a70ab97348] >>> gem5/build/GCN3_X86/gem5.opt(+0x15c2954)[0x55a70aba3954] >>> gem5/build/GCN3_X86/gem5.opt(+0x56a082)[0x55a709b4b082] >>> gem5/build/GCN3_X86/gem5.opt(+0x59e2c4)[0x55a709b7f2c4] >>> gem5/build/GCN3_X86/gem5.opt(+0x59e8a3)[0x55a709b7f8a3] >>> gem5/build/GCN3_X86/gem5.opt(+0x4ed462)[0x55a709ace462] >>> gem5/build/GCN3_X86/gem5.opt(+0x4af427)[0x55a709a90427] >>> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x2a8738)[0x7f1888459738] >>> >>> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x8dd8)[0x7f188822ef48] >>> >>> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0x8fb)[0x7f188837be3b] >>> >>> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyFunction_Vectorcall+0x94)[0x7f1888459114] >>> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x74d6d)[0x7f1888225d6d] >>> >>> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x7d86)[0x7f188822def6] >>> >>> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0x8fb)[0x7f188837be3b] >>> >>> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(PyEval_EvalCodeEx+0x42)[0x7f188837c1c2] >>> >>> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(PyEval_EvalCode+0x1f)[0x7f188837c5af] >>> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x1cfbf1)[0x7f1888380bf1] >>> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x25f537)[0x7f1888410537] >>> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x74d6d)[0x7f1888225d6d] >>> >>> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x12fd)[0x7f188822746d] >>> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x8006b)[0x7f188823106b] >>> --- END LIBC BACKTRACE --- >>> Failed to execute default signal handler! >>> _________ >>> >> _______________________________________________ >> gem5-users mailing list -- gem5-users@gem5.org >> To unsubscribe send an email to gem5-users-leave@gem5.org >> >
MS
Matt Sinclair
Thu, Aug 17, 2023 3:13 PM

Hi Anoop,

I'm glad that increasing -n helped.  It's hard to say what exactly the
problem is without digging in further, but often the ROCm stack will launch
additional processes to do a variety of things (e.g., check which version
of LLVM is being used).  In gem5, each of these require a separate CPU
thread context -- which increasing -n handles in SE mode.  So if I had to
guess, I would say that this is what is happening.

If you added gdb locally to your docker, and you built the docker properly,
then I would expect gdb to work with gem5.

Thanks,
Matt

On Wed, Aug 16, 2023 at 11:41 PM Anoop Mysore mysanoop@gmail.com wrote:

Thank you, Matt, having 10 CPUs (up from previous 3) in the simulated
system seems to make it work! (At least, I don't see that error at that
point anymore). Is "resource temporarily unavailable" commonly due to CPU
count? Curious to know how you made that connection.

Re gdb: I am indeed using a local docker build
(gem5/util/dockerfiles/gcn-gpu) with an added gdb installation -- is that
what you meant?

Will send in a PR to the repo soon as I'm done :)

On Wed, Aug 16, 2023, 5:03 PM Matt Sinclair mattdsinclair.wisc@gmail.com
wrote:

Hi Anoop,

A few things here:

  • Regarding the original failure (at least the !FS part), this is
    normally happening either because of the GPU Target ISA (e.g., gfx900) you
    used in your Makefile (e.g., it is not supported) or because you didn't
    properly specify what GPU ISA you are using when running the program.  So,
    what is your command line for running this application and what ISA are you
    specifying in your Makefile?
  • If the "what()" is the real source of the error, then I think this
    could be related to the number of CPU thread contexts you are running with
    gem5.  What did you set "-n" to?
  • Regarding gdb, @Matt P: did you remove gdb from what is installed in
    the Docker a while back?  If so, I think Anoop would need to add it back
    and create a local docker or something like that.
  • Setting aside the above, it would be wonderful if you contribute the
    CHAI benchmarks to gem5-resources once you get them working!  Please let us
    know if we can do anything to help with that.

Thanks,
Matt

On Wed, Aug 16, 2023 at 9:51 AM Anoop Mysore via gem5-users <
gem5-users@gem5.org> wrote:

Curiously, running the gem5.debug executable with gdb within docker results
in:
Reading symbols from gem5/build/GCN3_X86/gem5.debug...
(gdb) quit
(the quit wasn't a command I provided, it just quits automatically). Is
gdb working with gem5 GCN3 in Docker?

I ran gem5.opt with ExecAll and SyscallAll debug flags, the debug tail
and the simerr logs are attached.
I don't see anything peculiar other than a tgkill syscall with a SIGABRT
sent to a thread thereafter halting within a few instructions.

On Tue, Aug 15, 2023 at 9:00 PM Anoop Mysore mysanoop@gmail.com wrote:

I am trying to port CHAI benchmarks
https://github.com/chai-benchmarks/chaisimilarly to
gem5-resources/src/gpu/pannotia
https://github.com/gem5/gem5-resources/tree/stable/src/gpu/pannotia.
I was able to HIPify (through the perl script + some manual changes) all
the code files, and ran the BFS program. I see the following error message
at the point of launching the CPU threads here
https://github.com/mysoreanoop/chai/blob/678c18fd551fbf12f4abbb05ab7164f1b588be68/HIP-U-gem5/BFS/main.cpp#L273 (fork
of HIPified CHAI). I do not see any of the prints from the CPU threads
which leads me to believe the error is to do with the threads not being
launched or a related error.

(This looks related; incorporated the suggestion of linking against
-pthread: https://stackoverflow.com/a/6485728)

The stderr log is below; any help is appreciated.


....
AM: Launching CPU
terminate called after throwing an instance of 'std::system_error'
what():  Resource temporarily unavailable
build/GCN3_X86/sim/faults.cc:60: panic: panic condition !FullSystem
occurred: fault (General-Protection) detected @ PC
(0x7ffff6afa941=>0x7ffff6afa942).(0=>1)
Memory Usage: 19704072 KBytes

Program aborted at tick 441590522500
--- BEGIN LIBC BACKTRACE ---
gem5/build/GCN3_X86/gem5.opt(+0x550200)[0x55a709b31200]
gem5/build/GCN3_X86/gem5.opt(+0x57d46e)[0x55a709b5e46e]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x14420)[0x7f18881a0420]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb)[0x7f188734800b]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x12b)[0x7f1887327859]
gem5/build/GCN3_X86/gem5.opt(+0x4be295)[0x55a709a9f295]
gem5/build/GCN3_X86/gem5.opt(+0x5f6169)[0x55a709bd7169]
gem5/build/GCN3_X86/gem5.opt(+0x9fd9ed)[0x55a709fde9ed]
gem5/build/GCN3_X86/gem5.opt(+0x15b1d10)[0x55a70ab92d10]
gem5/build/GCN3_X86/gem5.opt(+0x15b2fd5)[0x55a70ab93fd5]
gem5/build/GCN3_X86/gem5.opt(+0x15b5620)[0x55a70ab96620]
gem5/build/GCN3_X86/gem5.opt(+0x15b6348)[0x55a70ab97348]
gem5/build/GCN3_X86/gem5.opt(+0x15c2954)[0x55a70aba3954]
gem5/build/GCN3_X86/gem5.opt(+0x56a082)[0x55a709b4b082]
gem5/build/GCN3_X86/gem5.opt(+0x59e2c4)[0x55a709b7f2c4]
gem5/build/GCN3_X86/gem5.opt(+0x59e8a3)[0x55a709b7f8a3]
gem5/build/GCN3_X86/gem5.opt(+0x4ed462)[0x55a709ace462]
gem5/build/GCN3_X86/gem5.opt(+0x4af427)[0x55a709a90427]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x2a8738)[0x7f1888459738]

/lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x8dd8)[0x7f188822ef48]

/lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0x8fb)[0x7f188837be3b]

/lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyFunction_Vectorcall+0x94)[0x7f1888459114]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x74d6d)[0x7f1888225d6d]

/lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x7d86)[0x7f188822def6]

/lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0x8fb)[0x7f188837be3b]

/lib/x86_64-linux-gnu/libpython3.8.so.1.0(PyEval_EvalCodeEx+0x42)[0x7f188837c1c2]

/lib/x86_64-linux-gnu/libpython3.8.so.1.0(PyEval_EvalCode+0x1f)[0x7f188837c5af]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x1cfbf1)[0x7f1888380bf1]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x25f537)[0x7f1888410537]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x74d6d)[0x7f1888225d6d]

/lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x12fd)[0x7f188822746d]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x8006b)[0x7f188823106b]
--- END LIBC BACKTRACE ---
Failed to execute default signal handler!



gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-leave@gem5.org

Hi Anoop, I'm glad that increasing -n helped. It's hard to say what exactly the problem is without digging in further, but often the ROCm stack will launch additional processes to do a variety of things (e.g., check which version of LLVM is being used). In gem5, each of these require a separate CPU thread context -- which increasing -n handles in SE mode. So if I had to guess, I would say that this is what is happening. If you added gdb locally to your docker, and you built the docker properly, then I would expect gdb to work with gem5. Thanks, Matt On Wed, Aug 16, 2023 at 11:41 PM Anoop Mysore <mysanoop@gmail.com> wrote: > Thank you, Matt, having 10 CPUs (up from previous 3) in the simulated > system seems to make it work! (At least, I don't see that error at that > point anymore). Is "resource temporarily unavailable" commonly due to CPU > count? Curious to know how you made that connection. > > Re gdb: I am indeed using a local docker build > (gem5/util/dockerfiles/gcn-gpu) with an added gdb installation -- is that > what you meant? > > Will send in a PR to the repo soon as I'm done :) > > On Wed, Aug 16, 2023, 5:03 PM Matt Sinclair <mattdsinclair.wisc@gmail.com> > wrote: > >> Hi Anoop, >> >> A few things here: >> >> - Regarding the original failure (at least the !FS part), this is >> normally happening either because of the GPU Target ISA (e.g., gfx900) you >> used in your Makefile (e.g., it is not supported) or because you didn't >> properly specify what GPU ISA you are using when running the program. So, >> what is your command line for running this application and what ISA are you >> specifying in your Makefile? >> - If the "what()" is the real source of the error, then I think this >> could be related to the number of CPU thread contexts you are running with >> gem5. What did you set "-n" to? >> - Regarding gdb, @Matt P: did you remove gdb from what is installed in >> the Docker a while back? If so, I think Anoop would need to add it back >> and create a local docker or something like that. >> - Setting aside the above, it would be wonderful if you contribute the >> CHAI benchmarks to gem5-resources once you get them working! Please let us >> know if we can do anything to help with that. >> >> Thanks, >> Matt >> >> On Wed, Aug 16, 2023 at 9:51 AM Anoop Mysore via gem5-users < >> gem5-users@gem5.org> wrote: >> >>> Curiously, running the gem5.debug executable with gdb within docker results >>> in: >>> Reading symbols from gem5/build/GCN3_X86/gem5.debug... >>> (gdb) quit >>> (the quit wasn't a command I provided, it just quits automatically). Is >>> gdb working with gem5 GCN3 in Docker? >>> >>> I ran gem5.opt with ExecAll and SyscallAll debug flags, the debug tail >>> and the simerr logs are attached. >>> I don't see anything peculiar other than a tgkill syscall with a SIGABRT >>> sent to a thread thereafter halting within a few instructions. >>> >>> On Tue, Aug 15, 2023 at 9:00 PM Anoop Mysore <mysanoop@gmail.com> wrote: >>> >>>> I am trying to port CHAI benchmarks >>>> <https://github.com/chai-benchmarks/chai>similarly to >>>> gem5-resources/src/gpu/pannotia >>>> <https://github.com/gem5/gem5-resources/tree/stable/src/gpu/pannotia>. >>>> I was able to HIPify (through the perl script + some manual changes) all >>>> the code files, and ran the BFS program. I see the following error message >>>> at the point of launching the CPU threads here >>>> <https://github.com/mysoreanoop/chai/blob/678c18fd551fbf12f4abbb05ab7164f1b588be68/HIP-U-gem5/BFS/main.cpp#L273> (fork >>>> of HIPified CHAI). I do not see any of the prints from the CPU threads >>>> which leads me to believe the error is to do with the threads not being >>>> launched or a related error. >>>> >>>> (This looks related; incorporated the suggestion of linking against >>>> -pthread: https://stackoverflow.com/a/6485728) >>>> >>>> The stderr log is below; any help is appreciated. >>>> _________ >>>> .... >>>> AM: Launching CPU >>>> terminate called after throwing an instance of 'std::system_error' >>>> what(): Resource temporarily unavailable >>>> build/GCN3_X86/sim/faults.cc:60: panic: panic condition !FullSystem >>>> occurred: fault (General-Protection) detected @ PC >>>> (0x7ffff6afa941=>0x7ffff6afa942).(0=>1) >>>> Memory Usage: 19704072 KBytes >>>> >>>> Program aborted at tick 441590522500 >>>> --- BEGIN LIBC BACKTRACE --- >>>> gem5/build/GCN3_X86/gem5.opt(+0x550200)[0x55a709b31200] >>>> gem5/build/GCN3_X86/gem5.opt(+0x57d46e)[0x55a709b5e46e] >>>> /lib/x86_64-linux-gnu/libpthread.so.0(+0x14420)[0x7f18881a0420] >>>> /lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb)[0x7f188734800b] >>>> /lib/x86_64-linux-gnu/libc.so.6(abort+0x12b)[0x7f1887327859] >>>> gem5/build/GCN3_X86/gem5.opt(+0x4be295)[0x55a709a9f295] >>>> gem5/build/GCN3_X86/gem5.opt(+0x5f6169)[0x55a709bd7169] >>>> gem5/build/GCN3_X86/gem5.opt(+0x9fd9ed)[0x55a709fde9ed] >>>> gem5/build/GCN3_X86/gem5.opt(+0x15b1d10)[0x55a70ab92d10] >>>> gem5/build/GCN3_X86/gem5.opt(+0x15b2fd5)[0x55a70ab93fd5] >>>> gem5/build/GCN3_X86/gem5.opt(+0x15b5620)[0x55a70ab96620] >>>> gem5/build/GCN3_X86/gem5.opt(+0x15b6348)[0x55a70ab97348] >>>> gem5/build/GCN3_X86/gem5.opt(+0x15c2954)[0x55a70aba3954] >>>> gem5/build/GCN3_X86/gem5.opt(+0x56a082)[0x55a709b4b082] >>>> gem5/build/GCN3_X86/gem5.opt(+0x59e2c4)[0x55a709b7f2c4] >>>> gem5/build/GCN3_X86/gem5.opt(+0x59e8a3)[0x55a709b7f8a3] >>>> gem5/build/GCN3_X86/gem5.opt(+0x4ed462)[0x55a709ace462] >>>> gem5/build/GCN3_X86/gem5.opt(+0x4af427)[0x55a709a90427] >>>> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x2a8738)[0x7f1888459738] >>>> >>>> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x8dd8)[0x7f188822ef48] >>>> >>>> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0x8fb)[0x7f188837be3b] >>>> >>>> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyFunction_Vectorcall+0x94)[0x7f1888459114] >>>> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x74d6d)[0x7f1888225d6d] >>>> >>>> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x7d86)[0x7f188822def6] >>>> >>>> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0x8fb)[0x7f188837be3b] >>>> >>>> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(PyEval_EvalCodeEx+0x42)[0x7f188837c1c2] >>>> >>>> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(PyEval_EvalCode+0x1f)[0x7f188837c5af] >>>> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x1cfbf1)[0x7f1888380bf1] >>>> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x25f537)[0x7f1888410537] >>>> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x74d6d)[0x7f1888225d6d] >>>> >>>> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x12fd)[0x7f188822746d] >>>> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x8006b)[0x7f188823106b] >>>> --- END LIBC BACKTRACE --- >>>> Failed to execute default signal handler! >>>> _________ >>>> >>> _______________________________________________ >>> gem5-users mailing list -- gem5-users@gem5.org >>> To unsubscribe send an email to gem5-users-leave@gem5.org >>> >>
AM
Anoop Mysore
Fri, Sep 8, 2023 2:32 PM

Hi Matt,
I'm facing a few other problems:

  1. panic: panic condition (numWfs * vregDemandPerWI) > (numVectorALUs * numVecRegsPerSimd) occurred: WG with 1 WFs and 29285 VGPRs per WI can not be allocated to CU that has 8192 VGPRs
    The corresponding line of the code in gem5:
    https://github.com/gem5/gem5/blob/f29bfc0640c88a79eb7f94454ce31b3237ec0066/src/gpu-compute/compute_unit.cc#L565
    One of the variables (vregDemandPerWI) is ultimately derived from reading
    the executable for the kernel code. Is it possible to reduce this VGRP
    demand somehow, or is increasing the VGPRs (to what seems like an
    unrealistically high value) be the only solution? Similar error for SGPRs
    as well.
  2. Some kernels (compiled for gfx801/3) have instructions such as ds_add_u32
    https://www.amd.com/system/files/TechDocs/gcn3-instruction-set-architecture.pdf
    (Data
    Store instruction page: 12-161), s_sendmsg (send message to host CPU) --
    which do not have their relevant decoding code available
    https://github.com/gem5/gem5/blob/48a40cf2f5182a82de360b7efa497d82e06b1631/src/arch/amdgpu/gcn3/insts/instructions.cc#L30929.
    Is this intentional or was this just punted for later -- anything to keep
    in mind when coding for these?

On Thu, Aug 17, 2023 at 5:13 PM Matt Sinclair mattdsinclair.wisc@gmail.com
wrote:

Hi Anoop,

I'm glad that increasing -n helped.  It's hard to say what exactly the
problem is without digging in further, but often the ROCm stack will launch
additional processes to do a variety of things (e.g., check which version
of LLVM is being used).  In gem5, each of these require a separate CPU
thread context -- which increasing -n handles in SE mode.  So if I had to
guess, I would say that this is what is happening.

If you added gdb locally to your docker, and you built the docker
properly, then I would expect gdb to work with gem5.

Thanks,
Matt

On Wed, Aug 16, 2023 at 11:41 PM Anoop Mysore mysanoop@gmail.com wrote:

Thank you, Matt, having 10 CPUs (up from previous 3) in the simulated
system seems to make it work! (At least, I don't see that error at that
point anymore). Is "resource temporarily unavailable" commonly due to CPU
count? Curious to know how you made that connection.

Re gdb: I am indeed using a local docker build
(gem5/util/dockerfiles/gcn-gpu) with an added gdb installation -- is that
what you meant?

Will send in a PR to the repo soon as I'm done :)

On Wed, Aug 16, 2023, 5:03 PM Matt Sinclair mattdsinclair.wisc@gmail.com
wrote:

Hi Anoop,

A few things here:

  • Regarding the original failure (at least the !FS part), this is
    normally happening either because of the GPU Target ISA (e.g., gfx900) you
    used in your Makefile (e.g., it is not supported) or because you didn't
    properly specify what GPU ISA you are using when running the program.  So,
    what is your command line for running this application and what ISA are you
    specifying in your Makefile?
  • If the "what()" is the real source of the error, then I think this
    could be related to the number of CPU thread contexts you are running with
    gem5.  What did you set "-n" to?
  • Regarding gdb, @Matt P: did you remove gdb from what is installed in
    the Docker a while back?  If so, I think Anoop would need to add it back
    and create a local docker or something like that.
  • Setting aside the above, it would be wonderful if you contribute the
    CHAI benchmarks to gem5-resources once you get them working!  Please let us
    know if we can do anything to help with that.

Thanks,
Matt

On Wed, Aug 16, 2023 at 9:51 AM Anoop Mysore via gem5-users <
gem5-users@gem5.org> wrote:

Curiously, running the gem5.debug executable with gdb within docker results
in:
Reading symbols from gem5/build/GCN3_X86/gem5.debug...
(gdb) quit
(the quit wasn't a command I provided, it just quits automatically). Is
gdb working with gem5 GCN3 in Docker?

I ran gem5.opt with ExecAll and SyscallAll debug flags, the debug tail
and the simerr logs are attached.
I don't see anything peculiar other than a tgkill syscall with a
SIGABRT sent to a thread thereafter halting within a few instructions.

On Tue, Aug 15, 2023 at 9:00 PM Anoop Mysore mysanoop@gmail.com
wrote:

I am trying to port CHAI benchmarks
https://github.com/chai-benchmarks/chaisimilarly to
gem5-resources/src/gpu/pannotia
https://github.com/gem5/gem5-resources/tree/stable/src/gpu/pannotia.
I was able to HIPify (through the perl script + some manual changes) all
the code files, and ran the BFS program. I see the following error message
at the point of launching the CPU threads here
https://github.com/mysoreanoop/chai/blob/678c18fd551fbf12f4abbb05ab7164f1b588be68/HIP-U-gem5/BFS/main.cpp#L273 (fork
of HIPified CHAI). I do not see any of the prints from the CPU threads
which leads me to believe the error is to do with the threads not being
launched or a related error.

(This looks related; incorporated the suggestion of linking against
-pthread: https://stackoverflow.com/a/6485728)

The stderr log is below; any help is appreciated.


....
AM: Launching CPU
terminate called after throwing an instance of 'std::system_error'
what():  Resource temporarily unavailable
build/GCN3_X86/sim/faults.cc:60: panic: panic condition !FullSystem
occurred: fault (General-Protection) detected @ PC
(0x7ffff6afa941=>0x7ffff6afa942).(0=>1)
Memory Usage: 19704072 KBytes

Program aborted at tick 441590522500
--- BEGIN LIBC BACKTRACE ---
gem5/build/GCN3_X86/gem5.opt(+0x550200)[0x55a709b31200]
gem5/build/GCN3_X86/gem5.opt(+0x57d46e)[0x55a709b5e46e]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x14420)[0x7f18881a0420]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb)[0x7f188734800b]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x12b)[0x7f1887327859]
gem5/build/GCN3_X86/gem5.opt(+0x4be295)[0x55a709a9f295]
gem5/build/GCN3_X86/gem5.opt(+0x5f6169)[0x55a709bd7169]
gem5/build/GCN3_X86/gem5.opt(+0x9fd9ed)[0x55a709fde9ed]
gem5/build/GCN3_X86/gem5.opt(+0x15b1d10)[0x55a70ab92d10]
gem5/build/GCN3_X86/gem5.opt(+0x15b2fd5)[0x55a70ab93fd5]
gem5/build/GCN3_X86/gem5.opt(+0x15b5620)[0x55a70ab96620]
gem5/build/GCN3_X86/gem5.opt(+0x15b6348)[0x55a70ab97348]
gem5/build/GCN3_X86/gem5.opt(+0x15c2954)[0x55a70aba3954]
gem5/build/GCN3_X86/gem5.opt(+0x56a082)[0x55a709b4b082]
gem5/build/GCN3_X86/gem5.opt(+0x59e2c4)[0x55a709b7f2c4]
gem5/build/GCN3_X86/gem5.opt(+0x59e8a3)[0x55a709b7f8a3]
gem5/build/GCN3_X86/gem5.opt(+0x4ed462)[0x55a709ace462]
gem5/build/GCN3_X86/gem5.opt(+0x4af427)[0x55a709a90427]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x2a8738)[0x7f1888459738]

/lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x8dd8)[0x7f188822ef48]

/lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0x8fb)[0x7f188837be3b]

/lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyFunction_Vectorcall+0x94)[0x7f1888459114]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x74d6d)[0x7f1888225d6d]

/lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x7d86)[0x7f188822def6]

/lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0x8fb)[0x7f188837be3b]

/lib/x86_64-linux-gnu/libpython3.8.so.1.0(PyEval_EvalCodeEx+0x42)[0x7f188837c1c2]

/lib/x86_64-linux-gnu/libpython3.8.so.1.0(PyEval_EvalCode+0x1f)[0x7f188837c5af]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x1cfbf1)[0x7f1888380bf1]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x25f537)[0x7f1888410537]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x74d6d)[0x7f1888225d6d]

/lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x12fd)[0x7f188822746d]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x8006b)[0x7f188823106b]
--- END LIBC BACKTRACE ---
Failed to execute default signal handler!



gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-leave@gem5.org

Hi Matt, I'm facing a few other problems: 1. `panic: panic condition (numWfs * vregDemandPerWI) > (numVectorALUs * numVecRegsPerSimd) occurred: WG with 1 WFs and 29285 VGPRs per WI can not be allocated to CU that has 8192 VGPRs` The corresponding line of the code in gem5: https://github.com/gem5/gem5/blob/f29bfc0640c88a79eb7f94454ce31b3237ec0066/src/gpu-compute/compute_unit.cc#L565 One of the variables (vregDemandPerWI) is ultimately derived from reading the executable for the kernel code. Is it possible to reduce this VGRP demand somehow, or is increasing the VGPRs (to what seems like an unrealistically high value) be the only solution? Similar error for SGPRs as well. 2. Some kernels (compiled for gfx801/3) have instructions such as ds_add_u32 <https://www.amd.com/system/files/TechDocs/gcn3-instruction-set-architecture.pdf> (Data Store instruction page: 12-161), s_sendmsg (send message to host CPU) -- which do not have their relevant decoding code available <https://github.com/gem5/gem5/blob/48a40cf2f5182a82de360b7efa497d82e06b1631/src/arch/amdgpu/gcn3/insts/instructions.cc#L30929>. Is this intentional or was this just punted for later -- anything to keep in mind when coding for these? On Thu, Aug 17, 2023 at 5:13 PM Matt Sinclair <mattdsinclair.wisc@gmail.com> wrote: > Hi Anoop, > > I'm glad that increasing -n helped. It's hard to say what exactly the > problem is without digging in further, but often the ROCm stack will launch > additional processes to do a variety of things (e.g., check which version > of LLVM is being used). In gem5, each of these require a separate CPU > thread context -- which increasing -n handles in SE mode. So if I had to > guess, I would say that this is what is happening. > > If you added gdb locally to your docker, and you built the docker > properly, then I would expect gdb to work with gem5. > > Thanks, > Matt > > On Wed, Aug 16, 2023 at 11:41 PM Anoop Mysore <mysanoop@gmail.com> wrote: > >> Thank you, Matt, having 10 CPUs (up from previous 3) in the simulated >> system seems to make it work! (At least, I don't see that error at that >> point anymore). Is "resource temporarily unavailable" commonly due to CPU >> count? Curious to know how you made that connection. >> >> Re gdb: I am indeed using a local docker build >> (gem5/util/dockerfiles/gcn-gpu) with an added gdb installation -- is that >> what you meant? >> >> Will send in a PR to the repo soon as I'm done :) >> >> On Wed, Aug 16, 2023, 5:03 PM Matt Sinclair <mattdsinclair.wisc@gmail.com> >> wrote: >> >>> Hi Anoop, >>> >>> A few things here: >>> >>> - Regarding the original failure (at least the !FS part), this is >>> normally happening either because of the GPU Target ISA (e.g., gfx900) you >>> used in your Makefile (e.g., it is not supported) or because you didn't >>> properly specify what GPU ISA you are using when running the program. So, >>> what is your command line for running this application and what ISA are you >>> specifying in your Makefile? >>> - If the "what()" is the real source of the error, then I think this >>> could be related to the number of CPU thread contexts you are running with >>> gem5. What did you set "-n" to? >>> - Regarding gdb, @Matt P: did you remove gdb from what is installed in >>> the Docker a while back? If so, I think Anoop would need to add it back >>> and create a local docker or something like that. >>> - Setting aside the above, it would be wonderful if you contribute the >>> CHAI benchmarks to gem5-resources once you get them working! Please let us >>> know if we can do anything to help with that. >>> >>> Thanks, >>> Matt >>> >>> On Wed, Aug 16, 2023 at 9:51 AM Anoop Mysore via gem5-users < >>> gem5-users@gem5.org> wrote: >>> >>>> Curiously, running the gem5.debug executable with gdb within docker results >>>> in: >>>> Reading symbols from gem5/build/GCN3_X86/gem5.debug... >>>> (gdb) quit >>>> (the quit wasn't a command I provided, it just quits automatically). Is >>>> gdb working with gem5 GCN3 in Docker? >>>> >>>> I ran gem5.opt with ExecAll and SyscallAll debug flags, the debug tail >>>> and the simerr logs are attached. >>>> I don't see anything peculiar other than a tgkill syscall with a >>>> SIGABRT sent to a thread thereafter halting within a few instructions. >>>> >>>> On Tue, Aug 15, 2023 at 9:00 PM Anoop Mysore <mysanoop@gmail.com> >>>> wrote: >>>> >>>>> I am trying to port CHAI benchmarks >>>>> <https://github.com/chai-benchmarks/chai>similarly to >>>>> gem5-resources/src/gpu/pannotia >>>>> <https://github.com/gem5/gem5-resources/tree/stable/src/gpu/pannotia>. >>>>> I was able to HIPify (through the perl script + some manual changes) all >>>>> the code files, and ran the BFS program. I see the following error message >>>>> at the point of launching the CPU threads here >>>>> <https://github.com/mysoreanoop/chai/blob/678c18fd551fbf12f4abbb05ab7164f1b588be68/HIP-U-gem5/BFS/main.cpp#L273> (fork >>>>> of HIPified CHAI). I do not see any of the prints from the CPU threads >>>>> which leads me to believe the error is to do with the threads not being >>>>> launched or a related error. >>>>> >>>>> (This looks related; incorporated the suggestion of linking against >>>>> -pthread: https://stackoverflow.com/a/6485728) >>>>> >>>>> The stderr log is below; any help is appreciated. >>>>> _________ >>>>> .... >>>>> AM: Launching CPU >>>>> terminate called after throwing an instance of 'std::system_error' >>>>> what(): Resource temporarily unavailable >>>>> build/GCN3_X86/sim/faults.cc:60: panic: panic condition !FullSystem >>>>> occurred: fault (General-Protection) detected @ PC >>>>> (0x7ffff6afa941=>0x7ffff6afa942).(0=>1) >>>>> Memory Usage: 19704072 KBytes >>>>> >>>>> Program aborted at tick 441590522500 >>>>> --- BEGIN LIBC BACKTRACE --- >>>>> gem5/build/GCN3_X86/gem5.opt(+0x550200)[0x55a709b31200] >>>>> gem5/build/GCN3_X86/gem5.opt(+0x57d46e)[0x55a709b5e46e] >>>>> /lib/x86_64-linux-gnu/libpthread.so.0(+0x14420)[0x7f18881a0420] >>>>> /lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb)[0x7f188734800b] >>>>> /lib/x86_64-linux-gnu/libc.so.6(abort+0x12b)[0x7f1887327859] >>>>> gem5/build/GCN3_X86/gem5.opt(+0x4be295)[0x55a709a9f295] >>>>> gem5/build/GCN3_X86/gem5.opt(+0x5f6169)[0x55a709bd7169] >>>>> gem5/build/GCN3_X86/gem5.opt(+0x9fd9ed)[0x55a709fde9ed] >>>>> gem5/build/GCN3_X86/gem5.opt(+0x15b1d10)[0x55a70ab92d10] >>>>> gem5/build/GCN3_X86/gem5.opt(+0x15b2fd5)[0x55a70ab93fd5] >>>>> gem5/build/GCN3_X86/gem5.opt(+0x15b5620)[0x55a70ab96620] >>>>> gem5/build/GCN3_X86/gem5.opt(+0x15b6348)[0x55a70ab97348] >>>>> gem5/build/GCN3_X86/gem5.opt(+0x15c2954)[0x55a70aba3954] >>>>> gem5/build/GCN3_X86/gem5.opt(+0x56a082)[0x55a709b4b082] >>>>> gem5/build/GCN3_X86/gem5.opt(+0x59e2c4)[0x55a709b7f2c4] >>>>> gem5/build/GCN3_X86/gem5.opt(+0x59e8a3)[0x55a709b7f8a3] >>>>> gem5/build/GCN3_X86/gem5.opt(+0x4ed462)[0x55a709ace462] >>>>> gem5/build/GCN3_X86/gem5.opt(+0x4af427)[0x55a709a90427] >>>>> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x2a8738)[0x7f1888459738] >>>>> >>>>> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x8dd8)[0x7f188822ef48] >>>>> >>>>> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0x8fb)[0x7f188837be3b] >>>>> >>>>> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyFunction_Vectorcall+0x94)[0x7f1888459114] >>>>> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x74d6d)[0x7f1888225d6d] >>>>> >>>>> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x7d86)[0x7f188822def6] >>>>> >>>>> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0x8fb)[0x7f188837be3b] >>>>> >>>>> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(PyEval_EvalCodeEx+0x42)[0x7f188837c1c2] >>>>> >>>>> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(PyEval_EvalCode+0x1f)[0x7f188837c5af] >>>>> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x1cfbf1)[0x7f1888380bf1] >>>>> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x25f537)[0x7f1888410537] >>>>> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x74d6d)[0x7f1888225d6d] >>>>> >>>>> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x12fd)[0x7f188822746d] >>>>> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x8006b)[0x7f188823106b] >>>>> --- END LIBC BACKTRACE --- >>>>> Failed to execute default signal handler! >>>>> _________ >>>>> >>>> _______________________________________________ >>>> gem5-users mailing list -- gem5-users@gem5.org >>>> To unsubscribe send an email to gem5-users-leave@gem5.org >>>> >>>
PM
Poremba, Matthew
Fri, Sep 8, 2023 2:50 PM

[Public]

Hi Anoop,

Based on that register count, I am going to guess you built the application with -O0 or some other debugging flags?  If you do this, the compiler makes some super large number of registers. I assume that is so a real GPU will not run any other applications simultaneously.

Similarly, if you are seeing s_sendmsg I am going to guess there is a printf() in your GPU kernel.  These aren’t currently supported in gem5, but something that would be very nice to have.

If these are true you will need to remove any printfs and compile with at least -O1 to run in gem5.

-Matt

From: Anoop Mysore mysanoop@gmail.com
Sent: Friday, September 8, 2023 7:33 AM
To: Matt Sinclair mattdsinclair.wisc@gmail.com
Cc: The gem5 Users mailing list gem5-users@gem5.org; Poremba, Matthew Matthew.Poremba@amd.com
Subject: Re: [gem5-users] Re: Error in an application running on gem5 GCN3 (with apu_se.py)

Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.

Hi Matt,
I'm facing a few other problems:

  1. panic: panic condition (numWfs * vregDemandPerWI) > (numVectorALUs * numVecRegsPerSimd) occurred: WG with 1 WFs and 29285 VGPRs per WI can not be allocated to CU that has 8192 VGPRs
    The corresponding line of the code in gem5: https://github.com/gem5/gem5/blob/f29bfc0640c88a79eb7f94454ce31b3237ec0066/src/gpu-compute/compute_unit.cc#L565
    One of the variables (vregDemandPerWI) is ultimately derived from reading the executable for the kernel code. Is it possible to reduce this VGRP demand somehow, or is increasing the VGPRs (to what seems like an unrealistically high value) be the only solution? Similar error for SGPRs as well.
  2. Some kernels (compiled for gfx801/3) have instructions such as ds_add_u32https://www.amd.com/system/files/TechDocs/gcn3-instruction-set-architecture.pdf (Data Store instruction page: 12-161), s_sendmsg (send message to host CPU) -- which do not have their relevant decoding code availablehttps://github.com/gem5/gem5/blob/48a40cf2f5182a82de360b7efa497d82e06b1631/src/arch/amdgpu/gcn3/insts/instructions.cc#L30929. Is this intentional or was this just punted for later -- anything to keep in mind when coding for these?

On Thu, Aug 17, 2023 at 5:13 PM Matt Sinclair <mattdsinclair.wisc@gmail.commailto:mattdsinclair.wisc@gmail.com> wrote:
Hi Anoop,

I'm glad that increasing -n helped.  It's hard to say what exactly the problem is without digging in further, but often the ROCm stack will launch additional processes to do a variety of things (e.g., check which version of LLVM is being used).  In gem5, each of these require a separate CPU thread context -- which increasing -n handles in SE mode.  So if I had to guess, I would say that this is what is happening.

If you added gdb locally to your docker, and you built the docker properly, then I would expect gdb to work with gem5.

Thanks,
Matt

On Wed, Aug 16, 2023 at 11:41 PM Anoop Mysore <mysanoop@gmail.commailto:mysanoop@gmail.com> wrote:
Thank you, Matt, having 10 CPUs (up from previous 3) in the simulated system seems to make it work! (At least, I don't see that error at that point anymore). Is "resource temporarily unavailable" commonly due to CPU count? Curious to know how you made that connection.

Re gdb: I am indeed using a local docker build (gem5/util/dockerfiles/gcn-gpu) with an added gdb installation -- is that what you meant?

Will send in a PR to the repo soon as I'm done :)
On Wed, Aug 16, 2023, 5:03 PM Matt Sinclair <mattdsinclair.wisc@gmail.commailto:mattdsinclair.wisc@gmail.com> wrote:
Hi Anoop,

A few things here:

  • Regarding the original failure (at least the !FS part), this is normally happening either because of the GPU Target ISA (e.g., gfx900) you used in your Makefile (e.g., it is not supported) or because you didn't properly specify what GPU ISA you are using when running the program.  So, what is your command line for running this application and what ISA are you specifying in your Makefile?
  • If the "what()" is the real source of the error, then I think this could be related to the number of CPU thread contexts you are running with gem5.  What did you set "-n" to?
  • Regarding gdb, @Matt P: did you remove gdb from what is installed in the Docker a while back?  If so, I think Anoop would need to add it back and create a local docker or something like that.
  • Setting aside the above, it would be wonderful if you contribute the CHAI benchmarks to gem5-resources once you get them working!  Please let us know if we can do anything to help with that.

Thanks,
Matt

On Wed, Aug 16, 2023 at 9:51 AM Anoop Mysore via gem5-users <gem5-users@gem5.orgmailto:gem5-users@gem5.org> wrote:
Curiously, running the gem5.debug executable with gdb within docker results in:
Reading symbols from gem5/build/GCN3_X86/gem5.debug...
(gdb) quit
(the quit wasn't a command I provided, it just quits automatically). Is gdb working with gem5 GCN3 in Docker?

I ran gem5.opt with ExecAll and SyscallAll debug flags, the debug tail and the simerr logs are attached.
I don't see anything peculiar other than a tgkill syscall with a SIGABRT sent to a thread thereafter halting within a few instructions.

On Tue, Aug 15, 2023 at 9:00 PM Anoop Mysore <mysanoop@gmail.commailto:mysanoop@gmail.com> wrote:
I am trying to port CHAI benchmarks https://github.com/chai-benchmarks/chai similarly to gem5-resources/src/gpu/pannotiahttps://github.com/gem5/gem5-resources/tree/stable/src/gpu/pannotia. I was able to HIPify (through the perl script + some manual changes) all the code files, and ran the BFS program. I see the following error message at the point of launching the CPU threads herehttps://github.com/mysoreanoop/chai/blob/678c18fd551fbf12f4abbb05ab7164f1b588be68/HIP-U-gem5/BFS/main.cpp#L273 (fork of HIPified CHAI). I do not see any of the prints from the CPU threads which leads me to believe the error is to do with the threads not being launched or a related error.

(This looks related; incorporated the suggestion of linking against -pthread: https://stackoverflow.com/a/6485728)

The stderr log is below; any help is appreciated.


....
AM: Launching CPU
terminate called after throwing an instance of 'std::system_error'
what():  Resource temporarily unavailable
build/GCN3_X86/sim/faults.cc:60: panic: panic condition !FullSystem occurred: fault (General-Protection) detected @ PC (0x7ffff6afa941=>0x7ffff6afa942).(0=>1)
Memory Usage: 19704072 KBytes
Program aborted at tick 441590522500
--- BEGIN LIBC BACKTRACE ---
gem5/build/GCN3_X86/gem5.opt(+0x550200)[0x55a709b31200]
gem5/build/GCN3_X86/gem5.opt(+0x57d46e)[0x55a709b5e46e]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x14420)[0x7f18881a0420]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb)[0x7f188734800b]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x12b)[0x7f1887327859]
gem5/build/GCN3_X86/gem5.opt(+0x4be295)[0x55a709a9f295]
gem5/build/GCN3_X86/gem5.opt(+0x5f6169)[0x55a709bd7169]
gem5/build/GCN3_X86/gem5.opt(+0x9fd9ed)[0x55a709fde9ed]
gem5/build/GCN3_X86/gem5.opt(+0x15b1d10)[0x55a70ab92d10]
gem5/build/GCN3_X86/gem5.opt(+0x15b2fd5)[0x55a70ab93fd5]
gem5/build/GCN3_X86/gem5.opt(+0x15b5620)[0x55a70ab96620]
gem5/build/GCN3_X86/gem5.opt(+0x15b6348)[0x55a70ab97348]
gem5/build/GCN3_X86/gem5.opt(+0x15c2954)[0x55a70aba3954]
gem5/build/GCN3_X86/gem5.opt(+0x56a082)[0x55a709b4b082]
gem5/build/GCN3_X86/gem5.opt(+0x59e2c4)[0x55a709b7f2c4]
gem5/build/GCN3_X86/gem5.opt(+0x59e8a3)[0x55a709b7f8a3]
gem5/build/GCN3_X86/gem5.opt(+0x4ed462)[0x55a709ace462]
gem5/build/GCN3_X86/gem5.opt(+0x4af427)[0x55a709a90427]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x2a8738)[0x7f1888459738]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x8dd8)[0x7f188822ef48]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0x8fb)[0x7f188837be3b]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyFunction_Vectorcall+0x94)[0x7f1888459114]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x74d6d)[0x7f1888225d6d]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x7d86)[0x7f188822def6]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0x8fb)[0x7f188837be3b]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(PyEval_EvalCodeEx+0x42)[0x7f188837c1c2]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(PyEval_EvalCode+0x1f)[0x7f188837c5af]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x1cfbf1)[0x7f1888380bf1]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x25f537)[0x7f1888410537]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x74d6d)[0x7f1888225d6d]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x12fd)[0x7f188822746d]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x8006b)[0x7f188823106b]
--- END LIBC BACKTRACE ---
Failed to execute default signal handler!



gem5-users mailing list -- gem5-users@gem5.orgmailto:gem5-users@gem5.org
To unsubscribe send an email to gem5-users-leave@gem5.orgmailto:gem5-users-leave@gem5.org

[Public] Hi Anoop, Based on that register count, I am going to guess you built the application with -O0 or some other debugging flags? If you do this, the compiler makes some super large number of registers. I assume that is so a real GPU will not run any other applications simultaneously. Similarly, if you are seeing s_sendmsg I am going to guess there is a printf() in your GPU kernel. These aren’t currently supported in gem5, but something that would be very nice to have. If these are true you will need to remove any printfs and compile with at least -O1 to run in gem5. -Matt From: Anoop Mysore <mysanoop@gmail.com> Sent: Friday, September 8, 2023 7:33 AM To: Matt Sinclair <mattdsinclair.wisc@gmail.com> Cc: The gem5 Users mailing list <gem5-users@gem5.org>; Poremba, Matthew <Matthew.Poremba@amd.com> Subject: Re: [gem5-users] Re: Error in an application running on gem5 GCN3 (with apu_se.py) Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding. Hi Matt, I'm facing a few other problems: 1. `panic: panic condition (numWfs * vregDemandPerWI) > (numVectorALUs * numVecRegsPerSimd) occurred: WG with 1 WFs and 29285 VGPRs per WI can not be allocated to CU that has 8192 VGPRs` The corresponding line of the code in gem5: https://github.com/gem5/gem5/blob/f29bfc0640c88a79eb7f94454ce31b3237ec0066/src/gpu-compute/compute_unit.cc#L565 One of the variables (vregDemandPerWI) is ultimately derived from reading the executable for the kernel code. Is it possible to reduce this VGRP demand somehow, or is increasing the VGPRs (to what seems like an unrealistically high value) be the only solution? Similar error for SGPRs as well. 2. Some kernels (compiled for gfx801/3) have instructions such as ds_add_u32<https://www.amd.com/system/files/TechDocs/gcn3-instruction-set-architecture.pdf> (Data Store instruction page: 12-161), s_sendmsg (send message to host CPU) -- which do not have their relevant decoding code available<https://github.com/gem5/gem5/blob/48a40cf2f5182a82de360b7efa497d82e06b1631/src/arch/amdgpu/gcn3/insts/instructions.cc#L30929>. Is this intentional or was this just punted for later -- anything to keep in mind when coding for these? On Thu, Aug 17, 2023 at 5:13 PM Matt Sinclair <mattdsinclair.wisc@gmail.com<mailto:mattdsinclair.wisc@gmail.com>> wrote: Hi Anoop, I'm glad that increasing -n helped. It's hard to say what exactly the problem is without digging in further, but often the ROCm stack will launch additional processes to do a variety of things (e.g., check which version of LLVM is being used). In gem5, each of these require a separate CPU thread context -- which increasing -n handles in SE mode. So if I had to guess, I would say that this is what is happening. If you added gdb locally to your docker, and you built the docker properly, then I would expect gdb to work with gem5. Thanks, Matt On Wed, Aug 16, 2023 at 11:41 PM Anoop Mysore <mysanoop@gmail.com<mailto:mysanoop@gmail.com>> wrote: Thank you, Matt, having 10 CPUs (up from previous 3) in the simulated system seems to make it work! (At least, I don't see that error at that point anymore). Is "resource temporarily unavailable" commonly due to CPU count? Curious to know how you made that connection. Re gdb: I am indeed using a local docker build (gem5/util/dockerfiles/gcn-gpu) with an added gdb installation -- is that what you meant? Will send in a PR to the repo soon as I'm done :) On Wed, Aug 16, 2023, 5:03 PM Matt Sinclair <mattdsinclair.wisc@gmail.com<mailto:mattdsinclair.wisc@gmail.com>> wrote: Hi Anoop, A few things here: - Regarding the original failure (at least the !FS part), this is normally happening either because of the GPU Target ISA (e.g., gfx900) you used in your Makefile (e.g., it is not supported) or because you didn't properly specify what GPU ISA you are using when running the program. So, what is your command line for running this application and what ISA are you specifying in your Makefile? - If the "what()" is the real source of the error, then I think this could be related to the number of CPU thread contexts you are running with gem5. What did you set "-n" to? - Regarding gdb, @Matt P: did you remove gdb from what is installed in the Docker a while back? If so, I think Anoop would need to add it back and create a local docker or something like that. - Setting aside the above, it would be wonderful if you contribute the CHAI benchmarks to gem5-resources once you get them working! Please let us know if we can do anything to help with that. Thanks, Matt On Wed, Aug 16, 2023 at 9:51 AM Anoop Mysore via gem5-users <gem5-users@gem5.org<mailto:gem5-users@gem5.org>> wrote: Curiously, running the gem5.debug executable with gdb within docker results in: Reading symbols from gem5/build/GCN3_X86/gem5.debug... (gdb) quit (the quit wasn't a command I provided, it just quits automatically). Is gdb working with gem5 GCN3 in Docker? I ran gem5.opt with ExecAll and SyscallAll debug flags, the debug tail and the simerr logs are attached. I don't see anything peculiar other than a tgkill syscall with a SIGABRT sent to a thread thereafter halting within a few instructions. On Tue, Aug 15, 2023 at 9:00 PM Anoop Mysore <mysanoop@gmail.com<mailto:mysanoop@gmail.com>> wrote: I am trying to port CHAI benchmarks <https://github.com/chai-benchmarks/chai> similarly to gem5-resources/src/gpu/pannotia<https://github.com/gem5/gem5-resources/tree/stable/src/gpu/pannotia>. I was able to HIPify (through the perl script + some manual changes) all the code files, and ran the BFS program. I see the following error message at the point of launching the CPU threads here<https://github.com/mysoreanoop/chai/blob/678c18fd551fbf12f4abbb05ab7164f1b588be68/HIP-U-gem5/BFS/main.cpp#L273> (fork of HIPified CHAI). I do not see any of the prints from the CPU threads which leads me to believe the error is to do with the threads not being launched or a related error. (This looks related; incorporated the suggestion of linking against -pthread: https://stackoverflow.com/a/6485728) The stderr log is below; any help is appreciated. _________ .... AM: Launching CPU terminate called after throwing an instance of 'std::system_error' what(): Resource temporarily unavailable build/GCN3_X86/sim/faults.cc:60: panic: panic condition !FullSystem occurred: fault (General-Protection) detected @ PC (0x7ffff6afa941=>0x7ffff6afa942).(0=>1) Memory Usage: 19704072 KBytes Program aborted at tick 441590522500 --- BEGIN LIBC BACKTRACE --- gem5/build/GCN3_X86/gem5.opt(+0x550200)[0x55a709b31200] gem5/build/GCN3_X86/gem5.opt(+0x57d46e)[0x55a709b5e46e] /lib/x86_64-linux-gnu/libpthread.so.0(+0x14420)[0x7f18881a0420] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb)[0x7f188734800b] /lib/x86_64-linux-gnu/libc.so.6(abort+0x12b)[0x7f1887327859] gem5/build/GCN3_X86/gem5.opt(+0x4be295)[0x55a709a9f295] gem5/build/GCN3_X86/gem5.opt(+0x5f6169)[0x55a709bd7169] gem5/build/GCN3_X86/gem5.opt(+0x9fd9ed)[0x55a709fde9ed] gem5/build/GCN3_X86/gem5.opt(+0x15b1d10)[0x55a70ab92d10] gem5/build/GCN3_X86/gem5.opt(+0x15b2fd5)[0x55a70ab93fd5] gem5/build/GCN3_X86/gem5.opt(+0x15b5620)[0x55a70ab96620] gem5/build/GCN3_X86/gem5.opt(+0x15b6348)[0x55a70ab97348] gem5/build/GCN3_X86/gem5.opt(+0x15c2954)[0x55a70aba3954] gem5/build/GCN3_X86/gem5.opt(+0x56a082)[0x55a709b4b082] gem5/build/GCN3_X86/gem5.opt(+0x59e2c4)[0x55a709b7f2c4] gem5/build/GCN3_X86/gem5.opt(+0x59e8a3)[0x55a709b7f8a3] gem5/build/GCN3_X86/gem5.opt(+0x4ed462)[0x55a709ace462] gem5/build/GCN3_X86/gem5.opt(+0x4af427)[0x55a709a90427] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x2a8738)[0x7f1888459738] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x8dd8)[0x7f188822ef48] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0x8fb)[0x7f188837be3b] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyFunction_Vectorcall+0x94)[0x7f1888459114] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x74d6d)[0x7f1888225d6d] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x7d86)[0x7f188822def6] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0x8fb)[0x7f188837be3b] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(PyEval_EvalCodeEx+0x42)[0x7f188837c1c2] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(PyEval_EvalCode+0x1f)[0x7f188837c5af] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x1cfbf1)[0x7f1888380bf1] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x25f537)[0x7f1888410537] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x74d6d)[0x7f1888225d6d] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x12fd)[0x7f188822746d] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x8006b)[0x7f188823106b] --- END LIBC BACKTRACE --- Failed to execute default signal handler! _________ _______________________________________________ gem5-users mailing list -- gem5-users@gem5.org<mailto:gem5-users@gem5.org> To unsubscribe send an email to gem5-users-leave@gem5.org<mailto:gem5-users-leave@gem5.org>
AM
Anoop Mysore
Mon, Sep 11, 2023 5:32 PM

Thanks, Matt. Yes, the printfs in the GPU kernel code were the issue for
s_sendmsg.
However, the ds_add_u32 instruction is still an issue. I am already
compiling with -O1 like so:
/opt/rocm/hip/bin/hipcc --amdgpu-target=gfx801,gfx803
main.cpp kernel.cu kernel.cpp
-o ./bin/hsto.gem5

-I/home/anoop/new/gem5-resources/src/gpu/chai/HIP-U-gem5/HSTO/../.gem5/include
-lz -lm -lc -lpthread -O1

-L/home/anoop/new/gem5-resources/src/gpu/chai/HIP-U-gem5/HSTO/../.gem5/util/m5/build/x86/out
-lm5

The exact error is:
src/gpu-compute/scoreboard_check_stage.cc:158: panic: next instruction:
ds_add_u32 v7, v8 is of unknown type

The corresponding line in the simulator
https://github.com/gem5/gem5/blob/48a40cf2f5182a82de360b7efa497d82e06b1631/src/gpu-compute/scoreboard_check_stage.cc#L158,
and decoder section of it
https://github.com/gem5/gem5/blob/48a40cf2f5182a82de360b7efa497d82e06b1631/src/arch/amdgpu/gcn3/insts/instructions.cc#L30929.
Because of the involvement of the LDS/GDS, I'm unsure how to implement this
-- any help would be appreciated.

Also, GDB still doesn't seem to be working with my gem5. And without prints
in the kernel, it's cumbersome to get any useful insight on failing
programs.
I added within the Dockerfile: RUN apt install -y gdb
I am invoking gdb with:
docker run -u $UID:$GID --volume $(pwd):$(pwd) -w $(pwd) gem5:new gdb
--args gem5/build/GCN3_X86/gem5.debug gem5/configs/example/apu_se.py
--cpu-type=DerivO3CPU --num-cpus=4 --mem-size=1GB --ruby
--mem-type=SimpleMemory -c
gem5-resources/src/gpu/chai/HIP-U-gem5/HSTO/bin/hsto.gem5

Log:
GNU gdb (Ubuntu 9.2-0ubuntu1~20.04.1) 9.2
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html

This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
http://www.gnu.org/software/gdb/bugs/.
Find the GDB manual and other documentation resources online at:
http://www.gnu.org/software/gdb/documentation/.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from gem5/build/GCN3_X86/gem5.debug...
(gdb) quit

PS: quit was automatically taken in.
Is there anything wrong I'm doing here?

On Fri, Sep 8, 2023 at 4:50 PM Poremba, Matthew Matthew.Poremba@amd.com
wrote:

[Public]

Hi Anoop,

Based on that register count, I am going to guess you built the
application with -O0 or some other debugging flags?  If you do this, the
compiler makes some super large number of registers. I assume that is so a
real GPU will not run any other applications simultaneously.

Similarly, if you are seeing s_sendmsg I am going to guess there is a
printf() in your GPU kernel.  These aren’t currently supported in gem5, but
something that would be very nice to have.

If these are true you will need to remove any printfs and compile with at
least -O1 to run in gem5.

-Matt

From: Anoop Mysore mysanoop@gmail.com
Sent: Friday, September 8, 2023 7:33 AM
To: Matt Sinclair mattdsinclair.wisc@gmail.com
Cc: The gem5 Users mailing list gem5-users@gem5.org; Poremba, Matthew
Matthew.Poremba@amd.com
Subject: Re: [gem5-users] Re: Error in an application running on gem5
GCN3 (with apu_se.py)

Caution: This message originated from an External Source. Use proper
caution when opening attachments, clicking links, or responding.

Hi Matt,
I'm facing a few other problems:

  1. panic: panic condition (numWfs * vregDemandPerWI) > (numVectorALUs * numVecRegsPerSimd) occurred: WG with 1 WFs and 29285 VGPRs per WI can not be allocated to CU that has 8192 VGPRs

The corresponding line of the code in gem5:
https://github.com/gem5/gem5/blob/f29bfc0640c88a79eb7f94454ce31b3237ec0066/src/gpu-compute/compute_unit.cc#L565

One of the variables (vregDemandPerWI) is ultimately derived from reading
the executable for the kernel code. Is it possible to reduce this VGRP
demand somehow, or is increasing the VGPRs (to what seems like an
unrealistically high value) be the only solution? Similar error for SGPRs
as well.

  1. Some kernels (compiled for gfx801/3) have instructions such as
    ds_add_u32
    https://www.amd.com/system/files/TechDocs/gcn3-instruction-set-architecture.pdf (Data
    Store instruction page: 12-161), s_sendmsg (send message to host CPU) --
    which do not have their relevant decoding code available
    https://github.com/gem5/gem5/blob/48a40cf2f5182a82de360b7efa497d82e06b1631/src/arch/amdgpu/gcn3/insts/instructions.cc#L30929.
    Is this intentional or was this just punted for later -- anything to keep
    in mind when coding for these?

On Thu, Aug 17, 2023 at 5:13 PM Matt Sinclair <
mattdsinclair.wisc@gmail.com> wrote:

Hi Anoop,

I'm glad that increasing -n helped.  It's hard to say what exactly the
problem is without digging in further, but often the ROCm stack will launch
additional processes to do a variety of things (e.g., check which version
of LLVM is being used).  In gem5, each of these require a separate CPU
thread context -- which increasing -n handles in SE mode.  So if I had to
guess, I would say that this is what is happening.

If you added gdb locally to your docker, and you built the docker
properly, then I would expect gdb to work with gem5.

Thanks,

Matt

On Wed, Aug 16, 2023 at 11:41 PM Anoop Mysore mysanoop@gmail.com wrote:

Thank you, Matt, having 10 CPUs (up from previous 3) in the simulated
system seems to make it work! (At least, I don't see that error at that
point anymore). Is "resource temporarily unavailable" commonly due to CPU
count? Curious to know how you made that connection.

Re gdb: I am indeed using a local docker build
(gem5/util/dockerfiles/gcn-gpu) with an added gdb installation -- is that
what you meant?

Will send in a PR to the repo soon as I'm done :)

On Wed, Aug 16, 2023, 5:03 PM Matt Sinclair mattdsinclair.wisc@gmail.com
wrote:

Hi Anoop,

A few things here:

  • Regarding the original failure (at least the !FS part), this is normally
    happening either because of the GPU Target ISA (e.g., gfx900) you used in
    your Makefile (e.g., it is not supported) or because you didn't properly
    specify what GPU ISA you are using when running the program.  So, what is
    your command line for running this application and what ISA are you
    specifying in your Makefile?

  • If the "what()" is the real source of the error, then I think this could
    be related to the number of CPU thread contexts you are running with gem5.
    What did you set "-n" to?

  • Regarding gdb, @Matt P: did you remove gdb from what is installed in the
    Docker a while back?  If so, I think Anoop would need to add it back and
    create a local docker or something like that.

  • Setting aside the above, it would be wonderful if you contribute the
    CHAI benchmarks to gem5-resources once you get them working!  Please let us
    know if we can do anything to help with that.

Thanks,

Matt

On Wed, Aug 16, 2023 at 9:51 AM Anoop Mysore via gem5-users <
gem5-users@gem5.org> wrote:

Curiously, running the gem5.debug executable with gdb within docker
results in:

Reading symbols from gem5/build/GCN3_X86/gem5.debug...
(gdb) quit
(the quit wasn't a command I provided, it just quits automatically). Is
gdb working with gem5 GCN3 in Docker?

I ran gem5.opt with ExecAll and SyscallAll debug flags, the debug tail and
the simerr logs are attached.

I don't see anything peculiar other than a tgkill syscall with a SIGABRT
sent to a thread thereafter halting within a few instructions.

On Tue, Aug 15, 2023 at 9:00 PM Anoop Mysore mysanoop@gmail.com wrote:

I am trying to port CHAI benchmarks
https://github.com/chai-benchmarks/chaisimilarly to
gem5-resources/src/gpu/pannotia
https://github.com/gem5/gem5-resources/tree/stable/src/gpu/pannotia. I
was able to HIPify (through the perl script + some manual changes) all the
code files, and ran the BFS program. I see the following error message at
the point of launching the CPU threads here
https://github.com/mysoreanoop/chai/blob/678c18fd551fbf12f4abbb05ab7164f1b588be68/HIP-U-gem5/BFS/main.cpp#L273 (fork
of HIPified CHAI). I do not see any of the prints from the CPU threads
which leads me to believe the error is to do with the threads not being
launched or a related error.

(This looks related; incorporated the suggestion of linking against
-pthread: https://stackoverflow.com/a/6485728)

The stderr log is below; any help is appreciated.


....

AM: Launching CPU

terminate called after throwing an instance of 'std::system_error'

what():  Resource temporarily unavailable

build/GCN3_X86/sim/faults.cc:60: panic: panic condition !FullSystem
occurred: fault (General-Protection) detected @ PC
(0x7ffff6afa941=>0x7ffff6afa942).(0=>1)
Memory Usage: 19704072 KBytes

Program aborted at tick 441590522500

--- BEGIN LIBC BACKTRACE ---
gem5/build/GCN3_X86/gem5.opt(+0x550200)[0x55a709b31200]
gem5/build/GCN3_X86/gem5.opt(+0x57d46e)[0x55a709b5e46e]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x14420)[0x7f18881a0420]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb)[0x7f188734800b]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x12b)[0x7f1887327859]
gem5/build/GCN3_X86/gem5.opt(+0x4be295)[0x55a709a9f295]
gem5/build/GCN3_X86/gem5.opt(+0x5f6169)[0x55a709bd7169]
gem5/build/GCN3_X86/gem5.opt(+0x9fd9ed)[0x55a709fde9ed]
gem5/build/GCN3_X86/gem5.opt(+0x15b1d10)[0x55a70ab92d10]
gem5/build/GCN3_X86/gem5.opt(+0x15b2fd5)[0x55a70ab93fd5]
gem5/build/GCN3_X86/gem5.opt(+0x15b5620)[0x55a70ab96620]
gem5/build/GCN3_X86/gem5.opt(+0x15b6348)[0x55a70ab97348]
gem5/build/GCN3_X86/gem5.opt(+0x15c2954)[0x55a70aba3954]
gem5/build/GCN3_X86/gem5.opt(+0x56a082)[0x55a709b4b082]
gem5/build/GCN3_X86/gem5.opt(+0x59e2c4)[0x55a709b7f2c4]
gem5/build/GCN3_X86/gem5.opt(+0x59e8a3)[0x55a709b7f8a3]
gem5/build/GCN3_X86/gem5.opt(+0x4ed462)[0x55a709ace462]
gem5/build/GCN3_X86/gem5.opt(+0x4af427)[0x55a709a90427]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x2a8738)[0x7f1888459738]

/lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x8dd8)[0x7f188822ef48]

/lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0x8fb)[0x7f188837be3b]

/lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyFunction_Vectorcall+0x94)[0x7f1888459114]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x74d6d)[0x7f1888225d6d]

/lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x7d86)[0x7f188822def6]

/lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0x8fb)[0x7f188837be3b]

/lib/x86_64-linux-gnu/libpython3.8.so.1.0(PyEval_EvalCodeEx+0x42)[0x7f188837c1c2]

/lib/x86_64-linux-gnu/libpython3.8.so.1.0(PyEval_EvalCode+0x1f)[0x7f188837c5af]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x1cfbf1)[0x7f1888380bf1]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x25f537)[0x7f1888410537]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x74d6d)[0x7f1888225d6d]

/lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x12fd)[0x7f188822746d]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x8006b)[0x7f188823106b]
--- END LIBC BACKTRACE ---
Failed to execute default signal handler!



gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-leave@gem5.org

Thanks, Matt. Yes, the printfs in the GPU kernel code were the issue for s_sendmsg. However, the ds_add_u32 instruction is still an issue. I am already compiling with -O1 like so: /opt/rocm/hip/bin/hipcc --amdgpu-target=gfx801,gfx803 main.cpp kernel.cu kernel.cpp -o ./bin/hsto.gem5 -I/home/anoop/new/gem5-resources/src/gpu/chai/HIP-U-gem5/HSTO/../.gem5/include -lz -lm -lc -lpthread -O1 -L/home/anoop/new/gem5-resources/src/gpu/chai/HIP-U-gem5/HSTO/../.gem5/util/m5/build/x86/out -lm5 The exact error is: src/gpu-compute/scoreboard_check_stage.cc:158: panic: next instruction: ds_add_u32 v7, v8 is of unknown type The corresponding line in the simulator <https://github.com/gem5/gem5/blob/48a40cf2f5182a82de360b7efa497d82e06b1631/src/gpu-compute/scoreboard_check_stage.cc#L158>, and decoder section of it <https://github.com/gem5/gem5/blob/48a40cf2f5182a82de360b7efa497d82e06b1631/src/arch/amdgpu/gcn3/insts/instructions.cc#L30929>. Because of the involvement of the LDS/GDS, I'm unsure how to implement this -- any help would be appreciated. Also, GDB still doesn't seem to be working with my gem5. And without prints in the kernel, it's cumbersome to get any useful insight on failing programs. I added within the Dockerfile: RUN apt install -y gdb I am invoking gdb with: docker run -u $UID:$GID --volume $(pwd):$(pwd) -w $(pwd) gem5:new gdb --args gem5/build/GCN3_X86/gem5.debug gem5/configs/example/apu_se.py --cpu-type=DerivO3CPU --num-cpus=4 --mem-size=1GB --ruby --mem-type=SimpleMemory -c gem5-resources/src/gpu/chai/HIP-U-gem5/HSTO/bin/hsto.gem5 Log: GNU gdb (Ubuntu 9.2-0ubuntu1~20.04.1) 9.2 Copyright (C) 2020 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html > This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from gem5/build/GCN3_X86/gem5.debug... (gdb) quit PS: `quit` was automatically taken in. Is there anything wrong I'm doing here? On Fri, Sep 8, 2023 at 4:50 PM Poremba, Matthew <Matthew.Poremba@amd.com> wrote: > [Public] > > Hi Anoop, > > > > > > Based on that register count, I am going to guess you built the > application with -O0 or some other debugging flags? If you do this, the > compiler makes some super large number of registers. I assume that is so a > real GPU will not run any other applications simultaneously. > > > > Similarly, if you are seeing s_sendmsg I am going to guess there is a > printf() in your GPU kernel. These aren’t currently supported in gem5, but > something that would be very nice to have. > > > > If these are true you will need to remove any printfs and compile with at > least -O1 to run in gem5. > > > > > > -Matt > > > > *From:* Anoop Mysore <mysanoop@gmail.com> > *Sent:* Friday, September 8, 2023 7:33 AM > *To:* Matt Sinclair <mattdsinclair.wisc@gmail.com> > *Cc:* The gem5 Users mailing list <gem5-users@gem5.org>; Poremba, Matthew > <Matthew.Poremba@amd.com> > *Subject:* Re: [gem5-users] Re: Error in an application running on gem5 > GCN3 (with apu_se.py) > > > > *Caution:* This message originated from an External Source. Use proper > caution when opening attachments, clicking links, or responding. > > > > Hi Matt, > I'm facing a few other problems: > > 1. `panic: panic condition (numWfs * vregDemandPerWI) > (numVectorALUs * > numVecRegsPerSimd) occurred: WG with 1 WFs and 29285 VGPRs per WI can not > be allocated to CU that has 8192 VGPRs` > > The corresponding line of the code in gem5: > https://github.com/gem5/gem5/blob/f29bfc0640c88a79eb7f94454ce31b3237ec0066/src/gpu-compute/compute_unit.cc#L565 > > One of the variables (vregDemandPerWI) is ultimately derived from reading > the executable for the kernel code. Is it possible to reduce this VGRP > demand somehow, or is increasing the VGPRs (to what seems like an > unrealistically high value) be the only solution? Similar error for SGPRs > as well. > > 2. Some kernels (compiled for gfx801/3) have instructions such as > ds_add_u32 > <https://www.amd.com/system/files/TechDocs/gcn3-instruction-set-architecture.pdf> (Data > Store instruction page: 12-161), s_sendmsg (send message to host CPU) -- > which do not have their relevant decoding code available > <https://github.com/gem5/gem5/blob/48a40cf2f5182a82de360b7efa497d82e06b1631/src/arch/amdgpu/gcn3/insts/instructions.cc#L30929>. > Is this intentional or was this just punted for later -- anything to keep > in mind when coding for these? > > > > > > > > > > On Thu, Aug 17, 2023 at 5:13 PM Matt Sinclair < > mattdsinclair.wisc@gmail.com> wrote: > > Hi Anoop, > > > > I'm glad that increasing -n helped. It's hard to say what exactly the > problem is without digging in further, but often the ROCm stack will launch > additional processes to do a variety of things (e.g., check which version > of LLVM is being used). In gem5, each of these require a separate CPU > thread context -- which increasing -n handles in SE mode. So if I had to > guess, I would say that this is what is happening. > > > > If you added gdb locally to your docker, and you built the docker > properly, then I would expect gdb to work with gem5. > > > > Thanks, > > Matt > > > > On Wed, Aug 16, 2023 at 11:41 PM Anoop Mysore <mysanoop@gmail.com> wrote: > > Thank you, Matt, having 10 CPUs (up from previous 3) in the simulated > system seems to make it work! (At least, I don't see that error at that > point anymore). Is "resource temporarily unavailable" commonly due to CPU > count? Curious to know how you made that connection. > > > > Re gdb: I am indeed using a local docker build > (gem5/util/dockerfiles/gcn-gpu) with an added gdb installation -- is that > what you meant? > > > > Will send in a PR to the repo soon as I'm done :) > > On Wed, Aug 16, 2023, 5:03 PM Matt Sinclair <mattdsinclair.wisc@gmail.com> > wrote: > > Hi Anoop, > > > > A few things here: > > > > - Regarding the original failure (at least the !FS part), this is normally > happening either because of the GPU Target ISA (e.g., gfx900) you used in > your Makefile (e.g., it is not supported) or because you didn't properly > specify what GPU ISA you are using when running the program. So, what is > your command line for running this application and what ISA are you > specifying in your Makefile? > > - If the "what()" is the real source of the error, then I think this could > be related to the number of CPU thread contexts you are running with gem5. > What did you set "-n" to? > > - Regarding gdb, @Matt P: did you remove gdb from what is installed in the > Docker a while back? If so, I think Anoop would need to add it back and > create a local docker or something like that. > > - Setting aside the above, it would be wonderful if you contribute the > CHAI benchmarks to gem5-resources once you get them working! Please let us > know if we can do anything to help with that. > > > > Thanks, > > Matt > > > > On Wed, Aug 16, 2023 at 9:51 AM Anoop Mysore via gem5-users < > gem5-users@gem5.org> wrote: > > Curiously, running the gem5.debug executable with gdb within docker > results in: > > Reading symbols from gem5/build/GCN3_X86/gem5.debug... > (gdb) quit > (the quit wasn't a command I provided, it just quits automatically). Is > gdb working with gem5 GCN3 in Docker? > > > > I ran gem5.opt with ExecAll and SyscallAll debug flags, the debug tail and > the simerr logs are attached. > > I don't see anything peculiar other than a tgkill syscall with a SIGABRT > sent to a thread thereafter halting within a few instructions. > > > > On Tue, Aug 15, 2023 at 9:00 PM Anoop Mysore <mysanoop@gmail.com> wrote: > > I am trying to port CHAI benchmarks > <https://github.com/chai-benchmarks/chai>similarly to > gem5-resources/src/gpu/pannotia > <https://github.com/gem5/gem5-resources/tree/stable/src/gpu/pannotia>. I > was able to HIPify (through the perl script + some manual changes) all the > code files, and ran the BFS program. I see the following error message at > the point of launching the CPU threads here > <https://github.com/mysoreanoop/chai/blob/678c18fd551fbf12f4abbb05ab7164f1b588be68/HIP-U-gem5/BFS/main.cpp#L273> (fork > of HIPified CHAI). I do not see any of the prints from the CPU threads > which leads me to believe the error is to do with the threads not being > launched or a related error. > > > > (This looks related; incorporated the suggestion of linking against > -pthread: https://stackoverflow.com/a/6485728) > > > > The stderr log is below; any help is appreciated. > > _________ > > .... > > AM: Launching CPU > > terminate called after throwing an instance of 'std::system_error' > > what(): Resource temporarily unavailable > > build/GCN3_X86/sim/faults.cc:60: panic: panic condition !FullSystem > occurred: fault (General-Protection) detected @ PC > (0x7ffff6afa941=>0x7ffff6afa942).(0=>1) > Memory Usage: 19704072 KBytes > > Program aborted at tick 441590522500 > > --- BEGIN LIBC BACKTRACE --- > gem5/build/GCN3_X86/gem5.opt(+0x550200)[0x55a709b31200] > gem5/build/GCN3_X86/gem5.opt(+0x57d46e)[0x55a709b5e46e] > /lib/x86_64-linux-gnu/libpthread.so.0(+0x14420)[0x7f18881a0420] > /lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb)[0x7f188734800b] > /lib/x86_64-linux-gnu/libc.so.6(abort+0x12b)[0x7f1887327859] > gem5/build/GCN3_X86/gem5.opt(+0x4be295)[0x55a709a9f295] > gem5/build/GCN3_X86/gem5.opt(+0x5f6169)[0x55a709bd7169] > gem5/build/GCN3_X86/gem5.opt(+0x9fd9ed)[0x55a709fde9ed] > gem5/build/GCN3_X86/gem5.opt(+0x15b1d10)[0x55a70ab92d10] > gem5/build/GCN3_X86/gem5.opt(+0x15b2fd5)[0x55a70ab93fd5] > gem5/build/GCN3_X86/gem5.opt(+0x15b5620)[0x55a70ab96620] > gem5/build/GCN3_X86/gem5.opt(+0x15b6348)[0x55a70ab97348] > gem5/build/GCN3_X86/gem5.opt(+0x15c2954)[0x55a70aba3954] > gem5/build/GCN3_X86/gem5.opt(+0x56a082)[0x55a709b4b082] > gem5/build/GCN3_X86/gem5.opt(+0x59e2c4)[0x55a709b7f2c4] > gem5/build/GCN3_X86/gem5.opt(+0x59e8a3)[0x55a709b7f8a3] > gem5/build/GCN3_X86/gem5.opt(+0x4ed462)[0x55a709ace462] > gem5/build/GCN3_X86/gem5.opt(+0x4af427)[0x55a709a90427] > /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x2a8738)[0x7f1888459738] > > /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x8dd8)[0x7f188822ef48] > > /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0x8fb)[0x7f188837be3b] > > /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyFunction_Vectorcall+0x94)[0x7f1888459114] > /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x74d6d)[0x7f1888225d6d] > > /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x7d86)[0x7f188822def6] > > /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0x8fb)[0x7f188837be3b] > > /lib/x86_64-linux-gnu/libpython3.8.so.1.0(PyEval_EvalCodeEx+0x42)[0x7f188837c1c2] > > /lib/x86_64-linux-gnu/libpython3.8.so.1.0(PyEval_EvalCode+0x1f)[0x7f188837c5af] > /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x1cfbf1)[0x7f1888380bf1] > /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x25f537)[0x7f1888410537] > /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x74d6d)[0x7f1888225d6d] > > /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x12fd)[0x7f188822746d] > /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x8006b)[0x7f188823106b] > --- END LIBC BACKTRACE --- > Failed to execute default signal handler! > > _________ > > _______________________________________________ > gem5-users mailing list -- gem5-users@gem5.org > To unsubscribe send an email to gem5-users-leave@gem5.org > >
PM
Poremba, Matthew
Mon, Sep 11, 2023 5:56 PM

[Public]

Hi Anoop,

That instruction was recently added to gem5, but for Vega ISA only:  https://gem5-review.googlesource.com/c/public/gem5/+/67072 .  It could be ported to GCN3 probably by copying the code exactly into the corresponding GCN3 files.  You’ll notice however in that relation chain there are many more instructions implemented for Vega only, so there will be similar issues to this.  Alternately, I think there is a Vega APU working (gfx902?).  MattS would know more about the status of that.  I am not sure of your use case but if you can use a dGPU, Vega with gfx900 version or full system mode is another option to use Vega ISA.

For the docker automatically quitting, you will have to do docker run -it … to start an interactive session.

-Matt

From: Anoop Mysore mysanoop@gmail.com
Sent: Monday, September 11, 2023 10:33 AM
To: Poremba, Matthew Matthew.Poremba@amd.com
Cc: Matt Sinclair mattdsinclair.wisc@gmail.com; The gem5 Users mailing list gem5-users@gem5.org
Subject: Re: [gem5-users] Re: Error in an application running on gem5 GCN3 (with apu_se.py)

Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.

Thanks, Matt. Yes, the printfs in the GPU kernel code were the issue for s_sendmsg.
However, the ds_add_u32 instruction is still an issue. I am already compiling with -O1 like so:
/opt/rocm/hip/bin/hipcc --amdgpu-target=gfx801,gfx803
main.cpp kernel.cuhttp://kernel.cu kernel.cpp
-o ./bin/hsto.gem5
-I/home/anoop/new/gem5-resources/src/gpu/chai/HIP-U-gem5/HSTO/../.gem5/include
-lz -lm -lc -lpthread -O1
-L/home/anoop/new/gem5-resources/src/gpu/chai/HIP-U-gem5/HSTO/../.gem5/util/m5/build/x86/out -lm5

The exact error is:
src/gpu-compute/scoreboard_check_stage.cc:158: panic: next instruction: ds_add_u32 v7, v8 is of unknown type

The corresponding line in the simulatorhttps://github.com/gem5/gem5/blob/48a40cf2f5182a82de360b7efa497d82e06b1631/src/gpu-compute/scoreboard_check_stage.cc#L158, and decoder section of ithttps://github.com/gem5/gem5/blob/48a40cf2f5182a82de360b7efa497d82e06b1631/src/arch/amdgpu/gcn3/insts/instructions.cc#L30929. Because of the involvement of the LDS/GDS, I'm unsure how to implement this -- any help would be appreciated.

Also, GDB still doesn't seem to be working with my gem5. And without prints in the kernel, it's cumbersome to get any useful insight on failing programs.
I added within the Dockerfile: RUN apt install -y gdb
I am invoking gdb with:
docker run -u $UID:$GID --volume $(pwd):$(pwd) -w $(pwd) gem5:new gdb --args gem5/build/GCN3_X86/gem5.debug gem5/configs/example/apu_se.py --cpu-type=DerivO3CPU --num-cpus=4 --mem-size=1GB --ruby --mem-type=SimpleMemory -c gem5-resources/src/gpu/chai/HIP-U-gem5/HSTO/bin/hsto.gem5

Log:
GNU gdb (Ubuntu 9.2-0ubuntu1~20.04.1) 9.2
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
http://www.gnu.org/software/gdb/bugs/.
Find the GDB manual and other documentation resources online at:
http://www.gnu.org/software/gdb/documentation/.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from gem5/build/GCN3_X86/gem5.debug...
(gdb) quit

PS: quit was automatically taken in.
Is there anything wrong I'm doing here?

On Fri, Sep 8, 2023 at 4:50 PM Poremba, Matthew <Matthew.Poremba@amd.commailto:Matthew.Poremba@amd.com> wrote:

[Public]

Hi Anoop,

Based on that register count, I am going to guess you built the application with -O0 or some other debugging flags?  If you do this, the compiler makes some super large number of registers. I assume that is so a real GPU will not run any other applications simultaneously.

Similarly, if you are seeing s_sendmsg I am going to guess there is a printf() in your GPU kernel.  These aren’t currently supported in gem5, but something that would be very nice to have.

If these are true you will need to remove any printfs and compile with at least -O1 to run in gem5.

-Matt

From: Anoop Mysore <mysanoop@gmail.commailto:mysanoop@gmail.com>
Sent: Friday, September 8, 2023 7:33 AM
To: Matt Sinclair <mattdsinclair.wisc@gmail.commailto:mattdsinclair.wisc@gmail.com>
Cc: The gem5 Users mailing list <gem5-users@gem5.orgmailto:gem5-users@gem5.org>; Poremba, Matthew <Matthew.Poremba@amd.commailto:Matthew.Poremba@amd.com>
Subject: Re: [gem5-users] Re: Error in an application running on gem5 GCN3 (with apu_se.py)

Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.

Hi Matt,
I'm facing a few other problems:

  1. panic: panic condition (numWfs * vregDemandPerWI) > (numVectorALUs * numVecRegsPerSimd) occurred: WG with 1 WFs and 29285 VGPRs per WI can not be allocated to CU that has 8192 VGPRs
    The corresponding line of the code in gem5: https://github.com/gem5/gem5/blob/f29bfc0640c88a79eb7f94454ce31b3237ec0066/src/gpu-compute/compute_unit.cc#L565
    One of the variables (vregDemandPerWI) is ultimately derived from reading the executable for the kernel code. Is it possible to reduce this VGRP demand somehow, or is increasing the VGPRs (to what seems like an unrealistically high value) be the only solution? Similar error for SGPRs as well.
  2. Some kernels (compiled for gfx801/3) have instructions such as ds_add_u32https://www.amd.com/system/files/TechDocs/gcn3-instruction-set-architecture.pdf (Data Store instruction page: 12-161), s_sendmsg (send message to host CPU) -- which do not have their relevant decoding code availablehttps://github.com/gem5/gem5/blob/48a40cf2f5182a82de360b7efa497d82e06b1631/src/arch/amdgpu/gcn3/insts/instructions.cc#L30929. Is this intentional or was this just punted for later -- anything to keep in mind when coding for these?

On Thu, Aug 17, 2023 at 5:13 PM Matt Sinclair <mattdsinclair.wisc@gmail.commailto:mattdsinclair.wisc@gmail.com> wrote:
Hi Anoop,

I'm glad that increasing -n helped.  It's hard to say what exactly the problem is without digging in further, but often the ROCm stack will launch additional processes to do a variety of things (e.g., check which version of LLVM is being used).  In gem5, each of these require a separate CPU thread context -- which increasing -n handles in SE mode.  So if I had to guess, I would say that this is what is happening.

If you added gdb locally to your docker, and you built the docker properly, then I would expect gdb to work with gem5.

Thanks,
Matt

On Wed, Aug 16, 2023 at 11:41 PM Anoop Mysore <mysanoop@gmail.commailto:mysanoop@gmail.com> wrote:
Thank you, Matt, having 10 CPUs (up from previous 3) in the simulated system seems to make it work! (At least, I don't see that error at that point anymore). Is "resource temporarily unavailable" commonly due to CPU count? Curious to know how you made that connection.

Re gdb: I am indeed using a local docker build (gem5/util/dockerfiles/gcn-gpu) with an added gdb installation -- is that what you meant?

Will send in a PR to the repo soon as I'm done :)
On Wed, Aug 16, 2023, 5:03 PM Matt Sinclair <mattdsinclair.wisc@gmail.commailto:mattdsinclair.wisc@gmail.com> wrote:
Hi Anoop,

A few things here:

  • Regarding the original failure (at least the !FS part), this is normally happening either because of the GPU Target ISA (e.g., gfx900) you used in your Makefile (e.g., it is not supported) or because you didn't properly specify what GPU ISA you are using when running the program.  So, what is your command line for running this application and what ISA are you specifying in your Makefile?
  • If the "what()" is the real source of the error, then I think this could be related to the number of CPU thread contexts you are running with gem5.  What did you set "-n" to?
  • Regarding gdb, @Matt P: did you remove gdb from what is installed in the Docker a while back?  If so, I think Anoop would need to add it back and create a local docker or something like that.
  • Setting aside the above, it would be wonderful if you contribute the CHAI benchmarks to gem5-resources once you get them working!  Please let us know if we can do anything to help with that.

Thanks,
Matt

On Wed, Aug 16, 2023 at 9:51 AM Anoop Mysore via gem5-users <gem5-users@gem5.orgmailto:gem5-users@gem5.org> wrote:
Curiously, running the gem5.debug executable with gdb within docker results in:
Reading symbols from gem5/build/GCN3_X86/gem5.debug...
(gdb) quit
(the quit wasn't a command I provided, it just quits automatically). Is gdb working with gem5 GCN3 in Docker?

I ran gem5.opt with ExecAll and SyscallAll debug flags, the debug tail and the simerr logs are attached.
I don't see anything peculiar other than a tgkill syscall with a SIGABRT sent to a thread thereafter halting within a few instructions.

On Tue, Aug 15, 2023 at 9:00 PM Anoop Mysore <mysanoop@gmail.commailto:mysanoop@gmail.com> wrote:
I am trying to port CHAI benchmarks https://github.com/chai-benchmarks/chai similarly to gem5-resources/src/gpu/pannotiahttps://github.com/gem5/gem5-resources/tree/stable/src/gpu/pannotia. I was able to HIPify (through the perl script + some manual changes) all the code files, and ran the BFS program. I see the following error message at the point of launching the CPU threads herehttps://github.com/mysoreanoop/chai/blob/678c18fd551fbf12f4abbb05ab7164f1b588be68/HIP-U-gem5/BFS/main.cpp#L273 (fork of HIPified CHAI). I do not see any of the prints from the CPU threads which leads me to believe the error is to do with the threads not being launched or a related error.

(This looks related; incorporated the suggestion of linking against -pthread: https://stackoverflow.com/a/6485728)

The stderr log is below; any help is appreciated.


....
AM: Launching CPU
terminate called after throwing an instance of 'std::system_error'
what():  Resource temporarily unavailable
build/GCN3_X86/sim/faults.cc:60: panic: panic condition !FullSystem occurred: fault (General-Protection) detected @ PC (0x7ffff6afa941=>0x7ffff6afa942).(0=>1)
Memory Usage: 19704072 KBytes
Program aborted at tick 441590522500
--- BEGIN LIBC BACKTRACE ---
gem5/build/GCN3_X86/gem5.opt(+0x550200)[0x55a709b31200]
gem5/build/GCN3_X86/gem5.opt(+0x57d46e)[0x55a709b5e46e]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x14420)[0x7f18881a0420]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb)[0x7f188734800b]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x12b)[0x7f1887327859]
gem5/build/GCN3_X86/gem5.opt(+0x4be295)[0x55a709a9f295]
gem5/build/GCN3_X86/gem5.opt(+0x5f6169)[0x55a709bd7169]
gem5/build/GCN3_X86/gem5.opt(+0x9fd9ed)[0x55a709fde9ed]
gem5/build/GCN3_X86/gem5.opt(+0x15b1d10)[0x55a70ab92d10]
gem5/build/GCN3_X86/gem5.opt(+0x15b2fd5)[0x55a70ab93fd5]
gem5/build/GCN3_X86/gem5.opt(+0x15b5620)[0x55a70ab96620]
gem5/build/GCN3_X86/gem5.opt(+0x15b6348)[0x55a70ab97348]
gem5/build/GCN3_X86/gem5.opt(+0x15c2954)[0x55a70aba3954]
gem5/build/GCN3_X86/gem5.opt(+0x56a082)[0x55a709b4b082]
gem5/build/GCN3_X86/gem5.opt(+0x59e2c4)[0x55a709b7f2c4]
gem5/build/GCN3_X86/gem5.opt(+0x59e8a3)[0x55a709b7f8a3]
gem5/build/GCN3_X86/gem5.opt(+0x4ed462)[0x55a709ace462]
gem5/build/GCN3_X86/gem5.opt(+0x4af427)[0x55a709a90427]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x2a8738)[0x7f1888459738]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x8dd8)[0x7f188822ef48]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0x8fb)[0x7f188837be3b]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyFunction_Vectorcall+0x94)[0x7f1888459114]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x74d6d)[0x7f1888225d6d]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x7d86)[0x7f188822def6]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0x8fb)[0x7f188837be3b]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(PyEval_EvalCodeEx+0x42)[0x7f188837c1c2]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(PyEval_EvalCode+0x1f)[0x7f188837c5af]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x1cfbf1)[0x7f1888380bf1]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x25f537)[0x7f1888410537]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x74d6d)[0x7f1888225d6d]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x12fd)[0x7f188822746d]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x8006b)[0x7f188823106b]
--- END LIBC BACKTRACE ---
Failed to execute default signal handler!



gem5-users mailing list -- gem5-users@gem5.orgmailto:gem5-users@gem5.org
To unsubscribe send an email to gem5-users-leave@gem5.orgmailto:gem5-users-leave@gem5.org

[Public] Hi Anoop, That instruction was recently added to gem5, but for Vega ISA only: https://gem5-review.googlesource.com/c/public/gem5/+/67072 . It could be ported to GCN3 probably by copying the code exactly into the corresponding GCN3 files. You’ll notice however in that relation chain there are many more instructions implemented for Vega only, so there will be similar issues to this. Alternately, I think there is a Vega APU working (gfx902?). MattS would know more about the status of that. I am not sure of your use case but if you can use a dGPU, Vega with gfx900 version or full system mode is another option to use Vega ISA. For the docker automatically quitting, you will have to do `docker run -it …` to start an interactive session. -Matt From: Anoop Mysore <mysanoop@gmail.com> Sent: Monday, September 11, 2023 10:33 AM To: Poremba, Matthew <Matthew.Poremba@amd.com> Cc: Matt Sinclair <mattdsinclair.wisc@gmail.com>; The gem5 Users mailing list <gem5-users@gem5.org> Subject: Re: [gem5-users] Re: Error in an application running on gem5 GCN3 (with apu_se.py) Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding. Thanks, Matt. Yes, the printfs in the GPU kernel code were the issue for s_sendmsg. However, the ds_add_u32 instruction is still an issue. I am already compiling with -O1 like so: /opt/rocm/hip/bin/hipcc --amdgpu-target=gfx801,gfx803 main.cpp kernel.cu<http://kernel.cu> kernel.cpp -o ./bin/hsto.gem5 -I/home/anoop/new/gem5-resources/src/gpu/chai/HIP-U-gem5/HSTO/../.gem5/include -lz -lm -lc -lpthread -O1 -L/home/anoop/new/gem5-resources/src/gpu/chai/HIP-U-gem5/HSTO/../.gem5/util/m5/build/x86/out -lm5 The exact error is: src/gpu-compute/scoreboard_check_stage.cc:158: panic: next instruction: ds_add_u32 v7, v8 is of unknown type The corresponding line in the simulator<https://github.com/gem5/gem5/blob/48a40cf2f5182a82de360b7efa497d82e06b1631/src/gpu-compute/scoreboard_check_stage.cc#L158>, and decoder section of it<https://github.com/gem5/gem5/blob/48a40cf2f5182a82de360b7efa497d82e06b1631/src/arch/amdgpu/gcn3/insts/instructions.cc#L30929>. Because of the involvement of the LDS/GDS, I'm unsure how to implement this -- any help would be appreciated. Also, GDB still doesn't seem to be working with my gem5. And without prints in the kernel, it's cumbersome to get any useful insight on failing programs. I added within the Dockerfile: RUN apt install -y gdb I am invoking gdb with: docker run -u $UID:$GID --volume $(pwd):$(pwd) -w $(pwd) gem5:new gdb --args gem5/build/GCN3_X86/gem5.debug gem5/configs/example/apu_se.py --cpu-type=DerivO3CPU --num-cpus=4 --mem-size=1GB --ruby --mem-type=SimpleMemory -c gem5-resources/src/gpu/chai/HIP-U-gem5/HSTO/bin/hsto.gem5 Log: GNU gdb (Ubuntu 9.2-0ubuntu1~20.04.1) 9.2 Copyright (C) 2020 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from gem5/build/GCN3_X86/gem5.debug... (gdb) quit PS: `quit` was automatically taken in. Is there anything wrong I'm doing here? On Fri, Sep 8, 2023 at 4:50 PM Poremba, Matthew <Matthew.Poremba@amd.com<mailto:Matthew.Poremba@amd.com>> wrote: [Public] Hi Anoop, Based on that register count, I am going to guess you built the application with -O0 or some other debugging flags? If you do this, the compiler makes some super large number of registers. I assume that is so a real GPU will not run any other applications simultaneously. Similarly, if you are seeing s_sendmsg I am going to guess there is a printf() in your GPU kernel. These aren’t currently supported in gem5, but something that would be very nice to have. If these are true you will need to remove any printfs and compile with at least -O1 to run in gem5. -Matt From: Anoop Mysore <mysanoop@gmail.com<mailto:mysanoop@gmail.com>> Sent: Friday, September 8, 2023 7:33 AM To: Matt Sinclair <mattdsinclair.wisc@gmail.com<mailto:mattdsinclair.wisc@gmail.com>> Cc: The gem5 Users mailing list <gem5-users@gem5.org<mailto:gem5-users@gem5.org>>; Poremba, Matthew <Matthew.Poremba@amd.com<mailto:Matthew.Poremba@amd.com>> Subject: Re: [gem5-users] Re: Error in an application running on gem5 GCN3 (with apu_se.py) Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding. Hi Matt, I'm facing a few other problems: 1. `panic: panic condition (numWfs * vregDemandPerWI) > (numVectorALUs * numVecRegsPerSimd) occurred: WG with 1 WFs and 29285 VGPRs per WI can not be allocated to CU that has 8192 VGPRs` The corresponding line of the code in gem5: https://github.com/gem5/gem5/blob/f29bfc0640c88a79eb7f94454ce31b3237ec0066/src/gpu-compute/compute_unit.cc#L565 One of the variables (vregDemandPerWI) is ultimately derived from reading the executable for the kernel code. Is it possible to reduce this VGRP demand somehow, or is increasing the VGPRs (to what seems like an unrealistically high value) be the only solution? Similar error for SGPRs as well. 2. Some kernels (compiled for gfx801/3) have instructions such as ds_add_u32<https://www.amd.com/system/files/TechDocs/gcn3-instruction-set-architecture.pdf> (Data Store instruction page: 12-161), s_sendmsg (send message to host CPU) -- which do not have their relevant decoding code available<https://github.com/gem5/gem5/blob/48a40cf2f5182a82de360b7efa497d82e06b1631/src/arch/amdgpu/gcn3/insts/instructions.cc#L30929>. Is this intentional or was this just punted for later -- anything to keep in mind when coding for these? On Thu, Aug 17, 2023 at 5:13 PM Matt Sinclair <mattdsinclair.wisc@gmail.com<mailto:mattdsinclair.wisc@gmail.com>> wrote: Hi Anoop, I'm glad that increasing -n helped. It's hard to say what exactly the problem is without digging in further, but often the ROCm stack will launch additional processes to do a variety of things (e.g., check which version of LLVM is being used). In gem5, each of these require a separate CPU thread context -- which increasing -n handles in SE mode. So if I had to guess, I would say that this is what is happening. If you added gdb locally to your docker, and you built the docker properly, then I would expect gdb to work with gem5. Thanks, Matt On Wed, Aug 16, 2023 at 11:41 PM Anoop Mysore <mysanoop@gmail.com<mailto:mysanoop@gmail.com>> wrote: Thank you, Matt, having 10 CPUs (up from previous 3) in the simulated system seems to make it work! (At least, I don't see that error at that point anymore). Is "resource temporarily unavailable" commonly due to CPU count? Curious to know how you made that connection. Re gdb: I am indeed using a local docker build (gem5/util/dockerfiles/gcn-gpu) with an added gdb installation -- is that what you meant? Will send in a PR to the repo soon as I'm done :) On Wed, Aug 16, 2023, 5:03 PM Matt Sinclair <mattdsinclair.wisc@gmail.com<mailto:mattdsinclair.wisc@gmail.com>> wrote: Hi Anoop, A few things here: - Regarding the original failure (at least the !FS part), this is normally happening either because of the GPU Target ISA (e.g., gfx900) you used in your Makefile (e.g., it is not supported) or because you didn't properly specify what GPU ISA you are using when running the program. So, what is your command line for running this application and what ISA are you specifying in your Makefile? - If the "what()" is the real source of the error, then I think this could be related to the number of CPU thread contexts you are running with gem5. What did you set "-n" to? - Regarding gdb, @Matt P: did you remove gdb from what is installed in the Docker a while back? If so, I think Anoop would need to add it back and create a local docker or something like that. - Setting aside the above, it would be wonderful if you contribute the CHAI benchmarks to gem5-resources once you get them working! Please let us know if we can do anything to help with that. Thanks, Matt On Wed, Aug 16, 2023 at 9:51 AM Anoop Mysore via gem5-users <gem5-users@gem5.org<mailto:gem5-users@gem5.org>> wrote: Curiously, running the gem5.debug executable with gdb within docker results in: Reading symbols from gem5/build/GCN3_X86/gem5.debug... (gdb) quit (the quit wasn't a command I provided, it just quits automatically). Is gdb working with gem5 GCN3 in Docker? I ran gem5.opt with ExecAll and SyscallAll debug flags, the debug tail and the simerr logs are attached. I don't see anything peculiar other than a tgkill syscall with a SIGABRT sent to a thread thereafter halting within a few instructions. On Tue, Aug 15, 2023 at 9:00 PM Anoop Mysore <mysanoop@gmail.com<mailto:mysanoop@gmail.com>> wrote: I am trying to port CHAI benchmarks <https://github.com/chai-benchmarks/chai> similarly to gem5-resources/src/gpu/pannotia<https://github.com/gem5/gem5-resources/tree/stable/src/gpu/pannotia>. I was able to HIPify (through the perl script + some manual changes) all the code files, and ran the BFS program. I see the following error message at the point of launching the CPU threads here<https://github.com/mysoreanoop/chai/blob/678c18fd551fbf12f4abbb05ab7164f1b588be68/HIP-U-gem5/BFS/main.cpp#L273> (fork of HIPified CHAI). I do not see any of the prints from the CPU threads which leads me to believe the error is to do with the threads not being launched or a related error. (This looks related; incorporated the suggestion of linking against -pthread: https://stackoverflow.com/a/6485728) The stderr log is below; any help is appreciated. _________ .... AM: Launching CPU terminate called after throwing an instance of 'std::system_error' what(): Resource temporarily unavailable build/GCN3_X86/sim/faults.cc:60: panic: panic condition !FullSystem occurred: fault (General-Protection) detected @ PC (0x7ffff6afa941=>0x7ffff6afa942).(0=>1) Memory Usage: 19704072 KBytes Program aborted at tick 441590522500 --- BEGIN LIBC BACKTRACE --- gem5/build/GCN3_X86/gem5.opt(+0x550200)[0x55a709b31200] gem5/build/GCN3_X86/gem5.opt(+0x57d46e)[0x55a709b5e46e] /lib/x86_64-linux-gnu/libpthread.so.0(+0x14420)[0x7f18881a0420] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb)[0x7f188734800b] /lib/x86_64-linux-gnu/libc.so.6(abort+0x12b)[0x7f1887327859] gem5/build/GCN3_X86/gem5.opt(+0x4be295)[0x55a709a9f295] gem5/build/GCN3_X86/gem5.opt(+0x5f6169)[0x55a709bd7169] gem5/build/GCN3_X86/gem5.opt(+0x9fd9ed)[0x55a709fde9ed] gem5/build/GCN3_X86/gem5.opt(+0x15b1d10)[0x55a70ab92d10] gem5/build/GCN3_X86/gem5.opt(+0x15b2fd5)[0x55a70ab93fd5] gem5/build/GCN3_X86/gem5.opt(+0x15b5620)[0x55a70ab96620] gem5/build/GCN3_X86/gem5.opt(+0x15b6348)[0x55a70ab97348] gem5/build/GCN3_X86/gem5.opt(+0x15c2954)[0x55a70aba3954] gem5/build/GCN3_X86/gem5.opt(+0x56a082)[0x55a709b4b082] gem5/build/GCN3_X86/gem5.opt(+0x59e2c4)[0x55a709b7f2c4] gem5/build/GCN3_X86/gem5.opt(+0x59e8a3)[0x55a709b7f8a3] gem5/build/GCN3_X86/gem5.opt(+0x4ed462)[0x55a709ace462] gem5/build/GCN3_X86/gem5.opt(+0x4af427)[0x55a709a90427] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x2a8738)[0x7f1888459738] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x8dd8)[0x7f188822ef48] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0x8fb)[0x7f188837be3b] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyFunction_Vectorcall+0x94)[0x7f1888459114] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x74d6d)[0x7f1888225d6d] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x7d86)[0x7f188822def6] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0x8fb)[0x7f188837be3b] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(PyEval_EvalCodeEx+0x42)[0x7f188837c1c2] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(PyEval_EvalCode+0x1f)[0x7f188837c5af] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x1cfbf1)[0x7f1888380bf1] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x25f537)[0x7f1888410537] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x74d6d)[0x7f1888225d6d] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x12fd)[0x7f188822746d] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x8006b)[0x7f188823106b] --- END LIBC BACKTRACE --- Failed to execute default signal handler! _________ _______________________________________________ gem5-users mailing list -- gem5-users@gem5.org<mailto:gem5-users@gem5.org> To unsubscribe send an email to gem5-users-leave@gem5.org<mailto:gem5-users-leave@gem5.org>
MS
Matt Sinclair
Mon, Sep 11, 2023 7:15 PM

Yeah, I haven't tried CHAI but I believe gfx902 would work with it (if you
need APUs).

Matt S.

On Mon, Sep 11, 2023 at 12:56 PM Poremba, Matthew Matthew.Poremba@amd.com
wrote:

[Public]

Hi Anoop,

That instruction was recently added to gem5, but for Vega ISA only:
https://gem5-review.googlesource.com/c/public/gem5/+/67072 .  It could be
ported to GCN3 probably by copying the code exactly into the corresponding
GCN3 files.  You’ll notice however in that relation chain there are many
more instructions implemented for Vega only, so there will be similar
issues to this.  Alternately, I think there is a Vega APU working
(gfx902?).  MattS would know more about the status of that.  I am not sure
of your use case but if you can use a dGPU, Vega with gfx900 version or
full system mode is another option to use Vega ISA.

For the docker automatically quitting, you will have to do docker run *-it* … to start an interactive session.

-Matt

From: Anoop Mysore mysanoop@gmail.com
Sent: Monday, September 11, 2023 10:33 AM
To: Poremba, Matthew Matthew.Poremba@amd.com
Cc: Matt Sinclair mattdsinclair.wisc@gmail.com; The gem5 Users
mailing list gem5-users@gem5.org
Subject: Re: [gem5-users] Re: Error in an application running on gem5
GCN3 (with apu_se.py)

Caution: This message originated from an External Source. Use proper
caution when opening attachments, clicking links, or responding.

Thanks, Matt. Yes, the printfs in the GPU kernel code were the issue for
s_sendmsg.

However, the ds_add_u32 instruction is still an issue. I am already
compiling with -O1 like so:

/opt/rocm/hip/bin/hipcc --amdgpu-target=gfx801,gfx803

 main.cpp kernel.cu kernel.cpp

 -o ./bin/hsto.gem5

-I/home/anoop/new/gem5-resources/src/gpu/chai/HIP-U-gem5/HSTO/../.gem5/include

 -lz -lm -lc -lpthread -O1

-L/home/anoop/new/gem5-resources/src/gpu/chai/HIP-U-gem5/HSTO/../.gem5/util/m5/build/x86/out
-lm5

The exact error is:
src/gpu-compute/scoreboard_check_stage.cc:158: panic: next instruction:
ds_add_u32 v7, v8 is of unknown type

The corresponding line in the simulator
https://github.com/gem5/gem5/blob/48a40cf2f5182a82de360b7efa497d82e06b1631/src/gpu-compute/scoreboard_check_stage.cc#L158,
and decoder section of it
https://github.com/gem5/gem5/blob/48a40cf2f5182a82de360b7efa497d82e06b1631/src/arch/amdgpu/gcn3/insts/instructions.cc#L30929.
Because of the involvement of the LDS/GDS, I'm unsure how to implement this
-- any help would be appreciated.

Also, GDB still doesn't seem to be working with my gem5. And without
prints in the kernel, it's cumbersome to get any useful insight on failing
programs.

I added within the Dockerfile: RUN apt install -y gdb

I am invoking gdb with:

docker run -u $UID:$GID --volume $(pwd):$(pwd) -w $(pwd) gem5:new gdb
--args gem5/build/GCN3_X86/gem5.debug gem5/configs/example/apu_se.py
--cpu-type=DerivO3CPU --num-cpus=4 --mem-size=1GB --ruby
--mem-type=SimpleMemory -c
gem5-resources/src/gpu/chai/HIP-U-gem5/HSTO/bin/hsto.gem5

Log:

GNU gdb (Ubuntu 9.2-0ubuntu1~20.04.1) 9.2
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <
http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
http://www.gnu.org/software/gdb/bugs/.
Find the GDB manual and other documentation resources online at:
http://www.gnu.org/software/gdb/documentation/.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from gem5/build/GCN3_X86/gem5.debug...
(gdb) quit

PS: quit was automatically taken in.

Is there anything wrong I'm doing here?

On Fri, Sep 8, 2023 at 4:50 PM Poremba, Matthew Matthew.Poremba@amd.com
wrote:

[Public]

Hi Anoop,

Based on that register count, I am going to guess you built the
application with -O0 or some other debugging flags?  If you do this, the
compiler makes some super large number of registers. I assume that is so a
real GPU will not run any other applications simultaneously.

Similarly, if you are seeing s_sendmsg I am going to guess there is a
printf() in your GPU kernel.  These aren’t currently supported in gem5, but
something that would be very nice to have.

If these are true you will need to remove any printfs and compile with at
least -O1 to run in gem5.

-Matt

From: Anoop Mysore mysanoop@gmail.com
Sent: Friday, September 8, 2023 7:33 AM
To: Matt Sinclair mattdsinclair.wisc@gmail.com
Cc: The gem5 Users mailing list gem5-users@gem5.org; Poremba, Matthew
Matthew.Poremba@amd.com
Subject: Re: [gem5-users] Re: Error in an application running on gem5
GCN3 (with apu_se.py)

Caution: This message originated from an External Source. Use proper
caution when opening attachments, clicking links, or responding.

Hi Matt,
I'm facing a few other problems:

  1. panic: panic condition (numWfs * vregDemandPerWI) > (numVectorALUs * numVecRegsPerSimd) occurred: WG with 1 WFs and 29285 VGPRs per WI can not be allocated to CU that has 8192 VGPRs

The corresponding line of the code in gem5:
https://github.com/gem5/gem5/blob/f29bfc0640c88a79eb7f94454ce31b3237ec0066/src/gpu-compute/compute_unit.cc#L565

One of the variables (vregDemandPerWI) is ultimately derived from reading
the executable for the kernel code. Is it possible to reduce this VGRP
demand somehow, or is increasing the VGPRs (to what seems like an
unrealistically high value) be the only solution? Similar error for SGPRs
as well.

  1. Some kernels (compiled for gfx801/3) have instructions such as
    ds_add_u32
    https://www.amd.com/system/files/TechDocs/gcn3-instruction-set-architecture.pdf (Data
    Store instruction page: 12-161), s_sendmsg (send message to host CPU) --
    which do not have their relevant decoding code available
    https://github.com/gem5/gem5/blob/48a40cf2f5182a82de360b7efa497d82e06b1631/src/arch/amdgpu/gcn3/insts/instructions.cc#L30929.
    Is this intentional or was this just punted for later -- anything to keep
    in mind when coding for these?

On Thu, Aug 17, 2023 at 5:13 PM Matt Sinclair <
mattdsinclair.wisc@gmail.com> wrote:

Hi Anoop,

I'm glad that increasing -n helped.  It's hard to say what exactly the
problem is without digging in further, but often the ROCm stack will launch
additional processes to do a variety of things (e.g., check which version
of LLVM is being used).  In gem5, each of these require a separate CPU
thread context -- which increasing -n handles in SE mode.  So if I had to
guess, I would say that this is what is happening.

If you added gdb locally to your docker, and you built the docker
properly, then I would expect gdb to work with gem5.

Thanks,

Matt

On Wed, Aug 16, 2023 at 11:41 PM Anoop Mysore mysanoop@gmail.com wrote:

Thank you, Matt, having 10 CPUs (up from previous 3) in the simulated
system seems to make it work! (At least, I don't see that error at that
point anymore). Is "resource temporarily unavailable" commonly due to CPU
count? Curious to know how you made that connection.

Re gdb: I am indeed using a local docker build
(gem5/util/dockerfiles/gcn-gpu) with an added gdb installation -- is that
what you meant?

Will send in a PR to the repo soon as I'm done :)

On Wed, Aug 16, 2023, 5:03 PM Matt Sinclair mattdsinclair.wisc@gmail.com
wrote:

Hi Anoop,

A few things here:

  • Regarding the original failure (at least the !FS part), this is normally
    happening either because of the GPU Target ISA (e.g., gfx900) you used in
    your Makefile (e.g., it is not supported) or because you didn't properly
    specify what GPU ISA you are using when running the program.  So, what is
    your command line for running this application and what ISA are you
    specifying in your Makefile?

  • If the "what()" is the real source of the error, then I think this could
    be related to the number of CPU thread contexts you are running with gem5.
    What did you set "-n" to?

  • Regarding gdb, @Matt P: did you remove gdb from what is installed in the
    Docker a while back?  If so, I think Anoop would need to add it back and
    create a local docker or something like that.

  • Setting aside the above, it would be wonderful if you contribute the
    CHAI benchmarks to gem5-resources once you get them working!  Please let us
    know if we can do anything to help with that.

Thanks,

Matt

On Wed, Aug 16, 2023 at 9:51 AM Anoop Mysore via gem5-users <
gem5-users@gem5.org> wrote:

Curiously, running the gem5.debug executable with gdb within docker
results in:

Reading symbols from gem5/build/GCN3_X86/gem5.debug...
(gdb) quit
(the quit wasn't a command I provided, it just quits automatically). Is
gdb working with gem5 GCN3 in Docker?

I ran gem5.opt with ExecAll and SyscallAll debug flags, the debug tail and
the simerr logs are attached.

I don't see anything peculiar other than a tgkill syscall with a SIGABRT
sent to a thread thereafter halting within a few instructions.

On Tue, Aug 15, 2023 at 9:00 PM Anoop Mysore mysanoop@gmail.com wrote:

I am trying to port CHAI benchmarks
https://github.com/chai-benchmarks/chaisimilarly to
gem5-resources/src/gpu/pannotia
https://github.com/gem5/gem5-resources/tree/stable/src/gpu/pannotia. I
was able to HIPify (through the perl script + some manual changes) all the
code files, and ran the BFS program. I see the following error message at
the point of launching the CPU threads here
https://github.com/mysoreanoop/chai/blob/678c18fd551fbf12f4abbb05ab7164f1b588be68/HIP-U-gem5/BFS/main.cpp#L273 (fork
of HIPified CHAI). I do not see any of the prints from the CPU threads
which leads me to believe the error is to do with the threads not being
launched or a related error.

(This looks related; incorporated the suggestion of linking against
-pthread: https://stackoverflow.com/a/6485728)

The stderr log is below; any help is appreciated.


....

AM: Launching CPU

terminate called after throwing an instance of 'std::system_error'

what():  Resource temporarily unavailable

build/GCN3_X86/sim/faults.cc:60: panic: panic condition !FullSystem
occurred: fault (General-Protection) detected @ PC
(0x7ffff6afa941=>0x7ffff6afa942).(0=>1)
Memory Usage: 19704072 KBytes

Program aborted at tick 441590522500

--- BEGIN LIBC BACKTRACE ---
gem5/build/GCN3_X86/gem5.opt(+0x550200)[0x55a709b31200]
gem5/build/GCN3_X86/gem5.opt(+0x57d46e)[0x55a709b5e46e]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x14420)[0x7f18881a0420]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb)[0x7f188734800b]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x12b)[0x7f1887327859]
gem5/build/GCN3_X86/gem5.opt(+0x4be295)[0x55a709a9f295]
gem5/build/GCN3_X86/gem5.opt(+0x5f6169)[0x55a709bd7169]
gem5/build/GCN3_X86/gem5.opt(+0x9fd9ed)[0x55a709fde9ed]
gem5/build/GCN3_X86/gem5.opt(+0x15b1d10)[0x55a70ab92d10]
gem5/build/GCN3_X86/gem5.opt(+0x15b2fd5)[0x55a70ab93fd5]
gem5/build/GCN3_X86/gem5.opt(+0x15b5620)[0x55a70ab96620]
gem5/build/GCN3_X86/gem5.opt(+0x15b6348)[0x55a70ab97348]
gem5/build/GCN3_X86/gem5.opt(+0x15c2954)[0x55a70aba3954]
gem5/build/GCN3_X86/gem5.opt(+0x56a082)[0x55a709b4b082]
gem5/build/GCN3_X86/gem5.opt(+0x59e2c4)[0x55a709b7f2c4]
gem5/build/GCN3_X86/gem5.opt(+0x59e8a3)[0x55a709b7f8a3]
gem5/build/GCN3_X86/gem5.opt(+0x4ed462)[0x55a709ace462]
gem5/build/GCN3_X86/gem5.opt(+0x4af427)[0x55a709a90427]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x2a8738)[0x7f1888459738]

/lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x8dd8)[0x7f188822ef48]

/lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0x8fb)[0x7f188837be3b]

/lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyFunction_Vectorcall+0x94)[0x7f1888459114]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x74d6d)[0x7f1888225d6d]

/lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x7d86)[0x7f188822def6]

/lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0x8fb)[0x7f188837be3b]

/lib/x86_64-linux-gnu/libpython3.8.so.1.0(PyEval_EvalCodeEx+0x42)[0x7f188837c1c2]

/lib/x86_64-linux-gnu/libpython3.8.so.1.0(PyEval_EvalCode+0x1f)[0x7f188837c5af]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x1cfbf1)[0x7f1888380bf1]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x25f537)[0x7f1888410537]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x74d6d)[0x7f1888225d6d]

/lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x12fd)[0x7f188822746d]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x8006b)[0x7f188823106b]
--- END LIBC BACKTRACE ---
Failed to execute default signal handler!



gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-leave@gem5.org

Yeah, I haven't tried CHAI but I believe gfx902 would work with it (if you need APUs). Matt S. On Mon, Sep 11, 2023 at 12:56 PM Poremba, Matthew <Matthew.Poremba@amd.com> wrote: > [Public] > > Hi Anoop, > > > > > > That instruction was recently added to gem5, but for Vega ISA only: > https://gem5-review.googlesource.com/c/public/gem5/+/67072 . It could be > ported to GCN3 probably by copying the code exactly into the corresponding > GCN3 files. You’ll notice however in that relation chain there are many > more instructions implemented for Vega only, so there will be similar > issues to this. Alternately, I think there is a Vega APU working > (gfx902?). MattS would know more about the status of that. I am not sure > of your use case but if you can use a dGPU, Vega with gfx900 version or > full system mode is another option to use Vega ISA. > > > > For the docker automatically quitting, you will have to do `docker run > *-it* …` to start an interactive session. > > > > > > -Matt > > > > *From:* Anoop Mysore <mysanoop@gmail.com> > *Sent:* Monday, September 11, 2023 10:33 AM > *To:* Poremba, Matthew <Matthew.Poremba@amd.com> > *Cc:* Matt Sinclair <mattdsinclair.wisc@gmail.com>; The gem5 Users > mailing list <gem5-users@gem5.org> > *Subject:* Re: [gem5-users] Re: Error in an application running on gem5 > GCN3 (with apu_se.py) > > > > *Caution:* This message originated from an External Source. Use proper > caution when opening attachments, clicking links, or responding. > > > > Thanks, Matt. Yes, the printfs in the GPU kernel code were the issue for > s_sendmsg. > > However, the ds_add_u32 instruction is still an issue. I am already > compiling with -O1 like so: > > /opt/rocm/hip/bin/hipcc --amdgpu-target=gfx801,gfx803 > > main.cpp kernel.cu kernel.cpp > > -o ./bin/hsto.gem5 > > > -I/home/anoop/new/gem5-resources/src/gpu/chai/HIP-U-gem5/HSTO/../.gem5/include > > -lz -lm -lc -lpthread -O1 > > > -L/home/anoop/new/gem5-resources/src/gpu/chai/HIP-U-gem5/HSTO/../.gem5/util/m5/build/x86/out > -lm5 > > > > The exact error is: > src/gpu-compute/scoreboard_check_stage.cc:158: panic: next instruction: > ds_add_u32 v7, v8 is of unknown type > > > > The corresponding line in the simulator > <https://github.com/gem5/gem5/blob/48a40cf2f5182a82de360b7efa497d82e06b1631/src/gpu-compute/scoreboard_check_stage.cc#L158>, > and decoder section of it > <https://github.com/gem5/gem5/blob/48a40cf2f5182a82de360b7efa497d82e06b1631/src/arch/amdgpu/gcn3/insts/instructions.cc#L30929>. > Because of the involvement of the LDS/GDS, I'm unsure how to implement this > -- any help would be appreciated. > > > > Also, GDB still doesn't seem to be working with my gem5. And without > prints in the kernel, it's cumbersome to get any useful insight on failing > programs. > > I added within the Dockerfile: RUN apt install -y gdb > > I am invoking gdb with: > > docker run -u $UID:$GID --volume $(pwd):$(pwd) -w $(pwd) gem5:new gdb > --args gem5/build/GCN3_X86/gem5.debug gem5/configs/example/apu_se.py > --cpu-type=DerivO3CPU --num-cpus=4 --mem-size=1GB --ruby > --mem-type=SimpleMemory -c > gem5-resources/src/gpu/chai/HIP-U-gem5/HSTO/bin/hsto.gem5 > > > > Log: > > GNU gdb (Ubuntu 9.2-0ubuntu1~20.04.1) 9.2 > Copyright (C) 2020 Free Software Foundation, Inc. > License GPLv3+: GNU GPL version 3 or later < > http://gnu.org/licenses/gpl.html> > This is free software: you are free to change and redistribute it. > There is NO WARRANTY, to the extent permitted by law. > Type "show copying" and "show warranty" for details. > This GDB was configured as "x86_64-linux-gnu". > Type "show configuration" for configuration details. > For bug reporting instructions, please see: > <http://www.gnu.org/software/gdb/bugs/>. > Find the GDB manual and other documentation resources online at: > <http://www.gnu.org/software/gdb/documentation/>. > > For help, type "help". > Type "apropos word" to search for commands related to "word"... > Reading symbols from gem5/build/GCN3_X86/gem5.debug... > (gdb) quit > > > > PS: `quit` was automatically taken in. > > Is there anything wrong I'm doing here? > > > > > > > > On Fri, Sep 8, 2023 at 4:50 PM Poremba, Matthew <Matthew.Poremba@amd.com> > wrote: > > [Public] > > > > Hi Anoop, > > > > > > Based on that register count, I am going to guess you built the > application with -O0 or some other debugging flags? If you do this, the > compiler makes some super large number of registers. I assume that is so a > real GPU will not run any other applications simultaneously. > > > > Similarly, if you are seeing s_sendmsg I am going to guess there is a > printf() in your GPU kernel. These aren’t currently supported in gem5, but > something that would be very nice to have. > > > > If these are true you will need to remove any printfs and compile with at > least -O1 to run in gem5. > > > > > > -Matt > > > > *From:* Anoop Mysore <mysanoop@gmail.com> > *Sent:* Friday, September 8, 2023 7:33 AM > *To:* Matt Sinclair <mattdsinclair.wisc@gmail.com> > *Cc:* The gem5 Users mailing list <gem5-users@gem5.org>; Poremba, Matthew > <Matthew.Poremba@amd.com> > *Subject:* Re: [gem5-users] Re: Error in an application running on gem5 > GCN3 (with apu_se.py) > > > > *Caution:* This message originated from an External Source. Use proper > caution when opening attachments, clicking links, or responding. > > > > Hi Matt, > I'm facing a few other problems: > > 1. `panic: panic condition (numWfs * vregDemandPerWI) > (numVectorALUs * > numVecRegsPerSimd) occurred: WG with 1 WFs and 29285 VGPRs per WI can not > be allocated to CU that has 8192 VGPRs` > > The corresponding line of the code in gem5: > https://github.com/gem5/gem5/blob/f29bfc0640c88a79eb7f94454ce31b3237ec0066/src/gpu-compute/compute_unit.cc#L565 > > One of the variables (vregDemandPerWI) is ultimately derived from reading > the executable for the kernel code. Is it possible to reduce this VGRP > demand somehow, or is increasing the VGPRs (to what seems like an > unrealistically high value) be the only solution? Similar error for SGPRs > as well. > > 2. Some kernels (compiled for gfx801/3) have instructions such as > ds_add_u32 > <https://www.amd.com/system/files/TechDocs/gcn3-instruction-set-architecture.pdf> (Data > Store instruction page: 12-161), s_sendmsg (send message to host CPU) -- > which do not have their relevant decoding code available > <https://github.com/gem5/gem5/blob/48a40cf2f5182a82de360b7efa497d82e06b1631/src/arch/amdgpu/gcn3/insts/instructions.cc#L30929>. > Is this intentional or was this just punted for later -- anything to keep > in mind when coding for these? > > > > > > > > > > On Thu, Aug 17, 2023 at 5:13 PM Matt Sinclair < > mattdsinclair.wisc@gmail.com> wrote: > > Hi Anoop, > > > > I'm glad that increasing -n helped. It's hard to say what exactly the > problem is without digging in further, but often the ROCm stack will launch > additional processes to do a variety of things (e.g., check which version > of LLVM is being used). In gem5, each of these require a separate CPU > thread context -- which increasing -n handles in SE mode. So if I had to > guess, I would say that this is what is happening. > > > > If you added gdb locally to your docker, and you built the docker > properly, then I would expect gdb to work with gem5. > > > > Thanks, > > Matt > > > > On Wed, Aug 16, 2023 at 11:41 PM Anoop Mysore <mysanoop@gmail.com> wrote: > > Thank you, Matt, having 10 CPUs (up from previous 3) in the simulated > system seems to make it work! (At least, I don't see that error at that > point anymore). Is "resource temporarily unavailable" commonly due to CPU > count? Curious to know how you made that connection. > > > > Re gdb: I am indeed using a local docker build > (gem5/util/dockerfiles/gcn-gpu) with an added gdb installation -- is that > what you meant? > > > > Will send in a PR to the repo soon as I'm done :) > > On Wed, Aug 16, 2023, 5:03 PM Matt Sinclair <mattdsinclair.wisc@gmail.com> > wrote: > > Hi Anoop, > > > > A few things here: > > > > - Regarding the original failure (at least the !FS part), this is normally > happening either because of the GPU Target ISA (e.g., gfx900) you used in > your Makefile (e.g., it is not supported) or because you didn't properly > specify what GPU ISA you are using when running the program. So, what is > your command line for running this application and what ISA are you > specifying in your Makefile? > > - If the "what()" is the real source of the error, then I think this could > be related to the number of CPU thread contexts you are running with gem5. > What did you set "-n" to? > > - Regarding gdb, @Matt P: did you remove gdb from what is installed in the > Docker a while back? If so, I think Anoop would need to add it back and > create a local docker or something like that. > > - Setting aside the above, it would be wonderful if you contribute the > CHAI benchmarks to gem5-resources once you get them working! Please let us > know if we can do anything to help with that. > > > > Thanks, > > Matt > > > > On Wed, Aug 16, 2023 at 9:51 AM Anoop Mysore via gem5-users < > gem5-users@gem5.org> wrote: > > Curiously, running the gem5.debug executable with gdb within docker > results in: > > Reading symbols from gem5/build/GCN3_X86/gem5.debug... > (gdb) quit > (the quit wasn't a command I provided, it just quits automatically). Is > gdb working with gem5 GCN3 in Docker? > > > > I ran gem5.opt with ExecAll and SyscallAll debug flags, the debug tail and > the simerr logs are attached. > > I don't see anything peculiar other than a tgkill syscall with a SIGABRT > sent to a thread thereafter halting within a few instructions. > > > > On Tue, Aug 15, 2023 at 9:00 PM Anoop Mysore <mysanoop@gmail.com> wrote: > > I am trying to port CHAI benchmarks > <https://github.com/chai-benchmarks/chai>similarly to > gem5-resources/src/gpu/pannotia > <https://github.com/gem5/gem5-resources/tree/stable/src/gpu/pannotia>. I > was able to HIPify (through the perl script + some manual changes) all the > code files, and ran the BFS program. I see the following error message at > the point of launching the CPU threads here > <https://github.com/mysoreanoop/chai/blob/678c18fd551fbf12f4abbb05ab7164f1b588be68/HIP-U-gem5/BFS/main.cpp#L273> (fork > of HIPified CHAI). I do not see any of the prints from the CPU threads > which leads me to believe the error is to do with the threads not being > launched or a related error. > > > > (This looks related; incorporated the suggestion of linking against > -pthread: https://stackoverflow.com/a/6485728) > > > > The stderr log is below; any help is appreciated. > > _________ > > .... > > AM: Launching CPU > > terminate called after throwing an instance of 'std::system_error' > > what(): Resource temporarily unavailable > > build/GCN3_X86/sim/faults.cc:60: panic: panic condition !FullSystem > occurred: fault (General-Protection) detected @ PC > (0x7ffff6afa941=>0x7ffff6afa942).(0=>1) > Memory Usage: 19704072 KBytes > > Program aborted at tick 441590522500 > > --- BEGIN LIBC BACKTRACE --- > gem5/build/GCN3_X86/gem5.opt(+0x550200)[0x55a709b31200] > gem5/build/GCN3_X86/gem5.opt(+0x57d46e)[0x55a709b5e46e] > /lib/x86_64-linux-gnu/libpthread.so.0(+0x14420)[0x7f18881a0420] > /lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb)[0x7f188734800b] > /lib/x86_64-linux-gnu/libc.so.6(abort+0x12b)[0x7f1887327859] > gem5/build/GCN3_X86/gem5.opt(+0x4be295)[0x55a709a9f295] > gem5/build/GCN3_X86/gem5.opt(+0x5f6169)[0x55a709bd7169] > gem5/build/GCN3_X86/gem5.opt(+0x9fd9ed)[0x55a709fde9ed] > gem5/build/GCN3_X86/gem5.opt(+0x15b1d10)[0x55a70ab92d10] > gem5/build/GCN3_X86/gem5.opt(+0x15b2fd5)[0x55a70ab93fd5] > gem5/build/GCN3_X86/gem5.opt(+0x15b5620)[0x55a70ab96620] > gem5/build/GCN3_X86/gem5.opt(+0x15b6348)[0x55a70ab97348] > gem5/build/GCN3_X86/gem5.opt(+0x15c2954)[0x55a70aba3954] > gem5/build/GCN3_X86/gem5.opt(+0x56a082)[0x55a709b4b082] > gem5/build/GCN3_X86/gem5.opt(+0x59e2c4)[0x55a709b7f2c4] > gem5/build/GCN3_X86/gem5.opt(+0x59e8a3)[0x55a709b7f8a3] > gem5/build/GCN3_X86/gem5.opt(+0x4ed462)[0x55a709ace462] > gem5/build/GCN3_X86/gem5.opt(+0x4af427)[0x55a709a90427] > /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x2a8738)[0x7f1888459738] > > /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x8dd8)[0x7f188822ef48] > > /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0x8fb)[0x7f188837be3b] > > /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyFunction_Vectorcall+0x94)[0x7f1888459114] > /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x74d6d)[0x7f1888225d6d] > > /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x7d86)[0x7f188822def6] > > /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0x8fb)[0x7f188837be3b] > > /lib/x86_64-linux-gnu/libpython3.8.so.1.0(PyEval_EvalCodeEx+0x42)[0x7f188837c1c2] > > /lib/x86_64-linux-gnu/libpython3.8.so.1.0(PyEval_EvalCode+0x1f)[0x7f188837c5af] > /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x1cfbf1)[0x7f1888380bf1] > /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x25f537)[0x7f1888410537] > /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x74d6d)[0x7f1888225d6d] > > /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x12fd)[0x7f188822746d] > /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x8006b)[0x7f188823106b] > --- END LIBC BACKTRACE --- > Failed to execute default signal handler! > > _________ > > _______________________________________________ > gem5-users mailing list -- gem5-users@gem5.org > To unsubscribe send an email to gem5-users-leave@gem5.org > >