gem5-users@gem5.org

The gem5 Users mailing list

View all threads

回复:Re: Fail to run gpu-fs

关富润
Wed, Dec 20, 2023 3:04 AM

Thanks for the ad, Firstly, I want to express my gratitude for your previous advice regarding the use of 'sudo'. It effectively resolved the issue I was facing earlier. After implementing it, I was able to successfully create the disk-image using packer, which was a significant breakthrough.
However, during the disk-image creation process, I observed numerous red font warnings from qemu, and I'm uncertain if these might affect subsequent emulation tasks. I'd appreciate any insights you might have on this. Additionally, I've encountered a new challenge. Following the steps in the README, I executed the command:  sudo build/VEGA_X86/gem5.opt configs/example/gpufs/vega10_kvm.py --disk-image ../gem5-resources/src/gpu-fs/disk-image/rocm42/rocm42-image/rocm42  --kernel ../gem5-resources/src/gpu-fs/vmlinux-5.4.0-105-generic --gpu-mmio-trace ../gem5-resources/src/gpu-fs/vega_mmio.log --app ../gem5-resources/src/gpu/square/bin/square Unfortunately, this resulted in an error. The specific error message form the system.pc.com_1.device was: [    4.694416] amdgpu 0000:00:08.0: amdgpu: ring page1 uses VM inv eng 5 on hub 1 [    4.733771] [drm] Initialized amdgpu 3.41.0 20150101 for 0000:00:08.0 on minor 0 Running ../gem5-resources/src/gpu/square/bin/square  ./myapp: error while loading shared libraries: libamdhip64.so.5: cannot open shared object file: No such file or directory
This seems to suggest a missing shared library file. I am wondering if this issue could be a result of the disk-image creation process, and how I might go about resolving it. Any guidance or suggestions you could provide would be immensely helpful.

Thank you once again for your support and looking forward to your advice.
Best regards, Sandy

------------------ 原始邮件 ------------------
发件人:                                                                                                                        "The gem5 Users mailing list"                                                                                    <gem5-users@gem5.org>;
发送时间: 2023年12月20日(星期三) 上午6:54
收件人: "The gem5 Users mailing list"<gem5-users@gem5.org>;
抄送: "Pau Galindo Figuerola"<pau.galindo.figuerola@estudiantat.upc.edu>;
主题: [gem5-users] Re: Fail to run gpu-fs

Hi,

It might seem dumb but I faced a similar issue where vega10_atomic worked and vega10_kvm not and the fix was typing 'sudo' at the beginning of the command.

Hope it works!

Regards,
Pau

El mar, 19 dic 2023 18:57, Poremba, Matthew via gem5-users <gem5-users@gem5.org> escribió:

[AMD Official Use Only - General]

Hi Sandy,

 

 

Could you share the file “m5out/system.pc.com_1.device” as well?

 

You could also try using vega10_atomic.py instead of vega10_kvm.py.  Initially it looks to me like a KVM issue.

 

 

-Matt

 

From: Matt Sinclair <mattdsinclair.wisc@gmail.com>
Sent: Tuesday, December 19, 2023 9:28 AM
To: The gem5 Users mailing list <gem5-users@gem5.org>
Cc: 关富润 <448367413@qq.com>; Poremba, Matthew <Matthew.Poremba@amd.com>; VISHNU RAMADAS <vramadas@wisc.edu>
Subject: Re: [gem5-users] Fail to run gpu-fs

 

Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.

 

Hi Sandy,

 

Can you please give us a bit more information about what you were running?  It looks like you were just trying to run square from the README?  Normally that works out of the box, so I'm wondering if you made any changes to your local setup.

 

(I am not the primary developer for GPUFS, but am trying to help)

 

Thanks,

Matt

 

On Tue, Dec 19, 2023 at 5:16 AM  关富润 via gem5-users <gem5-users@gem5.org> wrote:

Dear all,

I've encountered while performing a gpu-fs simulation using the gem5 simulator. Following the instructions outlined in the https://github.com/gem5/gem5-resources/blob/stable/src/gpu-fs/README.md,  and using the disk image obtained from https://www.gem5.org/2023/02/13/moving-to-full-system-gpu.html,  I executed the following command:
build/VEGA_X86/gem5.opt configs/example/gpufs/vega10_kvm.py --disk-image ../gem5-resources/src/gpu-fs/disk-image/rocm42/rocm42-image/rocm42 --kernel ../gem5-resources/src/gpu-fs/vmlinux-5.4.0-105-generic  --gpu-mmio-trace ../gem5-resources/src/gpu-fs/vega_mmio.log --app ../gem5-resources/src/gpu/square/bin/square During the execution, I encountered multiple warning messages related to unsupported MSR (Model Specific Register) accesses, followed by a panic related  to the Intel 8254 timer. The specific warning and error messages were: build/VEGA_X86/arch/x86/kvm/x86_cpu.cc:1562: warn: kvm-x86: MSR (0xc001011f) unsupported by gem5. Skipping.  build/VEGA_X86/arch/x86/kvm/x86_cpu.cc:1562: warn: kvm-x86: MSR (0x1fc) unsupported by gem5. Skipping. build/VEGA_X86/arch/x86/kvm/x86_cpu.cc:1562: warn: kvm-x86: MSR (0x8b) unsupported by gem5. Skipping. build/VEGA_X86/arch/x86/kvm/x86_cpu.cc:1562: warn:  kvm-x86: MSR (0xc0010015) unsupported by gem5. Skipping. build/VEGA_X86/arch/x86/kvm/x86_cpu.cc:1562: warn: kvm-x86: MSR (0x4b564d05) unsupported by gem5. Skipping. build/VEGA_X86/dev/x86/pc.cc:117: warn: Don't know what interrupt to clear for console. build/VEGA_X86/dev/intel_8254_timer.cc:215:  panic: PIT mode 0x4 is not implemented: Memory Usage: 23051064 KBytes Program aborted at tick 2058120564000 --- BEGIN LIBC BACKTRACE --- build/VEGA_X86/gem5.opt(+0x12471b0)[0x5576147331b0] build/VEGA_X86/gem5.opt(+0x126b9be)[0x5576147579be] /lib/x86_64-linux-gnu/libpthread.so.0(+0x14420)[0x7f634f500420]  /lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb)[0x7f634e6a700b] /lib/x86_64-linux-gnu/libc.so.6(abort+0x12b)[0x7f634e686859] build/VEGA_X86/gem5.opt(+0x4ec3e5)[0x5576139d83e5] build/VEGA_X86/gem5.opt(+0x1b1d38f)[0x55761500938f] build/VEGA_X86/gem5.opt(+0x1cca5fa)[0x5576151b65fa]  build/VEGA_X86/gem5.opt(+0x1b03226)[0x557614fef226] build/VEGA_X86/gem5.opt(+0x77b598)[0x557613c67598] build/VEGA_X86/gem5.opt(+0x96b627)[0x557613e57627] build/VEGA_X86/gem5.opt(+0xfcc34b)[0x5576144b834b] build/VEGA_X86/gem5.opt(+0x19d159d)[0x557614ebd59d]  build/VEGA_X86/gem5.opt(+0xfcccb3)[0x5576144b8cb3] build/VEGA_X86/gem5.opt(+0xfcb8b1)[0x5576144b78b1] build/VEGA_X86/gem5.opt(+0x125aa22)[0x557614746a22] build/VEGA_X86/gem5.opt(+0x1283534)[0x55761476f534] build/VEGA_X86/gem5.opt(+0x1283b13)[0x55761476fb13]  build/VEGA_X86/gem5.opt(+0x665ab2)[0x557613b51ab2] build/VEGA_X86/gem5.opt(+0x4ba777)[0x5576139a6777] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x2a8748)[0x7f634f7b9748] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x8dd8)[0x7f634f58ef48]  /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0x8fb)[0x7f634f6dbe4b] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyFunction_Vectorcall+0x94)[0x7f634f7b9124] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x74d6d)[0x7f634f585d6d] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x7d86)[0x7f634f58def6]  /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x8006b)[0x7f634f59106b] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x74d6d)[0x7f634f585d6d] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x7d86)[0x7f634f58def6] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0x8fb)[0x7f634f6dbe4b]  /lib/x86_64-linux-gnu/libpython3.8.so.1.0(PyEval_EvalCodeEx+0x42)[0x7f634f6dc1d2] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(PyEval_EvalCode+0x1f)[0x7f634f6dc5bf] --- END LIBC BACKTRACE --- Aborted (core dumped) Additionally, in the m5out/system.pc.com_1.device file, I found multiple error entries related to unchecked MSR access errors. [ 0.334614] unchecked MSR access error: RDMSR from 0x1b0 at rIP: 0xffffffff8107688a (native_read_msr+0xa/0x30) [ 0.337428] Call Trace: [ 0.338158] ? __switch_to_asm+0x34/0x70  [ 0.338535] intel_epb_restore+0x1f/0x80 [ 0.339670] intel_epb_online+0x17/0x40 [ 0.340786] cpuhp_invoke_callback+0x8a/0x580 [ 0.342045] ? __schedule+0x29a/0x720 [ 0.342531] cpuhp_thread_fun+0xb8/0x120 [ 0.343683] smpboot_thread_fn+0xfc/0x170 [ 0.344851] kthread+0x121/0x140  [ 0.345784] ? sort_range+0x30/0x30 [ 0.346531] ? kthread_park+0x90/0x90 [ 0.347606] ret_from_fork+0x22/0x40 [ 0.348640] ENERGY_PERF_BIAS: Set to 'normal', was 'performance' [ 0.350533] unchecked MSR access error: WRMSR to 0x1b0 (tried to write 0x0000000000000006)  at rIP: 0xffffffff81076a88 (native_write_msr+0x8/0x30) [ 0.354238] Call Trace: [ 0.354532] intel_epb_restore+0x4d/0x80 [ 0.355674] intel_epb_online+0x17/0x40 [ 0.356785] cpuhp_invoke_callback+0x8a/0x580 [ 0.358046] ? __schedule+0x29a/0x720 [ 0.358531] cpuhp_thread_fun+0xb8/0x120  [ 0.359682] smpboot_thread_fn+0xfc/0x170 [ 0.360847] kthread+0x121/0x140 [ 0.361786] ? sort_range+0x30/0x30 [ 0.362531] ? kthread_park+0x90/0x90 [ 0.363754] ret_from_fork+0x22/0x40

I am unsure how to proceed with resolving these issues. I would greatly appreciate any guidance or advice you can provide on how to address these errors and successfully run the simulation.

Thank you for your time and assistance. I look forward to your valuable insights.

Best regards,

Sandy.


gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to  gem5-users-leave@gem5.org


gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-leave@gem5.org

Thanks for the ad, Firstly, I want to express my gratitude for your previous advice regarding the use of 'sudo'. It effectively resolved the issue I was facing earlier. After implementing it, I was able to successfully create the disk-image using packer, which was a significant breakthrough. However, during the disk-image creation process, I observed numerous red font warnings from qemu, and I'm uncertain if these might affect subsequent emulation tasks. I'd appreciate any insights you might have on this. Additionally, I've encountered a new challenge. Following the steps in the README, I executed the command: sudo build/VEGA_X86/gem5.opt configs/example/gpufs/vega10_kvm.py --disk-image ../gem5-resources/src/gpu-fs/disk-image/rocm42/rocm42-image/rocm42 --kernel ../gem5-resources/src/gpu-fs/vmlinux-5.4.0-105-generic --gpu-mmio-trace ../gem5-resources/src/gpu-fs/vega_mmio.log --app ../gem5-resources/src/gpu/square/bin/square Unfortunately, this resulted in an error. The specific error message form the system.pc.com_1.device was: [ 4.694416] amdgpu 0000:00:08.0: amdgpu: ring page1 uses VM inv eng 5 on hub 1 [ 4.733771] [drm] Initialized amdgpu 3.41.0 20150101 for 0000:00:08.0 on minor 0 Running ../gem5-resources/src/gpu/square/bin/square ./myapp: error while loading shared libraries: libamdhip64.so.5: cannot open shared object file: No such file or directory This seems to suggest a missing shared library file. I am wondering if this issue could be a result of the disk-image creation process, and how I might go about resolving it. Any guidance or suggestions you could provide would be immensely helpful. Thank you once again for your support and looking forward to your advice. Best regards, Sandy ------------------&nbsp;原始邮件&nbsp;------------------ 发件人: "The gem5 Users mailing list" <gem5-users@gem5.org&gt;; 发送时间:&nbsp;2023年12月20日(星期三) 上午6:54 收件人:&nbsp;"The gem5 Users mailing list"<gem5-users@gem5.org&gt;; 抄送:&nbsp;"Pau Galindo Figuerola"<pau.galindo.figuerola@estudiantat.upc.edu&gt;; 主题:&nbsp;[gem5-users] Re: Fail to run gpu-fs Hi, It might seem dumb but I faced a similar issue where vega10_atomic worked and vega10_kvm not and the fix was typing 'sudo' at the beginning of the command. Hope it works! Regards, Pau El mar, 19 dic 2023 18:57, Poremba, Matthew via gem5-users <gem5-users@gem5.org&gt; escribió: [AMD Official Use Only - General] Hi Sandy, &nbsp; &nbsp; Could you share the file “m5out/system.pc.com_1.device” as well? &nbsp; You could also try using vega10_atomic.py instead of vega10_kvm.py.&nbsp; Initially it looks to me like a KVM issue. &nbsp; &nbsp; -Matt &nbsp; From: Matt Sinclair <mattdsinclair.wisc@gmail.com&gt; Sent: Tuesday, December 19, 2023 9:28 AM To: The gem5 Users mailing list <gem5-users@gem5.org&gt; Cc: 关富润 <448367413@qq.com&gt;; Poremba, Matthew <Matthew.Poremba@amd.com&gt;; VISHNU RAMADAS <vramadas@wisc.edu&gt; Subject: Re: [gem5-users] Fail to run gpu-fs &nbsp; Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding. &nbsp; Hi Sandy, &nbsp; Can you please give us a bit more information about what you were running?&nbsp; It looks like you were just trying to run square from the README?&nbsp; Normally that works out of the box, so I'm wondering if you made any changes to your local setup. &nbsp; (I am not the primary developer for GPUFS, but am trying to help) &nbsp; Thanks, Matt &nbsp; On Tue, Dec 19, 2023 at 5:16 AM 关富润 via gem5-users <gem5-users@gem5.org&gt; wrote: Dear all, I've encountered while performing a gpu-fs simulation using the gem5 simulator. Following the instructions outlined in the https://github.com/gem5/gem5-resources/blob/stable/src/gpu-fs/README.md, and using the disk image obtained from https://www.gem5.org/2023/02/13/moving-to-full-system-gpu.html, I executed the following command: build/VEGA_X86/gem5.opt configs/example/gpufs/vega10_kvm.py --disk-image ../gem5-resources/src/gpu-fs/disk-image/rocm42/rocm42-image/rocm42 --kernel ../gem5-resources/src/gpu-fs/vmlinux-5.4.0-105-generic --gpu-mmio-trace ../gem5-resources/src/gpu-fs/vega_mmio.log --app ../gem5-resources/src/gpu/square/bin/square During the execution, I encountered multiple warning messages related to unsupported MSR (Model Specific Register) accesses, followed by a panic related to the Intel 8254 timer. The specific warning and error messages were: build/VEGA_X86/arch/x86/kvm/x86_cpu.cc:1562: warn: kvm-x86: MSR (0xc001011f) unsupported by gem5. Skipping. build/VEGA_X86/arch/x86/kvm/x86_cpu.cc:1562: warn: kvm-x86: MSR (0x1fc) unsupported by gem5. Skipping. build/VEGA_X86/arch/x86/kvm/x86_cpu.cc:1562: warn: kvm-x86: MSR (0x8b) unsupported by gem5. Skipping. build/VEGA_X86/arch/x86/kvm/x86_cpu.cc:1562: warn: kvm-x86: MSR (0xc0010015) unsupported by gem5. Skipping. build/VEGA_X86/arch/x86/kvm/x86_cpu.cc:1562: warn: kvm-x86: MSR (0x4b564d05) unsupported by gem5. Skipping. build/VEGA_X86/dev/x86/pc.cc:117: warn: Don't know what interrupt to clear for console. build/VEGA_X86/dev/intel_8254_timer.cc:215: panic: PIT mode 0x4 is not implemented: Memory Usage: 23051064 KBytes Program aborted at tick 2058120564000 --- BEGIN LIBC BACKTRACE --- build/VEGA_X86/gem5.opt(+0x12471b0)[0x5576147331b0] build/VEGA_X86/gem5.opt(+0x126b9be)[0x5576147579be] /lib/x86_64-linux-gnu/libpthread.so.0(+0x14420)[0x7f634f500420] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb)[0x7f634e6a700b] /lib/x86_64-linux-gnu/libc.so.6(abort+0x12b)[0x7f634e686859] build/VEGA_X86/gem5.opt(+0x4ec3e5)[0x5576139d83e5] build/VEGA_X86/gem5.opt(+0x1b1d38f)[0x55761500938f] build/VEGA_X86/gem5.opt(+0x1cca5fa)[0x5576151b65fa] build/VEGA_X86/gem5.opt(+0x1b03226)[0x557614fef226] build/VEGA_X86/gem5.opt(+0x77b598)[0x557613c67598] build/VEGA_X86/gem5.opt(+0x96b627)[0x557613e57627] build/VEGA_X86/gem5.opt(+0xfcc34b)[0x5576144b834b] build/VEGA_X86/gem5.opt(+0x19d159d)[0x557614ebd59d] build/VEGA_X86/gem5.opt(+0xfcccb3)[0x5576144b8cb3] build/VEGA_X86/gem5.opt(+0xfcb8b1)[0x5576144b78b1] build/VEGA_X86/gem5.opt(+0x125aa22)[0x557614746a22] build/VEGA_X86/gem5.opt(+0x1283534)[0x55761476f534] build/VEGA_X86/gem5.opt(+0x1283b13)[0x55761476fb13] build/VEGA_X86/gem5.opt(+0x665ab2)[0x557613b51ab2] build/VEGA_X86/gem5.opt(+0x4ba777)[0x5576139a6777] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x2a8748)[0x7f634f7b9748] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x8dd8)[0x7f634f58ef48] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0x8fb)[0x7f634f6dbe4b] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyFunction_Vectorcall+0x94)[0x7f634f7b9124] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x74d6d)[0x7f634f585d6d] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x7d86)[0x7f634f58def6] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x8006b)[0x7f634f59106b] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x74d6d)[0x7f634f585d6d] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x7d86)[0x7f634f58def6] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0x8fb)[0x7f634f6dbe4b] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(PyEval_EvalCodeEx+0x42)[0x7f634f6dc1d2] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(PyEval_EvalCode+0x1f)[0x7f634f6dc5bf] --- END LIBC BACKTRACE --- Aborted (core dumped) Additionally, in the m5out/system.pc.com_1.device file, I found multiple error entries related to unchecked MSR access errors. [ 0.334614] unchecked MSR access error: RDMSR from 0x1b0 at rIP: 0xffffffff8107688a (native_read_msr+0xa/0x30) [ 0.337428] Call Trace: [ 0.338158] ? __switch_to_asm+0x34/0x70 [ 0.338535] intel_epb_restore+0x1f/0x80 [ 0.339670] intel_epb_online+0x17/0x40 [ 0.340786] cpuhp_invoke_callback+0x8a/0x580 [ 0.342045] ? __schedule+0x29a/0x720 [ 0.342531] cpuhp_thread_fun+0xb8/0x120 [ 0.343683] smpboot_thread_fn+0xfc/0x170 [ 0.344851] kthread+0x121/0x140 [ 0.345784] ? sort_range+0x30/0x30 [ 0.346531] ? kthread_park+0x90/0x90 [ 0.347606] ret_from_fork+0x22/0x40 [ 0.348640] ENERGY_PERF_BIAS: Set to 'normal', was 'performance' [ 0.350533] unchecked MSR access error: WRMSR to 0x1b0 (tried to write 0x0000000000000006) at rIP: 0xffffffff81076a88 (native_write_msr+0x8/0x30) [ 0.354238] Call Trace: [ 0.354532] intel_epb_restore+0x4d/0x80 [ 0.355674] intel_epb_online+0x17/0x40 [ 0.356785] cpuhp_invoke_callback+0x8a/0x580 [ 0.358046] ? __schedule+0x29a/0x720 [ 0.358531] cpuhp_thread_fun+0xb8/0x120 [ 0.359682] smpboot_thread_fn+0xfc/0x170 [ 0.360847] kthread+0x121/0x140 [ 0.361786] ? sort_range+0x30/0x30 [ 0.362531] ? kthread_park+0x90/0x90 [ 0.363754] ret_from_fork+0x22/0x40 I am unsure how to proceed with resolving these issues. I would greatly appreciate any guidance or advice you can provide on how to address these errors and successfully run the simulation. Thank you for your time and assistance. I look forward to your valuable insights. Best regards, Sandy. _______________________________________________ gem5-users mailing list -- gem5-users@gem5.org To unsubscribe send an email to gem5-users-leave@gem5.org _______________________________________________ gem5-users mailing list -- gem5-users@gem5.org To unsubscribe send an email to gem5-users-leave@gem5.org
PM
Poremba, Matthew
Tue, Jan 2, 2024 7:31 PM

[Public]

Hi Sandy,

The docker image to build GPU applications and the disk image were not updated in sync.  Therefore, the docker image will build for ROCm 5 but the disk image is for ROCm 4.  You will need to follow the instructions to create a ROCm 5 disk image here:  https://github.com/gem5/gem5-resources/pull/12 .  That will hopefully be merged soon and become the default.

-Matt

From: 关富润 via gem5-users gem5-users@gem5.org
Sent: Tuesday, December 19, 2023 7:04 PM
To: The gem5 Users mailing list gem5-users@gem5.org
Cc: 关富润 448367413@qq.com
Subject: [gem5-users] 回复:Re: Fail to run gpu-fs

Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.

Thanks for the ad, Firstly, I want to express my gratitude for your previous advice regarding the use of 'sudo'. It effectively resolved the issue I was facing earlier. After implementing it, I was able to successfully create the disk-image using packer, which was a significant breakthrough.
However, during the disk-image creation process, I observed numerous red font warnings from qemu, and I'm uncertain if these might affect subsequent emulation tasks. I'd appreciate any insights you might have on this. Additionally, I've encountered a new challenge. Following the steps in the README, I executed the command: sudo build/VEGA_X86/gem5.opt configs/example/gpufs/vega10_kvm.py --disk-image ../gem5-resources/src/gpu-fs/disk-image/rocm42/rocm42-image/rocm42 --kernel ../gem5-resources/src/gpu-fs/vmlinux-5.4.0-105-generic --gpu-mmio-trace ../gem5-resources/src/gpu-fs/vega_mmio.log --app ../gem5-resources/src/gpu/square/bin/square Unfortunately, this resulted in an error. The specific error message form the system.pc.com_1.device was: [ 4.694416] amdgpu 0000:00:08.0: amdgpu: ring page1 uses VM inv eng 5 on hub 1 [ 4.733771] [drm] Initialized amdgpu 3.41.0 20150101 for 0000:00:08.0 on minor 0 Running ../gem5-resources/src/gpu/square/bin/square ./myapp: error while loading shared libraries: libamdhip64.so.5: cannot open shared object file: No such file or directory

This seems to suggest a missing shared library file. I am wondering if this issue could be a result of the disk-image creation process, and how I might go about resolving it. Any guidance or suggestions you could provide would be immensely helpful.

Thank you once again for your support and looking forward to your advice.
Best regards, Sandy

------------------ 原始邮件 ------------------
发件人: "The gem5 Users mailing list" <gem5-users@gem5.orgmailto:gem5-users@gem5.org>;
发送时间: 2023年12月20日(星期三) 上午6:54
收件人: "The gem5 Users mailing list"<gem5-users@gem5.orgmailto:gem5-users@gem5.org>;
抄送: "Pau Galindo Figuerola"<pau.galindo.figuerola@estudiantat.upc.edumailto:pau.galindo.figuerola@estudiantat.upc.edu>;
主题: [gem5-users] Re: Fail to run gpu-fs

Hi,

It might seem dumb but I faced a similar issue where vega10_atomic worked and vega10_kvm not and the fix was typing 'sudo' at the beginning of the command.

Hope it works!

Regards,
Pau

El mar, 19 dic 2023 18:57, Poremba, Matthew via gem5-users <gem5-users@gem5.orgmailto:gem5-users@gem5.org> escribió:

[AMD Official Use Only - General]

Hi Sandy,

Could you share the file “m5out/system.pc.com_1.device” as well?

You could also try using vega10_atomic.py instead of vega10_kvm.py.  Initially it looks to me like a KVM issue.

-Matt

From: Matt Sinclair <mattdsinclair.wisc@gmail.commailto:mattdsinclair.wisc@gmail.com>
Sent: Tuesday, December 19, 2023 9:28 AM
To: The gem5 Users mailing list <gem5-users@gem5.orgmailto:gem5-users@gem5.org>
Cc: 关富润 <448367413@qq.commailto:448367413@qq.com>; Poremba, Matthew <Matthew.Poremba@amd.commailto:Matthew.Poremba@amd.com>; VISHNU RAMADAS <vramadas@wisc.edumailto:vramadas@wisc.edu>
Subject: Re: [gem5-users] Fail to run gpu-fs

Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.

Hi Sandy,

Can you please give us a bit more information about what you were running?  It looks like you were just trying to run square from the README?  Normally that works out of the box, so I'm wondering if you made any changes to your local setup.

(I am not the primary developer for GPUFS, but am trying to help)

Thanks,
Matt

On Tue, Dec 19, 2023 at 5:16 AM 关富润 via gem5-users <gem5-users@gem5.orgmailto:gem5-users@gem5.org> wrote:
Dear all,
I've encountered while performing a gpu-fs simulation using the gem5 simulator. Following the instructions outlined in the https://github.com/gem5/gem5-resources/blob/stable/src/gpu-fs/README.md, and using the disk image obtained from https://www.gem5.org/2023/02/13/moving-to-full-system-gpu.html, I executed the following command:
build/VEGA_X86/gem5.opt configs/example/gpufs/vega10_kvm.py --disk-image ../gem5-resources/src/gpu-fs/disk-image/rocm42/rocm42-image/rocm42 --kernel ../gem5-resources/src/gpu-fs/vmlinux-5.4.0-105-generic --gpu-mmio-trace ../gem5-resources/src/gpu-fs/vega_mmio.log --app ../gem5-resources/src/gpu/square/bin/square During the execution, I encountered multiple warning messages related to unsupported MSR (Model Specific Register) accesses, followed by a panic related to the Intel 8254 timer. The specific warning and error messages were: build/VEGA_X86/arch/x86/kvm/x86_cpu.cc:1562: warn: kvm-x86: MSR (0xc001011f) unsupported by gem5. Skipping. build/VEGA_X86/arch/x86/kvm/x86_cpu.cc:1562: warn: kvm-x86: MSR (0x1fc) unsupported by gem5. Skipping. build/VEGA_X86/arch/x86/kvm/x86_cpu.cc:1562: warn: kvm-x86: MSR (0x8b) unsupported by gem5. Skipping. build/VEGA_X86/arch/x86/kvm/x86_cpu.cc:1562: warn: kvm-x86: MSR (0xc0010015) unsupported by gem5. Skipping. build/VEGA_X86/arch/x86/kvm/x86_cpu.cc:1562: warn: kvm-x86: MSR (0x4b564d05) unsupported by gem5. Skipping. build/VEGA_X86/dev/x86/pc.cc:117: warn: Don't know what interrupt to clear for console. build/VEGA_X86/dev/intel_8254_timer.cc:215: panic: PIT mode 0x4 is not implemented: Memory Usage: 23051064 KBytes Program aborted at tick 2058120564000 --- BEGIN LIBC BACKTRACE --- build/VEGA_X86/gem5.opt(+0x12471b0)[0x5576147331b0] build/VEGA_X86/gem5.opt(+0x126b9be)[0x5576147579be] /lib/x86_64-linux-gnu/libpthread.so.0(+0x14420)[0x7f634f500420] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb)[0x7f634e6a700b] /lib/x86_64-linux-gnu/libc.so.6(abort+0x12b)[0x7f634e686859] build/VEGA_X86/gem5.opt(+0x4ec3e5)[0x5576139d83e5] build/VEGA_X86/gem5.opt(+0x1b1d38f)[0x55761500938f] build/VEGA_X86/gem5.opt(+0x1cca5fa)[0x5576151b65fa] build/VEGA_X86/gem5.opt(+0x1b03226)[0x557614fef226] build/VEGA_X86/gem5.opt(+0x77b598)[0x557613c67598] build/VEGA_X86/gem5.opt(+0x96b627)[0x557613e57627] build/VEGA_X86/gem5.opt(+0xfcc34b)[0x5576144b834b] build/VEGA_X86/gem5.opt(+0x19d159d)[0x557614ebd59d] build/VEGA_X86/gem5.opt(+0xfcccb3)[0x5576144b8cb3] build/VEGA_X86/gem5.opt(+0xfcb8b1)[0x5576144b78b1] build/VEGA_X86/gem5.opt(+0x125aa22)[0x557614746a22] build/VEGA_X86/gem5.opt(+0x1283534)[0x55761476f534] build/VEGA_X86/gem5.opt(+0x1283b13)[0x55761476fb13] build/VEGA_X86/gem5.opt(+0x665ab2)[0x557613b51ab2] build/VEGA_X86/gem5.opt(+0x4ba777)[0x5576139a6777] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x2a8748)[0x7f634f7b9748] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x8dd8)[0x7f634f58ef48] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0x8fb)[0x7f634f6dbe4b] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyFunction_Vectorcall+0x94)[0x7f634f7b9124] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x74d6d)[0x7f634f585d6d] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x7d86)[0x7f634f58def6] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x8006b)[0x7f634f59106b] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x74d6d)[0x7f634f585d6d] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x7d86)[0x7f634f58def6] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0x8fb)[0x7f634f6dbe4b] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(PyEval_EvalCodeEx+0x42)[0x7f634f6dc1d2] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(PyEval_EvalCode+0x1f)[0x7f634f6dc5bf] --- END LIBC BACKTRACE --- Aborted (core dumped) Additionally, in the m5out/system.pc.com_1.device file, I found multiple error entries related to unchecked MSR access errors. [ 0.334614] unchecked MSR access error: RDMSR from 0x1b0 at rIP: 0xffffffff8107688a (native_read_msr+0xa/0x30) [ 0.337428] Call Trace: [ 0.338158] ? __switch_to_asm+0x34/0x70 [ 0.338535] intel_epb_restore+0x1f/0x80 [ 0.339670] intel_epb_online+0x17/0x40 [ 0.340786] cpuhp_invoke_callback+0x8a/0x580 [ 0.342045] ? __schedule+0x29a/0x720 [ 0.342531] cpuhp_thread_fun+0xb8/0x120 [ 0.343683] smpboot_thread_fn+0xfc/0x170 [ 0.344851] kthread+0x121/0x140 [ 0.345784] ? sort_range+0x30/0x30 [ 0.346531] ? kthread_park+0x90/0x90 [ 0.347606] ret_from_fork+0x22/0x40 [ 0.348640] ENERGY_PERF_BIAS: Set to 'normal', was 'performance' [ 0.350533] unchecked MSR access error: WRMSR to 0x1b0 (tried to write 0x0000000000000006) at rIP: 0xffffffff81076a88 (native_write_msr+0x8/0x30) [ 0.354238] Call Trace: [ 0.354532] intel_epb_restore+0x4d/0x80 [ 0.355674] intel_epb_online+0x17/0x40 [ 0.356785] cpuhp_invoke_callback+0x8a/0x580 [ 0.358046] ? __schedule+0x29a/0x720 [ 0.358531] cpuhp_thread_fun+0xb8/0x120 [ 0.359682] smpboot_thread_fn+0xfc/0x170 [ 0.360847] kthread+0x121/0x140 [ 0.361786] ? sort_range+0x30/0x30 [ 0.362531] ? kthread_park+0x90/0x90 [ 0.363754] ret_from_fork+0x22/0x40

I am unsure how to proceed with resolving these issues. I would greatly appreciate any guidance or advice you can provide on how to address these errors and successfully run the simulation.

Thank you for your time and assistance. I look forward to your valuable insights.

Best regards,

Sandy.


gem5-users mailing list -- gem5-users@gem5.orgmailto:gem5-users@gem5.org
To unsubscribe send an email to gem5-users-leave@gem5.orgmailto:gem5-users-leave@gem5.org


gem5-users mailing list -- gem5-users@gem5.orgmailto:gem5-users@gem5.org
To unsubscribe send an email to gem5-users-leave@gem5.orgmailto:gem5-users-leave@gem5.org

[Public] Hi Sandy, The docker image to build GPU applications and the disk image were not updated in sync. Therefore, the docker image will build for ROCm 5 but the disk image is for ROCm 4. You will need to follow the instructions to create a ROCm 5 disk image here: https://github.com/gem5/gem5-resources/pull/12 . That will hopefully be merged soon and become the default. -Matt From: 关富润 via gem5-users <gem5-users@gem5.org> Sent: Tuesday, December 19, 2023 7:04 PM To: The gem5 Users mailing list <gem5-users@gem5.org> Cc: 关富润 <448367413@qq.com> Subject: [gem5-users] 回复:Re: Fail to run gpu-fs Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding. Thanks for the ad, Firstly, I want to express my gratitude for your previous advice regarding the use of 'sudo'. It effectively resolved the issue I was facing earlier. After implementing it, I was able to successfully create the disk-image using packer, which was a significant breakthrough. However, during the disk-image creation process, I observed numerous red font warnings from qemu, and I'm uncertain if these might affect subsequent emulation tasks. I'd appreciate any insights you might have on this. Additionally, I've encountered a new challenge. Following the steps in the README, I executed the command: sudo build/VEGA_X86/gem5.opt configs/example/gpufs/vega10_kvm.py --disk-image ../gem5-resources/src/gpu-fs/disk-image/rocm42/rocm42-image/rocm42 --kernel ../gem5-resources/src/gpu-fs/vmlinux-5.4.0-105-generic --gpu-mmio-trace ../gem5-resources/src/gpu-fs/vega_mmio.log --app ../gem5-resources/src/gpu/square/bin/square Unfortunately, this resulted in an error. The specific error message form the system.pc.com_1.device was: [ 4.694416] amdgpu 0000:00:08.0: amdgpu: ring page1 uses VM inv eng 5 on hub 1 [ 4.733771] [drm] Initialized amdgpu 3.41.0 20150101 for 0000:00:08.0 on minor 0 Running ../gem5-resources/src/gpu/square/bin/square ./myapp: error while loading shared libraries: libamdhip64.so.5: cannot open shared object file: No such file or directory This seems to suggest a missing shared library file. I am wondering if this issue could be a result of the disk-image creation process, and how I might go about resolving it. Any guidance or suggestions you could provide would be immensely helpful. Thank you once again for your support and looking forward to your advice. Best regards, Sandy ------------------ 原始邮件 ------------------ 发件人: "The gem5 Users mailing list" <gem5-users@gem5.org<mailto:gem5-users@gem5.org>>; 发送时间: 2023年12月20日(星期三) 上午6:54 收件人: "The gem5 Users mailing list"<gem5-users@gem5.org<mailto:gem5-users@gem5.org>>; 抄送: "Pau Galindo Figuerola"<pau.galindo.figuerola@estudiantat.upc.edu<mailto:pau.galindo.figuerola@estudiantat.upc.edu>>; 主题: [gem5-users] Re: Fail to run gpu-fs Hi, It might seem dumb but I faced a similar issue where vega10_atomic worked and vega10_kvm not and the fix was typing 'sudo' at the beginning of the command. Hope it works! Regards, Pau El mar, 19 dic 2023 18:57, Poremba, Matthew via gem5-users <gem5-users@gem5.org<mailto:gem5-users@gem5.org>> escribió: [AMD Official Use Only - General] Hi Sandy, Could you share the file “m5out/system.pc.com_1.device” as well? You could also try using vega10_atomic.py instead of vega10_kvm.py. Initially it looks to me like a KVM issue. -Matt From: Matt Sinclair <mattdsinclair.wisc@gmail.com<mailto:mattdsinclair.wisc@gmail.com>> Sent: Tuesday, December 19, 2023 9:28 AM To: The gem5 Users mailing list <gem5-users@gem5.org<mailto:gem5-users@gem5.org>> Cc: 关富润 <448367413@qq.com<mailto:448367413@qq.com>>; Poremba, Matthew <Matthew.Poremba@amd.com<mailto:Matthew.Poremba@amd.com>>; VISHNU RAMADAS <vramadas@wisc.edu<mailto:vramadas@wisc.edu>> Subject: Re: [gem5-users] Fail to run gpu-fs Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding. Hi Sandy, Can you please give us a bit more information about what you were running? It looks like you were just trying to run square from the README? Normally that works out of the box, so I'm wondering if you made any changes to your local setup. (I am not the primary developer for GPUFS, but am trying to help) Thanks, Matt On Tue, Dec 19, 2023 at 5:16 AM 关富润 via gem5-users <gem5-users@gem5.org<mailto:gem5-users@gem5.org>> wrote: Dear all, I've encountered while performing a gpu-fs simulation using the gem5 simulator. Following the instructions outlined in the https://github.com/gem5/gem5-resources/blob/stable/src/gpu-fs/README.md, and using the disk image obtained from https://www.gem5.org/2023/02/13/moving-to-full-system-gpu.html, I executed the following command: build/VEGA_X86/gem5.opt configs/example/gpufs/vega10_kvm.py --disk-image ../gem5-resources/src/gpu-fs/disk-image/rocm42/rocm42-image/rocm42 --kernel ../gem5-resources/src/gpu-fs/vmlinux-5.4.0-105-generic --gpu-mmio-trace ../gem5-resources/src/gpu-fs/vega_mmio.log --app ../gem5-resources/src/gpu/square/bin/square During the execution, I encountered multiple warning messages related to unsupported MSR (Model Specific Register) accesses, followed by a panic related to the Intel 8254 timer. The specific warning and error messages were: build/VEGA_X86/arch/x86/kvm/x86_cpu.cc:1562: warn: kvm-x86: MSR (0xc001011f) unsupported by gem5. Skipping. build/VEGA_X86/arch/x86/kvm/x86_cpu.cc:1562: warn: kvm-x86: MSR (0x1fc) unsupported by gem5. Skipping. build/VEGA_X86/arch/x86/kvm/x86_cpu.cc:1562: warn: kvm-x86: MSR (0x8b) unsupported by gem5. Skipping. build/VEGA_X86/arch/x86/kvm/x86_cpu.cc:1562: warn: kvm-x86: MSR (0xc0010015) unsupported by gem5. Skipping. build/VEGA_X86/arch/x86/kvm/x86_cpu.cc:1562: warn: kvm-x86: MSR (0x4b564d05) unsupported by gem5. Skipping. build/VEGA_X86/dev/x86/pc.cc:117: warn: Don't know what interrupt to clear for console. build/VEGA_X86/dev/intel_8254_timer.cc:215: panic: PIT mode 0x4 is not implemented: Memory Usage: 23051064 KBytes Program aborted at tick 2058120564000 --- BEGIN LIBC BACKTRACE --- build/VEGA_X86/gem5.opt(+0x12471b0)[0x5576147331b0] build/VEGA_X86/gem5.opt(+0x126b9be)[0x5576147579be] /lib/x86_64-linux-gnu/libpthread.so.0(+0x14420)[0x7f634f500420] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb)[0x7f634e6a700b] /lib/x86_64-linux-gnu/libc.so.6(abort+0x12b)[0x7f634e686859] build/VEGA_X86/gem5.opt(+0x4ec3e5)[0x5576139d83e5] build/VEGA_X86/gem5.opt(+0x1b1d38f)[0x55761500938f] build/VEGA_X86/gem5.opt(+0x1cca5fa)[0x5576151b65fa] build/VEGA_X86/gem5.opt(+0x1b03226)[0x557614fef226] build/VEGA_X86/gem5.opt(+0x77b598)[0x557613c67598] build/VEGA_X86/gem5.opt(+0x96b627)[0x557613e57627] build/VEGA_X86/gem5.opt(+0xfcc34b)[0x5576144b834b] build/VEGA_X86/gem5.opt(+0x19d159d)[0x557614ebd59d] build/VEGA_X86/gem5.opt(+0xfcccb3)[0x5576144b8cb3] build/VEGA_X86/gem5.opt(+0xfcb8b1)[0x5576144b78b1] build/VEGA_X86/gem5.opt(+0x125aa22)[0x557614746a22] build/VEGA_X86/gem5.opt(+0x1283534)[0x55761476f534] build/VEGA_X86/gem5.opt(+0x1283b13)[0x55761476fb13] build/VEGA_X86/gem5.opt(+0x665ab2)[0x557613b51ab2] build/VEGA_X86/gem5.opt(+0x4ba777)[0x5576139a6777] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x2a8748)[0x7f634f7b9748] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x8dd8)[0x7f634f58ef48] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0x8fb)[0x7f634f6dbe4b] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyFunction_Vectorcall+0x94)[0x7f634f7b9124] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x74d6d)[0x7f634f585d6d] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x7d86)[0x7f634f58def6] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x8006b)[0x7f634f59106b] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x74d6d)[0x7f634f585d6d] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x7d86)[0x7f634f58def6] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0x8fb)[0x7f634f6dbe4b] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(PyEval_EvalCodeEx+0x42)[0x7f634f6dc1d2] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(PyEval_EvalCode+0x1f)[0x7f634f6dc5bf] --- END LIBC BACKTRACE --- Aborted (core dumped) Additionally, in the m5out/system.pc.com_1.device file, I found multiple error entries related to unchecked MSR access errors. [ 0.334614] unchecked MSR access error: RDMSR from 0x1b0 at rIP: 0xffffffff8107688a (native_read_msr+0xa/0x30) [ 0.337428] Call Trace: [ 0.338158] ? __switch_to_asm+0x34/0x70 [ 0.338535] intel_epb_restore+0x1f/0x80 [ 0.339670] intel_epb_online+0x17/0x40 [ 0.340786] cpuhp_invoke_callback+0x8a/0x580 [ 0.342045] ? __schedule+0x29a/0x720 [ 0.342531] cpuhp_thread_fun+0xb8/0x120 [ 0.343683] smpboot_thread_fn+0xfc/0x170 [ 0.344851] kthread+0x121/0x140 [ 0.345784] ? sort_range+0x30/0x30 [ 0.346531] ? kthread_park+0x90/0x90 [ 0.347606] ret_from_fork+0x22/0x40 [ 0.348640] ENERGY_PERF_BIAS: Set to 'normal', was 'performance' [ 0.350533] unchecked MSR access error: WRMSR to 0x1b0 (tried to write 0x0000000000000006) at rIP: 0xffffffff81076a88 (native_write_msr+0x8/0x30) [ 0.354238] Call Trace: [ 0.354532] intel_epb_restore+0x4d/0x80 [ 0.355674] intel_epb_online+0x17/0x40 [ 0.356785] cpuhp_invoke_callback+0x8a/0x580 [ 0.358046] ? __schedule+0x29a/0x720 [ 0.358531] cpuhp_thread_fun+0xb8/0x120 [ 0.359682] smpboot_thread_fn+0xfc/0x170 [ 0.360847] kthread+0x121/0x140 [ 0.361786] ? sort_range+0x30/0x30 [ 0.362531] ? kthread_park+0x90/0x90 [ 0.363754] ret_from_fork+0x22/0x40 I am unsure how to proceed with resolving these issues. I would greatly appreciate any guidance or advice you can provide on how to address these errors and successfully run the simulation. Thank you for your time and assistance. I look forward to your valuable insights. Best regards, Sandy. _______________________________________________ gem5-users mailing list -- gem5-users@gem5.org<mailto:gem5-users@gem5.org> To unsubscribe send an email to gem5-users-leave@gem5.org<mailto:gem5-users-leave@gem5.org> _______________________________________________ gem5-users mailing list -- gem5-users@gem5.org<mailto:gem5-users@gem5.org> To unsubscribe send an email to gem5-users-leave@gem5.org<mailto:gem5-users-leave@gem5.org>