gem5-users@gem5.org

The gem5 Users mailing list

View all threads

回复: [gem5-dev] GEM5-GCN-DNNMark get Invalid filter channel number when running: dnnmark_test_VGG, dnnmark_test_alexnet, dnnmark_test_fwd_conv

4
429442672
Sat, May 13, 2023 3:34 PM

Thank you so much for advice.

Acturally, i have made the cachefiles as shown in the figure.

Besides, i have succesfully run several benchmarks such as pool, activations, softmax, so i think the kernels is setuped.
The reason of the differences in commands is that i directly run the command in the docker container.

There might be a common problem when running the network with conv, some other email such as
https://www.mail-archive.com/gem5-users@gem5.org/msg20468.html
https://www.mail-archive.com/gem5-users@gem5.org/msg20456.html
also met this problem.

I have also tried to use the latest docker and the original command like this:
docker run --rm -v ${PWD}:${PWD} -v ${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 -w ${PWD} gcr.io/gem5-test/gcn-gpu:v22-1 gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py -n3 --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_conv -cdnnmark_test_fwd_conv --options="-config gem5-resources/src/gpu/DNNMark/config_example/conv_config.dnnmark -mmap gem5-resources/src/gpu/DNNMark/mmap.bin"but still got the same error.Is that means conv layers is currently no available for gem5-gcn?May i ask is there anyone else met this problem before?thank you!

------------------ 原始邮件 ------------------
发件人:                                                                                                                        "Matt Sinclair"                                                                                    <mattdsinclair.wisc@gmail.com>;
发送时间: 2023年5月12日(星期五) 中午11:42
收件人: "The gem5 Developer List"<gem5-dev@gem5.org>;
抄送: "gem5-users"<gem5-users@gem5.org>;"429442672"<429442672@qq.com>;
主题: Re: [gem5-dev] GEM5-GCN-DNNMark get Invalid filter channel number when running: dnnmark_test_VGG, dnnmark_test_alexnet, dnnmark_test_fwd_conv

I have not tried running these specific benchmarks in gem5 personally, so I cannot say for certain what the error is or even if they are expected to run to completion in gem5.  But, normally the error you're seeing happens because you have not created the appropriate "cache" files for the GPU kernel(s) the program is trying to run.  MIOpen first checks to see if the desired kernel has been run on your machine before, and if not it tries to do online compilation of that kernel.  Unfortunately online compilation of kernels in gem5 is a) very slow and b), because it is very slow, not supported in gem5 (basically, it is so slow as to not be worth supporting in many cases, in my opinion).  So, instead, the expectation is that we build the kernels we want ahead of time, before running the program.  You may have seen in the examples we provide for DNNMark (https://resources.gem5.org/resources/dnn-mark) that we have this "generate cachefiles" script -- that is exactly what the purpose of that script is.  Moreover, on the same webpage, you may have noticed we include the path to that cache directory in our docker commands (emphasis mine):
docker run --rm -v ${PWD}:${PWD} -v ${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 -w ${PWD} gcr.io/gem5-test/gcn-gpu:v22-1 gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py -n3 --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_softmax -cdnnmark_test_fwd_softmax --options="-config gem5-resources/src/gpu/DNNMark/config_example/softmax_config.dnnmark -mmap gem5-resources/src/gpu/DNNMark/mmap.bin"
From looking at your commands, I don't see you including this.  Thus, while I do not know if that script by default produces the kernels needed for, say, AlexNet, I strongly suspect you should start by running that script and updating your docker commands to include the cache stuff ... then see what happens from there.

Sidenote: normally when I see this:

MIOpen(HIP): Warning [ParseAndLoadDb] File is unreadable: /opt/rocm-4.0.1/miopen/share/miopen/db/gfx801_4.HIP.fdb.txt

It means that the files MIOpen is expecting are not setup properly.  Normally I just symlink these extra files -- e.g., symlink gfx801_4... from gfx801_32 ... (this is not the best performing, option because the resources are different, but provides a basic setup step to avoid problems like this).

Matt

On Thu, May 11, 2023 at 10:02 PM 429442672 via gem5-dev <gem5-dev@gem5.org> wrote:

Dear Matt,
     When i run the benchmarks: dnnmark_test_VGG, dnnmark_test_alexnet, dnnmark_test_fwd_conv, and i got the same error like this (get Invalid filter channel number):

build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall fdatasync(...)
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall fdatasync(...)
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall fdatasync(...)
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall fdatasync(...)
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
MIOpen(HIP): Warning [ParseAndLoadDb] File is unreadable: /opt/rocm-4.0.1/miopen/share/miopen/db/gfx801_4.HIP.fdb.txt
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
MIOpen Error: /root/driver/MLOpen/src/ocl/convolutionocl.cpp:150: Invalid filter channel number
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
MIOpen Error: /root/driver/MLOpen/src/ocl/convolutionocl.cpp:150: Invalid filter channel number
MIOpen Error: 3 at /home/tang/gem5-resources/src/gpu/DNNMark/core/include/dnn_utility.h1057Ticks: 327510244000
Exiting because  exiting with last active thread context

My command line is:
gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py
-n 8 --mem-size=12GB --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_conv -c dnnmark_test_fwd_conv
--options="-config gem5-resources/src/gpu/DNNMark/config_example/conv_config.dnnmark -mmap gem5-resources/src/gpu/DNNMark/mmap.bin"

and i didn't change the setup of the default setup of conv_config.dnnmark and gfx801

May i ask, did i do something wrong here?
Have you ever test those benchmark without error, and could you please show me several your configurations?

Thank you so much!


gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-leave@gem5.org

Thank you so much for advice. Acturally, i have made the cachefiles as shown in the figure. Besides, i have succesfully run several benchmarks such as pool, activations, softmax, so i think the kernels is setuped. The reason of the differences in commands is that i directly run the command in the docker container. There might be a common problem when running the network with conv, some other email such as https://www.mail-archive.com/gem5-users@gem5.org/msg20468.html https://www.mail-archive.com/gem5-users@gem5.org/msg20456.html also met this problem. I have also tried to use the latest docker and the original command like this: docker run --rm -v ${PWD}:${PWD} -v ${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 -w ${PWD} gcr.io/gem5-test/gcn-gpu:v22-1 gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py -n3 --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_conv -cdnnmark_test_fwd_conv --options="-config gem5-resources/src/gpu/DNNMark/config_example/conv_config.dnnmark -mmap gem5-resources/src/gpu/DNNMark/mmap.bin"but still got the same error.Is that means conv layers is currently no available for gem5-gcn?May i ask is there anyone else met this problem before?thank you! ------------------ 原始邮件 ------------------ 发件人: "Matt Sinclair" <mattdsinclair.wisc@gmail.com&gt;; 发送时间:&nbsp;2023年5月12日(星期五) 中午11:42 收件人:&nbsp;"The gem5 Developer List"<gem5-dev@gem5.org&gt;; 抄送:&nbsp;"gem5-users"<gem5-users@gem5.org&gt;;"429442672"<429442672@qq.com&gt;; 主题:&nbsp;Re: [gem5-dev] GEM5-GCN-DNNMark get Invalid filter channel number when running: dnnmark_test_VGG, dnnmark_test_alexnet, dnnmark_test_fwd_conv I have not tried running these specific benchmarks in gem5 personally, so I cannot say for certain what the error is or even if they are expected to run to completion in gem5.&nbsp; But, normally the error you're seeing happens because you have not created the appropriate "cache" files for the GPU kernel(s) the program is trying to run.&nbsp; MIOpen first checks to see if the desired kernel has been run on your machine before, and if not it tries to do online compilation of that kernel.&nbsp; Unfortunately online compilation of kernels in gem5 is a) very slow and b), because it is very slow, not supported in gem5 (basically, it is so slow as to not be worth supporting in many cases, in my opinion).&nbsp; So, instead, the expectation is that we build the kernels we want ahead of time, before running the program.&nbsp; You may have seen in the examples we provide for DNNMark (https://resources.gem5.org/resources/dnn-mark) that we have this "generate cachefiles" script -- that is exactly what the purpose of that script is.&nbsp; Moreover, on the same webpage, you may have noticed we include the path to that cache directory in our docker commands (emphasis mine): docker run --rm -v ${PWD}:${PWD} -v ${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 -w ${PWD} gcr.io/gem5-test/gcn-gpu:v22-1 gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py -n3 --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_softmax -cdnnmark_test_fwd_softmax --options="-config gem5-resources/src/gpu/DNNMark/config_example/softmax_config.dnnmark -mmap gem5-resources/src/gpu/DNNMark/mmap.bin" From looking at your commands, I don't see you including this.&nbsp; Thus, while I do not know if that script by default produces the kernels needed for, say, AlexNet, I strongly suspect you should start by running that script and updating your docker commands to include the cache stuff ... then see what happens from there. Sidenote: normally when I see this: MIOpen(HIP): Warning [ParseAndLoadDb] File is unreadable: /opt/rocm-4.0.1/miopen/share/miopen/db/gfx801_4.HIP.fdb.txt It means that the files MIOpen is expecting are not setup properly.&nbsp; Normally I just symlink these extra files -- e.g., symlink gfx801_4... from gfx801_32 ... (this is not the best performing, option because the resources are different, but provides a basic setup step to avoid problems like this). Matt On Thu, May 11, 2023 at 10:02 PM 429442672 via gem5-dev <gem5-dev@gem5.org&gt; wrote: Dear Matt, &nbsp; &nbsp; &nbsp;When i run the benchmarks: dnnmark_test_VGG, dnnmark_test_alexnet, dnnmark_test_fwd_conv, and i got the same error like this (get Invalid filter channel number): build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall fdatasync(...) build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall fdatasync(...) build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall fdatasync(...) build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall fdatasync(...) build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 MIOpen(HIP): Warning [ParseAndLoadDb] File is unreadable: /opt/rocm-4.0.1/miopen/share/miopen/db/gfx801_4.HIP.fdb.txt build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 MIOpen Error: /root/driver/MLOpen/src/ocl/convolutionocl.cpp:150: Invalid filter channel number build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 MIOpen Error: /root/driver/MLOpen/src/ocl/convolutionocl.cpp:150: Invalid filter channel number MIOpen Error: 3 at /home/tang/gem5-resources/src/gpu/DNNMark/core/include/dnn_utility.h1057Ticks: 327510244000 Exiting because exiting with last active thread context My command line is: gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py -n 8 --mem-size=12GB --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_conv -c dnnmark_test_fwd_conv --options="-config gem5-resources/src/gpu/DNNMark/config_example/conv_config.dnnmark -mmap gem5-resources/src/gpu/DNNMark/mmap.bin" and i didn't change the setup of the default setup of conv_config.dnnmark and gfx801 May i ask, did i do something wrong here? Have you ever test those benchmark without error, and could you please show me several your configurations? Thank you so much! _______________________________________________ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-leave@gem5.org
MS
Matt Sinclair
Sun, May 14, 2023 8:34 PM

Thanks, this is helpful.  Looking through those old email chains, I don't
see any specific resolution to them, unfortunately.  I do not have a ton of
time to dig into this (end of the semester is keeping me busy) but if you
can keep digging I may be able to provide some ideas.

First, are you still seeing this error:

MIOpen(HIP): Warning [ParseAndLoadDb] File is unreadable:
/opt/rocm-4.0.1/miopen/share/miopen/db/gfx801_4.HIP.fdb.txt

after making the above change to include the MIOpen cache?

Second, what layer size are you assuming/trying to conv?  The failure
shortly after is here:
https://github.com/ROCmSoftwarePlatform/MIOpen/blob/rocm-4.0.1-release/src/ocl/convolutionocl.cpp#L149.
It seems to imply that X and W are not equal, but we'd need to dig to
figure out if this is because of the config file being passed in, or
something in MIOpen/gem5 that is breaking it.  Given that the function call
that is failing is trying to create a tensor:
https://github.com/shidong-ai/DNNMark/blob/develop/core/include/dnn_utility.h#L106,
my guess is that it's something with the config, because something so basic
probably (hopefully?) doesn't fail in MIOpen...

In terms of if it should work or not, I don't see that we included it in
prior papers (e.g.,
https://www.gem5.org/assets/files/papers/enabling2021ispass.pdf) but I
don't know if that was because we didn't try or if there was a deeper, more
fundamental reason).

Matt

On Sat, May 13, 2023 at 10:34 AM 429442672 429442672@qq.com wrote:

Thank you so much for advice.

Acturally, i have made the cachefiles as shown in the figure.

Besides, i have succesfully run several benchmarks such as pool,
activations, softmax, so i think the kernels is setuped.
The reason of the differences in commands is that i directly run the
command in the docker container.

There might be a common problem when running the network with conv, some
other email such as
https://www.mail-archive.com/gem5-users@gem5.org/msg20468.html
https://www.mail-archive.com/gem5-users@gem5.org/msg20456.html
also met this problem.

I have also tried to use the latest docker and the original command like
this:

docker run --rm -v ${PWD}:${PWD} -v ${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 -w ${PWD} gcr.io/gem5-test/gcn-gpu:v22-1 gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py -n3 --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_conv -cdnnmark_test_fwd_conv --options="-config gem5-resources/src/gpu/DNNMark/config_example/conv_config.dnnmark -mmap gem5-resources/src/gpu/DNNMark/mmap.bin"

but still got the same error.

Is that means conv layers is currently no available for gem5-gcn?

May i ask is there anyone else met this problem before?

thank you!

------------------ 原始邮件 ------------------
发件人: "Matt Sinclair" mattdsinclair.wisc@gmail.com;
发送时间: 2023年5月12日(星期五) 中午11:42
收件人: "The gem5 Developer List"gem5-dev@gem5.org;
抄送: "gem5-users"gem5-users@gem5.org;"429442672"429442672@qq.com;
主题: Re: [gem5-dev] GEM5-GCN-DNNMark get Invalid filter channel number
when running: dnnmark_test_VGG, dnnmark_test_alexnet, dnnmark_test_fwd_conv

I have not tried running these specific benchmarks in gem5 personally, so
I cannot say for certain what the error is or even if they are expected to
run to completion in gem5.  But, normally the error you're seeing happens
because you have not created the appropriate "cache" files for the GPU
kernel(s) the program is trying to run.  MIOpen first checks to see if the
desired kernel has been run on your machine before, and if not it tries to
do online compilation of that kernel.  Unfortunately online compilation of
kernels in gem5 is a) very slow and b), because it is very slow, not
supported in gem5 (basically, it is so slow as to not be worth supporting
in many cases, in my opinion).  So, instead, the expectation is that we
build the kernels we want ahead of time, before running the program.  You
may have seen in the examples we provide for DNNMark (
https://resources.gem5.org/resources/dnn-mark) that we have this
"generate cachefiles" script -- that is exactly what the purpose of that
script is.  Moreover, on the same webpage, you may have noticed we include
the path to that cache directory in our docker commands (emphasis mine):

docker run --rm -v ${PWD}:${PWD} -v ${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 -w ${PWD} gcr.io/gem5-test/gcn-gpu:v22-1 gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py -n3 --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_softmax -cdnnmark_test_fwd_softmax --options="-config gem5-resources/src/gpu/DNNMark/config_example/softmax_config.dnnmark -mmap gem5-resources/src/gpu/DNNMark/mmap.bin"

From looking at your commands, I don't see you including this.  Thus,
while I do not know if that script by default produces the kernels needed
for, say, AlexNet, I strongly suspect you should start by running that
script and updating your docker commands to include the cache stuff ...
then see what happens from there.

Sidenote: normally when I see this:

MIOpen(HIP): Warning [ParseAndLoadDb] File is unreadable:
/opt/rocm-4.0.1/miopen/share/miopen/db/gfx801_4.HIP.fdb.txt

It means that the files MIOpen is expecting are not setup properly.
Normally I just symlink these extra files -- e.g., symlink gfx801_4... from
gfx801_32 ... (this is not the best performing, option because the
resources are different, but provides a basic setup step to avoid problems
like this).
Matt

On Thu, May 11, 2023 at 10:02 PM 429442672 via gem5-dev gem5-dev@gem5.org
wrote:

Dear Matt,
When i run the benchmarks: dnnmark_test_VGG, dnnmark_test_alexnet,
dnnmark_test_fwd_conv, and i got the same error like this (get Invalid
filter channel number):

build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall
fdatasync(...)
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall
fdatasync(...)
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall
fdatasync(...)
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall
fdatasync(...)
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
MIOpen(HIP): Warning [ParseAndLoadDb] File is unreadable:
/opt/rocm-4.0.1/miopen/share/miopen/db/gfx801_4.HIP.fdb.txt
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
MIOpen Error: /root/driver/MLOpen/src/ocl/convolutionocl.cpp:150: Invalid
filter channel number
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
MIOpen Error: /root/driver/MLOpen/src/ocl/convolutionocl.cpp:150: Invalid
filter channel number
MIOpen Error: 3 at
/home/tang/gem5-resources/src/gpu/DNNMark/core/include/dnn_utility.h1057Ticks:
327510244000
Exiting because exiting with last active thread context

My command line is:

gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py
-n 8 --mem-size=12GB
--benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_conv
-c dnnmark_test_fwd_conv
--options="-config
gem5-resources/src/gpu/DNNMark/config_example/conv_config.dnnmark -mmap
gem5-resources/src/gpu/DNNMark/mmap.bin"

and i didn't change the setup of the default setup of conv_config.dnnmark
and gfx801

May i ask, did i do something wrong here?
Have you ever test those benchmark without error, and could you please
show me several your configurations?

Thank you so much!


gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-leave@gem5.org

Thanks, this is helpful. Looking through those old email chains, I don't see any specific resolution to them, unfortunately. I do not have a ton of time to dig into this (end of the semester is keeping me busy) but if you can keep digging I may be able to provide some ideas. First, are you still seeing this error: MIOpen(HIP): Warning [ParseAndLoadDb] File is unreadable: /opt/rocm-4.0.1/miopen/share/miopen/db/gfx801_4.HIP.fdb.txt after making the above change to include the MIOpen cache? Second, what layer size are you assuming/trying to conv? The failure shortly after is here: https://github.com/ROCmSoftwarePlatform/MIOpen/blob/rocm-4.0.1-release/src/ocl/convolutionocl.cpp#L149. It seems to imply that X and W are not equal, but we'd need to dig to figure out if this is because of the config file being passed in, or something in MIOpen/gem5 that is breaking it. Given that the function call that is failing is trying to create a tensor: https://github.com/shidong-ai/DNNMark/blob/develop/core/include/dnn_utility.h#L106, my guess is that it's something with the config, because something so basic probably (hopefully?) doesn't fail in MIOpen... In terms of if it should work or not, I don't see that we included it in prior papers (e.g., https://www.gem5.org/assets/files/papers/enabling2021ispass.pdf) but I don't know if that was because we didn't try or if there was a deeper, more fundamental reason). Matt On Sat, May 13, 2023 at 10:34 AM 429442672 <429442672@qq.com> wrote: > Thank you so much for advice. > > Acturally, i have made the cachefiles as shown in the figure. > > Besides, i have succesfully run several benchmarks such as pool, > activations, softmax, so i think the kernels is setuped. > The reason of the differences in commands is that i directly run the > command in the docker container. > > There might be a common problem when running the network with conv, some > other email such as > https://www.mail-archive.com/gem5-users@gem5.org/msg20468.html > https://www.mail-archive.com/gem5-users@gem5.org/msg20456.html > also met this problem. > > I have also tried to use the latest docker and the original command like > this: > > docker run --rm -v ${PWD}:${PWD} -v ${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 -w ${PWD} gcr.io/gem5-test/gcn-gpu:v22-1 gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py -n3 --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_conv -cdnnmark_test_fwd_conv --options="-config gem5-resources/src/gpu/DNNMark/config_example/conv_config.dnnmark -mmap gem5-resources/src/gpu/DNNMark/mmap.bin" > > but still got the same error. > > Is that means conv layers is currently no available for gem5-gcn? > > May i ask is there anyone else met this problem before? > > thank you! > > > ------------------ 原始邮件 ------------------ > *发件人:* "Matt Sinclair" <mattdsinclair.wisc@gmail.com>; > *发送时间:* 2023年5月12日(星期五) 中午11:42 > *收件人:* "The gem5 Developer List"<gem5-dev@gem5.org>; > *抄送:* "gem5-users"<gem5-users@gem5.org>;"429442672"<429442672@qq.com>; > *主题:* Re: [gem5-dev] GEM5-GCN-DNNMark get Invalid filter channel number > when running: dnnmark_test_VGG, dnnmark_test_alexnet, dnnmark_test_fwd_conv > > I have not tried running these specific benchmarks in gem5 personally, so > I cannot say for certain what the error is or even if they are expected to > run to completion in gem5. But, normally the error you're seeing happens > because you have not created the appropriate "cache" files for the GPU > kernel(s) the program is trying to run. MIOpen first checks to see if the > desired kernel has been run on your machine before, and if not it tries to > do online compilation of that kernel. Unfortunately online compilation of > kernels in gem5 is a) very slow and b), because it is very slow, not > supported in gem5 (basically, it is so slow as to not be worth supporting > in many cases, in my opinion). So, instead, the expectation is that we > build the kernels we want ahead of time, before running the program. You > may have seen in the examples we provide for DNNMark ( > https://resources.gem5.org/resources/dnn-mark) that we have this > "generate cachefiles" script -- that is exactly what the purpose of that > script is. Moreover, on the same webpage, you may have noticed we include > the path to that cache directory in our docker commands (emphasis mine): > > docker run --rm -v ${PWD}:${PWD} -v *${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0* -w ${PWD} gcr.io/gem5-test/gcn-gpu:v22-1 gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py -n3 --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_softmax -cdnnmark_test_fwd_softmax --options="-config gem5-resources/src/gpu/DNNMark/config_example/softmax_config.dnnmark -mmap gem5-resources/src/gpu/DNNMark/mmap.bin" > > From looking at your commands, I don't see you including this. Thus, > while I do not know if that script by default produces the kernels needed > for, say, AlexNet, I strongly suspect you should start by running that > script and updating your docker commands to include the cache stuff ... > then see what happens from there. > > Sidenote: normally when I see this: > > MIOpen(HIP): Warning [ParseAndLoadDb] File is unreadable: > /opt/rocm-4.0.1/miopen/share/miopen/db/gfx801_4.HIP.fdb.txt > > It means that the files MIOpen is expecting are not setup properly. > Normally I just symlink these extra files -- e.g., symlink gfx801_4... from > gfx801_32 ... (this is not the best performing, option because the > resources are different, but provides a basic setup step to avoid problems > like this). > Matt > > On Thu, May 11, 2023 at 10:02 PM 429442672 via gem5-dev <gem5-dev@gem5.org> > wrote: > >> Dear Matt, >> When i run the benchmarks: dnnmark_test_VGG, dnnmark_test_alexnet, >> dnnmark_test_fwd_conv, and i got the same error like this (get Invalid >> filter channel number): >> >> build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall >> fdatasync(...) >> build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall >> fdatasync(...) >> build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall >> fdatasync(...) >> build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall >> fdatasync(...) >> build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 >> build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 >> build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 >> build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 >> build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 >> build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 >> build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 >> build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 >> build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 >> build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 >> build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 >> MIOpen(HIP): Warning [ParseAndLoadDb] File is unreadable: >> /opt/rocm-4.0.1/miopen/share/miopen/db/gfx801_4.HIP.fdb.txt >> build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 >> build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 >> MIOpen Error: /root/driver/MLOpen/src/ocl/convolutionocl.cpp:150: Invalid >> filter channel number >> build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 >> build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 >> MIOpen Error: /root/driver/MLOpen/src/ocl/convolutionocl.cpp:150: Invalid >> filter channel number >> MIOpen Error: 3 at >> /home/tang/gem5-resources/src/gpu/DNNMark/core/include/dnn_utility.h1057Ticks: >> 327510244000 >> Exiting because exiting with last active thread context >> >> >> My command line is: >> >> gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py >> -n 8 --mem-size=12GB >> --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_conv >> -c dnnmark_test_fwd_conv >> --options="-config >> gem5-resources/src/gpu/DNNMark/config_example/conv_config.dnnmark -mmap >> gem5-resources/src/gpu/DNNMark/mmap.bin" >> >> and i didn't change the setup of the default setup of conv_config.dnnmark >> and gfx801 >> >> May i ask, did i do something wrong here? >> Have you ever test those benchmark without error, and could you please >> show me several your configurations? >> >> Thank you so much! >> >> >> >> _______________________________________________ >> gem5-dev mailing list -- gem5-dev@gem5.org >> To unsubscribe send an email to gem5-dev-leave@gem5.org >> >
4
429442672
Tue, May 16, 2023 4:05 AM

Thank you,I would appreciate it very much if you could help me to solve this problem.

the command i apply is:
sudo docker run --rm -v ${PWD}:${PWD} -v ${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 -w ${PWD} gcn-gpu23 gem5/build/GCN3_X86/gem5.opt
gem5/configs/example/apu_se.py -n3 --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_conv -cdnnmark_test_fwd_conv
--options="-config gem5-resources/src/gpu/DNNMark/config_example/conv_config.dnnmark -mmap gem5-resources/src/gpu/DNNMark/mmap.bin"

  1. yes, i still see this error.
         the generated cache is placed here:

  2. i used the default setting in DNNMARK:

besides, 

I also run softmax by:
sudo docker run --rm -v ${PWD}:${PWD} -v ${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 -w ${PWD} gcn-gpu23 gem5/build/GCN3_X86/gem5.opt
gem5/configs/example/apu_se.py -n3 --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_softmax -cdnnmark_test_fwd_softmax
--options="-config gem5-resources/src/gpu/DNNMark/config_example/softmax_config.dnnmark -mmap gem5-resources/src/gpu/DNNMark/mmap.bin"

and get

Exiting because  exiting with last active thread context.

the detailed logs of running conv and softmax are attached in this emai.

Thank you!

------------------ 原始邮件 ------------------
发件人:                                                                                                                        "Matt Sinclair"                                                                                    <mattdsinclair.wisc@gmail.com>;
发送时间: 2023年5月15日(星期一) 凌晨4:34
收件人: "429442672"<429442672@qq.com>;
抄送: "gem5-dev"<gem5-dev@gem5.org>;"gem5-users"<gem5-users@gem5.org>;
主题: Re: [gem5-dev] GEM5-GCN-DNNMark get Invalid filter channel number when running: dnnmark_test_VGG, dnnmark_test_alexnet, dnnmark_test_fwd_conv

Thanks, this is helpful.  Looking through those old email chains, I don't see any specific resolution to them, unfortunately.  I do not have a ton of time to dig into this (end of the semester is keeping me busy) but if you can keep digging I may be able to provide some ideas. 

First, are you still seeing this error:

MIOpen(HIP): Warning [ParseAndLoadDb] File is unreadable: /opt/rocm-4.0.1/miopen/share/miopen/db/gfx801_4.HIP.fdb.txt

after making the above change to include the MIOpen cache?

Second, what layer size are you assuming/trying to conv?  The failure shortly after is here: https://github.com/ROCmSoftwarePlatform/MIOpen/blob/rocm-4.0.1-release/src/ocl/convolutionocl.cpp#L149.  It seems to imply that X and W are not equal, but we'd need to dig to figure out if this is because of the config file being passed in, or something in MIOpen/gem5 that is breaking it.  Given that the function call that is failing is trying to create a tensor: https://github.com/shidong-ai/DNNMark/blob/develop/core/include/dnn_utility.h#L106, my guess is that it's something with the config, because something so basic probably (hopefully?) doesn't fail in MIOpen...

In terms of if it should work or not, I don't see that we included it in prior papers (e.g., https://www.gem5.org/assets/files/papers/enabling2021ispass.pdf) but I don't know if that was because we didn't try or if there was a deeper, more fundamental reason).

Matt

On Sat, May 13, 2023 at 10:34 AM 429442672 <429442672@qq.com> wrote:

Thank you so much for advice.

Acturally, i have made the cachefiles as shown in the figure.

Besides, i have succesfully run several benchmarks such as pool, activations, softmax, so i think the kernels is setuped.
The reason of the differences in commands is that i directly run the command in the docker container.

There might be a common problem when running the network with conv, some other email such as
https://www.mail-archive.com/gem5-users@gem5.org/msg20468.html
https://www.mail-archive.com/gem5-users@gem5.org/msg20456.html
also met this problem.

I have also tried to use the latest docker and the original command like this:
docker run --rm -v ${PWD}:${PWD} -v ${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 -w ${PWD} gcr.io/gem5-test/gcn-gpu:v22-1 gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py -n3 --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_conv -cdnnmark_test_fwd_conv --options="-config gem5-resources/src/gpu/DNNMark/config_example/conv_config.dnnmark -mmap gem5-resources/src/gpu/DNNMark/mmap.bin"but still got the same error.Is that means conv layers is currently no available for gem5-gcn?May i ask is there anyone else met this problem before?thank you!

------------------ 原始邮件 ------------------
发件人:                                                                                                                        "Matt Sinclair"                                                                                    <mattdsinclair.wisc@gmail.com>;
发送时间: 2023年5月12日(星期五) 中午11:42
收件人: "The gem5 Developer List"<gem5-dev@gem5.org>;
抄送: "gem5-users"<gem5-users@gem5.org>;"429442672"<429442672@qq.com>;
主题: Re: [gem5-dev] GEM5-GCN-DNNMark get Invalid filter channel number when running: dnnmark_test_VGG, dnnmark_test_alexnet, dnnmark_test_fwd_conv

I have not tried running these specific benchmarks in gem5 personally, so I cannot say for certain what the error is or even if they are expected to run to completion in gem5.  But, normally the error you're seeing happens because you have not created the appropriate "cache" files for the GPU kernel(s) the program is trying to run.  MIOpen first checks to see if the desired kernel has been run on your machine before, and if not it tries to do online compilation of that kernel.  Unfortunately online compilation of kernels in gem5 is a) very slow and b), because it is very slow, not supported in gem5 (basically, it is so slow as to not be worth supporting in many cases, in my opinion).  So, instead, the expectation is that we build the kernels we want ahead of time, before running the program.  You may have seen in the examples we provide for DNNMark (https://resources.gem5.org/resources/dnn-mark) that we have this "generate cachefiles" script -- that is exactly what the purpose of that script is.  Moreover, on the same webpage, you may have noticed we include the path to that cache directory in our docker commands (emphasis mine):
docker run --rm -v ${PWD}:${PWD} -v ${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 -w ${PWD} gcr.io/gem5-test/gcn-gpu:v22-1 gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py -n3 --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_softmax -cdnnmark_test_fwd_softmax --options="-config gem5-resources/src/gpu/DNNMark/config_example/softmax_config.dnnmark -mmap gem5-resources/src/gpu/DNNMark/mmap.bin"
From looking at your commands, I don't see you including this.  Thus, while I do not know if that script by default produces the kernels needed for, say, AlexNet, I strongly suspect you should start by running that script and updating your docker commands to include the cache stuff ... then see what happens from there.

Sidenote: normally when I see this:

MIOpen(HIP): Warning [ParseAndLoadDb] File is unreadable: /opt/rocm-4.0.1/miopen/share/miopen/db/gfx801_4.HIP.fdb.txt

It means that the files MIOpen is expecting are not setup properly.  Normally I just symlink these extra files -- e.g., symlink gfx801_4... from gfx801_32 ... (this is not the best performing, option because the resources are different, but provides a basic setup step to avoid problems like this).

Matt

On Thu, May 11, 2023 at 10:02 PM 429442672 via gem5-dev <gem5-dev@gem5.org> wrote:

Dear Matt,
     When i run the benchmarks: dnnmark_test_VGG, dnnmark_test_alexnet, dnnmark_test_fwd_conv, and i got the same error like this (get Invalid filter channel number):

build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall fdatasync(...)
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall fdatasync(...)
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall fdatasync(...)
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall fdatasync(...)
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
MIOpen(HIP): Warning [ParseAndLoadDb] File is unreadable: /opt/rocm-4.0.1/miopen/share/miopen/db/gfx801_4.HIP.fdb.txt
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
MIOpen Error: /root/driver/MLOpen/src/ocl/convolutionocl.cpp:150: Invalid filter channel number
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
MIOpen Error: /root/driver/MLOpen/src/ocl/convolutionocl.cpp:150: Invalid filter channel number
MIOpen Error: 3 at /home/tang/gem5-resources/src/gpu/DNNMark/core/include/dnn_utility.h1057Ticks: 327510244000
Exiting because  exiting with last active thread context

My command line is:
gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py
-n 8 --mem-size=12GB --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_conv -c dnnmark_test_fwd_conv
--options="-config gem5-resources/src/gpu/DNNMark/config_example/conv_config.dnnmark -mmap gem5-resources/src/gpu/DNNMark/mmap.bin"

and i didn't change the setup of the default setup of conv_config.dnnmark and gfx801

May i ask, did i do something wrong here?
Have you ever test those benchmark without error, and could you please show me several your configurations?

Thank you so much!


gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-leave@gem5.org

Thank you,I would appreciate it very much if you could help me to solve this problem. the command i apply is: sudo docker run --rm -v ${PWD}:${PWD} -v ${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 -w ${PWD} gcn-gpu23 gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py -n3 --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_conv -cdnnmark_test_fwd_conv --options="-config gem5-resources/src/gpu/DNNMark/config_example/conv_config.dnnmark -mmap gem5-resources/src/gpu/DNNMark/mmap.bin" 1. yes, i still see this error. &nbsp; &nbsp; &nbsp;the generated cache is placed here: 2. i used the default setting in DNNMARK: besides,&nbsp; I also run softmax by: sudo docker run --rm -v ${PWD}:${PWD} -v ${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 -w ${PWD} gcn-gpu23 gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py -n3 --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_softmax -cdnnmark_test_fwd_softmax --options="-config gem5-resources/src/gpu/DNNMark/config_example/softmax_config.dnnmark -mmap gem5-resources/src/gpu/DNNMark/mmap.bin" and get Exiting because exiting with last active thread context. the detailed logs of running conv and softmax are attached in this emai. Thank you! ------------------&nbsp;原始邮件&nbsp;------------------ 发件人: "Matt Sinclair" <mattdsinclair.wisc@gmail.com&gt;; 发送时间:&nbsp;2023年5月15日(星期一) 凌晨4:34 收件人:&nbsp;"429442672"<429442672@qq.com&gt;; 抄送:&nbsp;"gem5-dev"<gem5-dev@gem5.org&gt;;"gem5-users"<gem5-users@gem5.org&gt;; 主题:&nbsp;Re: [gem5-dev] GEM5-GCN-DNNMark get Invalid filter channel number when running: dnnmark_test_VGG, dnnmark_test_alexnet, dnnmark_test_fwd_conv Thanks, this is helpful.&nbsp; Looking through those old email chains, I don't see any specific resolution to them, unfortunately.&nbsp; I do not have a ton of time to dig into this (end of the semester is keeping me busy) but if you can keep digging I may be able to provide some ideas.&nbsp; First, are you still seeing this error: MIOpen(HIP): Warning [ParseAndLoadDb] File is unreadable: /opt/rocm-4.0.1/miopen/share/miopen/db/gfx801_4.HIP.fdb.txt after making the above change to include the MIOpen cache? Second, what layer size are you assuming/trying to conv?&nbsp; The failure shortly after is here: https://github.com/ROCmSoftwarePlatform/MIOpen/blob/rocm-4.0.1-release/src/ocl/convolutionocl.cpp#L149.&nbsp; It seems to imply that X and W are not equal, but we'd need to dig to figure out if this is because of the config file being passed in, or something in MIOpen/gem5 that is breaking it.&nbsp; Given that the function call that is failing is trying to create a tensor: https://github.com/shidong-ai/DNNMark/blob/develop/core/include/dnn_utility.h#L106, my guess is that it's something with the config, because something so basic probably (hopefully?) doesn't fail in MIOpen... In terms of if it should work or not, I don't see that we included it in prior papers (e.g., https://www.gem5.org/assets/files/papers/enabling2021ispass.pdf) but I don't know if that was because we didn't try or if there was a deeper, more fundamental reason). Matt On Sat, May 13, 2023 at 10:34 AM 429442672 <429442672@qq.com&gt; wrote: Thank you so much for advice. Acturally, i have made the cachefiles as shown in the figure. Besides, i have succesfully run several benchmarks such as pool, activations, softmax, so i think the kernels is setuped. The reason of the differences in commands is that i directly run the command in the docker container. There might be a common problem when running the network with conv, some other email such as https://www.mail-archive.com/gem5-users@gem5.org/msg20468.html https://www.mail-archive.com/gem5-users@gem5.org/msg20456.html also met this problem. I have also tried to use the latest docker and the original command like this: docker run --rm -v ${PWD}:${PWD} -v ${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 -w ${PWD} gcr.io/gem5-test/gcn-gpu:v22-1 gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py -n3 --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_conv -cdnnmark_test_fwd_conv --options="-config gem5-resources/src/gpu/DNNMark/config_example/conv_config.dnnmark -mmap gem5-resources/src/gpu/DNNMark/mmap.bin"but still got the same error.Is that means conv layers is currently no available for gem5-gcn?May i ask is there anyone else met this problem before?thank you! ------------------ 原始邮件 ------------------ 发件人: "Matt Sinclair" <mattdsinclair.wisc@gmail.com&gt;; 发送时间:&nbsp;2023年5月12日(星期五) 中午11:42 收件人:&nbsp;"The gem5 Developer List"<gem5-dev@gem5.org&gt;; 抄送:&nbsp;"gem5-users"<gem5-users@gem5.org&gt;;"429442672"<429442672@qq.com&gt;; 主题:&nbsp;Re: [gem5-dev] GEM5-GCN-DNNMark get Invalid filter channel number when running: dnnmark_test_VGG, dnnmark_test_alexnet, dnnmark_test_fwd_conv I have not tried running these specific benchmarks in gem5 personally, so I cannot say for certain what the error is or even if they are expected to run to completion in gem5.&nbsp; But, normally the error you're seeing happens because you have not created the appropriate "cache" files for the GPU kernel(s) the program is trying to run.&nbsp; MIOpen first checks to see if the desired kernel has been run on your machine before, and if not it tries to do online compilation of that kernel.&nbsp; Unfortunately online compilation of kernels in gem5 is a) very slow and b), because it is very slow, not supported in gem5 (basically, it is so slow as to not be worth supporting in many cases, in my opinion).&nbsp; So, instead, the expectation is that we build the kernels we want ahead of time, before running the program.&nbsp; You may have seen in the examples we provide for DNNMark (https://resources.gem5.org/resources/dnn-mark) that we have this "generate cachefiles" script -- that is exactly what the purpose of that script is.&nbsp; Moreover, on the same webpage, you may have noticed we include the path to that cache directory in our docker commands (emphasis mine): docker run --rm -v ${PWD}:${PWD} -v ${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 -w ${PWD} gcr.io/gem5-test/gcn-gpu:v22-1 gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py -n3 --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_softmax -cdnnmark_test_fwd_softmax --options="-config gem5-resources/src/gpu/DNNMark/config_example/softmax_config.dnnmark -mmap gem5-resources/src/gpu/DNNMark/mmap.bin" From looking at your commands, I don't see you including this.&nbsp; Thus, while I do not know if that script by default produces the kernels needed for, say, AlexNet, I strongly suspect you should start by running that script and updating your docker commands to include the cache stuff ... then see what happens from there. Sidenote: normally when I see this: MIOpen(HIP): Warning [ParseAndLoadDb] File is unreadable: /opt/rocm-4.0.1/miopen/share/miopen/db/gfx801_4.HIP.fdb.txt It means that the files MIOpen is expecting are not setup properly.&nbsp; Normally I just symlink these extra files -- e.g., symlink gfx801_4... from gfx801_32 ... (this is not the best performing, option because the resources are different, but provides a basic setup step to avoid problems like this). Matt On Thu, May 11, 2023 at 10:02 PM 429442672 via gem5-dev <gem5-dev@gem5.org&gt; wrote: Dear Matt, &nbsp; &nbsp; &nbsp;When i run the benchmarks: dnnmark_test_VGG, dnnmark_test_alexnet, dnnmark_test_fwd_conv, and i got the same error like this (get Invalid filter channel number): build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall fdatasync(...) build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall fdatasync(...) build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall fdatasync(...) build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall fdatasync(...) build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 MIOpen(HIP): Warning [ParseAndLoadDb] File is unreadable: /opt/rocm-4.0.1/miopen/share/miopen/db/gfx801_4.HIP.fdb.txt build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 MIOpen Error: /root/driver/MLOpen/src/ocl/convolutionocl.cpp:150: Invalid filter channel number build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 MIOpen Error: /root/driver/MLOpen/src/ocl/convolutionocl.cpp:150: Invalid filter channel number MIOpen Error: 3 at /home/tang/gem5-resources/src/gpu/DNNMark/core/include/dnn_utility.h1057Ticks: 327510244000 Exiting because exiting with last active thread context My command line is: gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py -n 8 --mem-size=12GB --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_conv -c dnnmark_test_fwd_conv --options="-config gem5-resources/src/gpu/DNNMark/config_example/conv_config.dnnmark -mmap gem5-resources/src/gpu/DNNMark/mmap.bin" and i didn't change the setup of the default setup of conv_config.dnnmark and gfx801 May i ask, did i do something wrong here? Have you ever test those benchmark without error, and could you please show me several your configurations? Thank you so much! _______________________________________________ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-leave@gem5.org
MS
Matt Sinclair
Tue, May 16, 2023 4:46 AM

It is a bit odd that the default batch size is 126, which is not a power of
2.  If you make it 4, 8, or 16 what happens?

If this doesn't resolve we would need to determine which of NCHW is being
set to X in the MIOpen code to determine if it's a config problem, MIOpen
problem, or a gem5 problem (i.e., if X is H then both H and W are 256 in
the config file).

Matt

On Mon, May 15, 2023 at 11:05 PM 429442672 429442672@qq.com wrote:

Thank you,I would appreciate it very much if you could help me to solve
this problem.

the command i apply is:
sudo docker run --rm -v ${PWD}:${PWD} -v
${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0
-w ${PWD} gcn-gpu23 gem5/build/GCN3_X86/gem5.opt
gem5/configs/example/apu_se.py -n3
--benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_conv
-cdnnmark_test_fwd_conv
--options="-config
gem5-resources/src/gpu/DNNMark/config_example/conv_config.dnnmark -mmap
gem5-resources/src/gpu/DNNMark/mmap.bin"

  1. yes, i still see this error.
    the generated cache is placed here:

  2. i used the default setting in DNNMARK:

besides,

I also run softmax by:
sudo docker run --rm -v ${PWD}:${PWD} -v
${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0
-w ${PWD} gcn-gpu23 gem5/build/GCN3_X86/gem5.opt
gem5/configs/example/apu_se.py -n3
--benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_softmax
-cdnnmark_test_fwd_softmax
--options="-config
gem5-resources/src/gpu/DNNMark/config_example/softmax_config.dnnmark -mmap
gem5-resources/src/gpu/DNNMark/mmap.bin"

and get

Exiting because exiting with last active thread context.

the detailed logs of running conv and softmax are attached in this emai.

Thank you!

------------------ 原始邮件 ------------------
发件人: "Matt Sinclair" mattdsinclair.wisc@gmail.com;
发送时间: 2023年5月15日(星期一) 凌晨4:34
收件人: "429442672"429442672@qq.com;
抄送: "gem5-dev"gem5-dev@gem5.org;"gem5-users"gem5-users@gem5.org;
主题: Re: [gem5-dev] GEM5-GCN-DNNMark get Invalid filter channel number
when running: dnnmark_test_VGG, dnnmark_test_alexnet, dnnmark_test_fwd_conv

Thanks, this is helpful.  Looking through those old email chains, I don't
see any specific resolution to them, unfortunately.  I do not have a ton of
time to dig into this (end of the semester is keeping me busy) but if you
can keep digging I may be able to provide some ideas.

First, are you still seeing this error:

MIOpen(HIP): Warning [ParseAndLoadDb] File is unreadable:
/opt/rocm-4.0.1/miopen/share/miopen/db/gfx801_4.HIP.fdb.txt

after making the above change to include the MIOpen cache?

Second, what layer size are you assuming/trying to conv?  The failure
shortly after is here:
https://github.com/ROCmSoftwarePlatform/MIOpen/blob/rocm-4.0.1-release/src/ocl/convolutionocl.cpp#L149.
It seems to imply that X and W are not equal, but we'd need to dig to
figure out if this is because of the config file being passed in, or
something in MIOpen/gem5 that is breaking it.  Given that the function call
that is failing is trying to create a tensor:
https://github.com/shidong-ai/DNNMark/blob/develop/core/include/dnn_utility.h#L106,
my guess is that it's something with the config, because something so basic
probably (hopefully?) doesn't fail in MIOpen...

In terms of if it should work or not, I don't see that we included it in
prior papers (e.g.,
https://www.gem5.org/assets/files/papers/enabling2021ispass.pdf) but I
don't know if that was because we didn't try or if there was a deeper, more
fundamental reason).

Matt

On Sat, May 13, 2023 at 10:34 AM 429442672 429442672@qq.com wrote:

Thank you so much for advice.

Acturally, i have made the cachefiles as shown in the figure.

Besides, i have succesfully run several benchmarks such as pool,
activations, softmax, so i think the kernels is setuped.
The reason of the differences in commands is that i directly run the
command in the docker container.

There might be a common problem when running the network with conv, some
other email such as
https://www.mail-archive.com/gem5-users@gem5.org/msg20468.html
https://www.mail-archive.com/gem5-users@gem5.org/msg20456.html
also met this problem.

I have also tried to use the latest docker and the original command like
this:

docker run --rm -v ${PWD}:${PWD} -v ${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 -w ${PWD} gcr.io/gem5-test/gcn-gpu:v22-1 gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py -n3 --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_conv -cdnnmark_test_fwd_conv --options="-config gem5-resources/src/gpu/DNNMark/config_example/conv_config.dnnmark -mmap gem5-resources/src/gpu/DNNMark/mmap.bin"

but still got the same error.

Is that means conv layers is currently no available for gem5-gcn?

May i ask is there anyone else met this problem before?

thank you!

------------------ 原始邮件 ------------------
发件人: "Matt Sinclair" mattdsinclair.wisc@gmail.com;
发送时间: 2023年5月12日(星期五) 中午11:42
收件人: "The gem5 Developer List"gem5-dev@gem5.org;
抄送: "gem5-users"gem5-users@gem5.org;"429442672"429442672@qq.com;
主题: Re: [gem5-dev] GEM5-GCN-DNNMark get Invalid filter channel number
when running: dnnmark_test_VGG, dnnmark_test_alexnet, dnnmark_test_fwd_conv

I have not tried running these specific benchmarks in gem5 personally, so
I cannot say for certain what the error is or even if they are expected to
run to completion in gem5.  But, normally the error you're seeing happens
because you have not created the appropriate "cache" files for the GPU
kernel(s) the program is trying to run.  MIOpen first checks to see if the
desired kernel has been run on your machine before, and if not it tries to
do online compilation of that kernel.  Unfortunately online compilation of
kernels in gem5 is a) very slow and b), because it is very slow, not
supported in gem5 (basically, it is so slow as to not be worth supporting
in many cases, in my opinion).  So, instead, the expectation is that we
build the kernels we want ahead of time, before running the program.  You
may have seen in the examples we provide for DNNMark (
https://resources.gem5.org/resources/dnn-mark) that we have this
"generate cachefiles" script -- that is exactly what the purpose of that
script is.  Moreover, on the same webpage, you may have noticed we include
the path to that cache directory in our docker commands (emphasis mine):

docker run --rm -v ${PWD}:${PWD} -v ${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 -w ${PWD} gcr.io/gem5-test/gcn-gpu:v22-1 gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py -n3 --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_softmax -cdnnmark_test_fwd_softmax --options="-config gem5-resources/src/gpu/DNNMark/config_example/softmax_config.dnnmark -mmap gem5-resources/src/gpu/DNNMark/mmap.bin"

From looking at your commands, I don't see you including this.  Thus,
while I do not know if that script by default produces the kernels needed
for, say, AlexNet, I strongly suspect you should start by running that
script and updating your docker commands to include the cache stuff ...
then see what happens from there.

Sidenote: normally when I see this:

MIOpen(HIP): Warning [ParseAndLoadDb] File is unreadable:
/opt/rocm-4.0.1/miopen/share/miopen/db/gfx801_4.HIP.fdb.txt

It means that the files MIOpen is expecting are not setup properly.
Normally I just symlink these extra files -- e.g., symlink gfx801_4... from
gfx801_32 ... (this is not the best performing, option because the
resources are different, but provides a basic setup step to avoid problems
like this).
Matt

On Thu, May 11, 2023 at 10:02 PM 429442672 via gem5-dev <
gem5-dev@gem5.org> wrote:

Dear Matt,
When i run the benchmarks: dnnmark_test_VGG, dnnmark_test_alexnet,
dnnmark_test_fwd_conv, and i got the same error like this (get Invalid
filter channel number):

build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall
fdatasync(...)
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall
fdatasync(...)
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall
fdatasync(...)
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall
fdatasync(...)
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command
6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command
6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command
6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command
6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command
6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command
6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command
6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command
6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command
6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command
6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command
6
MIOpen(HIP): Warning [ParseAndLoadDb] File is unreadable:
/opt/rocm-4.0.1/miopen/share/miopen/db/gfx801_4.HIP.fdb.txt
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command
6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command
6
MIOpen Error: /root/driver/MLOpen/src/ocl/convolutionocl.cpp:150:
Invalid filter channel number
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command
6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command
6
MIOpen Error: /root/driver/MLOpen/src/ocl/convolutionocl.cpp:150:
Invalid filter channel number
MIOpen Error: 3 at
/home/tang/gem5-resources/src/gpu/DNNMark/core/include/dnn_utility.h1057Ticks:
327510244000
Exiting because exiting with last active thread context

My command line is:

gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py
-n 8 --mem-size=12GB
--benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_conv
-c dnnmark_test_fwd_conv
--options="-config
gem5-resources/src/gpu/DNNMark/config_example/conv_config.dnnmark -mmap
gem5-resources/src/gpu/DNNMark/mmap.bin"

and i didn't change the setup of the default setup of
conv_config.dnnmark and gfx801

May i ask, did i do something wrong here?
Have you ever test those benchmark without error, and could you please
show me several your configurations?

Thank you so much!


gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-leave@gem5.org

It is a bit odd that the default batch size is 126, which is not a power of 2. If you make it 4, 8, or 16 what happens? If this doesn't resolve we would need to determine which of NCHW is being set to X in the MIOpen code to determine if it's a config problem, MIOpen problem, or a gem5 problem (i.e., if X is H then both H and W are 256 in the config file). Matt On Mon, May 15, 2023 at 11:05 PM 429442672 <429442672@qq.com> wrote: > Thank you,I would appreciate it very much if you could help me to solve > this problem. > > the command i apply is: > sudo docker run --rm -v ${PWD}:${PWD} -v > ${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 > -w ${PWD} gcn-gpu23 gem5/build/GCN3_X86/gem5.opt > gem5/configs/example/apu_se.py -n3 > --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_conv > -cdnnmark_test_fwd_conv > --options="-config > gem5-resources/src/gpu/DNNMark/config_example/conv_config.dnnmark -mmap > gem5-resources/src/gpu/DNNMark/mmap.bin" > > 1. yes, i still see this error. > the generated cache is placed here: > > > 2. i used the default setting in DNNMARK: > > > > besides, > > I also run softmax by: > sudo docker run --rm -v ${PWD}:${PWD} -v > ${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 > -w ${PWD} gcn-gpu23 gem5/build/GCN3_X86/gem5.opt > gem5/configs/example/apu_se.py -n3 > --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_softmax > -cdnnmark_test_fwd_softmax > --options="-config > gem5-resources/src/gpu/DNNMark/config_example/softmax_config.dnnmark -mmap > gem5-resources/src/gpu/DNNMark/mmap.bin" > > and get > > Exiting because exiting with last active thread context. > > the detailed logs of running conv and softmax are attached in this emai. > > Thank you! > > ------------------ 原始邮件 ------------------ > *发件人:* "Matt Sinclair" <mattdsinclair.wisc@gmail.com>; > *发送时间:* 2023年5月15日(星期一) 凌晨4:34 > *收件人:* "429442672"<429442672@qq.com>; > *抄送:* "gem5-dev"<gem5-dev@gem5.org>;"gem5-users"<gem5-users@gem5.org>; > *主题:* Re: [gem5-dev] GEM5-GCN-DNNMark get Invalid filter channel number > when running: dnnmark_test_VGG, dnnmark_test_alexnet, dnnmark_test_fwd_conv > > Thanks, this is helpful. Looking through those old email chains, I don't > see any specific resolution to them, unfortunately. I do not have a ton of > time to dig into this (end of the semester is keeping me busy) but if you > can keep digging I may be able to provide some ideas. > > First, are you still seeing this error: > > MIOpen(HIP): Warning [ParseAndLoadDb] File is unreadable: > /opt/rocm-4.0.1/miopen/share/miopen/db/gfx801_4.HIP.fdb.txt > > after making the above change to include the MIOpen cache? > > Second, what layer size are you assuming/trying to conv? The failure > shortly after is here: > https://github.com/ROCmSoftwarePlatform/MIOpen/blob/rocm-4.0.1-release/src/ocl/convolutionocl.cpp#L149. > It seems to imply that X and W are not equal, but we'd need to dig to > figure out if this is because of the config file being passed in, or > something in MIOpen/gem5 that is breaking it. Given that the function call > that is failing is trying to create a tensor: > https://github.com/shidong-ai/DNNMark/blob/develop/core/include/dnn_utility.h#L106, > my guess is that it's something with the config, because something so basic > probably (hopefully?) doesn't fail in MIOpen... > > In terms of if it should work or not, I don't see that we included it in > prior papers (e.g., > https://www.gem5.org/assets/files/papers/enabling2021ispass.pdf) but I > don't know if that was because we didn't try or if there was a deeper, more > fundamental reason). > > Matt > > On Sat, May 13, 2023 at 10:34 AM 429442672 <429442672@qq.com> wrote: > >> Thank you so much for advice. >> >> Acturally, i have made the cachefiles as shown in the figure. >> >> Besides, i have succesfully run several benchmarks such as pool, >> activations, softmax, so i think the kernels is setuped. >> The reason of the differences in commands is that i directly run the >> command in the docker container. >> >> There might be a common problem when running the network with conv, some >> other email such as >> https://www.mail-archive.com/gem5-users@gem5.org/msg20468.html >> https://www.mail-archive.com/gem5-users@gem5.org/msg20456.html >> also met this problem. >> >> I have also tried to use the latest docker and the original command like >> this: >> >> docker run --rm -v ${PWD}:${PWD} -v ${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 -w ${PWD} gcr.io/gem5-test/gcn-gpu:v22-1 gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py -n3 --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_conv -cdnnmark_test_fwd_conv --options="-config gem5-resources/src/gpu/DNNMark/config_example/conv_config.dnnmark -mmap gem5-resources/src/gpu/DNNMark/mmap.bin" >> >> but still got the same error. >> >> Is that means conv layers is currently no available for gem5-gcn? >> >> May i ask is there anyone else met this problem before? >> >> thank you! >> >> >> ------------------ 原始邮件 ------------------ >> *发件人:* "Matt Sinclair" <mattdsinclair.wisc@gmail.com>; >> *发送时间:* 2023年5月12日(星期五) 中午11:42 >> *收件人:* "The gem5 Developer List"<gem5-dev@gem5.org>; >> *抄送:* "gem5-users"<gem5-users@gem5.org>;"429442672"<429442672@qq.com>; >> *主题:* Re: [gem5-dev] GEM5-GCN-DNNMark get Invalid filter channel number >> when running: dnnmark_test_VGG, dnnmark_test_alexnet, dnnmark_test_fwd_conv >> >> I have not tried running these specific benchmarks in gem5 personally, so >> I cannot say for certain what the error is or even if they are expected to >> run to completion in gem5. But, normally the error you're seeing happens >> because you have not created the appropriate "cache" files for the GPU >> kernel(s) the program is trying to run. MIOpen first checks to see if the >> desired kernel has been run on your machine before, and if not it tries to >> do online compilation of that kernel. Unfortunately online compilation of >> kernels in gem5 is a) very slow and b), because it is very slow, not >> supported in gem5 (basically, it is so slow as to not be worth supporting >> in many cases, in my opinion). So, instead, the expectation is that we >> build the kernels we want ahead of time, before running the program. You >> may have seen in the examples we provide for DNNMark ( >> https://resources.gem5.org/resources/dnn-mark) that we have this >> "generate cachefiles" script -- that is exactly what the purpose of that >> script is. Moreover, on the same webpage, you may have noticed we include >> the path to that cache directory in our docker commands (emphasis mine): >> >> docker run --rm -v ${PWD}:${PWD} -v *${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0* -w ${PWD} gcr.io/gem5-test/gcn-gpu:v22-1 gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py -n3 --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_softmax -cdnnmark_test_fwd_softmax --options="-config gem5-resources/src/gpu/DNNMark/config_example/softmax_config.dnnmark -mmap gem5-resources/src/gpu/DNNMark/mmap.bin" >> >> From looking at your commands, I don't see you including this. Thus, >> while I do not know if that script by default produces the kernels needed >> for, say, AlexNet, I strongly suspect you should start by running that >> script and updating your docker commands to include the cache stuff ... >> then see what happens from there. >> >> Sidenote: normally when I see this: >> >> MIOpen(HIP): Warning [ParseAndLoadDb] File is unreadable: >> /opt/rocm-4.0.1/miopen/share/miopen/db/gfx801_4.HIP.fdb.txt >> >> It means that the files MIOpen is expecting are not setup properly. >> Normally I just symlink these extra files -- e.g., symlink gfx801_4... from >> gfx801_32 ... (this is not the best performing, option because the >> resources are different, but provides a basic setup step to avoid problems >> like this). >> Matt >> >> On Thu, May 11, 2023 at 10:02 PM 429442672 via gem5-dev < >> gem5-dev@gem5.org> wrote: >> >>> Dear Matt, >>> When i run the benchmarks: dnnmark_test_VGG, dnnmark_test_alexnet, >>> dnnmark_test_fwd_conv, and i got the same error like this (get Invalid >>> filter channel number): >>> >>> build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall >>> fdatasync(...) >>> build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall >>> fdatasync(...) >>> build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall >>> fdatasync(...) >>> build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall >>> fdatasync(...) >>> build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command >>> 6 >>> build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command >>> 6 >>> build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command >>> 6 >>> build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command >>> 6 >>> build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command >>> 6 >>> build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command >>> 6 >>> build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command >>> 6 >>> build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command >>> 6 >>> build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command >>> 6 >>> build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command >>> 6 >>> build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command >>> 6 >>> MIOpen(HIP): Warning [ParseAndLoadDb] File is unreadable: >>> /opt/rocm-4.0.1/miopen/share/miopen/db/gfx801_4.HIP.fdb.txt >>> build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command >>> 6 >>> build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command >>> 6 >>> MIOpen Error: /root/driver/MLOpen/src/ocl/convolutionocl.cpp:150: >>> Invalid filter channel number >>> build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command >>> 6 >>> build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command >>> 6 >>> MIOpen Error: /root/driver/MLOpen/src/ocl/convolutionocl.cpp:150: >>> Invalid filter channel number >>> MIOpen Error: 3 at >>> /home/tang/gem5-resources/src/gpu/DNNMark/core/include/dnn_utility.h1057Ticks: >>> 327510244000 >>> Exiting because exiting with last active thread context >>> >>> >>> My command line is: >>> >>> gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py >>> -n 8 --mem-size=12GB >>> --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_conv >>> -c dnnmark_test_fwd_conv >>> --options="-config >>> gem5-resources/src/gpu/DNNMark/config_example/conv_config.dnnmark -mmap >>> gem5-resources/src/gpu/DNNMark/mmap.bin" >>> >>> and i didn't change the setup of the default setup of >>> conv_config.dnnmark and gfx801 >>> >>> May i ask, did i do something wrong here? >>> Have you ever test those benchmark without error, and could you please >>> show me several your configurations? >>> >>> Thank you so much! >>> >>> >>> >>> _______________________________________________ >>> gem5-dev mailing list -- gem5-dev@gem5.org >>> To unsubscribe send an email to gem5-dev-leave@gem5.org >>> >>
4
429442672
Tue, May 16, 2023 6:58 AM

unfortunately, i tried to change n to 2,4,8,16, and get the same error.

May i ask what X means?

------------------ 原始邮件 ------------------
发件人:                                                                                                                        "Matt Sinclair"                                                                                    <mattdsinclair.wisc@gmail.com>;
发送时间: 2023年5月16日(星期二) 中午12:46
收件人: "429442672"<429442672@qq.com>;
抄送: "gem5-dev"<gem5-dev@gem5.org>;"gem5-users"<gem5-users@gem5.org>;
主题: Re: [gem5-dev] GEM5-GCN-DNNMark get Invalid filter channel number when running: dnnmark_test_VGG, dnnmark_test_alexnet, dnnmark_test_fwd_conv

It is a bit odd that the default batch size is 126, which is not a power of 2.  If you make it 4, 8, or 16 what happens?

If this doesn't resolve we would need to determine which of NCHW is being set to X in the MIOpen code to determine if it's a config problem, MIOpen problem, or a gem5 problem (i.e., if X is H then both H and W are 256 in the config file).

Matt

On Mon, May 15, 2023 at 11:05 PM 429442672 <429442672@qq.com> wrote:

Thank you,I would appreciate it very much if you could help me to solve this problem.

the command i apply is:
sudo docker run --rm -v ${PWD}:${PWD} -v ${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 -w ${PWD} gcn-gpu23 gem5/build/GCN3_X86/gem5.opt
gem5/configs/example/apu_se.py -n3 --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_conv -cdnnmark_test_fwd_conv
--options="-config gem5-resources/src/gpu/DNNMark/config_example/conv_config.dnnmark -mmap gem5-resources/src/gpu/DNNMark/mmap.bin"

  1. yes, i still see this error.
         the generated cache is placed here:

  2. i used the default setting in DNNMARK:

besides, 

I also run softmax by:
sudo docker run --rm -v ${PWD}:${PWD} -v ${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 -w ${PWD} gcn-gpu23 gem5/build/GCN3_X86/gem5.opt
gem5/configs/example/apu_se.py -n3 --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_softmax -cdnnmark_test_fwd_softmax
--options="-config gem5-resources/src/gpu/DNNMark/config_example/softmax_config.dnnmark -mmap gem5-resources/src/gpu/DNNMark/mmap.bin"

and get

Exiting because  exiting with last active thread context.

the detailed logs of running conv and softmax are attached in this emai.

Thank you!

------------------ 原始邮件 ------------------
发件人:                                                                                                                        "Matt Sinclair"                                                                                    <mattdsinclair.wisc@gmail.com>;
发送时间: 2023年5月15日(星期一) 凌晨4:34
收件人: "429442672"<429442672@qq.com>;
抄送: "gem5-dev"<gem5-dev@gem5.org>;"gem5-users"<gem5-users@gem5.org>;
主题: Re: [gem5-dev] GEM5-GCN-DNNMark get Invalid filter channel number when running: dnnmark_test_VGG, dnnmark_test_alexnet, dnnmark_test_fwd_conv

Thanks, this is helpful.  Looking through those old email chains, I don't see any specific resolution to them, unfortunately.  I do not have a ton of time to dig into this (end of the semester is keeping me busy) but if you can keep digging I may be able to provide some ideas. 

First, are you still seeing this error:

MIOpen(HIP): Warning [ParseAndLoadDb] File is unreadable: /opt/rocm-4.0.1/miopen/share/miopen/db/gfx801_4.HIP.fdb.txt

after making the above change to include the MIOpen cache?

Second, what layer size are you assuming/trying to conv?  The failure shortly after is here: https://github.com/ROCmSoftwarePlatform/MIOpen/blob/rocm-4.0.1-release/src/ocl/convolutionocl.cpp#L149.  It seems to imply that X and W are not equal, but we'd need to dig to figure out if this is because of the config file being passed in, or something in MIOpen/gem5 that is breaking it.  Given that the function call that is failing is trying to create a tensor: https://github.com/shidong-ai/DNNMark/blob/develop/core/include/dnn_utility.h#L106, my guess is that it's something with the config, because something so basic probably (hopefully?) doesn't fail in MIOpen...

In terms of if it should work or not, I don't see that we included it in prior papers (e.g., https://www.gem5.org/assets/files/papers/enabling2021ispass.pdf) but I don't know if that was because we didn't try or if there was a deeper, more fundamental reason).

Matt

On Sat, May 13, 2023 at 10:34 AM 429442672 <429442672@qq.com> wrote:

Thank you so much for advice.

Acturally, i have made the cachefiles as shown in the figure.

Besides, i have succesfully run several benchmarks such as pool, activations, softmax, so i think the kernels is setuped.
The reason of the differences in commands is that i directly run the command in the docker container.

There might be a common problem when running the network with conv, some other email such as
https://www.mail-archive.com/gem5-users@gem5.org/msg20468.html
https://www.mail-archive.com/gem5-users@gem5.org/msg20456.html
also met this problem.

I have also tried to use the latest docker and the original command like this:
docker run --rm -v ${PWD}:${PWD} -v ${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 -w ${PWD} gcr.io/gem5-test/gcn-gpu:v22-1 gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py -n3 --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_conv -cdnnmark_test_fwd_conv --options="-config gem5-resources/src/gpu/DNNMark/config_example/conv_config.dnnmark -mmap gem5-resources/src/gpu/DNNMark/mmap.bin"but still got the same error.Is that means conv layers is currently no available for gem5-gcn?May i ask is there anyone else met this problem before?thank you!

------------------ 原始邮件 ------------------
发件人:                                                                                                                        "Matt Sinclair"                                                                                    <mattdsinclair.wisc@gmail.com>;
发送时间: 2023年5月12日(星期五) 中午11:42
收件人: "The gem5 Developer List"<gem5-dev@gem5.org>;
抄送: "gem5-users"<gem5-users@gem5.org>;"429442672"<429442672@qq.com>;
主题: Re: [gem5-dev] GEM5-GCN-DNNMark get Invalid filter channel number when running: dnnmark_test_VGG, dnnmark_test_alexnet, dnnmark_test_fwd_conv

I have not tried running these specific benchmarks in gem5 personally, so I cannot say for certain what the error is or even if they are expected to run to completion in gem5.  But, normally the error you're seeing happens because you have not created the appropriate "cache" files for the GPU kernel(s) the program is trying to run.  MIOpen first checks to see if the desired kernel has been run on your machine before, and if not it tries to do online compilation of that kernel.  Unfortunately online compilation of kernels in gem5 is a) very slow and b), because it is very slow, not supported in gem5 (basically, it is so slow as to not be worth supporting in many cases, in my opinion).  So, instead, the expectation is that we build the kernels we want ahead of time, before running the program.  You may have seen in the examples we provide for DNNMark (https://resources.gem5.org/resources/dnn-mark) that we have this "generate cachefiles" script -- that is exactly what the purpose of that script is.  Moreover, on the same webpage, you may have noticed we include the path to that cache directory in our docker commands (emphasis mine):
docker run --rm -v ${PWD}:${PWD} -v ${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 -w ${PWD} gcr.io/gem5-test/gcn-gpu:v22-1 gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py -n3 --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_softmax -cdnnmark_test_fwd_softmax --options="-config gem5-resources/src/gpu/DNNMark/config_example/softmax_config.dnnmark -mmap gem5-resources/src/gpu/DNNMark/mmap.bin"
From looking at your commands, I don't see you including this.  Thus, while I do not know if that script by default produces the kernels needed for, say, AlexNet, I strongly suspect you should start by running that script and updating your docker commands to include the cache stuff ... then see what happens from there.

Sidenote: normally when I see this:

MIOpen(HIP): Warning [ParseAndLoadDb] File is unreadable: /opt/rocm-4.0.1/miopen/share/miopen/db/gfx801_4.HIP.fdb.txt

It means that the files MIOpen is expecting are not setup properly.  Normally I just symlink these extra files -- e.g., symlink gfx801_4... from gfx801_32 ... (this is not the best performing, option because the resources are different, but provides a basic setup step to avoid problems like this).

Matt

On Thu, May 11, 2023 at 10:02 PM 429442672 via gem5-dev <gem5-dev@gem5.org> wrote:

Dear Matt,
     When i run the benchmarks: dnnmark_test_VGG, dnnmark_test_alexnet, dnnmark_test_fwd_conv, and i got the same error like this (get Invalid filter channel number):

build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall fdatasync(...)
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall fdatasync(...)
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall fdatasync(...)
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall fdatasync(...)
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
MIOpen(HIP): Warning [ParseAndLoadDb] File is unreadable: /opt/rocm-4.0.1/miopen/share/miopen/db/gfx801_4.HIP.fdb.txt
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
MIOpen Error: /root/driver/MLOpen/src/ocl/convolutionocl.cpp:150: Invalid filter channel number
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
MIOpen Error: /root/driver/MLOpen/src/ocl/convolutionocl.cpp:150: Invalid filter channel number
MIOpen Error: 3 at /home/tang/gem5-resources/src/gpu/DNNMark/core/include/dnn_utility.h1057Ticks: 327510244000
Exiting because  exiting with last active thread context

My command line is:
gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py
-n 8 --mem-size=12GB --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_conv -c dnnmark_test_fwd_conv
--options="-config gem5-resources/src/gpu/DNNMark/config_example/conv_config.dnnmark -mmap gem5-resources/src/gpu/DNNMark/mmap.bin"

and i didn't change the setup of the default setup of conv_config.dnnmark and gfx801

May i ask, did i do something wrong here?
Have you ever test those benchmark without error, and could you please show me several your configurations?

Thank you so much!


gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-leave@gem5.org

unfortunately, i tried to change n to 2,4,8,16, and get the same error. May i ask what X means? ------------------&nbsp;原始邮件&nbsp;------------------ 发件人: "Matt Sinclair" <mattdsinclair.wisc@gmail.com&gt;; 发送时间:&nbsp;2023年5月16日(星期二) 中午12:46 收件人:&nbsp;"429442672"<429442672@qq.com&gt;; 抄送:&nbsp;"gem5-dev"<gem5-dev@gem5.org&gt;;"gem5-users"<gem5-users@gem5.org&gt;; 主题:&nbsp;Re: [gem5-dev] GEM5-GCN-DNNMark get Invalid filter channel number when running: dnnmark_test_VGG, dnnmark_test_alexnet, dnnmark_test_fwd_conv It is a bit odd that the default batch size is 126, which is not a power of 2.&nbsp; If you make it 4, 8, or 16 what happens? If this doesn't resolve we would need to determine which of NCHW is being set to X in the MIOpen code to determine if it's a config problem, MIOpen problem, or a gem5 problem (i.e., if X is H then both H and W are 256 in the config file). Matt On Mon, May 15, 2023 at 11:05 PM 429442672 <429442672@qq.com&gt; wrote: Thank you,I would appreciate it very much if you could help me to solve this problem. the command i apply is: sudo docker run --rm -v ${PWD}:${PWD} -v ${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 -w ${PWD} gcn-gpu23 gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py -n3 --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_conv -cdnnmark_test_fwd_conv --options="-config gem5-resources/src/gpu/DNNMark/config_example/conv_config.dnnmark -mmap gem5-resources/src/gpu/DNNMark/mmap.bin" 1. yes, i still see this error. &nbsp; &nbsp; &nbsp;the generated cache is placed here: 2. i used the default setting in DNNMARK: besides,&nbsp; I also run softmax by: sudo docker run --rm -v ${PWD}:${PWD} -v ${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 -w ${PWD} gcn-gpu23 gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py -n3 --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_softmax -cdnnmark_test_fwd_softmax --options="-config gem5-resources/src/gpu/DNNMark/config_example/softmax_config.dnnmark -mmap gem5-resources/src/gpu/DNNMark/mmap.bin" and get Exiting because exiting with last active thread context. the detailed logs of running conv and softmax are attached in this emai. Thank you! ------------------&nbsp;原始邮件&nbsp;------------------ 发件人: "Matt Sinclair" <mattdsinclair.wisc@gmail.com&gt;; 发送时间:&nbsp;2023年5月15日(星期一) 凌晨4:34 收件人:&nbsp;"429442672"<429442672@qq.com&gt;; 抄送:&nbsp;"gem5-dev"<gem5-dev@gem5.org&gt;;"gem5-users"<gem5-users@gem5.org&gt;; 主题:&nbsp;Re: [gem5-dev] GEM5-GCN-DNNMark get Invalid filter channel number when running: dnnmark_test_VGG, dnnmark_test_alexnet, dnnmark_test_fwd_conv Thanks, this is helpful.&nbsp; Looking through those old email chains, I don't see any specific resolution to them, unfortunately.&nbsp; I do not have a ton of time to dig into this (end of the semester is keeping me busy) but if you can keep digging I may be able to provide some ideas.&nbsp; First, are you still seeing this error: MIOpen(HIP): Warning [ParseAndLoadDb] File is unreadable: /opt/rocm-4.0.1/miopen/share/miopen/db/gfx801_4.HIP.fdb.txt after making the above change to include the MIOpen cache? Second, what layer size are you assuming/trying to conv?&nbsp; The failure shortly after is here: https://github.com/ROCmSoftwarePlatform/MIOpen/blob/rocm-4.0.1-release/src/ocl/convolutionocl.cpp#L149.&nbsp; It seems to imply that X and W are not equal, but we'd need to dig to figure out if this is because of the config file being passed in, or something in MIOpen/gem5 that is breaking it.&nbsp; Given that the function call that is failing is trying to create a tensor: https://github.com/shidong-ai/DNNMark/blob/develop/core/include/dnn_utility.h#L106, my guess is that it's something with the config, because something so basic probably (hopefully?) doesn't fail in MIOpen... In terms of if it should work or not, I don't see that we included it in prior papers (e.g., https://www.gem5.org/assets/files/papers/enabling2021ispass.pdf) but I don't know if that was because we didn't try or if there was a deeper, more fundamental reason). Matt On Sat, May 13, 2023 at 10:34 AM 429442672 <429442672@qq.com&gt; wrote: Thank you so much for advice. Acturally, i have made the cachefiles as shown in the figure. Besides, i have succesfully run several benchmarks such as pool, activations, softmax, so i think the kernels is setuped. The reason of the differences in commands is that i directly run the command in the docker container. There might be a common problem when running the network with conv, some other email such as https://www.mail-archive.com/gem5-users@gem5.org/msg20468.html https://www.mail-archive.com/gem5-users@gem5.org/msg20456.html also met this problem. I have also tried to use the latest docker and the original command like this: docker run --rm -v ${PWD}:${PWD} -v ${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 -w ${PWD} gcr.io/gem5-test/gcn-gpu:v22-1 gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py -n3 --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_conv -cdnnmark_test_fwd_conv --options="-config gem5-resources/src/gpu/DNNMark/config_example/conv_config.dnnmark -mmap gem5-resources/src/gpu/DNNMark/mmap.bin"but still got the same error.Is that means conv layers is currently no available for gem5-gcn?May i ask is there anyone else met this problem before?thank you! ------------------ 原始邮件 ------------------ 发件人: "Matt Sinclair" <mattdsinclair.wisc@gmail.com&gt;; 发送时间:&nbsp;2023年5月12日(星期五) 中午11:42 收件人:&nbsp;"The gem5 Developer List"<gem5-dev@gem5.org&gt;; 抄送:&nbsp;"gem5-users"<gem5-users@gem5.org&gt;;"429442672"<429442672@qq.com&gt;; 主题:&nbsp;Re: [gem5-dev] GEM5-GCN-DNNMark get Invalid filter channel number when running: dnnmark_test_VGG, dnnmark_test_alexnet, dnnmark_test_fwd_conv I have not tried running these specific benchmarks in gem5 personally, so I cannot say for certain what the error is or even if they are expected to run to completion in gem5.&nbsp; But, normally the error you're seeing happens because you have not created the appropriate "cache" files for the GPU kernel(s) the program is trying to run.&nbsp; MIOpen first checks to see if the desired kernel has been run on your machine before, and if not it tries to do online compilation of that kernel.&nbsp; Unfortunately online compilation of kernels in gem5 is a) very slow and b), because it is very slow, not supported in gem5 (basically, it is so slow as to not be worth supporting in many cases, in my opinion).&nbsp; So, instead, the expectation is that we build the kernels we want ahead of time, before running the program.&nbsp; You may have seen in the examples we provide for DNNMark (https://resources.gem5.org/resources/dnn-mark) that we have this "generate cachefiles" script -- that is exactly what the purpose of that script is.&nbsp; Moreover, on the same webpage, you may have noticed we include the path to that cache directory in our docker commands (emphasis mine): docker run --rm -v ${PWD}:${PWD} -v ${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 -w ${PWD} gcr.io/gem5-test/gcn-gpu:v22-1 gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py -n3 --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_softmax -cdnnmark_test_fwd_softmax --options="-config gem5-resources/src/gpu/DNNMark/config_example/softmax_config.dnnmark -mmap gem5-resources/src/gpu/DNNMark/mmap.bin" From looking at your commands, I don't see you including this.&nbsp; Thus, while I do not know if that script by default produces the kernels needed for, say, AlexNet, I strongly suspect you should start by running that script and updating your docker commands to include the cache stuff ... then see what happens from there. Sidenote: normally when I see this: MIOpen(HIP): Warning [ParseAndLoadDb] File is unreadable: /opt/rocm-4.0.1/miopen/share/miopen/db/gfx801_4.HIP.fdb.txt It means that the files MIOpen is expecting are not setup properly.&nbsp; Normally I just symlink these extra files -- e.g., symlink gfx801_4... from gfx801_32 ... (this is not the best performing, option because the resources are different, but provides a basic setup step to avoid problems like this). Matt On Thu, May 11, 2023 at 10:02 PM 429442672 via gem5-dev <gem5-dev@gem5.org&gt; wrote: Dear Matt, &nbsp; &nbsp; &nbsp;When i run the benchmarks: dnnmark_test_VGG, dnnmark_test_alexnet, dnnmark_test_fwd_conv, and i got the same error like this (get Invalid filter channel number): build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall fdatasync(...) build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall fdatasync(...) build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall fdatasync(...) build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall fdatasync(...) build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 MIOpen(HIP): Warning [ParseAndLoadDb] File is unreadable: /opt/rocm-4.0.1/miopen/share/miopen/db/gfx801_4.HIP.fdb.txt build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 MIOpen Error: /root/driver/MLOpen/src/ocl/convolutionocl.cpp:150: Invalid filter channel number build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 MIOpen Error: /root/driver/MLOpen/src/ocl/convolutionocl.cpp:150: Invalid filter channel number MIOpen Error: 3 at /home/tang/gem5-resources/src/gpu/DNNMark/core/include/dnn_utility.h1057Ticks: 327510244000 Exiting because exiting with last active thread context My command line is: gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py -n 8 --mem-size=12GB --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_conv -c dnnmark_test_fwd_conv --options="-config gem5-resources/src/gpu/DNNMark/config_example/conv_config.dnnmark -mmap gem5-resources/src/gpu/DNNMark/mmap.bin" and i didn't change the setup of the default setup of conv_config.dnnmark and gfx801 May i ask, did i do something wrong here? Have you ever test those benchmark without error, and could you please show me several your configurations? Thank you so much! _______________________________________________ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-leave@gem5.org
MS
Matt Sinclair
Tue, May 16, 2023 6:33 PM

X is referring to this:
https://github.com/ROCmSoftwarePlatform/MIOpen/blob/rocm-4.0.1-release/src/ocl/convolutionocl.cpp#L149,
which is where your failure is occurring.

Do you happen to have an AMD GPU handy?  If so, can you try running conv on
the real GPU with ROCM 4.0.1 and let me know what happens?

Matt

On Tue, May 16, 2023 at 1:58 AM 429442672 429442672@qq.com wrote:

unfortunately, i tried to change n to 2,4,8,16, and get the same error.

May i ask what X means?

------------------ 原始邮件 ------------------
发件人: "Matt Sinclair" mattdsinclair.wisc@gmail.com;
发送时间: 2023年5月16日(星期二) 中午12:46
收件人: "429442672"429442672@qq.com;
抄送: "gem5-dev"gem5-dev@gem5.org;"gem5-users"gem5-users@gem5.org;
主题: Re: [gem5-dev] GEM5-GCN-DNNMark get Invalid filter channel number
when running: dnnmark_test_VGG, dnnmark_test_alexnet, dnnmark_test_fwd_conv

It is a bit odd that the default batch size is 126, which is not a power
of 2.  If you make it 4, 8, or 16 what happens?

If this doesn't resolve we would need to determine which of NCHW is being
set to X in the MIOpen code to determine if it's a config problem, MIOpen
problem, or a gem5 problem (i.e., if X is H then both H and W are 256 in
the config file).

Matt

On Mon, May 15, 2023 at 11:05 PM 429442672 429442672@qq.com wrote:

Thank you,I would appreciate it very much if you could help me to solve
this problem.

the command i apply is:
sudo docker run --rm -v ${PWD}:${PWD} -v
${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0
-w ${PWD} gcn-gpu23 gem5/build/GCN3_X86/gem5.opt
gem5/configs/example/apu_se.py -n3
--benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_conv
-cdnnmark_test_fwd_conv
--options="-config
gem5-resources/src/gpu/DNNMark/config_example/conv_config.dnnmark -mmap
gem5-resources/src/gpu/DNNMark/mmap.bin"

  1. yes, i still see this error.
    the generated cache is placed here:

  2. i used the default setting in DNNMARK:

besides,

I also run softmax by:
sudo docker run --rm -v ${PWD}:${PWD} -v
${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0
-w ${PWD} gcn-gpu23 gem5/build/GCN3_X86/gem5.opt
gem5/configs/example/apu_se.py -n3
--benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_softmax
-cdnnmark_test_fwd_softmax
--options="-config
gem5-resources/src/gpu/DNNMark/config_example/softmax_config.dnnmark -mmap
gem5-resources/src/gpu/DNNMark/mmap.bin"

and get

Exiting because exiting with last active thread context.

the detailed logs of running conv and softmax are attached in this emai.

Thank you!

------------------ 原始邮件 ------------------
发件人: "Matt Sinclair" mattdsinclair.wisc@gmail.com;
发送时间: 2023年5月15日(星期一) 凌晨4:34
收件人: "429442672"429442672@qq.com;
抄送: "gem5-dev"gem5-dev@gem5.org;"gem5-users"gem5-users@gem5.org;
主题: Re: [gem5-dev] GEM5-GCN-DNNMark get Invalid filter channel number
when running: dnnmark_test_VGG, dnnmark_test_alexnet, dnnmark_test_fwd_conv

Thanks, this is helpful.  Looking through those old email chains, I don't
see any specific resolution to them, unfortunately.  I do not have a ton of
time to dig into this (end of the semester is keeping me busy) but if you
can keep digging I may be able to provide some ideas.

First, are you still seeing this error:

MIOpen(HIP): Warning [ParseAndLoadDb] File is unreadable:
/opt/rocm-4.0.1/miopen/share/miopen/db/gfx801_4.HIP.fdb.txt

after making the above change to include the MIOpen cache?

Second, what layer size are you assuming/trying to conv?  The failure
shortly after is here:
https://github.com/ROCmSoftwarePlatform/MIOpen/blob/rocm-4.0.1-release/src/ocl/convolutionocl.cpp#L149.
It seems to imply that X and W are not equal, but we'd need to dig to
figure out if this is because of the config file being passed in, or
something in MIOpen/gem5 that is breaking it.  Given that the function call
that is failing is trying to create a tensor:
https://github.com/shidong-ai/DNNMark/blob/develop/core/include/dnn_utility.h#L106,
my guess is that it's something with the config, because something so basic
probably (hopefully?) doesn't fail in MIOpen...

In terms of if it should work or not, I don't see that we included it in
prior papers (e.g.,
https://www.gem5.org/assets/files/papers/enabling2021ispass.pdf) but I
don't know if that was because we didn't try or if there was a deeper, more
fundamental reason).

Matt

On Sat, May 13, 2023 at 10:34 AM 429442672 429442672@qq.com wrote:

Thank you so much for advice.

Acturally, i have made the cachefiles as shown in the figure.

Besides, i have succesfully run several benchmarks such as pool,
activations, softmax, so i think the kernels is setuped.
The reason of the differences in commands is that i directly run the
command in the docker container.

There might be a common problem when running the network with conv, some
other email such as
https://www.mail-archive.com/gem5-users@gem5.org/msg20468.html
https://www.mail-archive.com/gem5-users@gem5.org/msg20456.html
also met this problem.

I have also tried to use the latest docker and the original command like
this:

docker run --rm -v ${PWD}:${PWD} -v ${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 -w ${PWD} gcr.io/gem5-test/gcn-gpu:v22-1 gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py -n3 --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_conv -cdnnmark_test_fwd_conv --options="-config gem5-resources/src/gpu/DNNMark/config_example/conv_config.dnnmark -mmap gem5-resources/src/gpu/DNNMark/mmap.bin"

but still got the same error.

Is that means conv layers is currently no available for gem5-gcn?

May i ask is there anyone else met this problem before?

thank you!

------------------ 原始邮件 ------------------
发件人: "Matt Sinclair" mattdsinclair.wisc@gmail.com;
发送时间: 2023年5月12日(星期五) 中午11:42
收件人: "The gem5 Developer List"gem5-dev@gem5.org;
抄送: "gem5-users"gem5-users@gem5.org;"429442672"429442672@qq.com;
主题: Re: [gem5-dev] GEM5-GCN-DNNMark get Invalid filter channel number
when running: dnnmark_test_VGG, dnnmark_test_alexnet, dnnmark_test_fwd_conv

I have not tried running these specific benchmarks in gem5 personally,
so I cannot say for certain what the error is or even if they are expected
to run to completion in gem5.  But, normally the error you're seeing
happens because you have not created the appropriate "cache" files for the
GPU kernel(s) the program is trying to run.  MIOpen first checks to see if
the desired kernel has been run on your machine before, and if not it tries
to do online compilation of that kernel.  Unfortunately online compilation
of kernels in gem5 is a) very slow and b), because it is very slow, not
supported in gem5 (basically, it is so slow as to not be worth supporting
in many cases, in my opinion).  So, instead, the expectation is that we
build the kernels we want ahead of time, before running the program.  You
may have seen in the examples we provide for DNNMark (
https://resources.gem5.org/resources/dnn-mark) that we have this
"generate cachefiles" script -- that is exactly what the purpose of that
script is.  Moreover, on the same webpage, you may have noticed we include
the path to that cache directory in our docker commands (emphasis mine):

docker run --rm -v ${PWD}:${PWD} -v ${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 -w ${PWD} gcr.io/gem5-test/gcn-gpu:v22-1 gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py -n3 --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_softmax -cdnnmark_test_fwd_softmax --options="-config gem5-resources/src/gpu/DNNMark/config_example/softmax_config.dnnmark -mmap gem5-resources/src/gpu/DNNMark/mmap.bin"

From looking at your commands, I don't see you including this.  Thus,
while I do not know if that script by default produces the kernels needed
for, say, AlexNet, I strongly suspect you should start by running that
script and updating your docker commands to include the cache stuff ...
then see what happens from there.

Sidenote: normally when I see this:

MIOpen(HIP): Warning [ParseAndLoadDb] File is unreadable:
/opt/rocm-4.0.1/miopen/share/miopen/db/gfx801_4.HIP.fdb.txt

It means that the files MIOpen is expecting are not setup properly.
Normally I just symlink these extra files -- e.g., symlink gfx801_4... from
gfx801_32 ... (this is not the best performing, option because the
resources are different, but provides a basic setup step to avoid problems
like this).
Matt

On Thu, May 11, 2023 at 10:02 PM 429442672 via gem5-dev <
gem5-dev@gem5.org> wrote:

Dear Matt,
When i run the benchmarks: dnnmark_test_VGG, dnnmark_test_alexnet,
dnnmark_test_fwd_conv, and i got the same error like this (get Invalid
filter channel number):

build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall
fdatasync(...)
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall
fdatasync(...)
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall
fdatasync(...)
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall
fdatasync(...)
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported
command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported
command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported
command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported
command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported
command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported
command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported
command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported
command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported
command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported
command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported
command 6
MIOpen(HIP): Warning [ParseAndLoadDb] File is unreadable:
/opt/rocm-4.0.1/miopen/share/miopen/db/gfx801_4.HIP.fdb.txt
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported
command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported
command 6
MIOpen Error: /root/driver/MLOpen/src/ocl/convolutionocl.cpp:150:
Invalid filter channel number
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported
command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported
command 6
MIOpen Error: /root/driver/MLOpen/src/ocl/convolutionocl.cpp:150:
Invalid filter channel number
MIOpen Error: 3 at
/home/tang/gem5-resources/src/gpu/DNNMark/core/include/dnn_utility.h1057Ticks:
327510244000
Exiting because exiting with last active thread context

My command line is:

gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py
-n 8 --mem-size=12GB
--benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_conv
-c dnnmark_test_fwd_conv
--options="-config
gem5-resources/src/gpu/DNNMark/config_example/conv_config.dnnmark -mmap
gem5-resources/src/gpu/DNNMark/mmap.bin"

and i didn't change the setup of the default setup of
conv_config.dnnmark and gfx801

May i ask, did i do something wrong here?
Have you ever test those benchmark without error, and could you please
show me several your configurations?

Thank you so much!


gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-leave@gem5.org

X is referring to this: https://github.com/ROCmSoftwarePlatform/MIOpen/blob/rocm-4.0.1-release/src/ocl/convolutionocl.cpp#L149, which is where your failure is occurring. Do you happen to have an AMD GPU handy? If so, can you try running conv on the real GPU with ROCM 4.0.1 and let me know what happens? Matt On Tue, May 16, 2023 at 1:58 AM 429442672 <429442672@qq.com> wrote: > unfortunately, i tried to change n to 2,4,8,16, and get the same error. > > May i ask what X means? > > ------------------ 原始邮件 ------------------ > *发件人:* "Matt Sinclair" <mattdsinclair.wisc@gmail.com>; > *发送时间:* 2023年5月16日(星期二) 中午12:46 > *收件人:* "429442672"<429442672@qq.com>; > *抄送:* "gem5-dev"<gem5-dev@gem5.org>;"gem5-users"<gem5-users@gem5.org>; > *主题:* Re: [gem5-dev] GEM5-GCN-DNNMark get Invalid filter channel number > when running: dnnmark_test_VGG, dnnmark_test_alexnet, dnnmark_test_fwd_conv > > It is a bit odd that the default batch size is 126, which is not a power > of 2. If you make it 4, 8, or 16 what happens? > > If this doesn't resolve we would need to determine which of NCHW is being > set to X in the MIOpen code to determine if it's a config problem, MIOpen > problem, or a gem5 problem (i.e., if X is H then both H and W are 256 in > the config file). > > Matt > > On Mon, May 15, 2023 at 11:05 PM 429442672 <429442672@qq.com> wrote: > >> Thank you,I would appreciate it very much if you could help me to solve >> this problem. >> >> the command i apply is: >> sudo docker run --rm -v ${PWD}:${PWD} -v >> ${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 >> -w ${PWD} gcn-gpu23 gem5/build/GCN3_X86/gem5.opt >> gem5/configs/example/apu_se.py -n3 >> --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_conv >> -cdnnmark_test_fwd_conv >> --options="-config >> gem5-resources/src/gpu/DNNMark/config_example/conv_config.dnnmark -mmap >> gem5-resources/src/gpu/DNNMark/mmap.bin" >> >> 1. yes, i still see this error. >> the generated cache is placed here: >> >> >> 2. i used the default setting in DNNMARK: >> >> >> >> besides, >> >> I also run softmax by: >> sudo docker run --rm -v ${PWD}:${PWD} -v >> ${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 >> -w ${PWD} gcn-gpu23 gem5/build/GCN3_X86/gem5.opt >> gem5/configs/example/apu_se.py -n3 >> --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_softmax >> -cdnnmark_test_fwd_softmax >> --options="-config >> gem5-resources/src/gpu/DNNMark/config_example/softmax_config.dnnmark -mmap >> gem5-resources/src/gpu/DNNMark/mmap.bin" >> >> and get >> >> Exiting because exiting with last active thread context. >> >> the detailed logs of running conv and softmax are attached in this emai. >> >> Thank you! >> >> ------------------ 原始邮件 ------------------ >> *发件人:* "Matt Sinclair" <mattdsinclair.wisc@gmail.com>; >> *发送时间:* 2023年5月15日(星期一) 凌晨4:34 >> *收件人:* "429442672"<429442672@qq.com>; >> *抄送:* "gem5-dev"<gem5-dev@gem5.org>;"gem5-users"<gem5-users@gem5.org>; >> *主题:* Re: [gem5-dev] GEM5-GCN-DNNMark get Invalid filter channel number >> when running: dnnmark_test_VGG, dnnmark_test_alexnet, dnnmark_test_fwd_conv >> >> Thanks, this is helpful. Looking through those old email chains, I don't >> see any specific resolution to them, unfortunately. I do not have a ton of >> time to dig into this (end of the semester is keeping me busy) but if you >> can keep digging I may be able to provide some ideas. >> >> First, are you still seeing this error: >> >> MIOpen(HIP): Warning [ParseAndLoadDb] File is unreadable: >> /opt/rocm-4.0.1/miopen/share/miopen/db/gfx801_4.HIP.fdb.txt >> >> after making the above change to include the MIOpen cache? >> >> Second, what layer size are you assuming/trying to conv? The failure >> shortly after is here: >> https://github.com/ROCmSoftwarePlatform/MIOpen/blob/rocm-4.0.1-release/src/ocl/convolutionocl.cpp#L149. >> It seems to imply that X and W are not equal, but we'd need to dig to >> figure out if this is because of the config file being passed in, or >> something in MIOpen/gem5 that is breaking it. Given that the function call >> that is failing is trying to create a tensor: >> https://github.com/shidong-ai/DNNMark/blob/develop/core/include/dnn_utility.h#L106, >> my guess is that it's something with the config, because something so basic >> probably (hopefully?) doesn't fail in MIOpen... >> >> In terms of if it should work or not, I don't see that we included it in >> prior papers (e.g., >> https://www.gem5.org/assets/files/papers/enabling2021ispass.pdf) but I >> don't know if that was because we didn't try or if there was a deeper, more >> fundamental reason). >> >> Matt >> >> On Sat, May 13, 2023 at 10:34 AM 429442672 <429442672@qq.com> wrote: >> >>> Thank you so much for advice. >>> >>> Acturally, i have made the cachefiles as shown in the figure. >>> >>> Besides, i have succesfully run several benchmarks such as pool, >>> activations, softmax, so i think the kernels is setuped. >>> The reason of the differences in commands is that i directly run the >>> command in the docker container. >>> >>> There might be a common problem when running the network with conv, some >>> other email such as >>> https://www.mail-archive.com/gem5-users@gem5.org/msg20468.html >>> https://www.mail-archive.com/gem5-users@gem5.org/msg20456.html >>> also met this problem. >>> >>> I have also tried to use the latest docker and the original command like >>> this: >>> >>> docker run --rm -v ${PWD}:${PWD} -v ${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 -w ${PWD} gcr.io/gem5-test/gcn-gpu:v22-1 gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py -n3 --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_conv -cdnnmark_test_fwd_conv --options="-config gem5-resources/src/gpu/DNNMark/config_example/conv_config.dnnmark -mmap gem5-resources/src/gpu/DNNMark/mmap.bin" >>> >>> but still got the same error. >>> >>> Is that means conv layers is currently no available for gem5-gcn? >>> >>> May i ask is there anyone else met this problem before? >>> >>> thank you! >>> >>> >>> ------------------ 原始邮件 ------------------ >>> *发件人:* "Matt Sinclair" <mattdsinclair.wisc@gmail.com>; >>> *发送时间:* 2023年5月12日(星期五) 中午11:42 >>> *收件人:* "The gem5 Developer List"<gem5-dev@gem5.org>; >>> *抄送:* "gem5-users"<gem5-users@gem5.org>;"429442672"<429442672@qq.com>; >>> *主题:* Re: [gem5-dev] GEM5-GCN-DNNMark get Invalid filter channel number >>> when running: dnnmark_test_VGG, dnnmark_test_alexnet, dnnmark_test_fwd_conv >>> >>> I have not tried running these specific benchmarks in gem5 personally, >>> so I cannot say for certain what the error is or even if they are expected >>> to run to completion in gem5. But, normally the error you're seeing >>> happens because you have not created the appropriate "cache" files for the >>> GPU kernel(s) the program is trying to run. MIOpen first checks to see if >>> the desired kernel has been run on your machine before, and if not it tries >>> to do online compilation of that kernel. Unfortunately online compilation >>> of kernels in gem5 is a) very slow and b), because it is very slow, not >>> supported in gem5 (basically, it is so slow as to not be worth supporting >>> in many cases, in my opinion). So, instead, the expectation is that we >>> build the kernels we want ahead of time, before running the program. You >>> may have seen in the examples we provide for DNNMark ( >>> https://resources.gem5.org/resources/dnn-mark) that we have this >>> "generate cachefiles" script -- that is exactly what the purpose of that >>> script is. Moreover, on the same webpage, you may have noticed we include >>> the path to that cache directory in our docker commands (emphasis mine): >>> >>> docker run --rm -v ${PWD}:${PWD} -v *${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0* -w ${PWD} gcr.io/gem5-test/gcn-gpu:v22-1 gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py -n3 --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_softmax -cdnnmark_test_fwd_softmax --options="-config gem5-resources/src/gpu/DNNMark/config_example/softmax_config.dnnmark -mmap gem5-resources/src/gpu/DNNMark/mmap.bin" >>> >>> From looking at your commands, I don't see you including this. Thus, >>> while I do not know if that script by default produces the kernels needed >>> for, say, AlexNet, I strongly suspect you should start by running that >>> script and updating your docker commands to include the cache stuff ... >>> then see what happens from there. >>> >>> Sidenote: normally when I see this: >>> >>> MIOpen(HIP): Warning [ParseAndLoadDb] File is unreadable: >>> /opt/rocm-4.0.1/miopen/share/miopen/db/gfx801_4.HIP.fdb.txt >>> >>> It means that the files MIOpen is expecting are not setup properly. >>> Normally I just symlink these extra files -- e.g., symlink gfx801_4... from >>> gfx801_32 ... (this is not the best performing, option because the >>> resources are different, but provides a basic setup step to avoid problems >>> like this). >>> Matt >>> >>> On Thu, May 11, 2023 at 10:02 PM 429442672 via gem5-dev < >>> gem5-dev@gem5.org> wrote: >>> >>>> Dear Matt, >>>> When i run the benchmarks: dnnmark_test_VGG, dnnmark_test_alexnet, >>>> dnnmark_test_fwd_conv, and i got the same error like this (get Invalid >>>> filter channel number): >>>> >>>> build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall >>>> fdatasync(...) >>>> build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall >>>> fdatasync(...) >>>> build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall >>>> fdatasync(...) >>>> build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall >>>> fdatasync(...) >>>> build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported >>>> command 6 >>>> build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported >>>> command 6 >>>> build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported >>>> command 6 >>>> build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported >>>> command 6 >>>> build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported >>>> command 6 >>>> build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported >>>> command 6 >>>> build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported >>>> command 6 >>>> build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported >>>> command 6 >>>> build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported >>>> command 6 >>>> build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported >>>> command 6 >>>> build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported >>>> command 6 >>>> MIOpen(HIP): Warning [ParseAndLoadDb] File is unreadable: >>>> /opt/rocm-4.0.1/miopen/share/miopen/db/gfx801_4.HIP.fdb.txt >>>> build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported >>>> command 6 >>>> build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported >>>> command 6 >>>> MIOpen Error: /root/driver/MLOpen/src/ocl/convolutionocl.cpp:150: >>>> Invalid filter channel number >>>> build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported >>>> command 6 >>>> build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported >>>> command 6 >>>> MIOpen Error: /root/driver/MLOpen/src/ocl/convolutionocl.cpp:150: >>>> Invalid filter channel number >>>> MIOpen Error: 3 at >>>> /home/tang/gem5-resources/src/gpu/DNNMark/core/include/dnn_utility.h1057Ticks: >>>> 327510244000 >>>> Exiting because exiting with last active thread context >>>> >>>> >>>> My command line is: >>>> >>>> gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py >>>> -n 8 --mem-size=12GB >>>> --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_conv >>>> -c dnnmark_test_fwd_conv >>>> --options="-config >>>> gem5-resources/src/gpu/DNNMark/config_example/conv_config.dnnmark -mmap >>>> gem5-resources/src/gpu/DNNMark/mmap.bin" >>>> >>>> and i didn't change the setup of the default setup of >>>> conv_config.dnnmark and gfx801 >>>> >>>> May i ask, did i do something wrong here? >>>> Have you ever test those benchmark without error, and could you please >>>> show me several your configurations? >>>> >>>> Thank you so much! >>>> >>>> >>>> >>>> _______________________________________________ >>>> gem5-dev mailing list -- gem5-dev@gem5.org >>>> To unsubscribe send an email to gem5-dev-leave@gem5.org >>>> >>>
MS
Matt Sinclair
Tue, May 16, 2023 7:05 PM

Ok, I've verified this fails on my AMD GPU with ROCm 4 with the exact same
error:

[sinclair@eldin] (59)$
./build/benchmarks/test_fwd_conv/dnnmark_test_fwd_conv -config
config_example/conv_config.dnnmark --warmup 0 --debuginfo 1
13:55:12.799629 2314023 test_fwd_conv.cc:11] DNNMark suites: Start...
13:55:12.841578 2314023 dnnmark.cc:232] Search and parse general DNNMark
configuration
13:55:12.842526 2314023 dnnmark.cc:292] Search and parse layer configuration
13:55:12.842532 2314023 dnnmark.cc:306] Add [Convolution] layer
13:55:12.842555 2314023 dnn_layer.h:84] Layer name: conv1
13:55:12.842566 2314023 dnn_layer.h:89] Previous layer: null
13:55:12.842574 2314023 dnnmark.cc:361] DNNMark: Initialize...
13:55:12.842576 2314023 dnnmark.cc:362] Running mode: 1
13:55:12.842579 2314023 dnnmark.cc:363] Number of Layers: 1
13:55:12.842581 2314023 dnnmark.cc:365] Layer type: 1
13:55:12.842583 2314023 dnnmark.cc:367] DNNMark: Setup parameters of
Convolution layer
13:55:12.842586 2314023 dnn_layer.h:110] Bottom dimension: N: 126 C: 3 H:
256 W: 256
13:55:12.842594 2314023 data_manager.h:44] Create Data chunk of size
24772608
13:55:12.860227 2314023 data_manager.h:101] Create data with ID: 0
13:55:12.860245 2314023 data_manager.h:44] Create Data chunk of size
24772608
13:55:12.876646 2314023 data_manager.h:101] Create data with ID: 1
13:55:12.876678 2314023 data_manager.h:44] Create Data chunk of size
264241152
13:55:13.057176 2314023 data_manager.h:101] Create data with ID: 2
13:55:13.057200 2314023 data_manager.h:44] Create Data chunk of size
264241152
13:55:13.242066 2314023 data_manager.h:101] Create data with ID: 3
13:55:13.242089 2314023 data_manager.h:44] Create Data chunk of size 2400
13:55:13.242357 2314023 data_manager.h:101] Create data with ID: 4
13:55:13.242363 2314023 data_manager.h:44] Create Data chunk of size 2400
13:55:13.242486 2314023 data_manager.h:101] Create data with ID: 5
MIOpen Error: /root/driver/MLOpen/src/ocl/convolutionocl.cpp:150: Invalid
filter channel number
13:55:13.405368 2314023 conv_layer.h:220] Setting Bwd Filter Algo to
MIOpen Error: /root/driver/MLOpen/src/ocl/convolutionocl.cpp:150: Invalid
filter channel number
13:55:13.405539 2314023 data_manager.h:53] Free Data chunk of size 2400
I0516 13:55:13.405624 2314023 data_manager.h:53] Free Data chunk of size
2400
I0516 13:55:13.405642 2314023 data_manager.h:53] Free Data chunk of size
264241152
I0516 13:55:13.428352 2314023 data_manager.h:53] Free Data chunk of size
264241152
I0516 13:55:13.451712 2314023 data_manager.h:53] Free Data chunk of size
24772608
I0516 13:55:13.454205 2314023 data_manager.h:53] Free Data chunk of size
24772608
MIOpen Error: 3 at
/nobackup/sinclair/gem5-resources/src/gpu/DNNMark/core/include/dnn_utility.h1057

So it appears this is a problem with MIOpen/ROCm, not a gem5 problem.
Obviously this isn't ideal, but I'm also not sure how easy it's going to be
for us to fix -- we could try determining how MIOpen is setting up the
layers and see if there is an easy fix ... but if the issues are deeper
than something we can change in a small patch or in the config file, I'm
not sure what we can do.

Matt

On Tue, May 16, 2023 at 1:33 PM Matt Sinclair mattdsinclair.wisc@gmail.com
wrote:

X is referring to this:
https://github.com/ROCmSoftwarePlatform/MIOpen/blob/rocm-4.0.1-release/src/ocl/convolutionocl.cpp#L149,
which is where your failure is occurring.

Do you happen to have an AMD GPU handy?  If so, can you try running conv
on the real GPU with ROCM 4.0.1 and let me know what happens?

Matt

On Tue, May 16, 2023 at 1:58 AM 429442672 429442672@qq.com wrote:

unfortunately, i tried to change n to 2,4,8,16, and get the same error.

May i ask what X means?

------------------ 原始邮件 ------------------
发件人: "Matt Sinclair" mattdsinclair.wisc@gmail.com;
发送时间: 2023年5月16日(星期二) 中午12:46
收件人: "429442672"429442672@qq.com;
抄送: "gem5-dev"gem5-dev@gem5.org;"gem5-users"gem5-users@gem5.org;
主题: Re: [gem5-dev] GEM5-GCN-DNNMark get Invalid filter channel number
when running: dnnmark_test_VGG, dnnmark_test_alexnet, dnnmark_test_fwd_conv

It is a bit odd that the default batch size is 126, which is not a power
of 2.  If you make it 4, 8, or 16 what happens?

If this doesn't resolve we would need to determine which of NCHW is being
set to X in the MIOpen code to determine if it's a config problem, MIOpen
problem, or a gem5 problem (i.e., if X is H then both H and W are 256 in
the config file).

Matt

On Mon, May 15, 2023 at 11:05 PM 429442672 429442672@qq.com wrote:

Thank you,I would appreciate it very much if you could help me to solve
this problem.

the command i apply is:
sudo docker run --rm -v ${PWD}:${PWD} -v
${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0
-w ${PWD} gcn-gpu23 gem5/build/GCN3_X86/gem5.opt
gem5/configs/example/apu_se.py -n3
--benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_conv
-cdnnmark_test_fwd_conv
--options="-config
gem5-resources/src/gpu/DNNMark/config_example/conv_config.dnnmark -mmap
gem5-resources/src/gpu/DNNMark/mmap.bin"

  1. yes, i still see this error.
    the generated cache is placed here:

  2. i used the default setting in DNNMARK:

besides,

I also run softmax by:
sudo docker run --rm -v ${PWD}:${PWD} -v
${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0
-w ${PWD} gcn-gpu23 gem5/build/GCN3_X86/gem5.opt
gem5/configs/example/apu_se.py -n3
--benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_softmax
-cdnnmark_test_fwd_softmax
--options="-config
gem5-resources/src/gpu/DNNMark/config_example/softmax_config.dnnmark -mmap
gem5-resources/src/gpu/DNNMark/mmap.bin"

and get

Exiting because exiting with last active thread context.

the detailed logs of running conv and softmax are attached in this emai.

Thank you!

------------------ 原始邮件 ------------------
发件人: "Matt Sinclair" mattdsinclair.wisc@gmail.com;
发送时间: 2023年5月15日(星期一) 凌晨4:34
收件人: "429442672"429442672@qq.com;
抄送: "gem5-dev"gem5-dev@gem5.org;"gem5-users"gem5-users@gem5.org;
主题: Re: [gem5-dev] GEM5-GCN-DNNMark get Invalid filter channel number
when running: dnnmark_test_VGG, dnnmark_test_alexnet, dnnmark_test_fwd_conv

Thanks, this is helpful.  Looking through those old email chains, I
don't see any specific resolution to them, unfortunately.  I do not have a
ton of time to dig into this (end of the semester is keeping me busy) but
if you can keep digging I may be able to provide some ideas.

First, are you still seeing this error:

MIOpen(HIP): Warning [ParseAndLoadDb] File is unreadable:
/opt/rocm-4.0.1/miopen/share/miopen/db/gfx801_4.HIP.fdb.txt

after making the above change to include the MIOpen cache?

Second, what layer size are you assuming/trying to conv?  The failure
shortly after is here:
https://github.com/ROCmSoftwarePlatform/MIOpen/blob/rocm-4.0.1-release/src/ocl/convolutionocl.cpp#L149.
It seems to imply that X and W are not equal, but we'd need to dig to
figure out if this is because of the config file being passed in, or
something in MIOpen/gem5 that is breaking it.  Given that the function call
that is failing is trying to create a tensor:
https://github.com/shidong-ai/DNNMark/blob/develop/core/include/dnn_utility.h#L106,
my guess is that it's something with the config, because something so basic
probably (hopefully?) doesn't fail in MIOpen...

In terms of if it should work or not, I don't see that we included it in
prior papers (e.g.,
https://www.gem5.org/assets/files/papers/enabling2021ispass.pdf) but I
don't know if that was because we didn't try or if there was a deeper, more
fundamental reason).

Matt

On Sat, May 13, 2023 at 10:34 AM 429442672 429442672@qq.com wrote:

Thank you so much for advice.

Acturally, i have made the cachefiles as shown in the figure.

Besides, i have succesfully run several benchmarks such as pool,
activations, softmax, so i think the kernels is setuped.
The reason of the differences in commands is that i directly run the
command in the docker container.

There might be a common problem when running the network with conv,
some other email such as
https://www.mail-archive.com/gem5-users@gem5.org/msg20468.html
https://www.mail-archive.com/gem5-users@gem5.org/msg20456.html
also met this problem.

I have also tried to use the latest docker and the original command
like this:

docker run --rm -v ${PWD}:${PWD} -v ${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 -w ${PWD} gcr.io/gem5-test/gcn-gpu:v22-1 gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py -n3 --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_conv -cdnnmark_test_fwd_conv --options="-config gem5-resources/src/gpu/DNNMark/config_example/conv_config.dnnmark -mmap gem5-resources/src/gpu/DNNMark/mmap.bin"

but still got the same error.

Is that means conv layers is currently no available for gem5-gcn?

May i ask is there anyone else met this problem before?

thank you!

------------------ 原始邮件 ------------------
发件人: "Matt Sinclair" mattdsinclair.wisc@gmail.com;
发送时间: 2023年5月12日(星期五) 中午11:42
收件人: "The gem5 Developer List"gem5-dev@gem5.org;
抄送: "gem5-users"gem5-users@gem5.org;"429442672"429442672@qq.com;
主题: Re: [gem5-dev] GEM5-GCN-DNNMark get Invalid filter channel
number when running: dnnmark_test_VGG, dnnmark_test_alexnet,
dnnmark_test_fwd_conv

I have not tried running these specific benchmarks in gem5 personally,
so I cannot say for certain what the error is or even if they are expected
to run to completion in gem5.  But, normally the error you're seeing
happens because you have not created the appropriate "cache" files for the
GPU kernel(s) the program is trying to run.  MIOpen first checks to see if
the desired kernel has been run on your machine before, and if not it tries
to do online compilation of that kernel.  Unfortunately online compilation
of kernels in gem5 is a) very slow and b), because it is very slow, not
supported in gem5 (basically, it is so slow as to not be worth supporting
in many cases, in my opinion).  So, instead, the expectation is that we
build the kernels we want ahead of time, before running the program.  You
may have seen in the examples we provide for DNNMark (
https://resources.gem5.org/resources/dnn-mark) that we have this
"generate cachefiles" script -- that is exactly what the purpose of that
script is.  Moreover, on the same webpage, you may have noticed we include
the path to that cache directory in our docker commands (emphasis mine):

docker run --rm -v ${PWD}:${PWD} -v ${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 -w ${PWD} gcr.io/gem5-test/gcn-gpu:v22-1 gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py -n3 --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_softmax -cdnnmark_test_fwd_softmax --options="-config gem5-resources/src/gpu/DNNMark/config_example/softmax_config.dnnmark -mmap gem5-resources/src/gpu/DNNMark/mmap.bin"

From looking at your commands, I don't see you including this.  Thus,
while I do not know if that script by default produces the kernels needed
for, say, AlexNet, I strongly suspect you should start by running that
script and updating your docker commands to include the cache stuff ...
then see what happens from there.

Sidenote: normally when I see this:

MIOpen(HIP): Warning [ParseAndLoadDb] File is unreadable:
/opt/rocm-4.0.1/miopen/share/miopen/db/gfx801_4.HIP.fdb.txt

It means that the files MIOpen is expecting are not setup properly.
Normally I just symlink these extra files -- e.g., symlink gfx801_4... from
gfx801_32 ... (this is not the best performing, option because the
resources are different, but provides a basic setup step to avoid problems
like this).
Matt

On Thu, May 11, 2023 at 10:02 PM 429442672 via gem5-dev <
gem5-dev@gem5.org> wrote:

Dear Matt,
When i run the benchmarks: dnnmark_test_VGG,
dnnmark_test_alexnet, dnnmark_test_fwd_conv, and i got the same error like
this (get Invalid filter channel number):

build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall
fdatasync(...)
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall
fdatasync(...)
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall
fdatasync(...)
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall
fdatasync(...)
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported
command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported
command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported
command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported
command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported
command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported
command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported
command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported
command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported
command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported
command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported
command 6
MIOpen(HIP): Warning [ParseAndLoadDb] File is unreadable:
/opt/rocm-4.0.1/miopen/share/miopen/db/gfx801_4.HIP.fdb.txt
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported
command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported
command 6
MIOpen Error: /root/driver/MLOpen/src/ocl/convolutionocl.cpp:150:
Invalid filter channel number
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported
command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported
command 6
MIOpen Error: /root/driver/MLOpen/src/ocl/convolutionocl.cpp:150:
Invalid filter channel number
MIOpen Error: 3 at
/home/tang/gem5-resources/src/gpu/DNNMark/core/include/dnn_utility.h1057Ticks:
327510244000
Exiting because exiting with last active thread context

My command line is:

gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py
-n 8 --mem-size=12GB
--benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_conv
-c dnnmark_test_fwd_conv
--options="-config
gem5-resources/src/gpu/DNNMark/config_example/conv_config.dnnmark -mmap
gem5-resources/src/gpu/DNNMark/mmap.bin"

and i didn't change the setup of the default setup of
conv_config.dnnmark and gfx801

May i ask, did i do something wrong here?
Have you ever test those benchmark without error, and could you please
show me several your configurations?

Thank you so much!


gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-leave@gem5.org

Ok, I've verified this fails on my AMD GPU with ROCm 4 with the exact same error: [sinclair@eldin] (59)$ ./build/benchmarks/test_fwd_conv/dnnmark_test_fwd_conv -config config_example/conv_config.dnnmark --warmup 0 --debuginfo 1 13:55:12.799629 2314023 test_fwd_conv.cc:11] DNNMark suites: Start... 13:55:12.841578 2314023 dnnmark.cc:232] Search and parse general DNNMark configuration 13:55:12.842526 2314023 dnnmark.cc:292] Search and parse layer configuration 13:55:12.842532 2314023 dnnmark.cc:306] Add [Convolution] layer 13:55:12.842555 2314023 dnn_layer.h:84] Layer name: conv1 13:55:12.842566 2314023 dnn_layer.h:89] Previous layer: null 13:55:12.842574 2314023 dnnmark.cc:361] DNNMark: Initialize... 13:55:12.842576 2314023 dnnmark.cc:362] Running mode: 1 13:55:12.842579 2314023 dnnmark.cc:363] Number of Layers: 1 13:55:12.842581 2314023 dnnmark.cc:365] Layer type: 1 13:55:12.842583 2314023 dnnmark.cc:367] DNNMark: Setup parameters of Convolution layer 13:55:12.842586 2314023 dnn_layer.h:110] Bottom dimension: N: 126 C: 3 H: 256 W: 256 13:55:12.842594 2314023 data_manager.h:44] Create Data chunk of size 24772608 13:55:12.860227 2314023 data_manager.h:101] Create data with ID: 0 13:55:12.860245 2314023 data_manager.h:44] Create Data chunk of size 24772608 13:55:12.876646 2314023 data_manager.h:101] Create data with ID: 1 13:55:12.876678 2314023 data_manager.h:44] Create Data chunk of size 264241152 13:55:13.057176 2314023 data_manager.h:101] Create data with ID: 2 13:55:13.057200 2314023 data_manager.h:44] Create Data chunk of size 264241152 13:55:13.242066 2314023 data_manager.h:101] Create data with ID: 3 13:55:13.242089 2314023 data_manager.h:44] Create Data chunk of size 2400 13:55:13.242357 2314023 data_manager.h:101] Create data with ID: 4 13:55:13.242363 2314023 data_manager.h:44] Create Data chunk of size 2400 13:55:13.242486 2314023 data_manager.h:101] Create data with ID: 5 MIOpen Error: /root/driver/MLOpen/src/ocl/convolutionocl.cpp:150: Invalid filter channel number 13:55:13.405368 2314023 conv_layer.h:220] Setting Bwd Filter Algo to MIOpen Error: /root/driver/MLOpen/src/ocl/convolutionocl.cpp:150: Invalid filter channel number 13:55:13.405539 2314023 data_manager.h:53] Free Data chunk of size 2400 I0516 13:55:13.405624 2314023 data_manager.h:53] Free Data chunk of size 2400 I0516 13:55:13.405642 2314023 data_manager.h:53] Free Data chunk of size 264241152 I0516 13:55:13.428352 2314023 data_manager.h:53] Free Data chunk of size 264241152 I0516 13:55:13.451712 2314023 data_manager.h:53] Free Data chunk of size 24772608 I0516 13:55:13.454205 2314023 data_manager.h:53] Free Data chunk of size 24772608 MIOpen Error: 3 at /nobackup/sinclair/gem5-resources/src/gpu/DNNMark/core/include/dnn_utility.h1057 So it appears this is a problem with MIOpen/ROCm, not a gem5 problem. Obviously this isn't ideal, but I'm also not sure how easy it's going to be for us to fix -- we could try determining how MIOpen is setting up the layers and see if there is an easy fix ... but if the issues are deeper than something we can change in a small patch or in the config file, I'm not sure what we can do. Matt On Tue, May 16, 2023 at 1:33 PM Matt Sinclair <mattdsinclair.wisc@gmail.com> wrote: > X is referring to this: > https://github.com/ROCmSoftwarePlatform/MIOpen/blob/rocm-4.0.1-release/src/ocl/convolutionocl.cpp#L149, > which is where your failure is occurring. > > Do you happen to have an AMD GPU handy? If so, can you try running conv > on the real GPU with ROCM 4.0.1 and let me know what happens? > > Matt > > On Tue, May 16, 2023 at 1:58 AM 429442672 <429442672@qq.com> wrote: > >> unfortunately, i tried to change n to 2,4,8,16, and get the same error. >> >> May i ask what X means? >> >> ------------------ 原始邮件 ------------------ >> *发件人:* "Matt Sinclair" <mattdsinclair.wisc@gmail.com>; >> *发送时间:* 2023年5月16日(星期二) 中午12:46 >> *收件人:* "429442672"<429442672@qq.com>; >> *抄送:* "gem5-dev"<gem5-dev@gem5.org>;"gem5-users"<gem5-users@gem5.org>; >> *主题:* Re: [gem5-dev] GEM5-GCN-DNNMark get Invalid filter channel number >> when running: dnnmark_test_VGG, dnnmark_test_alexnet, dnnmark_test_fwd_conv >> >> It is a bit odd that the default batch size is 126, which is not a power >> of 2. If you make it 4, 8, or 16 what happens? >> >> If this doesn't resolve we would need to determine which of NCHW is being >> set to X in the MIOpen code to determine if it's a config problem, MIOpen >> problem, or a gem5 problem (i.e., if X is H then both H and W are 256 in >> the config file). >> >> Matt >> >> On Mon, May 15, 2023 at 11:05 PM 429442672 <429442672@qq.com> wrote: >> >>> Thank you,I would appreciate it very much if you could help me to solve >>> this problem. >>> >>> the command i apply is: >>> sudo docker run --rm -v ${PWD}:${PWD} -v >>> ${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 >>> -w ${PWD} gcn-gpu23 gem5/build/GCN3_X86/gem5.opt >>> gem5/configs/example/apu_se.py -n3 >>> --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_conv >>> -cdnnmark_test_fwd_conv >>> --options="-config >>> gem5-resources/src/gpu/DNNMark/config_example/conv_config.dnnmark -mmap >>> gem5-resources/src/gpu/DNNMark/mmap.bin" >>> >>> 1. yes, i still see this error. >>> the generated cache is placed here: >>> >>> >>> 2. i used the default setting in DNNMARK: >>> >>> >>> >>> besides, >>> >>> I also run softmax by: >>> sudo docker run --rm -v ${PWD}:${PWD} -v >>> ${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 >>> -w ${PWD} gcn-gpu23 gem5/build/GCN3_X86/gem5.opt >>> gem5/configs/example/apu_se.py -n3 >>> --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_softmax >>> -cdnnmark_test_fwd_softmax >>> --options="-config >>> gem5-resources/src/gpu/DNNMark/config_example/softmax_config.dnnmark -mmap >>> gem5-resources/src/gpu/DNNMark/mmap.bin" >>> >>> and get >>> >>> Exiting because exiting with last active thread context. >>> >>> the detailed logs of running conv and softmax are attached in this emai. >>> >>> Thank you! >>> >>> ------------------ 原始邮件 ------------------ >>> *发件人:* "Matt Sinclair" <mattdsinclair.wisc@gmail.com>; >>> *发送时间:* 2023年5月15日(星期一) 凌晨4:34 >>> *收件人:* "429442672"<429442672@qq.com>; >>> *抄送:* "gem5-dev"<gem5-dev@gem5.org>;"gem5-users"<gem5-users@gem5.org>; >>> *主题:* Re: [gem5-dev] GEM5-GCN-DNNMark get Invalid filter channel number >>> when running: dnnmark_test_VGG, dnnmark_test_alexnet, dnnmark_test_fwd_conv >>> >>> Thanks, this is helpful. Looking through those old email chains, I >>> don't see any specific resolution to them, unfortunately. I do not have a >>> ton of time to dig into this (end of the semester is keeping me busy) but >>> if you can keep digging I may be able to provide some ideas. >>> >>> First, are you still seeing this error: >>> >>> MIOpen(HIP): Warning [ParseAndLoadDb] File is unreadable: >>> /opt/rocm-4.0.1/miopen/share/miopen/db/gfx801_4.HIP.fdb.txt >>> >>> after making the above change to include the MIOpen cache? >>> >>> Second, what layer size are you assuming/trying to conv? The failure >>> shortly after is here: >>> https://github.com/ROCmSoftwarePlatform/MIOpen/blob/rocm-4.0.1-release/src/ocl/convolutionocl.cpp#L149. >>> It seems to imply that X and W are not equal, but we'd need to dig to >>> figure out if this is because of the config file being passed in, or >>> something in MIOpen/gem5 that is breaking it. Given that the function call >>> that is failing is trying to create a tensor: >>> https://github.com/shidong-ai/DNNMark/blob/develop/core/include/dnn_utility.h#L106, >>> my guess is that it's something with the config, because something so basic >>> probably (hopefully?) doesn't fail in MIOpen... >>> >>> In terms of if it should work or not, I don't see that we included it in >>> prior papers (e.g., >>> https://www.gem5.org/assets/files/papers/enabling2021ispass.pdf) but I >>> don't know if that was because we didn't try or if there was a deeper, more >>> fundamental reason). >>> >>> Matt >>> >>> On Sat, May 13, 2023 at 10:34 AM 429442672 <429442672@qq.com> wrote: >>> >>>> Thank you so much for advice. >>>> >>>> Acturally, i have made the cachefiles as shown in the figure. >>>> >>>> Besides, i have succesfully run several benchmarks such as pool, >>>> activations, softmax, so i think the kernels is setuped. >>>> The reason of the differences in commands is that i directly run the >>>> command in the docker container. >>>> >>>> There might be a common problem when running the network with conv, >>>> some other email such as >>>> https://www.mail-archive.com/gem5-users@gem5.org/msg20468.html >>>> https://www.mail-archive.com/gem5-users@gem5.org/msg20456.html >>>> also met this problem. >>>> >>>> I have also tried to use the latest docker and the original command >>>> like this: >>>> >>>> docker run --rm -v ${PWD}:${PWD} -v ${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 -w ${PWD} gcr.io/gem5-test/gcn-gpu:v22-1 gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py -n3 --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_conv -cdnnmark_test_fwd_conv --options="-config gem5-resources/src/gpu/DNNMark/config_example/conv_config.dnnmark -mmap gem5-resources/src/gpu/DNNMark/mmap.bin" >>>> >>>> but still got the same error. >>>> >>>> Is that means conv layers is currently no available for gem5-gcn? >>>> >>>> May i ask is there anyone else met this problem before? >>>> >>>> thank you! >>>> >>>> >>>> ------------------ 原始邮件 ------------------ >>>> *发件人:* "Matt Sinclair" <mattdsinclair.wisc@gmail.com>; >>>> *发送时间:* 2023年5月12日(星期五) 中午11:42 >>>> *收件人:* "The gem5 Developer List"<gem5-dev@gem5.org>; >>>> *抄送:* "gem5-users"<gem5-users@gem5.org>;"429442672"<429442672@qq.com>; >>>> *主题:* Re: [gem5-dev] GEM5-GCN-DNNMark get Invalid filter channel >>>> number when running: dnnmark_test_VGG, dnnmark_test_alexnet, >>>> dnnmark_test_fwd_conv >>>> >>>> I have not tried running these specific benchmarks in gem5 personally, >>>> so I cannot say for certain what the error is or even if they are expected >>>> to run to completion in gem5. But, normally the error you're seeing >>>> happens because you have not created the appropriate "cache" files for the >>>> GPU kernel(s) the program is trying to run. MIOpen first checks to see if >>>> the desired kernel has been run on your machine before, and if not it tries >>>> to do online compilation of that kernel. Unfortunately online compilation >>>> of kernels in gem5 is a) very slow and b), because it is very slow, not >>>> supported in gem5 (basically, it is so slow as to not be worth supporting >>>> in many cases, in my opinion). So, instead, the expectation is that we >>>> build the kernels we want ahead of time, before running the program. You >>>> may have seen in the examples we provide for DNNMark ( >>>> https://resources.gem5.org/resources/dnn-mark) that we have this >>>> "generate cachefiles" script -- that is exactly what the purpose of that >>>> script is. Moreover, on the same webpage, you may have noticed we include >>>> the path to that cache directory in our docker commands (emphasis mine): >>>> >>>> docker run --rm -v ${PWD}:${PWD} -v *${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0* -w ${PWD} gcr.io/gem5-test/gcn-gpu:v22-1 gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py -n3 --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_softmax -cdnnmark_test_fwd_softmax --options="-config gem5-resources/src/gpu/DNNMark/config_example/softmax_config.dnnmark -mmap gem5-resources/src/gpu/DNNMark/mmap.bin" >>>> >>>> From looking at your commands, I don't see you including this. Thus, >>>> while I do not know if that script by default produces the kernels needed >>>> for, say, AlexNet, I strongly suspect you should start by running that >>>> script and updating your docker commands to include the cache stuff ... >>>> then see what happens from there. >>>> >>>> Sidenote: normally when I see this: >>>> >>>> MIOpen(HIP): Warning [ParseAndLoadDb] File is unreadable: >>>> /opt/rocm-4.0.1/miopen/share/miopen/db/gfx801_4.HIP.fdb.txt >>>> >>>> It means that the files MIOpen is expecting are not setup properly. >>>> Normally I just symlink these extra files -- e.g., symlink gfx801_4... from >>>> gfx801_32 ... (this is not the best performing, option because the >>>> resources are different, but provides a basic setup step to avoid problems >>>> like this). >>>> Matt >>>> >>>> On Thu, May 11, 2023 at 10:02 PM 429442672 via gem5-dev < >>>> gem5-dev@gem5.org> wrote: >>>> >>>>> Dear Matt, >>>>> When i run the benchmarks: dnnmark_test_VGG, >>>>> dnnmark_test_alexnet, dnnmark_test_fwd_conv, and i got the same error like >>>>> this (get Invalid filter channel number): >>>>> >>>>> build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall >>>>> fdatasync(...) >>>>> build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall >>>>> fdatasync(...) >>>>> build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall >>>>> fdatasync(...) >>>>> build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall >>>>> fdatasync(...) >>>>> build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported >>>>> command 6 >>>>> build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported >>>>> command 6 >>>>> build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported >>>>> command 6 >>>>> build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported >>>>> command 6 >>>>> build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported >>>>> command 6 >>>>> build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported >>>>> command 6 >>>>> build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported >>>>> command 6 >>>>> build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported >>>>> command 6 >>>>> build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported >>>>> command 6 >>>>> build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported >>>>> command 6 >>>>> build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported >>>>> command 6 >>>>> MIOpen(HIP): Warning [ParseAndLoadDb] File is unreadable: >>>>> /opt/rocm-4.0.1/miopen/share/miopen/db/gfx801_4.HIP.fdb.txt >>>>> build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported >>>>> command 6 >>>>> build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported >>>>> command 6 >>>>> MIOpen Error: /root/driver/MLOpen/src/ocl/convolutionocl.cpp:150: >>>>> Invalid filter channel number >>>>> build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported >>>>> command 6 >>>>> build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported >>>>> command 6 >>>>> MIOpen Error: /root/driver/MLOpen/src/ocl/convolutionocl.cpp:150: >>>>> Invalid filter channel number >>>>> MIOpen Error: 3 at >>>>> /home/tang/gem5-resources/src/gpu/DNNMark/core/include/dnn_utility.h1057Ticks: >>>>> 327510244000 >>>>> Exiting because exiting with last active thread context >>>>> >>>>> >>>>> My command line is: >>>>> >>>>> gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py >>>>> -n 8 --mem-size=12GB >>>>> --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_conv >>>>> -c dnnmark_test_fwd_conv >>>>> --options="-config >>>>> gem5-resources/src/gpu/DNNMark/config_example/conv_config.dnnmark -mmap >>>>> gem5-resources/src/gpu/DNNMark/mmap.bin" >>>>> >>>>> and i didn't change the setup of the default setup of >>>>> conv_config.dnnmark and gfx801 >>>>> >>>>> May i ask, did i do something wrong here? >>>>> Have you ever test those benchmark without error, and could you please >>>>> show me several your configurations? >>>>> >>>>> Thank you so much! >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> gem5-dev mailing list -- gem5-dev@gem5.org >>>>> To unsubscribe send an email to gem5-dev-leave@gem5.org >>>>> >>>>
MS
Matt Sinclair
Wed, May 17, 2023 12:04 AM

Ok I've found a fix to this (i.e., that works on a real AMD GPU) although I
don't have a perfect understanding of why it is required.  The bottom line
is that the num_outputs and c values in the configuration appear to need to
be the same for convolution layers -- not exactly sure why.

But the implication of that is either:

Either change results in the X and W values being the same size for the
convolution.  I do not understand the MIOpen code well enough to appreciate
the implications of this, but hopefully this is helpful and you can check
if it runs in gem5 now.

Sidenote: if you want to see the details, you can set this:

export MIOPEN_ENABLE_LOGGING=1

This will produce a lot of output when you run the benchmark, which is how
I saw that X (really Y, as it seems convolution is assuming X and Y are
flipped since it's in transposed mode) and W were being set to 32 and 3.
It is also possible that the issue here is that the MIOpen code is assuming
that the matrix is transposed and the original num_outputs and c values
would work if not transposed, but again I don't quite understand the MIOpen
code well enough to reason through this.

Finally, I strongly recommend you reduce N anyways to get gem5 to run this
in a reasonable amount of time.  It took 7.4 seconds to run the provided
config (with my change above to set num_ouputs to 3) on my real GPU, which
is way longer than gem5 can simulate in a reasonable time.

Matt

On Tue, May 16, 2023 at 2:05 PM Matt Sinclair mattdsinclair.wisc@gmail.com
wrote:

Ok, I've verified this fails on my AMD GPU with ROCm 4 with the exact same
error:

[sinclair@eldin] (59)$
./build/benchmarks/test_fwd_conv/dnnmark_test_fwd_conv -config
config_example/conv_config.dnnmark --warmup 0 --debuginfo 1
13:55:12.799629 2314023 test_fwd_conv.cc:11] DNNMark suites: Start...
13:55:12.841578 2314023 dnnmark.cc:232] Search and parse general DNNMark
configuration
13:55:12.842526 2314023 dnnmark.cc:292] Search and parse layer
configuration
13:55:12.842532 2314023 dnnmark.cc:306] Add [Convolution] layer
13:55:12.842555 2314023 dnn_layer.h:84] Layer name: conv1
13:55:12.842566 2314023 dnn_layer.h:89] Previous layer: null
13:55:12.842574 2314023 dnnmark.cc:361] DNNMark: Initialize...
13:55:12.842576 2314023 dnnmark.cc:362] Running mode: 1
13:55:12.842579 2314023 dnnmark.cc:363] Number of Layers: 1
13:55:12.842581 2314023 dnnmark.cc:365] Layer type: 1
13:55:12.842583 2314023 dnnmark.cc:367] DNNMark: Setup parameters of
Convolution layer
13:55:12.842586 2314023 dnn_layer.h:110] Bottom dimension: N: 126 C: 3 H:
256 W: 256
13:55:12.842594 2314023 data_manager.h:44] Create Data chunk of size
24772608
13:55:12.860227 2314023 data_manager.h:101] Create data with ID: 0
13:55:12.860245 2314023 data_manager.h:44] Create Data chunk of size
24772608
13:55:12.876646 2314023 data_manager.h:101] Create data with ID: 1
13:55:12.876678 2314023 data_manager.h:44] Create Data chunk of size
264241152
13:55:13.057176 2314023 data_manager.h:101] Create data with ID: 2
13:55:13.057200 2314023 data_manager.h:44] Create Data chunk of size
264241152
13:55:13.242066 2314023 data_manager.h:101] Create data with ID: 3
13:55:13.242089 2314023 data_manager.h:44] Create Data chunk of size 2400
13:55:13.242357 2314023 data_manager.h:101] Create data with ID: 4
13:55:13.242363 2314023 data_manager.h:44] Create Data chunk of size 2400
13:55:13.242486 2314023 data_manager.h:101] Create data with ID: 5
MIOpen Error: /root/driver/MLOpen/src/ocl/convolutionocl.cpp:150: Invalid
filter channel number
13:55:13.405368 2314023 conv_layer.h:220] Setting Bwd Filter Algo to
MIOpen Error: /root/driver/MLOpen/src/ocl/convolutionocl.cpp:150: Invalid
filter channel number
13:55:13.405539 2314023 data_manager.h:53] Free Data chunk of size 2400
I0516 13:55:13.405624 2314023 data_manager.h:53] Free Data chunk of size
2400
I0516 13:55:13.405642 2314023 data_manager.h:53] Free Data chunk of size
264241152
I0516 13:55:13.428352 2314023 data_manager.h:53] Free Data chunk of size
264241152
I0516 13:55:13.451712 2314023 data_manager.h:53] Free Data chunk of size
24772608
I0516 13:55:13.454205 2314023 data_manager.h:53] Free Data chunk of size
24772608
MIOpen Error: 3 at
/nobackup/sinclair/gem5-resources/src/gpu/DNNMark/core/include/dnn_utility.h1057

So it appears this is a problem with MIOpen/ROCm, not a gem5 problem.
Obviously this isn't ideal, but I'm also not sure how easy it's going to be
for us to fix -- we could try determining how MIOpen is setting up the
layers and see if there is an easy fix ... but if the issues are deeper
than something we can change in a small patch or in the config file, I'm
not sure what we can do.

Matt

On Tue, May 16, 2023 at 1:33 PM Matt Sinclair <
mattdsinclair.wisc@gmail.com> wrote:

X is referring to this:
https://github.com/ROCmSoftwarePlatform/MIOpen/blob/rocm-4.0.1-release/src/ocl/convolutionocl.cpp#L149,
which is where your failure is occurring.

Do you happen to have an AMD GPU handy?  If so, can you try running conv
on the real GPU with ROCM 4.0.1 and let me know what happens?

Matt

On Tue, May 16, 2023 at 1:58 AM 429442672 429442672@qq.com wrote:

unfortunately, i tried to change n to 2,4,8,16, and get the same error.

May i ask what X means?

------------------ 原始邮件 ------------------
发件人: "Matt Sinclair" mattdsinclair.wisc@gmail.com;
发送时间: 2023年5月16日(星期二) 中午12:46
收件人: "429442672"429442672@qq.com;
抄送: "gem5-dev"gem5-dev@gem5.org;"gem5-users"gem5-users@gem5.org;
主题: Re: [gem5-dev] GEM5-GCN-DNNMark get Invalid filter channel number
when running: dnnmark_test_VGG, dnnmark_test_alexnet, dnnmark_test_fwd_conv

It is a bit odd that the default batch size is 126, which is not a power
of 2.  If you make it 4, 8, or 16 what happens?

If this doesn't resolve we would need to determine which of NCHW is
being set to X in the MIOpen code to determine if it's a config problem,
MIOpen problem, or a gem5 problem (i.e., if X is H then both H and W are
256 in the config file).

Matt

On Mon, May 15, 2023 at 11:05 PM 429442672 429442672@qq.com wrote:

Thank you,I would appreciate it very much if you could help me to solve
this problem.

the command i apply is:
sudo docker run --rm -v ${PWD}:${PWD} -v
${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0
-w ${PWD} gcn-gpu23 gem5/build/GCN3_X86/gem5.opt
gem5/configs/example/apu_se.py -n3
--benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_conv
-cdnnmark_test_fwd_conv
--options="-config
gem5-resources/src/gpu/DNNMark/config_example/conv_config.dnnmark -mmap
gem5-resources/src/gpu/DNNMark/mmap.bin"

  1. yes, i still see this error.
    the generated cache is placed here:

  2. i used the default setting in DNNMARK:

besides,

I also run softmax by:
sudo docker run --rm -v ${PWD}:${PWD} -v
${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0
-w ${PWD} gcn-gpu23 gem5/build/GCN3_X86/gem5.opt
gem5/configs/example/apu_se.py -n3
--benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_softmax
-cdnnmark_test_fwd_softmax
--options="-config
gem5-resources/src/gpu/DNNMark/config_example/softmax_config.dnnmark -mmap
gem5-resources/src/gpu/DNNMark/mmap.bin"

and get

Exiting because exiting with last active thread context.

the detailed logs of running conv and softmax are attached in this emai.

Thank you!

------------------ 原始邮件 ------------------
发件人: "Matt Sinclair" mattdsinclair.wisc@gmail.com;
发送时间: 2023年5月15日(星期一) 凌晨4:34
收件人: "429442672"429442672@qq.com;
抄送: "gem5-dev"gem5-dev@gem5.org;"gem5-users"gem5-users@gem5.org;
主题: Re: [gem5-dev] GEM5-GCN-DNNMark get Invalid filter channel
number when running: dnnmark_test_VGG, dnnmark_test_alexnet,
dnnmark_test_fwd_conv

Thanks, this is helpful.  Looking through those old email chains, I
don't see any specific resolution to them, unfortunately.  I do not have a
ton of time to dig into this (end of the semester is keeping me busy) but
if you can keep digging I may be able to provide some ideas.

First, are you still seeing this error:

MIOpen(HIP): Warning [ParseAndLoadDb] File is unreadable:
/opt/rocm-4.0.1/miopen/share/miopen/db/gfx801_4.HIP.fdb.txt

after making the above change to include the MIOpen cache?

Second, what layer size are you assuming/trying to conv?  The failure
shortly after is here:
https://github.com/ROCmSoftwarePlatform/MIOpen/blob/rocm-4.0.1-release/src/ocl/convolutionocl.cpp#L149.
It seems to imply that X and W are not equal, but we'd need to dig to
figure out if this is because of the config file being passed in, or
something in MIOpen/gem5 that is breaking it.  Given that the function call
that is failing is trying to create a tensor:
https://github.com/shidong-ai/DNNMark/blob/develop/core/include/dnn_utility.h#L106,
my guess is that it's something with the config, because something so basic
probably (hopefully?) doesn't fail in MIOpen...

In terms of if it should work or not, I don't see that we included it
in prior papers (e.g.,
https://www.gem5.org/assets/files/papers/enabling2021ispass.pdf) but I
don't know if that was because we didn't try or if there was a deeper, more
fundamental reason).

Matt

On Sat, May 13, 2023 at 10:34 AM 429442672 429442672@qq.com wrote:

Thank you so much for advice.

Acturally, i have made the cachefiles as shown in the figure.

Besides, i have succesfully run several benchmarks such as pool,
activations, softmax, so i think the kernels is setuped.
The reason of the differences in commands is that i directly run the
command in the docker container.

There might be a common problem when running the network with conv,
some other email such as
https://www.mail-archive.com/gem5-users@gem5.org/msg20468.html
https://www.mail-archive.com/gem5-users@gem5.org/msg20456.html
also met this problem.

I have also tried to use the latest docker and the original command
like this:

docker run --rm -v ${PWD}:${PWD} -v ${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 -w ${PWD} gcr.io/gem5-test/gcn-gpu:v22-1 gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py -n3 --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_conv -cdnnmark_test_fwd_conv --options="-config gem5-resources/src/gpu/DNNMark/config_example/conv_config.dnnmark -mmap gem5-resources/src/gpu/DNNMark/mmap.bin"

but still got the same error.

Is that means conv layers is currently no available for gem5-gcn?

May i ask is there anyone else met this problem before?

thank you!

------------------ 原始邮件 ------------------
发件人: "Matt Sinclair" mattdsinclair.wisc@gmail.com;
发送时间: 2023年5月12日(星期五) 中午11:42
收件人: "The gem5 Developer List"gem5-dev@gem5.org;
抄送: "gem5-users"gem5-users@gem5.org;"429442672"429442672@qq.com;
主题: Re: [gem5-dev] GEM5-GCN-DNNMark get Invalid filter channel
number when running: dnnmark_test_VGG, dnnmark_test_alexnet,
dnnmark_test_fwd_conv

I have not tried running these specific benchmarks in gem5 personally,
so I cannot say for certain what the error is or even if they are expected
to run to completion in gem5.  But, normally the error you're seeing
happens because you have not created the appropriate "cache" files for the
GPU kernel(s) the program is trying to run.  MIOpen first checks to see if
the desired kernel has been run on your machine before, and if not it tries
to do online compilation of that kernel.  Unfortunately online compilation
of kernels in gem5 is a) very slow and b), because it is very slow, not
supported in gem5 (basically, it is so slow as to not be worth supporting
in many cases, in my opinion).  So, instead, the expectation is that we
build the kernels we want ahead of time, before running the program.  You
may have seen in the examples we provide for DNNMark (
https://resources.gem5.org/resources/dnn-mark) that we have this
"generate cachefiles" script -- that is exactly what the purpose of that
script is.  Moreover, on the same webpage, you may have noticed we include
the path to that cache directory in our docker commands (emphasis mine):

docker run --rm -v ${PWD}:${PWD} -v ${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 -w ${PWD} gcr.io/gem5-test/gcn-gpu:v22-1 gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py -n3 --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_softmax -cdnnmark_test_fwd_softmax --options="-config gem5-resources/src/gpu/DNNMark/config_example/softmax_config.dnnmark -mmap gem5-resources/src/gpu/DNNMark/mmap.bin"

From looking at your commands, I don't see you including this.  Thus,
while I do not know if that script by default produces the kernels needed
for, say, AlexNet, I strongly suspect you should start by running that
script and updating your docker commands to include the cache stuff ...
then see what happens from there.

Sidenote: normally when I see this:

MIOpen(HIP): Warning [ParseAndLoadDb] File is unreadable:
/opt/rocm-4.0.1/miopen/share/miopen/db/gfx801_4.HIP.fdb.txt

It means that the files MIOpen is expecting are not setup properly.
Normally I just symlink these extra files -- e.g., symlink gfx801_4... from
gfx801_32 ... (this is not the best performing, option because the
resources are different, but provides a basic setup step to avoid problems
like this).
Matt

On Thu, May 11, 2023 at 10:02 PM 429442672 via gem5-dev <
gem5-dev@gem5.org> wrote:

Dear Matt,
When i run the benchmarks: dnnmark_test_VGG,
dnnmark_test_alexnet, dnnmark_test_fwd_conv, and i got the same error like
this (get Invalid filter channel number):

build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall
fdatasync(...)
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall
fdatasync(...)
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall
fdatasync(...)
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall
fdatasync(...)
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported
command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported
command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported
command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported
command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported
command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported
command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported
command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported
command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported
command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported
command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported
command 6
MIOpen(HIP): Warning [ParseAndLoadDb] File is unreadable:
/opt/rocm-4.0.1/miopen/share/miopen/db/gfx801_4.HIP.fdb.txt
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported
command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported
command 6
MIOpen Error: /root/driver/MLOpen/src/ocl/convolutionocl.cpp:150:
Invalid filter channel number
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported
command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported
command 6
MIOpen Error: /root/driver/MLOpen/src/ocl/convolutionocl.cpp:150:
Invalid filter channel number
MIOpen Error: 3 at
/home/tang/gem5-resources/src/gpu/DNNMark/core/include/dnn_utility.h1057Ticks:
327510244000
Exiting because exiting with last active thread context

My command line is:

gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py
-n 8 --mem-size=12GB
--benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_conv
-c dnnmark_test_fwd_conv
--options="-config
gem5-resources/src/gpu/DNNMark/config_example/conv_config.dnnmark -mmap
gem5-resources/src/gpu/DNNMark/mmap.bin"

and i didn't change the setup of the default setup of
conv_config.dnnmark and gfx801

May i ask, did i do something wrong here?
Have you ever test those benchmark without error, and could you
please show me several your configurations?

Thank you so much!


gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-leave@gem5.org

Ok I've found a fix to this (i.e., that works on a real AMD GPU) although I don't have a perfect understanding of why it is required. The bottom line is that the num_outputs and c values in the configuration appear to need to be the same for convolution layers -- not exactly sure why. But the implication of that is either: - change https://gem5.googlesource.com/public/gem5-resources/+/refs/heads/develop/src/gpu/DNNMark/config_example/conv_config.dnnmark#7 to 32 - or change https://gem5.googlesource.com/public/gem5-resources/+/refs/heads/develop/src/gpu/DNNMark/config_example/conv_config.dnnmark#12 to 3 and leave line 7 as is Either change results in the X and W values being the same size for the convolution. I do not understand the MIOpen code well enough to appreciate the implications of this, but hopefully this is helpful and you can check if it runs in gem5 now. Sidenote: if you want to see the details, you can set this: export MIOPEN_ENABLE_LOGGING=1 This will produce a lot of output when you run the benchmark, which is how I saw that X (really Y, as it seems convolution is assuming X and Y are flipped since it's in transposed mode) and W were being set to 32 and 3. It is also possible that the issue here is that the MIOpen code is assuming that the matrix is transposed and the original num_outputs and c values would work if not transposed, but again I don't quite understand the MIOpen code well enough to reason through this. Finally, I strongly recommend you reduce N anyways to get gem5 to run this in a reasonable amount of time. It took 7.4 seconds to run the provided config (with my change above to set num_ouputs to 3) on my real GPU, which is way longer than gem5 can simulate in a reasonable time. Matt On Tue, May 16, 2023 at 2:05 PM Matt Sinclair <mattdsinclair.wisc@gmail.com> wrote: > Ok, I've verified this fails on my AMD GPU with ROCm 4 with the exact same > error: > > [sinclair@eldin] (59)$ > ./build/benchmarks/test_fwd_conv/dnnmark_test_fwd_conv -config > config_example/conv_config.dnnmark --warmup 0 --debuginfo 1 > 13:55:12.799629 2314023 test_fwd_conv.cc:11] DNNMark suites: Start... > 13:55:12.841578 2314023 dnnmark.cc:232] Search and parse general DNNMark > configuration > 13:55:12.842526 2314023 dnnmark.cc:292] Search and parse layer > configuration > 13:55:12.842532 2314023 dnnmark.cc:306] Add [Convolution] layer > 13:55:12.842555 2314023 dnn_layer.h:84] Layer name: conv1 > 13:55:12.842566 2314023 dnn_layer.h:89] Previous layer: null > 13:55:12.842574 2314023 dnnmark.cc:361] DNNMark: Initialize... > 13:55:12.842576 2314023 dnnmark.cc:362] Running mode: 1 > 13:55:12.842579 2314023 dnnmark.cc:363] Number of Layers: 1 > 13:55:12.842581 2314023 dnnmark.cc:365] Layer type: 1 > 13:55:12.842583 2314023 dnnmark.cc:367] DNNMark: Setup parameters of > Convolution layer > 13:55:12.842586 2314023 dnn_layer.h:110] Bottom dimension: N: 126 C: 3 H: > 256 W: 256 > 13:55:12.842594 2314023 data_manager.h:44] Create Data chunk of size > 24772608 > 13:55:12.860227 2314023 data_manager.h:101] Create data with ID: 0 > 13:55:12.860245 2314023 data_manager.h:44] Create Data chunk of size > 24772608 > 13:55:12.876646 2314023 data_manager.h:101] Create data with ID: 1 > 13:55:12.876678 2314023 data_manager.h:44] Create Data chunk of size > 264241152 > 13:55:13.057176 2314023 data_manager.h:101] Create data with ID: 2 > 13:55:13.057200 2314023 data_manager.h:44] Create Data chunk of size > 264241152 > 13:55:13.242066 2314023 data_manager.h:101] Create data with ID: 3 > 13:55:13.242089 2314023 data_manager.h:44] Create Data chunk of size 2400 > 13:55:13.242357 2314023 data_manager.h:101] Create data with ID: 4 > 13:55:13.242363 2314023 data_manager.h:44] Create Data chunk of size 2400 > 13:55:13.242486 2314023 data_manager.h:101] Create data with ID: 5 > MIOpen Error: /root/driver/MLOpen/src/ocl/convolutionocl.cpp:150: Invalid > filter channel number > 13:55:13.405368 2314023 conv_layer.h:220] Setting Bwd Filter Algo to > MIOpen Error: /root/driver/MLOpen/src/ocl/convolutionocl.cpp:150: Invalid > filter channel number > 13:55:13.405539 2314023 data_manager.h:53] Free Data chunk of size 2400 > I0516 13:55:13.405624 2314023 data_manager.h:53] Free Data chunk of size > 2400 > I0516 13:55:13.405642 2314023 data_manager.h:53] Free Data chunk of size > 264241152 > I0516 13:55:13.428352 2314023 data_manager.h:53] Free Data chunk of size > 264241152 > I0516 13:55:13.451712 2314023 data_manager.h:53] Free Data chunk of size > 24772608 > I0516 13:55:13.454205 2314023 data_manager.h:53] Free Data chunk of size > 24772608 > MIOpen Error: 3 at > /nobackup/sinclair/gem5-resources/src/gpu/DNNMark/core/include/dnn_utility.h1057 > > So it appears this is a problem with MIOpen/ROCm, not a gem5 problem. > Obviously this isn't ideal, but I'm also not sure how easy it's going to be > for us to fix -- we could try determining how MIOpen is setting up the > layers and see if there is an easy fix ... but if the issues are deeper > than something we can change in a small patch or in the config file, I'm > not sure what we can do. > > Matt > > On Tue, May 16, 2023 at 1:33 PM Matt Sinclair < > mattdsinclair.wisc@gmail.com> wrote: > >> X is referring to this: >> https://github.com/ROCmSoftwarePlatform/MIOpen/blob/rocm-4.0.1-release/src/ocl/convolutionocl.cpp#L149, >> which is where your failure is occurring. >> >> Do you happen to have an AMD GPU handy? If so, can you try running conv >> on the real GPU with ROCM 4.0.1 and let me know what happens? >> >> Matt >> >> On Tue, May 16, 2023 at 1:58 AM 429442672 <429442672@qq.com> wrote: >> >>> unfortunately, i tried to change n to 2,4,8,16, and get the same error. >>> >>> May i ask what X means? >>> >>> ------------------ 原始邮件 ------------------ >>> *发件人:* "Matt Sinclair" <mattdsinclair.wisc@gmail.com>; >>> *发送时间:* 2023年5月16日(星期二) 中午12:46 >>> *收件人:* "429442672"<429442672@qq.com>; >>> *抄送:* "gem5-dev"<gem5-dev@gem5.org>;"gem5-users"<gem5-users@gem5.org>; >>> *主题:* Re: [gem5-dev] GEM5-GCN-DNNMark get Invalid filter channel number >>> when running: dnnmark_test_VGG, dnnmark_test_alexnet, dnnmark_test_fwd_conv >>> >>> It is a bit odd that the default batch size is 126, which is not a power >>> of 2. If you make it 4, 8, or 16 what happens? >>> >>> If this doesn't resolve we would need to determine which of NCHW is >>> being set to X in the MIOpen code to determine if it's a config problem, >>> MIOpen problem, or a gem5 problem (i.e., if X is H then both H and W are >>> 256 in the config file). >>> >>> Matt >>> >>> On Mon, May 15, 2023 at 11:05 PM 429442672 <429442672@qq.com> wrote: >>> >>>> Thank you,I would appreciate it very much if you could help me to solve >>>> this problem. >>>> >>>> the command i apply is: >>>> sudo docker run --rm -v ${PWD}:${PWD} -v >>>> ${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 >>>> -w ${PWD} gcn-gpu23 gem5/build/GCN3_X86/gem5.opt >>>> gem5/configs/example/apu_se.py -n3 >>>> --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_conv >>>> -cdnnmark_test_fwd_conv >>>> --options="-config >>>> gem5-resources/src/gpu/DNNMark/config_example/conv_config.dnnmark -mmap >>>> gem5-resources/src/gpu/DNNMark/mmap.bin" >>>> >>>> 1. yes, i still see this error. >>>> the generated cache is placed here: >>>> >>>> >>>> 2. i used the default setting in DNNMARK: >>>> >>>> >>>> >>>> besides, >>>> >>>> I also run softmax by: >>>> sudo docker run --rm -v ${PWD}:${PWD} -v >>>> ${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 >>>> -w ${PWD} gcn-gpu23 gem5/build/GCN3_X86/gem5.opt >>>> gem5/configs/example/apu_se.py -n3 >>>> --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_softmax >>>> -cdnnmark_test_fwd_softmax >>>> --options="-config >>>> gem5-resources/src/gpu/DNNMark/config_example/softmax_config.dnnmark -mmap >>>> gem5-resources/src/gpu/DNNMark/mmap.bin" >>>> >>>> and get >>>> >>>> Exiting because exiting with last active thread context. >>>> >>>> the detailed logs of running conv and softmax are attached in this emai. >>>> >>>> Thank you! >>>> >>>> ------------------ 原始邮件 ------------------ >>>> *发件人:* "Matt Sinclair" <mattdsinclair.wisc@gmail.com>; >>>> *发送时间:* 2023年5月15日(星期一) 凌晨4:34 >>>> *收件人:* "429442672"<429442672@qq.com>; >>>> *抄送:* "gem5-dev"<gem5-dev@gem5.org>;"gem5-users"<gem5-users@gem5.org>; >>>> *主题:* Re: [gem5-dev] GEM5-GCN-DNNMark get Invalid filter channel >>>> number when running: dnnmark_test_VGG, dnnmark_test_alexnet, >>>> dnnmark_test_fwd_conv >>>> >>>> Thanks, this is helpful. Looking through those old email chains, I >>>> don't see any specific resolution to them, unfortunately. I do not have a >>>> ton of time to dig into this (end of the semester is keeping me busy) but >>>> if you can keep digging I may be able to provide some ideas. >>>> >>>> First, are you still seeing this error: >>>> >>>> MIOpen(HIP): Warning [ParseAndLoadDb] File is unreadable: >>>> /opt/rocm-4.0.1/miopen/share/miopen/db/gfx801_4.HIP.fdb.txt >>>> >>>> after making the above change to include the MIOpen cache? >>>> >>>> Second, what layer size are you assuming/trying to conv? The failure >>>> shortly after is here: >>>> https://github.com/ROCmSoftwarePlatform/MIOpen/blob/rocm-4.0.1-release/src/ocl/convolutionocl.cpp#L149. >>>> It seems to imply that X and W are not equal, but we'd need to dig to >>>> figure out if this is because of the config file being passed in, or >>>> something in MIOpen/gem5 that is breaking it. Given that the function call >>>> that is failing is trying to create a tensor: >>>> https://github.com/shidong-ai/DNNMark/blob/develop/core/include/dnn_utility.h#L106, >>>> my guess is that it's something with the config, because something so basic >>>> probably (hopefully?) doesn't fail in MIOpen... >>>> >>>> In terms of if it should work or not, I don't see that we included it >>>> in prior papers (e.g., >>>> https://www.gem5.org/assets/files/papers/enabling2021ispass.pdf) but I >>>> don't know if that was because we didn't try or if there was a deeper, more >>>> fundamental reason). >>>> >>>> Matt >>>> >>>> On Sat, May 13, 2023 at 10:34 AM 429442672 <429442672@qq.com> wrote: >>>> >>>>> Thank you so much for advice. >>>>> >>>>> Acturally, i have made the cachefiles as shown in the figure. >>>>> >>>>> Besides, i have succesfully run several benchmarks such as pool, >>>>> activations, softmax, so i think the kernels is setuped. >>>>> The reason of the differences in commands is that i directly run the >>>>> command in the docker container. >>>>> >>>>> There might be a common problem when running the network with conv, >>>>> some other email such as >>>>> https://www.mail-archive.com/gem5-users@gem5.org/msg20468.html >>>>> https://www.mail-archive.com/gem5-users@gem5.org/msg20456.html >>>>> also met this problem. >>>>> >>>>> I have also tried to use the latest docker and the original command >>>>> like this: >>>>> >>>>> docker run --rm -v ${PWD}:${PWD} -v ${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 -w ${PWD} gcr.io/gem5-test/gcn-gpu:v22-1 gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py -n3 --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_conv -cdnnmark_test_fwd_conv --options="-config gem5-resources/src/gpu/DNNMark/config_example/conv_config.dnnmark -mmap gem5-resources/src/gpu/DNNMark/mmap.bin" >>>>> >>>>> but still got the same error. >>>>> >>>>> Is that means conv layers is currently no available for gem5-gcn? >>>>> >>>>> May i ask is there anyone else met this problem before? >>>>> >>>>> thank you! >>>>> >>>>> >>>>> ------------------ 原始邮件 ------------------ >>>>> *发件人:* "Matt Sinclair" <mattdsinclair.wisc@gmail.com>; >>>>> *发送时间:* 2023年5月12日(星期五) 中午11:42 >>>>> *收件人:* "The gem5 Developer List"<gem5-dev@gem5.org>; >>>>> *抄送:* "gem5-users"<gem5-users@gem5.org>;"429442672"<429442672@qq.com>; >>>>> *主题:* Re: [gem5-dev] GEM5-GCN-DNNMark get Invalid filter channel >>>>> number when running: dnnmark_test_VGG, dnnmark_test_alexnet, >>>>> dnnmark_test_fwd_conv >>>>> >>>>> I have not tried running these specific benchmarks in gem5 personally, >>>>> so I cannot say for certain what the error is or even if they are expected >>>>> to run to completion in gem5. But, normally the error you're seeing >>>>> happens because you have not created the appropriate "cache" files for the >>>>> GPU kernel(s) the program is trying to run. MIOpen first checks to see if >>>>> the desired kernel has been run on your machine before, and if not it tries >>>>> to do online compilation of that kernel. Unfortunately online compilation >>>>> of kernels in gem5 is a) very slow and b), because it is very slow, not >>>>> supported in gem5 (basically, it is so slow as to not be worth supporting >>>>> in many cases, in my opinion). So, instead, the expectation is that we >>>>> build the kernels we want ahead of time, before running the program. You >>>>> may have seen in the examples we provide for DNNMark ( >>>>> https://resources.gem5.org/resources/dnn-mark) that we have this >>>>> "generate cachefiles" script -- that is exactly what the purpose of that >>>>> script is. Moreover, on the same webpage, you may have noticed we include >>>>> the path to that cache directory in our docker commands (emphasis mine): >>>>> >>>>> docker run --rm -v ${PWD}:${PWD} -v *${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0* -w ${PWD} gcr.io/gem5-test/gcn-gpu:v22-1 gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py -n3 --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_softmax -cdnnmark_test_fwd_softmax --options="-config gem5-resources/src/gpu/DNNMark/config_example/softmax_config.dnnmark -mmap gem5-resources/src/gpu/DNNMark/mmap.bin" >>>>> >>>>> From looking at your commands, I don't see you including this. Thus, >>>>> while I do not know if that script by default produces the kernels needed >>>>> for, say, AlexNet, I strongly suspect you should start by running that >>>>> script and updating your docker commands to include the cache stuff ... >>>>> then see what happens from there. >>>>> >>>>> Sidenote: normally when I see this: >>>>> >>>>> MIOpen(HIP): Warning [ParseAndLoadDb] File is unreadable: >>>>> /opt/rocm-4.0.1/miopen/share/miopen/db/gfx801_4.HIP.fdb.txt >>>>> >>>>> It means that the files MIOpen is expecting are not setup properly. >>>>> Normally I just symlink these extra files -- e.g., symlink gfx801_4... from >>>>> gfx801_32 ... (this is not the best performing, option because the >>>>> resources are different, but provides a basic setup step to avoid problems >>>>> like this). >>>>> Matt >>>>> >>>>> On Thu, May 11, 2023 at 10:02 PM 429442672 via gem5-dev < >>>>> gem5-dev@gem5.org> wrote: >>>>> >>>>>> Dear Matt, >>>>>> When i run the benchmarks: dnnmark_test_VGG, >>>>>> dnnmark_test_alexnet, dnnmark_test_fwd_conv, and i got the same error like >>>>>> this (get Invalid filter channel number): >>>>>> >>>>>> build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall >>>>>> fdatasync(...) >>>>>> build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall >>>>>> fdatasync(...) >>>>>> build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall >>>>>> fdatasync(...) >>>>>> build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall >>>>>> fdatasync(...) >>>>>> build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported >>>>>> command 6 >>>>>> build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported >>>>>> command 6 >>>>>> build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported >>>>>> command 6 >>>>>> build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported >>>>>> command 6 >>>>>> build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported >>>>>> command 6 >>>>>> build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported >>>>>> command 6 >>>>>> build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported >>>>>> command 6 >>>>>> build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported >>>>>> command 6 >>>>>> build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported >>>>>> command 6 >>>>>> build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported >>>>>> command 6 >>>>>> build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported >>>>>> command 6 >>>>>> MIOpen(HIP): Warning [ParseAndLoadDb] File is unreadable: >>>>>> /opt/rocm-4.0.1/miopen/share/miopen/db/gfx801_4.HIP.fdb.txt >>>>>> build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported >>>>>> command 6 >>>>>> build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported >>>>>> command 6 >>>>>> MIOpen Error: /root/driver/MLOpen/src/ocl/convolutionocl.cpp:150: >>>>>> Invalid filter channel number >>>>>> build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported >>>>>> command 6 >>>>>> build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported >>>>>> command 6 >>>>>> MIOpen Error: /root/driver/MLOpen/src/ocl/convolutionocl.cpp:150: >>>>>> Invalid filter channel number >>>>>> MIOpen Error: 3 at >>>>>> /home/tang/gem5-resources/src/gpu/DNNMark/core/include/dnn_utility.h1057Ticks: >>>>>> 327510244000 >>>>>> Exiting because exiting with last active thread context >>>>>> >>>>>> >>>>>> My command line is: >>>>>> >>>>>> gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py >>>>>> -n 8 --mem-size=12GB >>>>>> --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_conv >>>>>> -c dnnmark_test_fwd_conv >>>>>> --options="-config >>>>>> gem5-resources/src/gpu/DNNMark/config_example/conv_config.dnnmark -mmap >>>>>> gem5-resources/src/gpu/DNNMark/mmap.bin" >>>>>> >>>>>> and i didn't change the setup of the default setup of >>>>>> conv_config.dnnmark and gfx801 >>>>>> >>>>>> May i ask, did i do something wrong here? >>>>>> Have you ever test those benchmark without error, and could you >>>>>> please show me several your configurations? >>>>>> >>>>>> Thank you so much! >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> gem5-dev mailing list -- gem5-dev@gem5.org >>>>>> To unsubscribe send an email to gem5-dev-leave@gem5.org >>>>>> >>>>>
4
429442672
Wed, May 17, 2023 12:48 AM

Thank you so much!I will try it latter!

------------------ 原始邮件 ------------------
发件人:                                                                                                                        "Matt Sinclair"                                                                                    <mattdsinclair.wisc@gmail.com>;
发送时间: 2023年5月17日(星期三) 上午8:04
收件人: "429442672"<429442672@qq.com>;
抄送: "gem5-dev"<gem5-dev@gem5.org>;"gem5-users"<gem5-users@gem5.org>;
主题: Re: [gem5-dev] GEM5-GCN-DNNMark get Invalid filter channel number when running: dnnmark_test_VGG, dnnmark_test_alexnet, dnnmark_test_fwd_conv

Ok I've found a fix to this (i.e., that works on a real AMD GPU) although I don't have a perfect understanding of why it is required.  The bottom line is that the num_outputs and c values in the configuration appear to need to be the same for convolution layers -- not exactly sure why.

But the implication of that is either:

Either change results in the X and W values being the same size for the convolution.  I do not understand the MIOpen code well enough to appreciate the implications of this, but hopefully this is helpful and you can check if it runs in gem5 now.

Sidenote: if you want to see the details, you can set this:

export MIOPEN_ENABLE_LOGGING=1

This will produce a lot of output when you run the benchmark, which is how I saw that X (really Y, as it seems convolution is assuming X and Y are flipped since it's in transposed mode) and W were being set to 32 and 3.  It is also possible that the issue here is that the MIOpen code is assuming that the matrix is transposed and the original num_outputs and c values would work if not transposed, but again I don't quite understand the MIOpen code well enough to reason through this.

Finally, I strongly recommend you reduce N anyways to get gem5 to run this in a reasonable amount of time.  It took 7.4 seconds to run the provided config (with my change above to set num_ouputs to 3) on my real GPU, which is way longer than gem5 can simulate in a reasonable time.

Matt

On Tue, May 16, 2023 at 2:05 PM Matt Sinclair <mattdsinclair.wisc@gmail.com> wrote:

Ok, I've verified this fails on my AMD GPU with ROCm 4 with the exact same error:

[sinclair@eldin] (59)$ ./build/benchmarks/test_fwd_conv/dnnmark_test_fwd_conv -config config_example/conv_config.dnnmark --warmup 0 --debuginfo 1
13:55:12.799629 2314023 test_fwd_conv.cc:11] DNNMark suites: Start...
13:55:12.841578 2314023 dnnmark.cc:232] Search and parse general DNNMark configuration
13:55:12.842526 2314023 dnnmark.cc:292] Search and parse layer configuration
13:55:12.842532 2314023 dnnmark.cc:306] Add [Convolution] layer
13:55:12.842555 2314023 dnn_layer.h:84] Layer name: conv1
13:55:12.842566 2314023 dnn_layer.h:89] Previous layer: null
13:55:12.842574 2314023 dnnmark.cc:361] DNNMark: Initialize...
13:55:12.842576 2314023 dnnmark.cc:362] Running mode: 1
13:55:12.842579 2314023 dnnmark.cc:363] Number of Layers: 1
13:55:12.842581 2314023 dnnmark.cc:365] Layer type: 1
13:55:12.842583 2314023 dnnmark.cc:367] DNNMark: Setup parameters of Convolution layer
13:55:12.842586 2314023 dnn_layer.h:110] Bottom dimension: N: 126 C: 3 H: 256 W: 256
13:55:12.842594 2314023 data_manager.h:44] Create Data chunk of size 24772608
13:55:12.860227 2314023 data_manager.h:101] Create data with ID: 0
13:55:12.860245 2314023 data_manager.h:44] Create Data chunk of size 24772608
13:55:12.876646 2314023 data_manager.h:101] Create data with ID: 1
13:55:12.876678 2314023 data_manager.h:44] Create Data chunk of size 264241152
13:55:13.057176 2314023 data_manager.h:101] Create data with ID: 2
13:55:13.057200 2314023 data_manager.h:44] Create Data chunk of size 264241152
13:55:13.242066 2314023 data_manager.h:101] Create data with ID: 3
13:55:13.242089 2314023 data_manager.h:44] Create Data chunk of size 2400
13:55:13.242357 2314023 data_manager.h:101] Create data with ID: 4
13:55:13.242363 2314023 data_manager.h:44] Create Data chunk of size 2400
13:55:13.242486 2314023 data_manager.h:101] Create data with ID: 5
MIOpen Error: /root/driver/MLOpen/src/ocl/convolutionocl.cpp:150: Invalid filter channel number
13:55:13.405368 2314023 conv_layer.h:220] Setting Bwd Filter Algo to
MIOpen Error: /root/driver/MLOpen/src/ocl/convolutionocl.cpp:150: Invalid filter channel number
13:55:13.405539 2314023 data_manager.h:53] Free Data chunk of size 2400
I0516 13:55:13.405624 2314023 data_manager.h:53] Free Data chunk of size 2400
I0516 13:55:13.405642 2314023 data_manager.h:53] Free Data chunk of size 264241152
I0516 13:55:13.428352 2314023 data_manager.h:53] Free Data chunk of size 264241152
I0516 13:55:13.451712 2314023 data_manager.h:53] Free Data chunk of size 24772608
I0516 13:55:13.454205 2314023 data_manager.h:53] Free Data chunk of size 24772608
MIOpen Error: 3 at /nobackup/sinclair/gem5-resources/src/gpu/DNNMark/core/include/dnn_utility.h1057

So it appears this is a problem with MIOpen/ROCm, not a gem5 problem.  Obviously this isn't ideal, but I'm also not sure how easy it's going to be for us to fix -- we could try determining how MIOpen is setting up the layers and see if there is an easy fix ... but if the issues are deeper than something we can change in a small patch or in the config file, I'm not sure what we can do.

Matt

On Tue, May 16, 2023 at 1:33 PM Matt Sinclair <mattdsinclair.wisc@gmail.com> wrote:

X is referring to this: https://github.com/ROCmSoftwarePlatform/MIOpen/blob/rocm-4.0.1-release/src/ocl/convolutionocl.cpp#L149, which is where your failure is occurring.

Do you happen to have an AMD GPU handy?  If so, can you try running conv on the real GPU with ROCM 4.0.1 and let me know what happens?

Matt

On Tue, May 16, 2023 at 1:58 AM 429442672 <429442672@qq.com> wrote:

unfortunately, i tried to change n to 2,4,8,16, and get the same error.

May i ask what X means?

------------------ 原始邮件 ------------------
发件人:                                                                                                                        "Matt Sinclair"                                                                                    <mattdsinclair.wisc@gmail.com>;
发送时间: 2023年5月16日(星期二) 中午12:46
收件人: "429442672"<429442672@qq.com>;
抄送: "gem5-dev"<gem5-dev@gem5.org>;"gem5-users"<gem5-users@gem5.org>;
主题: Re: [gem5-dev] GEM5-GCN-DNNMark get Invalid filter channel number when running: dnnmark_test_VGG, dnnmark_test_alexnet, dnnmark_test_fwd_conv

It is a bit odd that the default batch size is 126, which is not a power of 2.  If you make it 4, 8, or 16 what happens?

If this doesn't resolve we would need to determine which of NCHW is being set to X in the MIOpen code to determine if it's a config problem, MIOpen problem, or a gem5 problem (i.e., if X is H then both H and W are 256 in the config file).

Matt

On Mon, May 15, 2023 at 11:05 PM 429442672 <429442672@qq.com> wrote:

Thank you,I would appreciate it very much if you could help me to solve this problem.

the command i apply is:
sudo docker run --rm -v ${PWD}:${PWD} -v ${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 -w ${PWD} gcn-gpu23 gem5/build/GCN3_X86/gem5.opt
gem5/configs/example/apu_se.py -n3 --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_conv -cdnnmark_test_fwd_conv
--options="-config gem5-resources/src/gpu/DNNMark/config_example/conv_config.dnnmark -mmap gem5-resources/src/gpu/DNNMark/mmap.bin"

  1. yes, i still see this error.
         the generated cache is placed here:

  2. i used the default setting in DNNMARK:

besides, 

I also run softmax by:
sudo docker run --rm -v ${PWD}:${PWD} -v ${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 -w ${PWD} gcn-gpu23 gem5/build/GCN3_X86/gem5.opt
gem5/configs/example/apu_se.py -n3 --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_softmax -cdnnmark_test_fwd_softmax
--options="-config gem5-resources/src/gpu/DNNMark/config_example/softmax_config.dnnmark -mmap gem5-resources/src/gpu/DNNMark/mmap.bin"

and get

Exiting because  exiting with last active thread context.

the detailed logs of running conv and softmax are attached in this emai.

Thank you!

------------------ 原始邮件 ------------------
发件人:                                                                                                                        "Matt Sinclair"                                                                                    <mattdsinclair.wisc@gmail.com>;
发送时间: 2023年5月15日(星期一) 凌晨4:34
收件人: "429442672"<429442672@qq.com>;
抄送: "gem5-dev"<gem5-dev@gem5.org>;"gem5-users"<gem5-users@gem5.org>;
主题: Re: [gem5-dev] GEM5-GCN-DNNMark get Invalid filter channel number when running: dnnmark_test_VGG, dnnmark_test_alexnet, dnnmark_test_fwd_conv

Thanks, this is helpful.  Looking through those old email chains, I don't see any specific resolution to them, unfortunately.  I do not have a ton of time to dig into this (end of the semester is keeping me busy) but if you can keep digging I may be able to provide some ideas. 

First, are you still seeing this error:

MIOpen(HIP): Warning [ParseAndLoadDb] File is unreadable: /opt/rocm-4.0.1/miopen/share/miopen/db/gfx801_4.HIP.fdb.txt

after making the above change to include the MIOpen cache?

Second, what layer size are you assuming/trying to conv?  The failure shortly after is here: https://github.com/ROCmSoftwarePlatform/MIOpen/blob/rocm-4.0.1-release/src/ocl/convolutionocl.cpp#L149.  It seems to imply that X and W are not equal, but we'd need to dig to figure out if this is because of the config file being passed in, or something in MIOpen/gem5 that is breaking it.  Given that the function call that is failing is trying to create a tensor: https://github.com/shidong-ai/DNNMark/blob/develop/core/include/dnn_utility.h#L106, my guess is that it's something with the config, because something so basic probably (hopefully?) doesn't fail in MIOpen...

In terms of if it should work or not, I don't see that we included it in prior papers (e.g., https://www.gem5.org/assets/files/papers/enabling2021ispass.pdf) but I don't know if that was because we didn't try or if there was a deeper, more fundamental reason).

Matt

On Sat, May 13, 2023 at 10:34 AM 429442672 <429442672@qq.com> wrote:

Thank you so much for advice.

Acturally, i have made the cachefiles as shown in the figure.

Besides, i have succesfully run several benchmarks such as pool, activations, softmax, so i think the kernels is setuped.
The reason of the differences in commands is that i directly run the command in the docker container.

There might be a common problem when running the network with conv, some other email such as
https://www.mail-archive.com/gem5-users@gem5.org/msg20468.html
https://www.mail-archive.com/gem5-users@gem5.org/msg20456.html
also met this problem.

I have also tried to use the latest docker and the original command like this:
docker run --rm -v ${PWD}:${PWD} -v ${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 -w ${PWD} gcr.io/gem5-test/gcn-gpu:v22-1 gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py -n3 --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_conv -cdnnmark_test_fwd_conv --options="-config gem5-resources/src/gpu/DNNMark/config_example/conv_config.dnnmark -mmap gem5-resources/src/gpu/DNNMark/mmap.bin"but still got the same error.Is that means conv layers is currently no available for gem5-gcn?May i ask is there anyone else met this problem before?thank you!

------------------ 原始邮件 ------------------
发件人:                                                                                                                        "Matt Sinclair"                                                                                    <mattdsinclair.wisc@gmail.com>;
发送时间: 2023年5月12日(星期五) 中午11:42
收件人: "The gem5 Developer List"<gem5-dev@gem5.org>;
抄送: "gem5-users"<gem5-users@gem5.org>;"429442672"<429442672@qq.com>;
主题: Re: [gem5-dev] GEM5-GCN-DNNMark get Invalid filter channel number when running: dnnmark_test_VGG, dnnmark_test_alexnet, dnnmark_test_fwd_conv

I have not tried running these specific benchmarks in gem5 personally, so I cannot say for certain what the error is or even if they are expected to run to completion in gem5.  But, normally the error you're seeing happens because you have not created the appropriate "cache" files for the GPU kernel(s) the program is trying to run.  MIOpen first checks to see if the desired kernel has been run on your machine before, and if not it tries to do online compilation of that kernel.  Unfortunately online compilation of kernels in gem5 is a) very slow and b), because it is very slow, not supported in gem5 (basically, it is so slow as to not be worth supporting in many cases, in my opinion).  So, instead, the expectation is that we build the kernels we want ahead of time, before running the program.  You may have seen in the examples we provide for DNNMark (https://resources.gem5.org/resources/dnn-mark) that we have this "generate cachefiles" script -- that is exactly what the purpose of that script is.  Moreover, on the same webpage, you may have noticed we include the path to that cache directory in our docker commands (emphasis mine):
docker run --rm -v ${PWD}:${PWD} -v ${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 -w ${PWD} gcr.io/gem5-test/gcn-gpu:v22-1 gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py -n3 --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_softmax -cdnnmark_test_fwd_softmax --options="-config gem5-resources/src/gpu/DNNMark/config_example/softmax_config.dnnmark -mmap gem5-resources/src/gpu/DNNMark/mmap.bin"
From looking at your commands, I don't see you including this.  Thus, while I do not know if that script by default produces the kernels needed for, say, AlexNet, I strongly suspect you should start by running that script and updating your docker commands to include the cache stuff ... then see what happens from there.

Sidenote: normally when I see this:

MIOpen(HIP): Warning [ParseAndLoadDb] File is unreadable: /opt/rocm-4.0.1/miopen/share/miopen/db/gfx801_4.HIP.fdb.txt

It means that the files MIOpen is expecting are not setup properly.  Normally I just symlink these extra files -- e.g., symlink gfx801_4... from gfx801_32 ... (this is not the best performing, option because the resources are different, but provides a basic setup step to avoid problems like this).

Matt

On Thu, May 11, 2023 at 10:02 PM 429442672 via gem5-dev <gem5-dev@gem5.org> wrote:

Dear Matt,
     When i run the benchmarks: dnnmark_test_VGG, dnnmark_test_alexnet, dnnmark_test_fwd_conv, and i got the same error like this (get Invalid filter channel number):

build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall fdatasync(...)
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall fdatasync(...)
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall fdatasync(...)
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall fdatasync(...)
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
MIOpen(HIP): Warning [ParseAndLoadDb] File is unreadable: /opt/rocm-4.0.1/miopen/share/miopen/db/gfx801_4.HIP.fdb.txt
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
MIOpen Error: /root/driver/MLOpen/src/ocl/convolutionocl.cpp:150: Invalid filter channel number
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
MIOpen Error: /root/driver/MLOpen/src/ocl/convolutionocl.cpp:150: Invalid filter channel number
MIOpen Error: 3 at /home/tang/gem5-resources/src/gpu/DNNMark/core/include/dnn_utility.h1057Ticks: 327510244000
Exiting because  exiting with last active thread context

My command line is:
gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py
-n 8 --mem-size=12GB --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_conv -c dnnmark_test_fwd_conv
--options="-config gem5-resources/src/gpu/DNNMark/config_example/conv_config.dnnmark -mmap gem5-resources/src/gpu/DNNMark/mmap.bin"

and i didn't change the setup of the default setup of conv_config.dnnmark and gfx801

May i ask, did i do something wrong here?
Have you ever test those benchmark without error, and could you please show me several your configurations?

Thank you so much!


gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-leave@gem5.org

Thank you so much!I will try it latter! ------------------&nbsp;原始邮件&nbsp;------------------ 发件人: "Matt Sinclair" <mattdsinclair.wisc@gmail.com&gt;; 发送时间:&nbsp;2023年5月17日(星期三) 上午8:04 收件人:&nbsp;"429442672"<429442672@qq.com&gt;; 抄送:&nbsp;"gem5-dev"<gem5-dev@gem5.org&gt;;"gem5-users"<gem5-users@gem5.org&gt;; 主题:&nbsp;Re: [gem5-dev] GEM5-GCN-DNNMark get Invalid filter channel number when running: dnnmark_test_VGG, dnnmark_test_alexnet, dnnmark_test_fwd_conv Ok I've found a fix to this (i.e., that works on a real AMD GPU) although I don't have a perfect understanding of why it is required.&nbsp; The bottom line is that the num_outputs and c values in the configuration appear to need to be the same for convolution layers -- not exactly sure why. But the implication of that is either: - change https://gem5.googlesource.com/public/gem5-resources/+/refs/heads/develop/src/gpu/DNNMark/config_example/conv_config.dnnmark#7 to 32 - or change https://gem5.googlesource.com/public/gem5-resources/+/refs/heads/develop/src/gpu/DNNMark/config_example/conv_config.dnnmark#12 to 3 and leave line 7 as is Either change results in the X and W values being the same size for the convolution.&nbsp; I do not understand the MIOpen code well enough to appreciate the implications of this, but hopefully this is helpful and you can check if it runs in gem5 now. Sidenote: if you want to see the details, you can set this: export MIOPEN_ENABLE_LOGGING=1 This will produce a lot of output when you run the benchmark, which is how I saw that X (really Y, as it seems convolution is assuming X and Y are flipped since it's in transposed mode) and W were being set to 32 and 3.&nbsp; It is also possible that the issue here is that the MIOpen code is assuming that the matrix is transposed and the original num_outputs and c values would work if not transposed, but again I don't quite understand the MIOpen code well enough to reason through this. Finally, I strongly recommend you reduce N anyways to get gem5 to run this in a reasonable amount of time.&nbsp; It took 7.4 seconds to run the provided config (with my change above to set num_ouputs to 3) on my real GPU, which is way longer than gem5 can simulate in a reasonable time. Matt On Tue, May 16, 2023 at 2:05 PM Matt Sinclair <mattdsinclair.wisc@gmail.com&gt; wrote: Ok, I've verified this fails on my AMD GPU with ROCm 4 with the exact same error: [sinclair@eldin] (59)$ ./build/benchmarks/test_fwd_conv/dnnmark_test_fwd_conv -config config_example/conv_config.dnnmark --warmup 0 --debuginfo 1 13:55:12.799629 2314023 test_fwd_conv.cc:11] DNNMark suites: Start... 13:55:12.841578 2314023 dnnmark.cc:232] Search and parse general DNNMark configuration 13:55:12.842526 2314023 dnnmark.cc:292] Search and parse layer configuration 13:55:12.842532 2314023 dnnmark.cc:306] Add [Convolution] layer 13:55:12.842555 2314023 dnn_layer.h:84] Layer name: conv1 13:55:12.842566 2314023 dnn_layer.h:89] Previous layer: null 13:55:12.842574 2314023 dnnmark.cc:361] DNNMark: Initialize... 13:55:12.842576 2314023 dnnmark.cc:362] Running mode: 1 13:55:12.842579 2314023 dnnmark.cc:363] Number of Layers: 1 13:55:12.842581 2314023 dnnmark.cc:365] Layer type: 1 13:55:12.842583 2314023 dnnmark.cc:367] DNNMark: Setup parameters of Convolution layer 13:55:12.842586 2314023 dnn_layer.h:110] Bottom dimension: N: 126 C: 3 H: 256 W: 256 13:55:12.842594 2314023 data_manager.h:44] Create Data chunk of size 24772608 13:55:12.860227 2314023 data_manager.h:101] Create data with ID: 0 13:55:12.860245 2314023 data_manager.h:44] Create Data chunk of size 24772608 13:55:12.876646 2314023 data_manager.h:101] Create data with ID: 1 13:55:12.876678 2314023 data_manager.h:44] Create Data chunk of size 264241152 13:55:13.057176 2314023 data_manager.h:101] Create data with ID: 2 13:55:13.057200 2314023 data_manager.h:44] Create Data chunk of size 264241152 13:55:13.242066 2314023 data_manager.h:101] Create data with ID: 3 13:55:13.242089 2314023 data_manager.h:44] Create Data chunk of size 2400 13:55:13.242357 2314023 data_manager.h:101] Create data with ID: 4 13:55:13.242363 2314023 data_manager.h:44] Create Data chunk of size 2400 13:55:13.242486 2314023 data_manager.h:101] Create data with ID: 5 MIOpen Error: /root/driver/MLOpen/src/ocl/convolutionocl.cpp:150: Invalid filter channel number 13:55:13.405368 2314023 conv_layer.h:220] Setting Bwd Filter Algo to MIOpen Error: /root/driver/MLOpen/src/ocl/convolutionocl.cpp:150: Invalid filter channel number 13:55:13.405539 2314023 data_manager.h:53] Free Data chunk of size 2400 I0516 13:55:13.405624 2314023 data_manager.h:53] Free Data chunk of size 2400 I0516 13:55:13.405642 2314023 data_manager.h:53] Free Data chunk of size 264241152 I0516 13:55:13.428352 2314023 data_manager.h:53] Free Data chunk of size 264241152 I0516 13:55:13.451712 2314023 data_manager.h:53] Free Data chunk of size 24772608 I0516 13:55:13.454205 2314023 data_manager.h:53] Free Data chunk of size 24772608 MIOpen Error: 3 at /nobackup/sinclair/gem5-resources/src/gpu/DNNMark/core/include/dnn_utility.h1057 So it appears this is a problem with MIOpen/ROCm, not a gem5 problem.&nbsp; Obviously this isn't ideal, but I'm also not sure how easy it's going to be for us to fix -- we could try determining how MIOpen is setting up the layers and see if there is an easy fix ... but if the issues are deeper than something we can change in a small patch or in the config file, I'm not sure what we can do. Matt On Tue, May 16, 2023 at 1:33 PM Matt Sinclair <mattdsinclair.wisc@gmail.com&gt; wrote: X is referring to this: https://github.com/ROCmSoftwarePlatform/MIOpen/blob/rocm-4.0.1-release/src/ocl/convolutionocl.cpp#L149, which is where your failure is occurring. Do you happen to have an AMD GPU handy?&nbsp; If so, can you try running conv on the real GPU with ROCM 4.0.1 and let me know what happens? Matt On Tue, May 16, 2023 at 1:58 AM 429442672 <429442672@qq.com&gt; wrote: unfortunately, i tried to change n to 2,4,8,16, and get the same error. May i ask what X means? ------------------&nbsp;原始邮件&nbsp;------------------ 发件人: "Matt Sinclair" <mattdsinclair.wisc@gmail.com&gt;; 发送时间:&nbsp;2023年5月16日(星期二) 中午12:46 收件人:&nbsp;"429442672"<429442672@qq.com&gt;; 抄送:&nbsp;"gem5-dev"<gem5-dev@gem5.org&gt;;"gem5-users"<gem5-users@gem5.org&gt;; 主题:&nbsp;Re: [gem5-dev] GEM5-GCN-DNNMark get Invalid filter channel number when running: dnnmark_test_VGG, dnnmark_test_alexnet, dnnmark_test_fwd_conv It is a bit odd that the default batch size is 126, which is not a power of 2.&nbsp; If you make it 4, 8, or 16 what happens? If this doesn't resolve we would need to determine which of NCHW is being set to X in the MIOpen code to determine if it's a config problem, MIOpen problem, or a gem5 problem (i.e., if X is H then both H and W are 256 in the config file). Matt On Mon, May 15, 2023 at 11:05 PM 429442672 <429442672@qq.com&gt; wrote: Thank you,I would appreciate it very much if you could help me to solve this problem. the command i apply is: sudo docker run --rm -v ${PWD}:${PWD} -v ${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 -w ${PWD} gcn-gpu23 gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py -n3 --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_conv -cdnnmark_test_fwd_conv --options="-config gem5-resources/src/gpu/DNNMark/config_example/conv_config.dnnmark -mmap gem5-resources/src/gpu/DNNMark/mmap.bin" 1. yes, i still see this error. &nbsp; &nbsp; &nbsp;the generated cache is placed here: 2. i used the default setting in DNNMARK: besides,&nbsp; I also run softmax by: sudo docker run --rm -v ${PWD}:${PWD} -v ${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 -w ${PWD} gcn-gpu23 gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py -n3 --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_softmax -cdnnmark_test_fwd_softmax --options="-config gem5-resources/src/gpu/DNNMark/config_example/softmax_config.dnnmark -mmap gem5-resources/src/gpu/DNNMark/mmap.bin" and get Exiting because exiting with last active thread context. the detailed logs of running conv and softmax are attached in this emai. Thank you! ------------------&nbsp;原始邮件&nbsp;------------------ 发件人: "Matt Sinclair" <mattdsinclair.wisc@gmail.com&gt;; 发送时间:&nbsp;2023年5月15日(星期一) 凌晨4:34 收件人:&nbsp;"429442672"<429442672@qq.com&gt;; 抄送:&nbsp;"gem5-dev"<gem5-dev@gem5.org&gt;;"gem5-users"<gem5-users@gem5.org&gt;; 主题:&nbsp;Re: [gem5-dev] GEM5-GCN-DNNMark get Invalid filter channel number when running: dnnmark_test_VGG, dnnmark_test_alexnet, dnnmark_test_fwd_conv Thanks, this is helpful.&nbsp; Looking through those old email chains, I don't see any specific resolution to them, unfortunately.&nbsp; I do not have a ton of time to dig into this (end of the semester is keeping me busy) but if you can keep digging I may be able to provide some ideas.&nbsp; First, are you still seeing this error: MIOpen(HIP): Warning [ParseAndLoadDb] File is unreadable: /opt/rocm-4.0.1/miopen/share/miopen/db/gfx801_4.HIP.fdb.txt after making the above change to include the MIOpen cache? Second, what layer size are you assuming/trying to conv?&nbsp; The failure shortly after is here: https://github.com/ROCmSoftwarePlatform/MIOpen/blob/rocm-4.0.1-release/src/ocl/convolutionocl.cpp#L149.&nbsp; It seems to imply that X and W are not equal, but we'd need to dig to figure out if this is because of the config file being passed in, or something in MIOpen/gem5 that is breaking it.&nbsp; Given that the function call that is failing is trying to create a tensor: https://github.com/shidong-ai/DNNMark/blob/develop/core/include/dnn_utility.h#L106, my guess is that it's something with the config, because something so basic probably (hopefully?) doesn't fail in MIOpen... In terms of if it should work or not, I don't see that we included it in prior papers (e.g., https://www.gem5.org/assets/files/papers/enabling2021ispass.pdf) but I don't know if that was because we didn't try or if there was a deeper, more fundamental reason). Matt On Sat, May 13, 2023 at 10:34 AM 429442672 <429442672@qq.com&gt; wrote: Thank you so much for advice. Acturally, i have made the cachefiles as shown in the figure. Besides, i have succesfully run several benchmarks such as pool, activations, softmax, so i think the kernels is setuped. The reason of the differences in commands is that i directly run the command in the docker container. There might be a common problem when running the network with conv, some other email such as https://www.mail-archive.com/gem5-users@gem5.org/msg20468.html https://www.mail-archive.com/gem5-users@gem5.org/msg20456.html also met this problem. I have also tried to use the latest docker and the original command like this: docker run --rm -v ${PWD}:${PWD} -v ${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 -w ${PWD} gcr.io/gem5-test/gcn-gpu:v22-1 gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py -n3 --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_conv -cdnnmark_test_fwd_conv --options="-config gem5-resources/src/gpu/DNNMark/config_example/conv_config.dnnmark -mmap gem5-resources/src/gpu/DNNMark/mmap.bin"but still got the same error.Is that means conv layers is currently no available for gem5-gcn?May i ask is there anyone else met this problem before?thank you! ------------------ 原始邮件 ------------------ 发件人: "Matt Sinclair" <mattdsinclair.wisc@gmail.com&gt;; 发送时间:&nbsp;2023年5月12日(星期五) 中午11:42 收件人:&nbsp;"The gem5 Developer List"<gem5-dev@gem5.org&gt;; 抄送:&nbsp;"gem5-users"<gem5-users@gem5.org&gt;;"429442672"<429442672@qq.com&gt;; 主题:&nbsp;Re: [gem5-dev] GEM5-GCN-DNNMark get Invalid filter channel number when running: dnnmark_test_VGG, dnnmark_test_alexnet, dnnmark_test_fwd_conv I have not tried running these specific benchmarks in gem5 personally, so I cannot say for certain what the error is or even if they are expected to run to completion in gem5.&nbsp; But, normally the error you're seeing happens because you have not created the appropriate "cache" files for the GPU kernel(s) the program is trying to run.&nbsp; MIOpen first checks to see if the desired kernel has been run on your machine before, and if not it tries to do online compilation of that kernel.&nbsp; Unfortunately online compilation of kernels in gem5 is a) very slow and b), because it is very slow, not supported in gem5 (basically, it is so slow as to not be worth supporting in many cases, in my opinion).&nbsp; So, instead, the expectation is that we build the kernels we want ahead of time, before running the program.&nbsp; You may have seen in the examples we provide for DNNMark (https://resources.gem5.org/resources/dnn-mark) that we have this "generate cachefiles" script -- that is exactly what the purpose of that script is.&nbsp; Moreover, on the same webpage, you may have noticed we include the path to that cache directory in our docker commands (emphasis mine): docker run --rm -v ${PWD}:${PWD} -v ${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 -w ${PWD} gcr.io/gem5-test/gcn-gpu:v22-1 gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py -n3 --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_softmax -cdnnmark_test_fwd_softmax --options="-config gem5-resources/src/gpu/DNNMark/config_example/softmax_config.dnnmark -mmap gem5-resources/src/gpu/DNNMark/mmap.bin" From looking at your commands, I don't see you including this.&nbsp; Thus, while I do not know if that script by default produces the kernels needed for, say, AlexNet, I strongly suspect you should start by running that script and updating your docker commands to include the cache stuff ... then see what happens from there. Sidenote: normally when I see this: MIOpen(HIP): Warning [ParseAndLoadDb] File is unreadable: /opt/rocm-4.0.1/miopen/share/miopen/db/gfx801_4.HIP.fdb.txt It means that the files MIOpen is expecting are not setup properly.&nbsp; Normally I just symlink these extra files -- e.g., symlink gfx801_4... from gfx801_32 ... (this is not the best performing, option because the resources are different, but provides a basic setup step to avoid problems like this). Matt On Thu, May 11, 2023 at 10:02 PM 429442672 via gem5-dev <gem5-dev@gem5.org&gt; wrote: Dear Matt, &nbsp; &nbsp; &nbsp;When i run the benchmarks: dnnmark_test_VGG, dnnmark_test_alexnet, dnnmark_test_fwd_conv, and i got the same error like this (get Invalid filter channel number): build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall fdatasync(...) build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall fdatasync(...) build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall fdatasync(...) build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall fdatasync(...) build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 MIOpen(HIP): Warning [ParseAndLoadDb] File is unreadable: /opt/rocm-4.0.1/miopen/share/miopen/db/gfx801_4.HIP.fdb.txt build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 MIOpen Error: /root/driver/MLOpen/src/ocl/convolutionocl.cpp:150: Invalid filter channel number build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 MIOpen Error: /root/driver/MLOpen/src/ocl/convolutionocl.cpp:150: Invalid filter channel number MIOpen Error: 3 at /home/tang/gem5-resources/src/gpu/DNNMark/core/include/dnn_utility.h1057Ticks: 327510244000 Exiting because exiting with last active thread context My command line is: gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py -n 8 --mem-size=12GB --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_conv -c dnnmark_test_fwd_conv --options="-config gem5-resources/src/gpu/DNNMark/config_example/conv_config.dnnmark -mmap gem5-resources/src/gpu/DNNMark/mmap.bin" and i didn't change the setup of the default setup of conv_config.dnnmark and gfx801 May i ask, did i do something wrong here? Have you ever test those benchmark without error, and could you please show me several your configurations? Thank you so much! _______________________________________________ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-leave@gem5.org
4
429442672
Wed, May 17, 2023 6:55 AM

Thank you very much for all your help. 

I trid your suggestion (set c = num_outputs = 3), but i got another error. May i ask is there any further ideas to solve that?

thank you.


log infos:

build/GCN3_X86/sim/syscall_emul.cc:85: warn: ignoring syscall rt_sigaction(...)      (further warnings will be suppressed) build/GCN3_X86/sim/syscall_emul.cc:85: warn: ignoring syscall rt_sigprocmask(...)      (further warnings will be suppressed) build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall get_mempolicy(...) build/GCN3_X86/arch/x86/generated/exec-ns.cc.inc:27: warn: instruction 'frndint' unimplemented build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall mprotect(...) build/GCN3_X86/sim/mem_state.cc:443: info: Increasing stack size by one page. build/GCN3_X86/gpu-compute/gpu_compute_driver.cc:710: warn: unimplemented ioctl: AMDKFD_IOC_ACQUIRE_VM build/GCN3_X86/sim/syscall_emul.hh:1890: warn: mmap: writing to shared mmap region is currently unsupported. The write succeeds on the target, but it will not be propagated to the host or shared mappings build/GCN3_X86/gpu-compute/gpu_compute_driver.cc:460: warn: Signal events are only supported currently build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall mprotect(...) build/GCN3_X86/sim/power_state.cc:105: warn: PowerState: Already in the requested power state, request ignored build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall set_robust_list(...) build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall mprotect(...) build/GCN3_X86/gpu-compute/gpu_compute_driver.cc:604: warn: unimplemented ioctl: AMDKFD_IOC_SET_SCRATCH_BACKING_VA build/GCN3_X86/gpu-compute/gpu_compute_driver.cc:614: warn: unimplemented ioctl: AMDKFD_IOC_SET_TRAP_HANDLER build/GCN3_X86/sim/syscall_emul.hh:2109: warn: prlimit: unimplemented resource 7 build/GCN3_X86/sim/syscall_emul.hh:2109: warn: prlimit: unimplemented resource 7 build/GCN3_X86/sim/mem_state.cc:443: info: Increasing stack size by one page. build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall mprotect(...) build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall mprotect(...) build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall mprotect(...) sh: 1: Cannot fork MIOpen(HIP): Error [ValidateGcnAssemblerImpl] Specified assembler does not support AMDGPU. Expect performance degradation. build/GCN3_X86/sim/mem_state.cc:443: info: Increasing stack size by one page. build/GCN3_X86/sim/mem_state.cc:443: info: Increasing stack size by one page. build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/mem_state.cc:443: info: Increasing stack size by one page. build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall fdatasync(...) build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall fdatasync(...) build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall fdatasync(...) build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall fdatasync(...) build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall fdatasync(...) build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall fdatasync(...) build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall fdatasync(...) build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall fdatasync(...) build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall fdatasync(...) build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall fdatasync(...) build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall fdatasync(...) build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall fdatasync(...) build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall fdatasync(...) build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall fdatasync(...) build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall fdatasync(...) build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall fdatasync(...) build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 MIOpen(HIP): Warning [ParseAndLoadDb] File is unreadable: /opt/rocm-4.0.1/miopen/share/miopen/db/gfx801_4.HIP.fdb.txt build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/arch/generic/debugfaults.hh:145: warn: MOVNTDQ: Ignoring non-temporal hint, modeling as cacheable! build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall mprotect(...) build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall set_robust_list(...) build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall mprotect(...) build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall madvise(...) build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall mprotect(...) build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall sched_setaffinity(...) build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall set_robust_list(...) build/GCN3_X86/sim/syscall_emul.cc:85: warn: ignoring syscall sched_yield(...)      (further warnings will be suppressed) build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall mprotect(...) build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall mprotect(...) build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall mprotect(...) build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall mprotect(...) build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall mprotect(...) build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall mprotect(...) build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall mprotect(...) build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall mprotect(...) build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall mprotect(...) build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall mprotect(...) build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall mprotect(...) build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall mprotect(...) build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 1030 build/GCN3_X86/arch/x86/faults.cc:166: panic: Tried to read unmapped address 0x20. PC: (0x7ffff30ee5fc=>0x7ffff30ee5ff).(0=>1), Instr:  MOV_R_M : ld  rdi, DS:[rdi] Memory Usage: 20185460 KBytes Program aborted at tick 173459458000 --- BEGIN LIBC BACKTRACE --- gem5/build/GCN3_X86/gem5.opt(+0x19efd50)[0x5630de7fed50] gem5/build/GCN3_X86/gem5.opt(+0x1a1425e)[0x5630de82325e] /lib/x86_64-linux-gnu/libpthread.so.0(+0x14420)[0x7f6087fdc420] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb)[0x7f608718200b] /lib/x86_64-linux-gnu/libc.so.6(abort+0x12b)[0x7f6087161859] gem5/build/GCN3_X86/gem5.opt(+0x4c20a5)[0x5630dd2d10a5] gem5/build/GCN3_X86/gem5.opt(+0xdf6efa)[0x5630ddc05efa] gem5/build/GCN3_X86/gem5.opt(+0x16b9804)[0x5630de4c8804] gem5/build/GCN3_X86/gem5.opt(+0x16b3dc8)[0x5630de4c2dc8] gem5/build/GCN3_X86/gem5.opt(+0x16b6bf3)[0x5630de4c5bf3] gem5/build/GCN3_X86/gem5.opt(+0x16b84e5)[0x5630de4c74e5] gem5/build/GCN3_X86/gem5.opt(+0xe3d322)[0x5630ddc4c322] gem5/build/GCN3_X86/gem5.opt(+0x16b1a54)[0x5630de4c0a54] gem5/build/GCN3_X86/gem5.opt(+0x16bb713)[0x5630de4ca713] gem5/build/GCN3_X86/gem5.opt(+0x122dfca)[0x5630de03cfca] gem5/build/GCN3_X86/gem5.opt(+0x12674ee)[0x5630de0764ee] gem5/build/GCN3_X86/gem5.opt(+0x16b4a62)[0x5630de4c3a62] gem5/build/GCN3_X86/gem5.opt(+0x1a03665)[0x5630de812665] gem5/build/GCN3_X86/gem5.opt(+0x1a2bab4)[0x5630de83aab4] gem5/build/GCN3_X86/gem5.opt(+0x1a2c093)[0x5630de83b093] gem5/build/GCN3_X86/gem5.opt(+0xadded2)[0x5630dd8eced2] gem5/build/GCN3_X86/gem5.opt(+0x4b6757)[0x5630dd2c5757] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x2a8748)[0x7f6088293748] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x8dd8)[0x7f6088068f48] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0x8fb)[0x7f60881b5e4b] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyFunction_Vectorcall+0x94)[0x7f6088293124] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x74d6d)[0x7f608805fd6d] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x7d86)[0x7f6088067ef6] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0x8fb)[0x7f60881b5e4b] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(PyEval_EvalCodeEx+0x42)[0x7f60881b61d2] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(PyEval_EvalCode+0x1f)[0x7f60881b65bf] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x1cfc01)[0x7f60881bac01] --- END LIBC BACKTRACE --- Failed to execute default signal handler!

------------------ 原始邮件 ------------------
发件人:                                                                                                                        "429442672"                                                                                    <429442672@qq.com>;
发送时间: 2023年5月17日(星期三) 上午8:48
收件人: "Matt Sinclair"<mattdsinclair.wisc@gmail.com>;
抄送: "gem5-dev"<gem5-dev@gem5.org>;"gem5-users"<gem5-users@gem5.org>;
主题: 回复: [gem5-dev] GEM5-GCN-DNNMark get Invalid filter channel number when running: dnnmark_test_VGG, dnnmark_test_alexnet, dnnmark_test_fwd_conv

Thank you so much!I will try it latter!

------------------ 原始邮件 ------------------
发件人:                                                                                                                        "Matt Sinclair"                                                                                    <mattdsinclair.wisc@gmail.com>;
发送时间: 2023年5月17日(星期三) 上午8:04
收件人: "429442672"<429442672@qq.com>;
抄送: "gem5-dev"<gem5-dev@gem5.org>;"gem5-users"<gem5-users@gem5.org>;
主题: Re: [gem5-dev] GEM5-GCN-DNNMark get Invalid filter channel number when running: dnnmark_test_VGG, dnnmark_test_alexnet, dnnmark_test_fwd_conv

Ok I've found a fix to this (i.e., that works on a real AMD GPU) although I don't have a perfect understanding of why it is required.  The bottom line is that the num_outputs and c values in the configuration appear to need to be the same for convolution layers -- not exactly sure why.

But the implication of that is either:

Either change results in the X and W values being the same size for the convolution.  I do not understand the MIOpen code well enough to appreciate the implications of this, but hopefully this is helpful and you can check if it runs in gem5 now.

Sidenote: if you want to see the details, you can set this:

export MIOPEN_ENABLE_LOGGING=1

This will produce a lot of output when you run the benchmark, which is how I saw that X (really Y, as it seems convolution is assuming X and Y are flipped since it's in transposed mode) and W were being set to 32 and 3.  It is also possible that the issue here is that the MIOpen code is assuming that the matrix is transposed and the original num_outputs and c values would work if not transposed, but again I don't quite understand the MIOpen code well enough to reason through this.

Finally, I strongly recommend you reduce N anyways to get gem5 to run this in a reasonable amount of time.  It took 7.4 seconds to run the provided config (with my change above to set num_ouputs to 3) on my real GPU, which is way longer than gem5 can simulate in a reasonable time.

Matt

On Tue, May 16, 2023 at 2:05 PM Matt Sinclair <mattdsinclair.wisc@gmail.com> wrote:

Ok, I've verified this fails on my AMD GPU with ROCm 4 with the exact same error:

[sinclair@eldin] (59)$ ./build/benchmarks/test_fwd_conv/dnnmark_test_fwd_conv -config config_example/conv_config.dnnmark --warmup 0 --debuginfo 1
13:55:12.799629 2314023 test_fwd_conv.cc:11] DNNMark suites: Start...
13:55:12.841578 2314023 dnnmark.cc:232] Search and parse general DNNMark configuration
13:55:12.842526 2314023 dnnmark.cc:292] Search and parse layer configuration
13:55:12.842532 2314023 dnnmark.cc:306] Add [Convolution] layer
13:55:12.842555 2314023 dnn_layer.h:84] Layer name: conv1
13:55:12.842566 2314023 dnn_layer.h:89] Previous layer: null
13:55:12.842574 2314023 dnnmark.cc:361] DNNMark: Initialize...
13:55:12.842576 2314023 dnnmark.cc:362] Running mode: 1
13:55:12.842579 2314023 dnnmark.cc:363] Number of Layers: 1
13:55:12.842581 2314023 dnnmark.cc:365] Layer type: 1
13:55:12.842583 2314023 dnnmark.cc:367] DNNMark: Setup parameters of Convolution layer
13:55:12.842586 2314023 dnn_layer.h:110] Bottom dimension: N: 126 C: 3 H: 256 W: 256
13:55:12.842594 2314023 data_manager.h:44] Create Data chunk of size 24772608
13:55:12.860227 2314023 data_manager.h:101] Create data with ID: 0
13:55:12.860245 2314023 data_manager.h:44] Create Data chunk of size 24772608
13:55:12.876646 2314023 data_manager.h:101] Create data with ID: 1
13:55:12.876678 2314023 data_manager.h:44] Create Data chunk of size 264241152
13:55:13.057176 2314023 data_manager.h:101] Create data with ID: 2
13:55:13.057200 2314023 data_manager.h:44] Create Data chunk of size 264241152
13:55:13.242066 2314023 data_manager.h:101] Create data with ID: 3
13:55:13.242089 2314023 data_manager.h:44] Create Data chunk of size 2400
13:55:13.242357 2314023 data_manager.h:101] Create data with ID: 4
13:55:13.242363 2314023 data_manager.h:44] Create Data chunk of size 2400
13:55:13.242486 2314023 data_manager.h:101] Create data with ID: 5
MIOpen Error: /root/driver/MLOpen/src/ocl/convolutionocl.cpp:150: Invalid filter channel number
13:55:13.405368 2314023 conv_layer.h:220] Setting Bwd Filter Algo to
MIOpen Error: /root/driver/MLOpen/src/ocl/convolutionocl.cpp:150: Invalid filter channel number
13:55:13.405539 2314023 data_manager.h:53] Free Data chunk of size 2400
I0516 13:55:13.405624 2314023 data_manager.h:53] Free Data chunk of size 2400
I0516 13:55:13.405642 2314023 data_manager.h:53] Free Data chunk of size 264241152
I0516 13:55:13.428352 2314023 data_manager.h:53] Free Data chunk of size 264241152
I0516 13:55:13.451712 2314023 data_manager.h:53] Free Data chunk of size 24772608
I0516 13:55:13.454205 2314023 data_manager.h:53] Free Data chunk of size 24772608
MIOpen Error: 3 at /nobackup/sinclair/gem5-resources/src/gpu/DNNMark/core/include/dnn_utility.h1057

So it appears this is a problem with MIOpen/ROCm, not a gem5 problem.  Obviously this isn't ideal, but I'm also not sure how easy it's going to be for us to fix -- we could try determining how MIOpen is setting up the layers and see if there is an easy fix ... but if the issues are deeper than something we can change in a small patch or in the config file, I'm not sure what we can do.

Matt

On Tue, May 16, 2023 at 1:33 PM Matt Sinclair <mattdsinclair.wisc@gmail.com> wrote:

X is referring to this: https://github.com/ROCmSoftwarePlatform/MIOpen/blob/rocm-4.0.1-release/src/ocl/convolutionocl.cpp#L149, which is where your failure is occurring.

Do you happen to have an AMD GPU handy?  If so, can you try running conv on the real GPU with ROCM 4.0.1 and let me know what happens?

Matt

On Tue, May 16, 2023 at 1:58 AM 429442672 <429442672@qq.com> wrote:

unfortunately, i tried to change n to 2,4,8,16, and get the same error.

May i ask what X means?

------------------ 原始邮件 ------------------
发件人:                                                                                                                        "Matt Sinclair"                                                                                    <mattdsinclair.wisc@gmail.com>;
发送时间: 2023年5月16日(星期二) 中午12:46
收件人: "429442672"<429442672@qq.com>;
抄送: "gem5-dev"<gem5-dev@gem5.org>;"gem5-users"<gem5-users@gem5.org>;
主题: Re: [gem5-dev] GEM5-GCN-DNNMark get Invalid filter channel number when running: dnnmark_test_VGG, dnnmark_test_alexnet, dnnmark_test_fwd_conv

It is a bit odd that the default batch size is 126, which is not a power of 2.  If you make it 4, 8, or 16 what happens?

If this doesn't resolve we would need to determine which of NCHW is being set to X in the MIOpen code to determine if it's a config problem, MIOpen problem, or a gem5 problem (i.e., if X is H then both H and W are 256 in the config file).

Matt

On Mon, May 15, 2023 at 11:05 PM 429442672 <429442672@qq.com> wrote:

Thank you,I would appreciate it very much if you could help me to solve this problem.

the command i apply is:
sudo docker run --rm -v ${PWD}:${PWD} -v ${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 -w ${PWD} gcn-gpu23 gem5/build/GCN3_X86/gem5.opt
gem5/configs/example/apu_se.py -n3 --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_conv -cdnnmark_test_fwd_conv
--options="-config gem5-resources/src/gpu/DNNMark/config_example/conv_config.dnnmark -mmap gem5-resources/src/gpu/DNNMark/mmap.bin"

  1. yes, i still see this error.
         the generated cache is placed here:

  2. i used the default setting in DNNMARK:

besides,

I also run softmax by:
sudo docker run --rm -v ${PWD}:${PWD} -v ${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 -w ${PWD} gcn-gpu23 gem5/build/GCN3_X86/gem5.opt
gem5/configs/example/apu_se.py -n3 --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_softmax -cdnnmark_test_fwd_softmax
--options="-config gem5-resources/src/gpu/DNNMark/config_example/softmax_config.dnnmark -mmap gem5-resources/src/gpu/DNNMark/mmap.bin"

and get

Exiting because  exiting with last active thread context.

the detailed logs of running conv and softmax are attached in this emai.

Thank you!

------------------ 原始邮件 ------------------
发件人:                                                                                                                        "Matt Sinclair"                                                                                    <mattdsinclair.wisc@gmail.com>;
发送时间: 2023年5月15日(星期一) 凌晨4:34
收件人: "429442672"<429442672@qq.com>;
抄送: "gem5-dev"<gem5-dev@gem5.org>;"gem5-users"<gem5-users@gem5.org>;
主题: Re: [gem5-dev] GEM5-GCN-DNNMark get Invalid filter channel number when running: dnnmark_test_VGG, dnnmark_test_alexnet, dnnmark_test_fwd_conv

Thanks, this is helpful.  Looking through those old email chains, I don't see any specific resolution to them, unfortunately.  I do not have a ton of time to dig into this (end of the semester is keeping me busy) but if you can keep digging I may be able to provide some ideas. 

First, are you still seeing this error:

MIOpen(HIP): Warning [ParseAndLoadDb] File is unreadable: /opt/rocm-4.0.1/miopen/share/miopen/db/gfx801_4.HIP.fdb.txt

after making the above change to include the MIOpen cache?

Second, what layer size are you assuming/trying to conv?  The failure shortly after is here: https://github.com/ROCmSoftwarePlatform/MIOpen/blob/rocm-4.0.1-release/src/ocl/convolutionocl.cpp#L149.  It seems to imply that X and W are not equal, but we'd need to dig to figure out if this is because of the config file being passed in, or something in MIOpen/gem5 that is breaking it.  Given that the function call that is failing is trying to create a tensor: https://github.com/shidong-ai/DNNMark/blob/develop/core/include/dnn_utility.h#L106, my guess is that it's something with the config, because something so basic probably (hopefully?) doesn't fail in MIOpen...

In terms of if it should work or not, I don't see that we included it in prior papers (e.g., https://www.gem5.org/assets/files/papers/enabling2021ispass.pdf) but I don't know if that was because we didn't try or if there was a deeper, more fundamental reason).

Matt

On Sat, May 13, 2023 at 10:34 AM 429442672 <429442672@qq.com> wrote:

Thank you so much for advice.

Acturally, i have made the cachefiles as shown in the figure.

Besides, i have succesfully run several benchmarks such as pool, activations, softmax, so i think the kernels is setuped.
The reason of the differences in commands is that i directly run the command in the docker container.

There might be a common problem when running the network with conv, some other email such as
https://www.mail-archive.com/gem5-users@gem5.org/msg20468.html
https://www.mail-archive.com/gem5-users@gem5.org/msg20456.html
also met this problem.

I have also tried to use the latest docker and the original command like this:
docker run --rm -v ${PWD}:${PWD} -v ${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 -w ${PWD} gcr.io/gem5-test/gcn-gpu:v22-1 gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py -n3 --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_conv -cdnnmark_test_fwd_conv --options="-config gem5-resources/src/gpu/DNNMark/config_example/conv_config.dnnmark -mmap gem5-resources/src/gpu/DNNMark/mmap.bin"but still got the same error.Is that means conv layers is currently no available for gem5-gcn?May i ask is there anyone else met this problem before?thank you!

------------------ 原始邮件 ------------------
发件人:                                                                                                                        "Matt Sinclair"                                                                                    <mattdsinclair.wisc@gmail.com>;
发送时间: 2023年5月12日(星期五) 中午11:42
收件人: "The gem5 Developer List"<gem5-dev@gem5.org>;
抄送: "gem5-users"<gem5-users@gem5.org>;"429442672"<429442672@qq.com>;
主题: Re: [gem5-dev] GEM5-GCN-DNNMark get Invalid filter channel number when running: dnnmark_test_VGG, dnnmark_test_alexnet, dnnmark_test_fwd_conv

I have not tried running these specific benchmarks in gem5 personally, so I cannot say for certain what the error is or even if they are expected to run to completion in gem5.  But, normally the error you're seeing happens because you have not created the appropriate "cache" files for the GPU kernel(s) the program is trying to run.  MIOpen first checks to see if the desired kernel has been run on your machine before, and if not it tries to do online compilation of that kernel.  Unfortunately online compilation of kernels in gem5 is a) very slow and b), because it is very slow, not supported in gem5 (basically, it is so slow as to not be worth supporting in many cases, in my opinion).  So, instead, the expectation is that we build the kernels we want ahead of time, before running the program.  You may have seen in the examples we provide for DNNMark (https://resources.gem5.org/resources/dnn-mark) that we have this "generate cachefiles" script -- that is exactly what the purpose of that script is.  Moreover, on the same webpage, you may have noticed we include the path to that cache directory in our docker commands (emphasis mine):
docker run --rm -v ${PWD}:${PWD} -v ${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 -w ${PWD} gcr.io/gem5-test/gcn-gpu:v22-1 gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py -n3 --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_softmax -cdnnmark_test_fwd_softmax --options="-config gem5-resources/src/gpu/DNNMark/config_example/softmax_config.dnnmark -mmap gem5-resources/src/gpu/DNNMark/mmap.bin"
From looking at your commands, I don't see you including this.  Thus, while I do not know if that script by default produces the kernels needed for, say, AlexNet, I strongly suspect you should start by running that script and updating your docker commands to include the cache stuff ... then see what happens from there.

Sidenote: normally when I see this:

MIOpen(HIP): Warning [ParseAndLoadDb] File is unreadable: /opt/rocm-4.0.1/miopen/share/miopen/db/gfx801_4.HIP.fdb.txt

It means that the files MIOpen is expecting are not setup properly.  Normally I just symlink these extra files -- e.g., symlink gfx801_4... from gfx801_32 ... (this is not the best performing, option because the resources are different, but provides a basic setup step to avoid problems like this).

Matt

On Thu, May 11, 2023 at 10:02 PM 429442672 via gem5-dev <gem5-dev@gem5.org> wrote:

Dear Matt,
     When i run the benchmarks: dnnmark_test_VGG, dnnmark_test_alexnet, dnnmark_test_fwd_conv, and i got the same error like this (get Invalid filter channel number):

build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall fdatasync(...)
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall fdatasync(...)
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall fdatasync(...)
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall fdatasync(...)
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
MIOpen(HIP): Warning [ParseAndLoadDb] File is unreadable: /opt/rocm-4.0.1/miopen/share/miopen/db/gfx801_4.HIP.fdb.txt
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
MIOpen Error: /root/driver/MLOpen/src/ocl/convolutionocl.cpp:150: Invalid filter channel number
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
MIOpen Error: /root/driver/MLOpen/src/ocl/convolutionocl.cpp:150: Invalid filter channel number
MIOpen Error: 3 at /home/tang/gem5-resources/src/gpu/DNNMark/core/include/dnn_utility.h1057Ticks: 327510244000
Exiting because  exiting with last active thread context

My command line is:
gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py
-n 8 --mem-size=12GB --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_conv -c dnnmark_test_fwd_conv
--options="-config gem5-resources/src/gpu/DNNMark/config_example/conv_config.dnnmark -mmap gem5-resources/src/gpu/DNNMark/mmap.bin"

and i didn't change the setup of the default setup of conv_config.dnnmark and gfx801

May i ask, did i do something wrong here?
Have you ever test those benchmark without error, and could you please show me several your configurations?

Thank you so much!


gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-leave@gem5.org

Thank you very much for all your help.&nbsp; I trid your suggestion (set c = num_outputs = 3), but i got another error. May i ask is there any further ideas to solve that? thank you. --------- log infos: build/GCN3_X86/sim/syscall_emul.cc:85: warn: ignoring syscall rt_sigaction(...) (further warnings will be suppressed) build/GCN3_X86/sim/syscall_emul.cc:85: warn: ignoring syscall rt_sigprocmask(...) (further warnings will be suppressed) build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall get_mempolicy(...) build/GCN3_X86/arch/x86/generated/exec-ns.cc.inc:27: warn: instruction 'frndint' unimplemented build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall mprotect(...) build/GCN3_X86/sim/mem_state.cc:443: info: Increasing stack size by one page. build/GCN3_X86/gpu-compute/gpu_compute_driver.cc:710: warn: unimplemented ioctl: AMDKFD_IOC_ACQUIRE_VM build/GCN3_X86/sim/syscall_emul.hh:1890: warn: mmap: writing to shared mmap region is currently unsupported. The write succeeds on the target, but it will not be propagated to the host or shared mappings build/GCN3_X86/gpu-compute/gpu_compute_driver.cc:460: warn: Signal events are only supported currently build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall mprotect(...) build/GCN3_X86/sim/power_state.cc:105: warn: PowerState: Already in the requested power state, request ignored build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall set_robust_list(...) build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall mprotect(...) build/GCN3_X86/gpu-compute/gpu_compute_driver.cc:604: warn: unimplemented ioctl: AMDKFD_IOC_SET_SCRATCH_BACKING_VA build/GCN3_X86/gpu-compute/gpu_compute_driver.cc:614: warn: unimplemented ioctl: AMDKFD_IOC_SET_TRAP_HANDLER build/GCN3_X86/sim/syscall_emul.hh:2109: warn: prlimit: unimplemented resource 7 build/GCN3_X86/sim/syscall_emul.hh:2109: warn: prlimit: unimplemented resource 7 build/GCN3_X86/sim/mem_state.cc:443: info: Increasing stack size by one page. build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall mprotect(...) build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall mprotect(...) build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall mprotect(...) sh: 1: Cannot fork MIOpen(HIP): Error [ValidateGcnAssemblerImpl] Specified assembler does not support AMDGPU. Expect performance degradation. build/GCN3_X86/sim/mem_state.cc:443: info: Increasing stack size by one page. build/GCN3_X86/sim/mem_state.cc:443: info: Increasing stack size by one page. build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/mem_state.cc:443: info: Increasing stack size by one page. build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall fdatasync(...) build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall fdatasync(...) build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall fdatasync(...) build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall fdatasync(...) build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall fdatasync(...) build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall fdatasync(...) build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall fdatasync(...) build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall fdatasync(...) build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall fdatasync(...) build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall fdatasync(...) build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall fdatasync(...) build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall fdatasync(...) build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall fdatasync(...) build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall fdatasync(...) build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall fdatasync(...) build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall fdatasync(...) build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 MIOpen(HIP): Warning [ParseAndLoadDb] File is unreadable: /opt/rocm-4.0.1/miopen/share/miopen/db/gfx801_4.HIP.fdb.txt build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/arch/generic/debugfaults.hh:145: warn: MOVNTDQ: Ignoring non-temporal hint, modeling as cacheable! build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall mprotect(...) build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall set_robust_list(...) build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall mprotect(...) build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall madvise(...) build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall mprotect(...) build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall sched_setaffinity(...) build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall set_robust_list(...) build/GCN3_X86/sim/syscall_emul.cc:85: warn: ignoring syscall sched_yield(...) (further warnings will be suppressed) build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall mprotect(...) build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall mprotect(...) build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall mprotect(...) build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall mprotect(...) build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall mprotect(...) build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall mprotect(...) build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall mprotect(...) build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall mprotect(...) build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall mprotect(...) build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall mprotect(...) build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall mprotect(...) build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall mprotect(...) build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 1030 build/GCN3_X86/arch/x86/faults.cc:166: panic: Tried to read unmapped address 0x20. PC: (0x7ffff30ee5fc=&gt;0x7ffff30ee5ff).(0=&gt;1), Instr: MOV_R_M : ld rdi, DS:[rdi] Memory Usage: 20185460 KBytes Program aborted at tick 173459458000 --- BEGIN LIBC BACKTRACE --- gem5/build/GCN3_X86/gem5.opt(+0x19efd50)[0x5630de7fed50] gem5/build/GCN3_X86/gem5.opt(+0x1a1425e)[0x5630de82325e] /lib/x86_64-linux-gnu/libpthread.so.0(+0x14420)[0x7f6087fdc420] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb)[0x7f608718200b] /lib/x86_64-linux-gnu/libc.so.6(abort+0x12b)[0x7f6087161859] gem5/build/GCN3_X86/gem5.opt(+0x4c20a5)[0x5630dd2d10a5] gem5/build/GCN3_X86/gem5.opt(+0xdf6efa)[0x5630ddc05efa] gem5/build/GCN3_X86/gem5.opt(+0x16b9804)[0x5630de4c8804] gem5/build/GCN3_X86/gem5.opt(+0x16b3dc8)[0x5630de4c2dc8] gem5/build/GCN3_X86/gem5.opt(+0x16b6bf3)[0x5630de4c5bf3] gem5/build/GCN3_X86/gem5.opt(+0x16b84e5)[0x5630de4c74e5] gem5/build/GCN3_X86/gem5.opt(+0xe3d322)[0x5630ddc4c322] gem5/build/GCN3_X86/gem5.opt(+0x16b1a54)[0x5630de4c0a54] gem5/build/GCN3_X86/gem5.opt(+0x16bb713)[0x5630de4ca713] gem5/build/GCN3_X86/gem5.opt(+0x122dfca)[0x5630de03cfca] gem5/build/GCN3_X86/gem5.opt(+0x12674ee)[0x5630de0764ee] gem5/build/GCN3_X86/gem5.opt(+0x16b4a62)[0x5630de4c3a62] gem5/build/GCN3_X86/gem5.opt(+0x1a03665)[0x5630de812665] gem5/build/GCN3_X86/gem5.opt(+0x1a2bab4)[0x5630de83aab4] gem5/build/GCN3_X86/gem5.opt(+0x1a2c093)[0x5630de83b093] gem5/build/GCN3_X86/gem5.opt(+0xadded2)[0x5630dd8eced2] gem5/build/GCN3_X86/gem5.opt(+0x4b6757)[0x5630dd2c5757] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x2a8748)[0x7f6088293748] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x8dd8)[0x7f6088068f48] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0x8fb)[0x7f60881b5e4b] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyFunction_Vectorcall+0x94)[0x7f6088293124] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x74d6d)[0x7f608805fd6d] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x7d86)[0x7f6088067ef6] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0x8fb)[0x7f60881b5e4b] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(PyEval_EvalCodeEx+0x42)[0x7f60881b61d2] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(PyEval_EvalCode+0x1f)[0x7f60881b65bf] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x1cfc01)[0x7f60881bac01] --- END LIBC BACKTRACE --- Failed to execute default signal handler! ------------------&nbsp;原始邮件&nbsp;------------------ 发件人: "429442672" <429442672@qq.com&gt;; 发送时间:&nbsp;2023年5月17日(星期三) 上午8:48 收件人:&nbsp;"Matt Sinclair"<mattdsinclair.wisc@gmail.com&gt;; 抄送:&nbsp;"gem5-dev"<gem5-dev@gem5.org&gt;;"gem5-users"<gem5-users@gem5.org&gt;; 主题:&nbsp;回复: [gem5-dev] GEM5-GCN-DNNMark get Invalid filter channel number when running: dnnmark_test_VGG, dnnmark_test_alexnet, dnnmark_test_fwd_conv Thank you so much!I will try it latter! ------------------ 原始邮件 ------------------ 发件人: "Matt Sinclair" <mattdsinclair.wisc@gmail.com&gt;; 发送时间:&nbsp;2023年5月17日(星期三) 上午8:04 收件人:&nbsp;"429442672"<429442672@qq.com&gt;; 抄送:&nbsp;"gem5-dev"<gem5-dev@gem5.org&gt;;"gem5-users"<gem5-users@gem5.org&gt;; 主题:&nbsp;Re: [gem5-dev] GEM5-GCN-DNNMark get Invalid filter channel number when running: dnnmark_test_VGG, dnnmark_test_alexnet, dnnmark_test_fwd_conv Ok I've found a fix to this (i.e., that works on a real AMD GPU) although I don't have a perfect understanding of why it is required.&nbsp; The bottom line is that the num_outputs and c values in the configuration appear to need to be the same for convolution layers -- not exactly sure why. But the implication of that is either: - change https://gem5.googlesource.com/public/gem5-resources/+/refs/heads/develop/src/gpu/DNNMark/config_example/conv_config.dnnmark#7 to 32 - or change https://gem5.googlesource.com/public/gem5-resources/+/refs/heads/develop/src/gpu/DNNMark/config_example/conv_config.dnnmark#12 to 3 and leave line 7 as is Either change results in the X and W values being the same size for the convolution.&nbsp; I do not understand the MIOpen code well enough to appreciate the implications of this, but hopefully this is helpful and you can check if it runs in gem5 now. Sidenote: if you want to see the details, you can set this: export MIOPEN_ENABLE_LOGGING=1 This will produce a lot of output when you run the benchmark, which is how I saw that X (really Y, as it seems convolution is assuming X and Y are flipped since it's in transposed mode) and W were being set to 32 and 3.&nbsp; It is also possible that the issue here is that the MIOpen code is assuming that the matrix is transposed and the original num_outputs and c values would work if not transposed, but again I don't quite understand the MIOpen code well enough to reason through this. Finally, I strongly recommend you reduce N anyways to get gem5 to run this in a reasonable amount of time.&nbsp; It took 7.4 seconds to run the provided config (with my change above to set num_ouputs to 3) on my real GPU, which is way longer than gem5 can simulate in a reasonable time. Matt On Tue, May 16, 2023 at 2:05 PM Matt Sinclair <mattdsinclair.wisc@gmail.com&gt; wrote: Ok, I've verified this fails on my AMD GPU with ROCm 4 with the exact same error: [sinclair@eldin] (59)$ ./build/benchmarks/test_fwd_conv/dnnmark_test_fwd_conv -config config_example/conv_config.dnnmark --warmup 0 --debuginfo 1 13:55:12.799629 2314023 test_fwd_conv.cc:11] DNNMark suites: Start... 13:55:12.841578 2314023 dnnmark.cc:232] Search and parse general DNNMark configuration 13:55:12.842526 2314023 dnnmark.cc:292] Search and parse layer configuration 13:55:12.842532 2314023 dnnmark.cc:306] Add [Convolution] layer 13:55:12.842555 2314023 dnn_layer.h:84] Layer name: conv1 13:55:12.842566 2314023 dnn_layer.h:89] Previous layer: null 13:55:12.842574 2314023 dnnmark.cc:361] DNNMark: Initialize... 13:55:12.842576 2314023 dnnmark.cc:362] Running mode: 1 13:55:12.842579 2314023 dnnmark.cc:363] Number of Layers: 1 13:55:12.842581 2314023 dnnmark.cc:365] Layer type: 1 13:55:12.842583 2314023 dnnmark.cc:367] DNNMark: Setup parameters of Convolution layer 13:55:12.842586 2314023 dnn_layer.h:110] Bottom dimension: N: 126 C: 3 H: 256 W: 256 13:55:12.842594 2314023 data_manager.h:44] Create Data chunk of size 24772608 13:55:12.860227 2314023 data_manager.h:101] Create data with ID: 0 13:55:12.860245 2314023 data_manager.h:44] Create Data chunk of size 24772608 13:55:12.876646 2314023 data_manager.h:101] Create data with ID: 1 13:55:12.876678 2314023 data_manager.h:44] Create Data chunk of size 264241152 13:55:13.057176 2314023 data_manager.h:101] Create data with ID: 2 13:55:13.057200 2314023 data_manager.h:44] Create Data chunk of size 264241152 13:55:13.242066 2314023 data_manager.h:101] Create data with ID: 3 13:55:13.242089 2314023 data_manager.h:44] Create Data chunk of size 2400 13:55:13.242357 2314023 data_manager.h:101] Create data with ID: 4 13:55:13.242363 2314023 data_manager.h:44] Create Data chunk of size 2400 13:55:13.242486 2314023 data_manager.h:101] Create data with ID: 5 MIOpen Error: /root/driver/MLOpen/src/ocl/convolutionocl.cpp:150: Invalid filter channel number 13:55:13.405368 2314023 conv_layer.h:220] Setting Bwd Filter Algo to MIOpen Error: /root/driver/MLOpen/src/ocl/convolutionocl.cpp:150: Invalid filter channel number 13:55:13.405539 2314023 data_manager.h:53] Free Data chunk of size 2400 I0516 13:55:13.405624 2314023 data_manager.h:53] Free Data chunk of size 2400 I0516 13:55:13.405642 2314023 data_manager.h:53] Free Data chunk of size 264241152 I0516 13:55:13.428352 2314023 data_manager.h:53] Free Data chunk of size 264241152 I0516 13:55:13.451712 2314023 data_manager.h:53] Free Data chunk of size 24772608 I0516 13:55:13.454205 2314023 data_manager.h:53] Free Data chunk of size 24772608 MIOpen Error: 3 at /nobackup/sinclair/gem5-resources/src/gpu/DNNMark/core/include/dnn_utility.h1057 So it appears this is a problem with MIOpen/ROCm, not a gem5 problem.&nbsp; Obviously this isn't ideal, but I'm also not sure how easy it's going to be for us to fix -- we could try determining how MIOpen is setting up the layers and see if there is an easy fix ... but if the issues are deeper than something we can change in a small patch or in the config file, I'm not sure what we can do. Matt On Tue, May 16, 2023 at 1:33 PM Matt Sinclair <mattdsinclair.wisc@gmail.com&gt; wrote: X is referring to this: https://github.com/ROCmSoftwarePlatform/MIOpen/blob/rocm-4.0.1-release/src/ocl/convolutionocl.cpp#L149, which is where your failure is occurring. Do you happen to have an AMD GPU handy?&nbsp; If so, can you try running conv on the real GPU with ROCM 4.0.1 and let me know what happens? Matt On Tue, May 16, 2023 at 1:58 AM 429442672 <429442672@qq.com&gt; wrote: unfortunately, i tried to change n to 2,4,8,16, and get the same error. May i ask what X means? ------------------ 原始邮件 ------------------ 发件人: "Matt Sinclair" <mattdsinclair.wisc@gmail.com&gt;; 发送时间:&nbsp;2023年5月16日(星期二) 中午12:46 收件人:&nbsp;"429442672"<429442672@qq.com&gt;; 抄送:&nbsp;"gem5-dev"<gem5-dev@gem5.org&gt;;"gem5-users"<gem5-users@gem5.org&gt;; 主题:&nbsp;Re: [gem5-dev] GEM5-GCN-DNNMark get Invalid filter channel number when running: dnnmark_test_VGG, dnnmark_test_alexnet, dnnmark_test_fwd_conv It is a bit odd that the default batch size is 126, which is not a power of 2.&nbsp; If you make it 4, 8, or 16 what happens? If this doesn't resolve we would need to determine which of NCHW is being set to X in the MIOpen code to determine if it's a config problem, MIOpen problem, or a gem5 problem (i.e., if X is H then both H and W are 256 in the config file). Matt On Mon, May 15, 2023 at 11:05 PM 429442672 <429442672@qq.com&gt; wrote: Thank you,I would appreciate it very much if you could help me to solve this problem. the command i apply is: sudo docker run --rm -v ${PWD}:${PWD} -v ${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 -w ${PWD} gcn-gpu23 gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py -n3 --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_conv -cdnnmark_test_fwd_conv --options="-config gem5-resources/src/gpu/DNNMark/config_example/conv_config.dnnmark -mmap gem5-resources/src/gpu/DNNMark/mmap.bin" 1. yes, i still see this error. &nbsp; &nbsp; &nbsp;the generated cache is placed here: 2. i used the default setting in DNNMARK: besides, I also run softmax by: sudo docker run --rm -v ${PWD}:${PWD} -v ${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 -w ${PWD} gcn-gpu23 gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py -n3 --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_softmax -cdnnmark_test_fwd_softmax --options="-config gem5-resources/src/gpu/DNNMark/config_example/softmax_config.dnnmark -mmap gem5-resources/src/gpu/DNNMark/mmap.bin" and get Exiting because exiting with last active thread context. the detailed logs of running conv and softmax are attached in this emai. Thank you! ------------------ 原始邮件 ------------------ 发件人: "Matt Sinclair" <mattdsinclair.wisc@gmail.com&gt;; 发送时间:&nbsp;2023年5月15日(星期一) 凌晨4:34 收件人:&nbsp;"429442672"<429442672@qq.com&gt;; 抄送:&nbsp;"gem5-dev"<gem5-dev@gem5.org&gt;;"gem5-users"<gem5-users@gem5.org&gt;; 主题:&nbsp;Re: [gem5-dev] GEM5-GCN-DNNMark get Invalid filter channel number when running: dnnmark_test_VGG, dnnmark_test_alexnet, dnnmark_test_fwd_conv Thanks, this is helpful.&nbsp; Looking through those old email chains, I don't see any specific resolution to them, unfortunately.&nbsp; I do not have a ton of time to dig into this (end of the semester is keeping me busy) but if you can keep digging I may be able to provide some ideas.&nbsp; First, are you still seeing this error: MIOpen(HIP): Warning [ParseAndLoadDb] File is unreadable: /opt/rocm-4.0.1/miopen/share/miopen/db/gfx801_4.HIP.fdb.txt after making the above change to include the MIOpen cache? Second, what layer size are you assuming/trying to conv?&nbsp; The failure shortly after is here: https://github.com/ROCmSoftwarePlatform/MIOpen/blob/rocm-4.0.1-release/src/ocl/convolutionocl.cpp#L149.&nbsp; It seems to imply that X and W are not equal, but we'd need to dig to figure out if this is because of the config file being passed in, or something in MIOpen/gem5 that is breaking it.&nbsp; Given that the function call that is failing is trying to create a tensor: https://github.com/shidong-ai/DNNMark/blob/develop/core/include/dnn_utility.h#L106, my guess is that it's something with the config, because something so basic probably (hopefully?) doesn't fail in MIOpen... In terms of if it should work or not, I don't see that we included it in prior papers (e.g., https://www.gem5.org/assets/files/papers/enabling2021ispass.pdf) but I don't know if that was because we didn't try or if there was a deeper, more fundamental reason). Matt On Sat, May 13, 2023 at 10:34 AM 429442672 <429442672@qq.com&gt; wrote: Thank you so much for advice. Acturally, i have made the cachefiles as shown in the figure. Besides, i have succesfully run several benchmarks such as pool, activations, softmax, so i think the kernels is setuped. The reason of the differences in commands is that i directly run the command in the docker container. There might be a common problem when running the network with conv, some other email such as https://www.mail-archive.com/gem5-users@gem5.org/msg20468.html https://www.mail-archive.com/gem5-users@gem5.org/msg20456.html also met this problem. I have also tried to use the latest docker and the original command like this: docker run --rm -v ${PWD}:${PWD} -v ${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 -w ${PWD} gcr.io/gem5-test/gcn-gpu:v22-1 gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py -n3 --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_conv -cdnnmark_test_fwd_conv --options="-config gem5-resources/src/gpu/DNNMark/config_example/conv_config.dnnmark -mmap gem5-resources/src/gpu/DNNMark/mmap.bin"but still got the same error.Is that means conv layers is currently no available for gem5-gcn?May i ask is there anyone else met this problem before?thank you! ------------------ 原始邮件 ------------------ 发件人: "Matt Sinclair" <mattdsinclair.wisc@gmail.com&gt;; 发送时间:&nbsp;2023年5月12日(星期五) 中午11:42 收件人:&nbsp;"The gem5 Developer List"<gem5-dev@gem5.org&gt;; 抄送:&nbsp;"gem5-users"<gem5-users@gem5.org&gt;;"429442672"<429442672@qq.com&gt;; 主题:&nbsp;Re: [gem5-dev] GEM5-GCN-DNNMark get Invalid filter channel number when running: dnnmark_test_VGG, dnnmark_test_alexnet, dnnmark_test_fwd_conv I have not tried running these specific benchmarks in gem5 personally, so I cannot say for certain what the error is or even if they are expected to run to completion in gem5.&nbsp; But, normally the error you're seeing happens because you have not created the appropriate "cache" files for the GPU kernel(s) the program is trying to run.&nbsp; MIOpen first checks to see if the desired kernel has been run on your machine before, and if not it tries to do online compilation of that kernel.&nbsp; Unfortunately online compilation of kernels in gem5 is a) very slow and b), because it is very slow, not supported in gem5 (basically, it is so slow as to not be worth supporting in many cases, in my opinion).&nbsp; So, instead, the expectation is that we build the kernels we want ahead of time, before running the program.&nbsp; You may have seen in the examples we provide for DNNMark (https://resources.gem5.org/resources/dnn-mark) that we have this "generate cachefiles" script -- that is exactly what the purpose of that script is.&nbsp; Moreover, on the same webpage, you may have noticed we include the path to that cache directory in our docker commands (emphasis mine): docker run --rm -v ${PWD}:${PWD} -v ${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 -w ${PWD} gcr.io/gem5-test/gcn-gpu:v22-1 gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py -n3 --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_softmax -cdnnmark_test_fwd_softmax --options="-config gem5-resources/src/gpu/DNNMark/config_example/softmax_config.dnnmark -mmap gem5-resources/src/gpu/DNNMark/mmap.bin" From looking at your commands, I don't see you including this.&nbsp; Thus, while I do not know if that script by default produces the kernels needed for, say, AlexNet, I strongly suspect you should start by running that script and updating your docker commands to include the cache stuff ... then see what happens from there. Sidenote: normally when I see this: MIOpen(HIP): Warning [ParseAndLoadDb] File is unreadable: /opt/rocm-4.0.1/miopen/share/miopen/db/gfx801_4.HIP.fdb.txt It means that the files MIOpen is expecting are not setup properly.&nbsp; Normally I just symlink these extra files -- e.g., symlink gfx801_4... from gfx801_32 ... (this is not the best performing, option because the resources are different, but provides a basic setup step to avoid problems like this). Matt On Thu, May 11, 2023 at 10:02 PM 429442672 via gem5-dev <gem5-dev@gem5.org&gt; wrote: Dear Matt, &nbsp; &nbsp; &nbsp;When i run the benchmarks: dnnmark_test_VGG, dnnmark_test_alexnet, dnnmark_test_fwd_conv, and i got the same error like this (get Invalid filter channel number): build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall fdatasync(...) build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall fdatasync(...) build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall fdatasync(...) build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall fdatasync(...) build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 MIOpen(HIP): Warning [ParseAndLoadDb] File is unreadable: /opt/rocm-4.0.1/miopen/share/miopen/db/gfx801_4.HIP.fdb.txt build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 MIOpen Error: /root/driver/MLOpen/src/ocl/convolutionocl.cpp:150: Invalid filter channel number build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 MIOpen Error: /root/driver/MLOpen/src/ocl/convolutionocl.cpp:150: Invalid filter channel number MIOpen Error: 3 at /home/tang/gem5-resources/src/gpu/DNNMark/core/include/dnn_utility.h1057Ticks: 327510244000 Exiting because exiting with last active thread context My command line is: gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py -n 8 --mem-size=12GB --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_conv -c dnnmark_test_fwd_conv --options="-config gem5-resources/src/gpu/DNNMark/config_example/conv_config.dnnmark -mmap gem5-resources/src/gpu/DNNMark/mmap.bin" and i didn't change the setup of the default setup of conv_config.dnnmark and gfx801 May i ask, did i do something wrong here? Have you ever test those benchmark without error, and could you please show me several your configurations? Thank you so much! _______________________________________________ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-leave@gem5.org