gem5-users@gem5.org

The gem5 Users mailing list

View all threads

Problem on simulating GCN3 GPU: Running DNNMask too slow.

4
429442672
Tue, May 9, 2023 9:09 AM

hi everyone,

I have successfully built and ran DNNMask using the command:

sudo docker run --rm -v ${PWD}:${PWD} -v ${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 -w ${PWD} gcn-gpu
gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py -n 3
--benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_softmax -cdnnmark_test_fwd_softmax
--options="-config gem5-resources/src/gpu/DNNMark/config_example/softmax_config.dnnmark -mmap gem5-resources/src/gpu/DNNMark/mmap.bin"
with the output
Exiting because  exiting with last active thread context

which may means i have correctly made the running environment.

However, i tried several benchmarks in

but meet following problems:

  1. problem on running test_fwd_fc

When i run test_fwd_fc using:
sudo docker run --rm -v ${PWD}:${PWD} -v ${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 -w ${PWD} gcn-gpu gem5/build/GCN3_X86/gem5.opt
gem5/configs/example/apu_se.py -n3 --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_fc -c dnnmark_test_fwd_fc
--options="-config gem5-resources/src/gpu/DNNMark/config_example/fc_config.dnnmark -mmap gem5-resources/src/gpu/DNNMark/DNNMark_data.dat"

the problem is running for a few hours, even though i have modify the input data (mmap.bin -> DNNMark_data.dat) to a smaller size 300MB (2GB in default).
I have also tried several benchmarks, the only benchmark i done is the test_fwd_pool and test_bwd_pool, when i ran benchmarks such as conv、pool、fc, the program will be blocked, with out any output.
Is there anything i did wrong here? or these benchmards are too compute-intensive to run, leading to slow running?
May i ask for any suggestion for running these benchmarks?

  1. problem on running test_VGG and test_alexnet.

I run them with the commands:
sudo docker run --rm -v ${PWD}:${PWD} -v ${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 -w ${PWD} gcn-gpu gem5/build/GCN3_X86/gem5.opt
gem5/configs/example/apu_se.py -n3 --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_alexnet -c dnnmark_test_alexnet
--options="-config gem5-resources/src/gpu/DNNMark/config_example/alexnet.dnnmark -mmap gem5-resources/src/gpu/DNNMark/mmap.bin"

and
sudo docker run --rm -v ${PWD}:${PWD} -v ${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 -w ${PWD} gcn-gpu gem5/build/GCN3_X86/gem5.opt
gem5/configs/example/apu_se.py -n3 --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_VGG -c dnnmark_test_VGG
--options="-config gem5-resources/src/gpu/DNNMark/config_example/VGG.dnnmark -mmap gem5-resources/src/gpu/DNNMark/mmap.bin"

but they are also blocked.
May i ask for any suggestion for running these benchmarks?

  1. question on modifying DNN network.

May i ask how to modify the DNN network architecture? For example, is it possible to make a transformer block and run it on gem5? It seems that i can change the configures in /DNNMark/config_example following the 
example of alexnet.dnnmark, without modifing the code in DNNMark/benchmarks/test_alexnet. May i ask is that correct?

  1. How can i get trace on running DNNMark.

Running DNNMark seem like a block box. It is possible to get the trace of running DNNMark? For example, the process of data loading, computing, etc.

  1. question on apu_se.py

It seem that all the benchmarks require apu_se.py. May i ask is there any more detailed documents to introduce what this apu_se.py did and how to modify it?For example,how can i add more ruby memory to the gpu.

The documents and introduction for gem5 gcn gpu is pretty few, if it is possible, could any one provide some help for me?

Thank you all very much!

hi everyone, I have successfully built and ran DNNMask using the command: sudo docker run --rm -v ${PWD}:${PWD} -v ${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 -w ${PWD} gcn-gpu gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py -n 3 --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_softmax -cdnnmark_test_fwd_softmax --options="-config gem5-resources/src/gpu/DNNMark/config_example/softmax_config.dnnmark -mmap gem5-resources/src/gpu/DNNMark/mmap.bin" with the output Exiting because exiting with last active thread context which may means i have correctly made the running environment. However, i tried several benchmarks in but meet following problems: 1. problem on running test_fwd_fc When i run test_fwd_fc using: sudo docker run --rm -v ${PWD}:${PWD} -v ${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 -w ${PWD} gcn-gpu gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py -n3 --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_fc -c dnnmark_test_fwd_fc --options="-config gem5-resources/src/gpu/DNNMark/config_example/fc_config.dnnmark -mmap gem5-resources/src/gpu/DNNMark/DNNMark_data.dat" the problem is running for a few hours, even though i have modify the input data (mmap.bin -> DNNMark_data.dat) to a smaller size 300MB (2GB in default). I have also tried several benchmarks, the only benchmark i done is the test_fwd_pool and test_bwd_pool, when i ran benchmarks such as conv、pool、fc, the program will be blocked, with out any output. Is there anything i did wrong here? or these benchmards are too compute-intensive to run, leading to slow running? May i ask for any suggestion for running these benchmarks? 2. problem on running test_VGG and test_alexnet. I run them with the commands: sudo docker run --rm -v ${PWD}:${PWD} -v ${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 -w ${PWD} gcn-gpu gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py -n3 --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_alexnet -c dnnmark_test_alexnet --options="-config gem5-resources/src/gpu/DNNMark/config_example/alexnet.dnnmark -mmap gem5-resources/src/gpu/DNNMark/mmap.bin" and sudo docker run --rm -v ${PWD}:${PWD} -v ${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 -w ${PWD} gcn-gpu gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py -n3 --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_VGG -c dnnmark_test_VGG --options="-config gem5-resources/src/gpu/DNNMark/config_example/VGG.dnnmark -mmap gem5-resources/src/gpu/DNNMark/mmap.bin" but they are also blocked. May i ask for any suggestion for running these benchmarks? 3. question on modifying DNN network. May i ask how to modify the DNN network architecture? For example, is it possible to make a transformer block and run it on gem5? It seems that i can change the configures in /DNNMark/config_example following the  example of alexnet.dnnmark, without modifing the code in DNNMark/benchmarks/test_alexnet. May i ask is that correct? 4. How can i get trace on running DNNMark. Running DNNMark seem like a block box. It is possible to get the trace of running DNNMark? For example, the process of data loading, computing, etc. 5. question on apu_se.py It seem that all the benchmarks require apu_se.py. May i ask is there any more detailed documents to introduce what this apu_se.py did and how to modify it?For example,how can i add more ruby memory to the gpu. The documents and introduction for gem5 gcn gpu is pretty few, if it is possible, could any one provide some help for me? Thank you all very much!
4
429442672
Tue, May 9, 2023 9:34 AM

hi everyone,

I have successfully built and ran DNNMark using the command:

sudo docker run --rm -v ${PWD}:${PWD} -v ${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 -w ${PWD} gcn-gpu
gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py -n 3
--benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_softmax -cdnnmark_test_fwd_softmax
--options="-config gem5-resources/src/gpu/DNNMark/config_example/softmax_config.dnnmark -mmap gem5-resources/src/gpu/DNNMark/mmap.bin"
with the output
Exiting because  exiting with last active thread context

which may means i have correctly made the running environment.

However, i tried several benchmarks in

but meet following problems:

  1. problem on running test_fwd_fc

When i run test_fwd_fc using:
sudo docker run --rm -v ${PWD}:${PWD} -v ${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 -w ${PWD} gcn-gpu gem5/build/GCN3_X86/gem5.opt
gem5/configs/example/apu_se.py -n3 --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_fc -c dnnmark_test_fwd_fc
--options="-config gem5-resources/src/gpu/DNNMark/config_example/fc_config.dnnmark -mmap gem5-resources/src/gpu/DNNMark/DNNMark_data.dat"

the problem is running for a few hours, even though i have modify the input data (mmap.bin -> DNNMark_data.dat) to a smaller size 300MB (2GB in default).
I have also tried several benchmarks, the only benchmark i done is the test_fwd_pool and test_bwd_pool, when i ran benchmarks such as conv、pool、fc, the program will be blocked, with out any output.
Is there anything i did wrong here? or these benchmards are too compute-intensive to run, leading to slow running?
May i ask for any suggestion for running these benchmarks?

  1. problem on running test_VGG and test_alexnet.

I run them with the commands:
sudo docker run --rm -v ${PWD}:${PWD} -v ${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 -w ${PWD} gcn-gpu gem5/build/GCN3_X86/gem5.opt
gem5/configs/example/apu_se.py -n3 --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_alexnet -c dnnmark_test_alexnet
--options="-config gem5-resources/src/gpu/DNNMark/config_example/alexnet.dnnmark -mmap gem5-resources/src/gpu/DNNMark/mmap.bin"

and
sudo docker run --rm -v ${PWD}:${PWD} -v ${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 -w ${PWD} gcn-gpu gem5/build/GCN3_X86/gem5.opt
gem5/configs/example/apu_se.py -n3 --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_VGG -c dnnmark_test_VGG
--options="-config gem5-resources/src/gpu/DNNMark/config_example/VGG.dnnmark -mmap gem5-resources/src/gpu/DNNMark/mmap.bin"

but they are also blocked.
May i ask for any suggestion for running these benchmarks?

  1. question on modifying DNN network.

May i ask how to modify the DNN network architecture? For example, is it possible to make a transformer block and run it on gem5? It seems that i can change the configures in /DNNMark/config_example following the
example of alexnet.dnnmark, without modifing the code in DNNMark/benchmarks/test_alexnet. May i ask is that correct?

  1. How can i get trace on running DNNMark.

Running DNNMark seem like a block box. It is possible to get the trace of running DNNMark? For example, the process of data loading, computing, etc.

  1. question on apu_se.py

It seem that all the benchmarks require apu_se.py. May i ask is there any more detailed documents to introduce what this apu_se.py did and how to modify it?For example,how can i add more ruby memory to the gpu.

The documents and introduction for gem5 gcn gpu is pretty few, if it is possible, could any one provide some help for me?

Thank you all very much!

hi everyone, I have successfully built and ran DNNMark using the command: sudo docker run --rm -v ${PWD}:${PWD} -v ${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 -w ${PWD} gcn-gpu gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py -n 3 --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_softmax -cdnnmark_test_fwd_softmax --options="-config gem5-resources/src/gpu/DNNMark/config_example/softmax_config.dnnmark -mmap gem5-resources/src/gpu/DNNMark/mmap.bin" with the output Exiting because exiting with last active thread context which may means i have correctly made the running environment. However, i tried several benchmarks in but meet following problems: 1. problem on running test_fwd_fc When i run test_fwd_fc using: sudo docker run --rm -v ${PWD}:${PWD} -v ${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 -w ${PWD} gcn-gpu gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py -n3 --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_fc -c dnnmark_test_fwd_fc --options="-config gem5-resources/src/gpu/DNNMark/config_example/fc_config.dnnmark -mmap gem5-resources/src/gpu/DNNMark/DNNMark_data.dat" the problem is running for a few hours, even though i have modify the input data (mmap.bin -> DNNMark_data.dat) to a smaller size 300MB (2GB in default). I have also tried several benchmarks, the only benchmark i done is the test_fwd_pool and test_bwd_pool, when i ran benchmarks such as conv、pool、fc, the program will be blocked, with out any output. Is there anything i did wrong here? or these benchmards are too compute-intensive to run, leading to slow running? May i ask for any suggestion for running these benchmarks? 2. problem on running test_VGG and test_alexnet. I run them with the commands: sudo docker run --rm -v ${PWD}:${PWD} -v ${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 -w ${PWD} gcn-gpu gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py -n3 --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_alexnet -c dnnmark_test_alexnet --options="-config gem5-resources/src/gpu/DNNMark/config_example/alexnet.dnnmark -mmap gem5-resources/src/gpu/DNNMark/mmap.bin" and sudo docker run --rm -v ${PWD}:${PWD} -v ${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 -w ${PWD} gcn-gpu gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py -n3 --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_VGG -c dnnmark_test_VGG --options="-config gem5-resources/src/gpu/DNNMark/config_example/VGG.dnnmark -mmap gem5-resources/src/gpu/DNNMark/mmap.bin" but they are also blocked. May i ask for any suggestion for running these benchmarks? 3. question on modifying DNN network. May i ask how to modify the DNN network architecture? For example, is it possible to make a transformer block and run it on gem5? It seems that i can change the configures in /DNNMark/config_example following the example of alexnet.dnnmark, without modifing the code in DNNMark/benchmarks/test_alexnet. May i ask is that correct? 4. How can i get trace on running DNNMark. Running DNNMark seem like a block box. It is possible to get the trace of running DNNMark? For example, the process of data loading, computing, etc. 5. question on apu_se.py It seem that all the benchmarks require apu_se.py. May i ask is there any more detailed documents to introduce what this apu_se.py did and how to modify it?For example,how can i add more ruby memory to the gpu. The documents and introduction for gem5 gcn gpu is pretty few, if it is possible, could any one provide some help for me? Thank you all very much!
MS
Matt Sinclair
Tue, May 9, 2023 9:34 PM

Hi,

Trying to answer your various questions:

  1. Similar to #2 below, I am unclear what "blocked" means.  It sounds like
    the program is just running, but is slower than you were hoping it would
    be?  If so, unfortunately, this is a well known problem with detailed
    simulators like gem5 -- they can take a long time to simulate a workload.
    However, there is another option, where you aren't using enough thread
    contexts, see #2 below.  If you are willing to, you can decrease the batch
    size, and usually the program simulates faster.  For FWD_FC in particular,
    you would do this by decreasing n (e.g., from to 100 to 4, 8, or 16):
    https://gem5.googlesource.com/public/gem5-resources/+/refs/heads/develop/src/gpu/DNNMark/config_example/fc_config.dnnmark#6
    .

  2. Define blocked -- what does this mean?  The bigger benchmarks here are
    very large ML workloads, it would not surprise me if they took days (or
    maybe weeks) to run them end-to-end in gem5.  Are you seeing kernels
    progressing through it (e.g., use the GPUKernelInfo debug flag to print
    when kernels launch and exit)?  If you are seeing kernels progress, it's
    just a really large workload and you'd have to be more patient.  My group
    is working on ways to cut down runtime for workloads like this, but nothing
    we have specifically tested for these workloads and no ETA on when that
    would be available/fully working.

It is also possible that you aren't running with enough CPU thread contexts
and the program is infinitely looping there (ROCm launches additional CPU
processes when setting up a GPU program, these require gem5 to have
additional CPU thread contexts).  But without knowing where the program
seems to be blocked, it's hard to say if this is a problem or not.  But you
could try increasing -n on the command line (e.g., from 3 to 5, or from 5
to 10) to see if this resolves the current problem.  This will not resolve
the above issue though.

  1. I have never personally tried modeling a Transformer in DNNMark, so
    this might be a better question for the DNNMark authors.  But ultimately
    what you are suggesting is the right way to model things in DNNMark -- in
    the config files you can specify a series of layers, one connected after
    another.  So, if you knew what the layers in a Transformer are, in theory
    you could express it in a config file.  This assumes that DNNMark supports
    all of the layers in a Transformer though, which I do not know if that is
    true or not (you would need to ask the DNNMark authors).

  2. This seems like a question for DNNMark's authors.  In gem5, we are just
    running DNNMark in gem5.  But ultimately what I can recommend is you start
    with the base files (e.g.,
    https://gem5.googlesource.com/public/gem5-resources/+/refs/heads/develop/src/gpu/DNNMark/benchmarks/test_alexnet/test_alexnet.cc)
    and the config files (e.g.,
    https://gem5.googlesource.com/public/gem5-resources/+/refs/heads/develop/src/gpu/DNNMark/config_example/alexnet.dnnmark)
    and go from there.  When I started with DNNMark, I would observe the LOG
    prints it prints to the screen, then grep for those prints and examine the
    code.

  3. What is "ruby memory" -- is this L1, L2, or main memory size?
    Something else?  There are documents like this:
    https://www.gem5.org/2020/06/01/towards-full.html,
    https://www.gem5.org/2020/05/30/enabling-multi-gpu.html,
    https://www.gem5.org/2020/05/27/modern-gpu-applications.html, and
    https://www.gem5.org/documentation/general_docs/gpu_models/GCN3.  The GPU
    Ruby system uses the same building blocks as the CPU Ruby models:
    https://www.gem5.org/documentation/learning_gem5/part3/MSIintro/.  Not sure
    what exactly you are looking for though.

Thanks,
Matt

On Tue, May 9, 2023 at 4:34 AM 429442672 429442672@qq.com wrote:

hi everyone,

I have successfully built and ran DNNMark using the command:

sudo docker run --rm -v ${PWD}:${PWD} -v
${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0
-w ${PWD} gcn-gpu
gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py -n 3
--benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_softmax
-cdnnmark_test_fwd_softmax
--options="-config
gem5-resources/src/gpu/DNNMark/config_example/softmax_config.dnnmark -mmap
gem5-resources/src/gpu/DNNMark/mmap.bin"

with the output

Exiting because exiting with last active thread context

which may means i have correctly made the running environment.

However, i tried several benchmarks in

but meet following problems:

  1. problem on running test_fwd_fc

When i run test_fwd_fc using:

sudo docker run --rm -v ${PWD}:${PWD} -v
${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0
-w ${PWD} gcn-gpu gem5/build/GCN3_X86/gem5.opt
gem5/configs/example/apu_se.py -n3
--benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_fc
-c dnnmark_test_fwd_fc
--options="-config
gem5-resources/src/gpu/DNNMark/config_example/fc_config.dnnmark -mmap
gem5-resources/src/gpu/DNNMark/DNNMark_data.dat"

the problem is running for a few hours, even though i have modify the
input data (mmap.bin -> DNNMark_data.dat) to a smaller size 300MB (2GB in
default).
I have also tried several benchmarks, the only benchmark i done is the
test_fwd_pool and test_bwd_pool, when i ran benchmarks such as
conv、pool、fc, the program will be blocked, with out any output.
Is there anything i did wrong here? or these benchmards are too
compute-intensive to run, leading to slow running?
May i ask for any suggestion for running these benchmarks?

  1. problem on running test_VGG and test_alexnet.

I run them with the commands:

sudo docker run --rm -v ${PWD}:${PWD} -v
${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0
-w ${PWD} gcn-gpu gem5/build/GCN3_X86/gem5.opt
gem5/configs/example/apu_se.py -n3
--benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_alexnet
-c dnnmark_test_alexnet
--options="-config
gem5-resources/src/gpu/DNNMark/config_example/alexnet.dnnmark -mmap
gem5-resources/src/gpu/DNNMark/mmap.bin"

and

sudo docker run --rm -v ${PWD}:${PWD} -v
${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0
-w ${PWD} gcn-gpu gem5/build/GCN3_X86/gem5.opt
gem5/configs/example/apu_se.py -n3
--benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_VGG
-c dnnmark_test_VGG
--options="-config
gem5-resources/src/gpu/DNNMark/config_example/VGG.dnnmark -mmap
gem5-resources/src/gpu/DNNMark/mmap.bin"

but they are also blocked.
May i ask for any suggestion for running these benchmarks?

  1. question on modifying DNN network.

May i ask how to modify the DNN network architecture? For example, is it
possible to make a transformer block and run it on gem5? It seems that i
can change the configures in /DNNMark/config_example following the
example of alexnet.dnnmark, without modifing the code in
DNNMark/benchmarks/test_alexnet. May i ask is that correct?

  1. How can i get trace on running DNNMark.

Running DNNMark seem like a block box. It is possible to get the trace of
running DNNMark? For example, the process of data loading, computing, etc.

  1. question on apu_se.py

It seem that all the benchmarks require apu_se.py. May i ask is there any
more detailed documents to introduce what this apu_se.py did and how to
modify it?For example,how can i add more ruby memory to the gpu.

The documents and introduction for gem5 gcn gpu is pretty few, if it is
possible, could any one provide some help for me?

Thank you all very much!

Hi, Trying to answer your various questions: 1. Similar to #2 below, I am unclear what "blocked" means. It sounds like the program is just running, but is slower than you were hoping it would be? If so, unfortunately, this is a well known problem with detailed simulators like gem5 -- they can take a long time to simulate a workload. However, there is another option, where you aren't using enough thread contexts, see #2 below. If you are willing to, you can decrease the batch size, and usually the program simulates faster. For FWD_FC in particular, you would do this by decreasing n (e.g., from to 100 to 4, 8, or 16): https://gem5.googlesource.com/public/gem5-resources/+/refs/heads/develop/src/gpu/DNNMark/config_example/fc_config.dnnmark#6 . 2. Define blocked -- what does this mean? The bigger benchmarks here are very large ML workloads, it would not surprise me if they took days (or maybe weeks) to run them end-to-end in gem5. Are you seeing kernels progressing through it (e.g., use the GPUKernelInfo debug flag to print when kernels launch and exit)? If you are seeing kernels progress, it's just a really large workload and you'd have to be more patient. My group is working on ways to cut down runtime for workloads like this, but nothing we have specifically tested for these workloads and no ETA on when that would be available/fully working. It is also possible that you aren't running with enough CPU thread contexts and the program is infinitely looping there (ROCm launches additional CPU processes when setting up a GPU program, these require gem5 to have additional CPU thread contexts). But without knowing where the program seems to be blocked, it's hard to say if this is a problem or not. But you could try increasing -n on the command line (e.g., from 3 to 5, or from 5 to 10) to see if this resolves the current problem. This will not resolve the above issue though. 3. I have never personally tried modeling a Transformer in DNNMark, so this might be a better question for the DNNMark authors. But ultimately what you are suggesting is the right way to model things in DNNMark -- in the config files you can specify a series of layers, one connected after another. So, if you knew what the layers in a Transformer are, in theory you could express it in a config file. This assumes that DNNMark supports all of the layers in a Transformer though, which I do not know if that is true or not (you would need to ask the DNNMark authors). 4. This seems like a question for DNNMark's authors. In gem5, we are just running DNNMark in gem5. But ultimately what I can recommend is you start with the base files (e.g., https://gem5.googlesource.com/public/gem5-resources/+/refs/heads/develop/src/gpu/DNNMark/benchmarks/test_alexnet/test_alexnet.cc) and the config files (e.g., https://gem5.googlesource.com/public/gem5-resources/+/refs/heads/develop/src/gpu/DNNMark/config_example/alexnet.dnnmark) and go from there. When I started with DNNMark, I would observe the LOG prints it prints to the screen, then grep for those prints and examine the code. 5. What is "ruby memory" -- is this L1, L2, or main memory size? Something else? There are documents like this: https://www.gem5.org/2020/06/01/towards-full.html, https://www.gem5.org/2020/05/30/enabling-multi-gpu.html, https://www.gem5.org/2020/05/27/modern-gpu-applications.html, and https://www.gem5.org/documentation/general_docs/gpu_models/GCN3. The GPU Ruby system uses the same building blocks as the CPU Ruby models: https://www.gem5.org/documentation/learning_gem5/part3/MSIintro/. Not sure what exactly you are looking for though. Thanks, Matt On Tue, May 9, 2023 at 4:34 AM 429442672 <429442672@qq.com> wrote: > > hi everyone, > > I have successfully built and ran DNNMark using the command: > > sudo docker run --rm -v ${PWD}:${PWD} -v > ${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 > -w ${PWD} gcn-gpu > gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py -n 3 > --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_softmax > -cdnnmark_test_fwd_softmax > --options="-config > gem5-resources/src/gpu/DNNMark/config_example/softmax_config.dnnmark -mmap > gem5-resources/src/gpu/DNNMark/mmap.bin" > > with the output > > Exiting because exiting with last active thread context > > which may means i have correctly made the running environment. > > > However, i tried several benchmarks in > > > but meet following problems: > > 1. problem on running test_fwd_fc > > When i run test_fwd_fc using: > > sudo docker run --rm -v ${PWD}:${PWD} -v > ${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 > -w ${PWD} gcn-gpu gem5/build/GCN3_X86/gem5.opt > gem5/configs/example/apu_se.py -n3 > --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_fc > -c dnnmark_test_fwd_fc > --options="-config > gem5-resources/src/gpu/DNNMark/config_example/fc_config.dnnmark -mmap > gem5-resources/src/gpu/DNNMark/DNNMark_data.dat" > > the problem is running for a few hours, even though i have modify the > input data (mmap.bin -> DNNMark_data.dat) to a smaller size 300MB (2GB in > default). > I have also tried several benchmarks, the only benchmark i done is the > test_fwd_pool and test_bwd_pool, when i ran benchmarks such as > conv、pool、fc, the program will be blocked, with out any output. > Is there anything i did wrong here? or these benchmards are too > compute-intensive to run, leading to slow running? > May i ask for any suggestion for running these benchmarks? > > 2. problem on running test_VGG and test_alexnet. > > I run them with the commands: > > sudo docker run --rm -v ${PWD}:${PWD} -v > ${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 > -w ${PWD} gcn-gpu gem5/build/GCN3_X86/gem5.opt > gem5/configs/example/apu_se.py -n3 > --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_alexnet > -c dnnmark_test_alexnet > --options="-config > gem5-resources/src/gpu/DNNMark/config_example/alexnet.dnnmark -mmap > gem5-resources/src/gpu/DNNMark/mmap.bin" > > and > > sudo docker run --rm -v ${PWD}:${PWD} -v > ${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 > -w ${PWD} gcn-gpu gem5/build/GCN3_X86/gem5.opt > gem5/configs/example/apu_se.py -n3 > --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_VGG > -c dnnmark_test_VGG > --options="-config > gem5-resources/src/gpu/DNNMark/config_example/VGG.dnnmark -mmap > gem5-resources/src/gpu/DNNMark/mmap.bin" > > > but they are also blocked. > May i ask for any suggestion for running these benchmarks? > > 3. question on modifying DNN network. > > May i ask how to modify the DNN network architecture? For example, is it > possible to make a transformer block and run it on gem5? It seems that i > can change the configures in /DNNMark/config_example following the > example of alexnet.dnnmark, without modifing the code in > DNNMark/benchmarks/test_alexnet. May i ask is that correct? > > 4. How can i get trace on running DNNMark. > > Running DNNMark seem like a block box. It is possible to get the trace of > running DNNMark? For example, the process of data loading, computing, etc. > > 5. question on apu_se.py > > It seem that all the benchmarks require apu_se.py. May i ask is there any > more detailed documents to introduce what this apu_se.py did and how to > modify it?For example,how can i add more ruby memory to the gpu. > > > > > The documents and introduction for gem5 gcn gpu is pretty few, if it is > possible, could any one provide some help for me? > > Thank you all very much! >
4
429442672
Wed, May 10, 2023 6:33 AM

Thank you so much for taking the time to answer my questions

For the question 1:

yes, blocked means as what you said: "the program is just running"

I followed your suggestion and made some modifications:

a. for src/gpu/DNNMark/config_example/fc_config.dnnmark:

b. i generate a 30MB data as input, instead of using the mmap.bin

then i ran:

sudo docker run --rm -v ${PWD}:${PWD} -v ${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 -w ${PWD} gcn-gpu gem5/build/GCN3_X86/gem5.opt

gem5/configs/example/apu_se.py -n4 --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_fc -c dnnmark_test_fwd_fc

--options="-config gem5-resources/src/gpu/DNNMark/config_example/fc_config.dnnmark -mmap gem5-resources/src/gpu/DNNMark/DNNMark_data.dat"

after a few hours, i got the output with:

build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall mprotect(...)

build/GCN3_X86/arch/generic/debugfaults.hh:145: warn: MOVNTDQ: Ignoring non-temporal hint, modeling as cacheable!

build/GCN3_X86/sim/mem_state.cc:99: panic: Someone allocated physical memory at VA 0x10000000 without creating a VMA!

Memory Usage: 22622544 KBytes

Program aborted at tick 10636412834000

--- BEGIN LIBC BACKTRACE ---

gem5/build/GCN3_X86/gem5.opt(+0x19efd50)[0x560c0deabd50]

gem5/build/GCN3_X86/gem5.opt(+0x1a1425e)[0x560c0ded025e]

/lib/x86_64-linux-gnu/libpthread.so.0(+0x14420)[0x7f33bb73f420]

/lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb)[0x7f33ba8e400b]

/lib/x86_64-linux-gnu/libc.so.6(abort+0x12b)[0x7f33ba8c3859]

gem5/build/GCN3_X86/gem5.opt(+0x4c20a5)[0x560c0c97e0a5]

gem5/build/GCN3_X86/gem5.opt(+0x1a80f2b)[0x560c0df3cf2b]

gem5/build/GCN3_X86/gem5.opt(+0x1a81623)[0x560c0df3d623]

gem5/build/GCN3_X86/gem5.opt(+0x1a928ab)[0x560c0df4e8ab]

gem5/build/GCN3_X86/gem5.opt(+0x12a3c92)[0x560c0d75fc92]

gem5/build/GCN3_X86/gem5.opt(+0x12dc7f5)[0x560c0d7987f5]

gem5/build/GCN3_X86/gem5.opt(+0x1304b15)[0x560c0d7c0b15]

gem5/build/GCN3_X86/gem5.opt(+0x1304cc0)[0x560c0d7c0cc0]

gem5/build/GCN3_X86/gem5.opt(+0x1a9427f)[0x560c0df5027f]

gem5/build/GCN3_X86/gem5.opt(+0x129bef0)[0x560c0d757ef0]

gem5/build/GCN3_X86/gem5.opt(+0x1a7333f)[0x560c0df2f33f]

gem5/build/GCN3_X86/gem5.opt(+0x16b9804)[0x560c0db75804]

gem5/build/GCN3_X86/gem5.opt(+0x16b3dc8)[0x560c0db6fdc8]

gem5/build/GCN3_X86/gem5.opt(+0x16b4b80)[0x560c0db70b80]

gem5/build/GCN3_X86/gem5.opt(+0x1a03665)[0x560c0debf665]

gem5/build/GCN3_X86/gem5.opt(+0x1a2bab4)[0x560c0dee7ab4]

gem5/build/GCN3_X86/gem5.opt(+0x1a2c093)[0x560c0dee8093]

gem5/build/GCN3_X86/gem5.opt(+0xadded2)[0x560c0cf99ed2]

gem5/build/GCN3_X86/gem5.opt(+0x4b6757)[0x560c0c972757]

/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x2a8748)[0x7f33bb9f6748]

/lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x8dd8)[0x7f33bb7cbf48]

/lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0x8fb)[0x7f33bb918e4b]

/lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyFunction_Vectorcall+0x94)[0x7f33bb9f6124]

/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x74d6d)[0x7f33bb7c2d6d]

/lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x7d86)[0x7f33bb7caef6]

/lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0x8fb)[0x7f33bb918e4b]

/lib/x86_64-linux-gnu/libpython3.8.so.1.0(PyEval_EvalCodeEx+0x42)[0x7f33bb9191d2]

--- END LIBC BACKTRACE ---

Failed to execute default signal handler!

I don't know what i did wrong.
Have you ever tried running this benchmark or the benchmarks like alexnet or VGG? 
May I ask for some advices for successfully runing this test_fwd_fc?

Thank you !!

------------------ 原始邮件 ------------------
发件人:                                                                                                                        "Matt Sinclair"                                                                                    <mattdsinclair.wisc@gmail.com>;
发送时间: 2023年5月10日(星期三) 凌晨5:34
收件人: "429442672"<429442672@qq.com>;
抄送: "gem5-dev"<gem5-dev@gem5.org>;"gem5-users"<gem5-users@gem5.org>;"Poremba, Matthew"<Matthew.Poremba@amd.com>;
主题: Re: Problem on simulating GCN3 GPU: Running DNNMark too slow.

Hi,

Trying to answer your various questions:

1.  Similar to #2 below, I am unclear what "blocked" means.  It sounds like the program is just running, but is slower than you were hoping it would be?  If so, unfortunately, this is a well known problem with detailed simulators like gem5 -- they can take a long time to simulate a workload.  However, there is another option, where you aren't using enough thread contexts, see #2 below.  If you are willing to, you can decrease the batch size, and usually the program simulates faster.  For FWD_FC in particular, you would do this by decreasing n (e.g., from to 100 to 4, 8, or 16): https://gem5.googlesource.com/public/gem5-resources/+/refs/heads/develop/src/gpu/DNNMark/config_example/fc_config.dnnmark#6.

2.  Define blocked -- what does this mean?  The bigger benchmarks here are very large ML workloads, it would not surprise me if they took days (or maybe weeks) to run them end-to-end in gem5.  Are you seeing kernels progressing through it (e.g., use the GPUKernelInfo debug flag to print when kernels launch and exit)?  If you are seeing kernels progress, it's just a really large workload and you'd have to be more patient.  My group is working on ways to cut down runtime for workloads like this, but nothing we have specifically tested for these workloads and no ETA on when that would be available/fully working.

It is also possible that you aren't running with enough CPU thread contexts and the program is infinitely looping there (ROCm launches additional CPU processes when setting up a GPU program, these require gem5 to have additional CPU thread contexts).  But without knowing where the program seems to be blocked, it's hard to say if this is a problem or not.  But you could try increasing -n on the command line (e.g., from 3 to 5, or from 5 to 10) to see if this resolves the current problem.  This will not resolve the above issue though.

3.  I have never personally tried modeling a Transformer in DNNMark, so this might be a better question for the DNNMark authors.  But ultimately what you are suggesting is the right way to model things in DNNMark -- in the config files you can specify a series of layers, one connected after another.  So, if you knew what the layers in a Transformer are, in theory you could express it in a config file.  This assumes that DNNMark supports all of the layers in a Transformer though, which I do not know if that is true or not (you would need to ask the DNNMark authors).

4.  This seems like a question for DNNMark's authors.  In gem5, we are just running DNNMark in gem5.  But ultimately what I can recommend is you start with the base files (e.g., https://gem5.googlesource.com/public/gem5-resources/+/refs/heads/develop/src/gpu/DNNMark/benchmarks/test_alexnet/test_alexnet.cc) and the config files (e.g., https://gem5.googlesource.com/public/gem5-resources/+/refs/heads/develop/src/gpu/DNNMark/config_example/alexnet.dnnmark) and go from there.  When I started with DNNMark, I would observe the LOG prints it prints to the screen, then grep for those prints and examine the code.

5.  What is "ruby memory" -- is this L1, L2, or main memory size?  Something else?  There are documents like this: https://www.gem5.org/2020/06/01/towards-full.html, https://www.gem5.org/2020/05/30/enabling-multi-gpu.html, https://www.gem5.org/2020/05/27/modern-gpu-applications.html, and https://www.gem5.org/documentation/general_docs/gpu_models/GCN3.  The GPU Ruby system uses the same building blocks as the CPU Ruby models: https://www.gem5.org/documentation/learning_gem5/part3/MSIintro/.  Not sure what exactly you are looking for though.

Thanks,
Matt

On Tue, May 9, 2023 at 4:34 AM 429442672 <429442672@qq.com> wrote:

hi everyone,

I have successfully built and ran DNNMark using the command:

sudo docker run --rm -v ${PWD}:${PWD} -v ${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 -w ${PWD} gcn-gpu
gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py -n 3
--benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_softmax -cdnnmark_test_fwd_softmax
--options="-config gem5-resources/src/gpu/DNNMark/config_example/softmax_config.dnnmark -mmap gem5-resources/src/gpu/DNNMark/mmap.bin"
with the output
Exiting because  exiting with last active thread context

which may means i have correctly made the running environment.

However, i tried several benchmarks in

but meet following problems:

  1. problem on running test_fwd_fc

When i run test_fwd_fc using:
sudo docker run --rm -v ${PWD}:${PWD} -v ${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 -w ${PWD} gcn-gpu gem5/build/GCN3_X86/gem5.opt
gem5/configs/example/apu_se.py -n3 --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_fc -c dnnmark_test_fwd_fc
--options="-config gem5-resources/src/gpu/DNNMark/config_example/fc_config.dnnmark -mmap gem5-resources/src/gpu/DNNMark/DNNMark_data.dat"

the problem is running for a few hours, even though i have modify the input data (mmap.bin -> DNNMark_data.dat) to a smaller size 300MB (2GB in default).
I have also tried several benchmarks, the only benchmark i done is the test_fwd_pool and test_bwd_pool, when i ran benchmarks such as conv、pool、fc, the program will be blocked, with out any output.
Is there anything i did wrong here? or these benchmards are too compute-intensive to run, leading to slow running?
May i ask for any suggestion for running these benchmarks?

  1. problem on running test_VGG and test_alexnet.

I run them with the commands:
sudo docker run --rm -v ${PWD}:${PWD} -v ${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 -w ${PWD} gcn-gpu gem5/build/GCN3_X86/gem5.opt
gem5/configs/example/apu_se.py -n3 --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_alexnet -c dnnmark_test_alexnet
--options="-config gem5-resources/src/gpu/DNNMark/config_example/alexnet.dnnmark -mmap gem5-resources/src/gpu/DNNMark/mmap.bin"

and
sudo docker run --rm -v ${PWD}:${PWD} -v ${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 -w ${PWD} gcn-gpu gem5/build/GCN3_X86/gem5.opt
gem5/configs/example/apu_se.py -n3 --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_VGG -c dnnmark_test_VGG
--options="-config gem5-resources/src/gpu/DNNMark/config_example/VGG.dnnmark -mmap gem5-resources/src/gpu/DNNMark/mmap.bin"

but they are also blocked.
May i ask for any suggestion for running these benchmarks?

  1. question on modifying DNN network.

May i ask how to modify the DNN network architecture? For example, is it possible to make a transformer block and run it on gem5? It seems that i can change the configures in /DNNMark/config_example following the
example of alexnet.dnnmark, without modifing the code in DNNMark/benchmarks/test_alexnet. May i ask is that correct?

  1. How can i get trace on running DNNMark.

Running DNNMark seem like a block box. It is possible to get the trace of running DNNMark? For example, the process of data loading, computing, etc.

  1. question on apu_se.py

It seem that all the benchmarks require apu_se.py. May i ask is there any more detailed documents to introduce what this apu_se.py did and how to modify it?For example,how can i add more ruby memory to the gpu.

The documents and introduction for gem5 gcn gpu is pretty few, if it is possible, could any one provide some help for me?

Thank you all very much!

Thank you so much for taking the time to answer my questions For the question 1: yes, blocked means as what you said: "the program is just running" I followed your suggestion and made some modifications: a. for&nbsp;src/gpu/DNNMark/config_example/fc_config.dnnmark: b. i generate a 30MB data as input, instead of using the mmap.bin then i ran: sudo docker run --rm -v ${PWD}:${PWD} -v ${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 -w ${PWD} gcn-gpu gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py -n4 --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_fc -c dnnmark_test_fwd_fc --options="-config gem5-resources/src/gpu/DNNMark/config_example/fc_config.dnnmark -mmap gem5-resources/src/gpu/DNNMark/DNNMark_data.dat" after a few hours, i got the output with: build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall mprotect(...) build/GCN3_X86/arch/generic/debugfaults.hh:145: warn: MOVNTDQ: Ignoring non-temporal hint, modeling as cacheable! build/GCN3_X86/sim/mem_state.cc:99: panic: Someone allocated physical memory at VA 0x10000000 without creating a VMA! Memory Usage: 22622544 KBytes Program aborted at tick 10636412834000 --- BEGIN LIBC BACKTRACE --- gem5/build/GCN3_X86/gem5.opt(+0x19efd50)[0x560c0deabd50] gem5/build/GCN3_X86/gem5.opt(+0x1a1425e)[0x560c0ded025e] /lib/x86_64-linux-gnu/libpthread.so.0(+0x14420)[0x7f33bb73f420] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb)[0x7f33ba8e400b] /lib/x86_64-linux-gnu/libc.so.6(abort+0x12b)[0x7f33ba8c3859] gem5/build/GCN3_X86/gem5.opt(+0x4c20a5)[0x560c0c97e0a5] gem5/build/GCN3_X86/gem5.opt(+0x1a80f2b)[0x560c0df3cf2b] gem5/build/GCN3_X86/gem5.opt(+0x1a81623)[0x560c0df3d623] gem5/build/GCN3_X86/gem5.opt(+0x1a928ab)[0x560c0df4e8ab] gem5/build/GCN3_X86/gem5.opt(+0x12a3c92)[0x560c0d75fc92] gem5/build/GCN3_X86/gem5.opt(+0x12dc7f5)[0x560c0d7987f5] gem5/build/GCN3_X86/gem5.opt(+0x1304b15)[0x560c0d7c0b15] gem5/build/GCN3_X86/gem5.opt(+0x1304cc0)[0x560c0d7c0cc0] gem5/build/GCN3_X86/gem5.opt(+0x1a9427f)[0x560c0df5027f] gem5/build/GCN3_X86/gem5.opt(+0x129bef0)[0x560c0d757ef0] gem5/build/GCN3_X86/gem5.opt(+0x1a7333f)[0x560c0df2f33f] gem5/build/GCN3_X86/gem5.opt(+0x16b9804)[0x560c0db75804] gem5/build/GCN3_X86/gem5.opt(+0x16b3dc8)[0x560c0db6fdc8] gem5/build/GCN3_X86/gem5.opt(+0x16b4b80)[0x560c0db70b80] gem5/build/GCN3_X86/gem5.opt(+0x1a03665)[0x560c0debf665] gem5/build/GCN3_X86/gem5.opt(+0x1a2bab4)[0x560c0dee7ab4] gem5/build/GCN3_X86/gem5.opt(+0x1a2c093)[0x560c0dee8093] gem5/build/GCN3_X86/gem5.opt(+0xadded2)[0x560c0cf99ed2] gem5/build/GCN3_X86/gem5.opt(+0x4b6757)[0x560c0c972757] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x2a8748)[0x7f33bb9f6748] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x8dd8)[0x7f33bb7cbf48] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0x8fb)[0x7f33bb918e4b] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyFunction_Vectorcall+0x94)[0x7f33bb9f6124] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x74d6d)[0x7f33bb7c2d6d] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x7d86)[0x7f33bb7caef6] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0x8fb)[0x7f33bb918e4b] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(PyEval_EvalCodeEx+0x42)[0x7f33bb9191d2] --- END LIBC BACKTRACE --- Failed to execute default signal handler! I don't know what i did wrong. Have you ever tried running this benchmark or the benchmarks like alexnet or VGG?&nbsp; May I ask for some advices for successfully runing this&nbsp;test_fwd_fc? Thank you !! ------------------&nbsp;原始邮件&nbsp;------------------ 发件人: "Matt Sinclair" <mattdsinclair.wisc@gmail.com&gt;; 发送时间:&nbsp;2023年5月10日(星期三) 凌晨5:34 收件人:&nbsp;"429442672"<429442672@qq.com&gt;; 抄送:&nbsp;"gem5-dev"<gem5-dev@gem5.org&gt;;"gem5-users"<gem5-users@gem5.org&gt;;"Poremba, Matthew"<Matthew.Poremba@amd.com&gt;; 主题:&nbsp;Re: Problem on simulating GCN3 GPU: Running DNNMark too slow. Hi, Trying to answer your various questions: 1.&nbsp; Similar to #2 below, I am unclear what "blocked" means.&nbsp; It sounds like the program is just running, but is slower than you were hoping it would be?&nbsp; If so, unfortunately, this is a well known problem with detailed simulators like gem5 -- they can take a long time to simulate a workload.&nbsp; However, there is another option, where you aren't using enough thread contexts, see #2 below.&nbsp; If you are willing to, you can decrease the batch size, and usually the program simulates faster.&nbsp; For FWD_FC in particular, you would do this by decreasing n (e.g., from to 100 to 4, 8, or 16): https://gem5.googlesource.com/public/gem5-resources/+/refs/heads/develop/src/gpu/DNNMark/config_example/fc_config.dnnmark#6. 2.&nbsp; Define blocked -- what does this mean?&nbsp; The bigger benchmarks here are very large ML workloads, it would not surprise me if they took days (or maybe weeks) to run them end-to-end in gem5.&nbsp; Are you seeing kernels progressing through it (e.g., use the GPUKernelInfo debug flag to print when kernels launch and exit)?&nbsp; If you are seeing kernels progress, it's just a really large workload and you'd have to be more patient.&nbsp; My group is working on ways to cut down runtime for workloads like this, but nothing we have specifically tested for these workloads and no ETA on when that would be available/fully working. It is also possible that you aren't running with enough CPU thread contexts and the program is infinitely looping there (ROCm launches additional CPU processes when setting up a GPU program, these require gem5 to have additional CPU thread contexts).&nbsp; But without knowing where the program seems to be blocked, it's hard to say if this is a problem or not.&nbsp; But you could try increasing -n on the command line (e.g., from 3 to 5, or from 5 to 10) to see if this resolves the current problem.&nbsp; This will not resolve the above issue though. 3.&nbsp; I have never personally tried modeling a Transformer in DNNMark, so this might be a better question for the DNNMark authors.&nbsp; But ultimately what you are suggesting is the right way to model things in DNNMark -- in the config files you can specify a series of layers, one connected after another.&nbsp; So, if you knew what the layers in a Transformer are, in theory you could express it in a config file.&nbsp; This assumes that DNNMark supports all of the layers in a Transformer though, which I do not know if that is true or not (you would need to ask the DNNMark authors). 4.&nbsp; This seems like a question for DNNMark's authors.&nbsp; In gem5, we are just running DNNMark in gem5.&nbsp; But ultimately what I can recommend is you start with the base files (e.g., https://gem5.googlesource.com/public/gem5-resources/+/refs/heads/develop/src/gpu/DNNMark/benchmarks/test_alexnet/test_alexnet.cc) and the config files (e.g., https://gem5.googlesource.com/public/gem5-resources/+/refs/heads/develop/src/gpu/DNNMark/config_example/alexnet.dnnmark) and go from there.&nbsp; When I started with DNNMark, I would observe the LOG prints it prints to the screen, then grep for those prints and examine the code. 5.&nbsp; What is "ruby memory" -- is this L1, L2, or main memory size?&nbsp; Something else?&nbsp; There are documents like this: https://www.gem5.org/2020/06/01/towards-full.html, https://www.gem5.org/2020/05/30/enabling-multi-gpu.html, https://www.gem5.org/2020/05/27/modern-gpu-applications.html, and https://www.gem5.org/documentation/general_docs/gpu_models/GCN3.&nbsp; The GPU Ruby system uses the same building blocks as the CPU Ruby models: https://www.gem5.org/documentation/learning_gem5/part3/MSIintro/.&nbsp; Not sure what exactly you are looking for though. Thanks, Matt On Tue, May 9, 2023 at 4:34 AM 429442672 <429442672@qq.com&gt; wrote: hi everyone, I have successfully built and ran DNNMark using the command: sudo docker run --rm -v ${PWD}:${PWD} -v ${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 -w ${PWD} gcn-gpu gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py -n 3 --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_softmax -cdnnmark_test_fwd_softmax --options="-config gem5-resources/src/gpu/DNNMark/config_example/softmax_config.dnnmark -mmap gem5-resources/src/gpu/DNNMark/mmap.bin" with the output Exiting because exiting with last active thread context which may means i have correctly made the running environment. However, i tried several benchmarks in but meet following problems: 1. problem on running test_fwd_fc When i run test_fwd_fc using: sudo docker run --rm -v ${PWD}:${PWD} -v ${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 -w ${PWD} gcn-gpu gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py -n3 --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_fc -c dnnmark_test_fwd_fc --options="-config gem5-resources/src/gpu/DNNMark/config_example/fc_config.dnnmark -mmap gem5-resources/src/gpu/DNNMark/DNNMark_data.dat" the problem is running for a few hours, even though i have modify the input data (mmap.bin -&gt; DNNMark_data.dat) to a smaller size 300MB (2GB in default). I have also tried several benchmarks, the only benchmark i done is the test_fwd_pool and test_bwd_pool, when i ran benchmarks such as conv、pool、fc, the program will be blocked, with out any output. Is there anything i did wrong here? or these benchmards are too compute-intensive to run, leading to slow running? May i ask for any suggestion for running these benchmarks? 2. problem on running test_VGG and test_alexnet. I run them with the commands: sudo docker run --rm -v ${PWD}:${PWD} -v ${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 -w ${PWD} gcn-gpu gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py -n3 --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_alexnet -c dnnmark_test_alexnet --options="-config gem5-resources/src/gpu/DNNMark/config_example/alexnet.dnnmark -mmap gem5-resources/src/gpu/DNNMark/mmap.bin" and sudo docker run --rm -v ${PWD}:${PWD} -v ${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 -w ${PWD} gcn-gpu gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py -n3 --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_VGG -c dnnmark_test_VGG --options="-config gem5-resources/src/gpu/DNNMark/config_example/VGG.dnnmark -mmap gem5-resources/src/gpu/DNNMark/mmap.bin" but they are also blocked. May i ask for any suggestion for running these benchmarks? 3. question on modifying DNN network. May i ask how to modify the DNN network architecture? For example, is it possible to make a transformer block and run it on gem5? It seems that i can change the configures in /DNNMark/config_example following the example of alexnet.dnnmark, without modifing the code in DNNMark/benchmarks/test_alexnet. May i ask is that correct? 4. How can i get trace on running DNNMark. Running DNNMark seem like a block box. It is possible to get the trace of running DNNMark? For example, the process of data loading, computing, etc. 5. question on apu_se.py It seem that all the benchmarks require apu_se.py. May i ask is there any more detailed documents to introduce what this apu_se.py did and how to modify it?For example,how can i add more ruby memory to the gpu. The documents and introduction for gem5 gcn gpu is pretty few, if it is possible, could any one provide some help for me? Thank you all very much!
MS
Matt Sinclair
Wed, May 10, 2023 6:15 PM

I have not personally tried to run FWD_FC lately -- we do test some of the
DNNMark layers in the weekly test, but testing all of them is not
practical.  Looking at the error though, it appears that the simulated gem5
memory system is not big enough.  I would try increasing --mem-size on the
command line.  For example, try --mem-size=8GB or --mem-size=16GB (not sure
exactly how much memory FWD_FC needs).

Matt

On Wed, May 10, 2023 at 1:33 AM 429442672 429442672@qq.com wrote:

Thank you so much for taking the time to answer my questions

For the question 1:

yes, blocked means as what you said: "the program is just running"

I followed your suggestion and made some modifications:

a. for src/gpu/DNNMark/config_example/fc_config.dnnmark:

b. i generate a 30MB data as input, instead of using the mmap.bin

then i ran:

sudo docker run --rm -v ${PWD}:${PWD} -v
${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0
-w ${PWD} gcn-gpu gem5/build/GCN3_X86/gem5.opt
gem5/configs/example/apu_se.py -n4
--benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_fc
-c dnnmark_test_fwd_fc
--options="-config
gem5-resources/src/gpu/DNNMark/config_example/fc_config.dnnmark -mmap
gem5-resources/src/gpu/DNNMark/DNNMark_data.dat"

after a few hours, i got the output with:

build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall mprotect(...)
build/GCN3_X86/arch/generic/debugfaults.hh:145: warn: MOVNTDQ: Ignoring
non-temporal hint, modeling as cacheable!
build/GCN3_X86/sim/mem_state.cc:99: panic: Someone allocated physical
memory at VA 0x10000000 without creating a VMA!
Memory Usage: 22622544 KBytes
Program aborted at tick 10636412834000
--- BEGIN LIBC BACKTRACE ---
gem5/build/GCN3_X86/gem5.opt(+0x19efd50)[0x560c0deabd50]
gem5/build/GCN3_X86/gem5.opt(+0x1a1425e)[0x560c0ded025e]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x14420)[0x7f33bb73f420]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb)[0x7f33ba8e400b]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x12b)[0x7f33ba8c3859]
gem5/build/GCN3_X86/gem5.opt(+0x4c20a5)[0x560c0c97e0a5]
gem5/build/GCN3_X86/gem5.opt(+0x1a80f2b)[0x560c0df3cf2b]
gem5/build/GCN3_X86/gem5.opt(+0x1a81623)[0x560c0df3d623]
gem5/build/GCN3_X86/gem5.opt(+0x1a928ab)[0x560c0df4e8ab]
gem5/build/GCN3_X86/gem5.opt(+0x12a3c92)[0x560c0d75fc92]
gem5/build/GCN3_X86/gem5.opt(+0x12dc7f5)[0x560c0d7987f5]
gem5/build/GCN3_X86/gem5.opt(+0x1304b15)[0x560c0d7c0b15]
gem5/build/GCN3_X86/gem5.opt(+0x1304cc0)[0x560c0d7c0cc0]
gem5/build/GCN3_X86/gem5.opt(+0x1a9427f)[0x560c0df5027f]
gem5/build/GCN3_X86/gem5.opt(+0x129bef0)[0x560c0d757ef0]
gem5/build/GCN3_X86/gem5.opt(+0x1a7333f)[0x560c0df2f33f]
gem5/build/GCN3_X86/gem5.opt(+0x16b9804)[0x560c0db75804]
gem5/build/GCN3_X86/gem5.opt(+0x16b3dc8)[0x560c0db6fdc8]
gem5/build/GCN3_X86/gem5.opt(+0x16b4b80)[0x560c0db70b80]
gem5/build/GCN3_X86/gem5.opt(+0x1a03665)[0x560c0debf665]
gem5/build/GCN3_X86/gem5.opt(+0x1a2bab4)[0x560c0dee7ab4]
gem5/build/GCN3_X86/gem5.opt(+0x1a2c093)[0x560c0dee8093]
gem5/build/GCN3_X86/gem5.opt(+0xadded2)[0x560c0cf99ed2]
gem5/build/GCN3_X86/gem5.opt(+0x4b6757)[0x560c0c972757]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x2a8748)[0x7f33bb9f6748]

/lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x8dd8)[0x7f33bb7cbf48]

/lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0x8fb)[0x7f33bb918e4b]

/lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyFunction_Vectorcall+0x94)[0x7f33bb9f6124]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x74d6d)[0x7f33bb7c2d6d]

/lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x7d86)[0x7f33bb7caef6]

/lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0x8fb)[0x7f33bb918e4b]

/lib/x86_64-linux-gnu/libpython3.8.so.1.0(PyEval_EvalCodeEx+0x42)[0x7f33bb9191d2]
--- END LIBC BACKTRACE ---
Failed to execute default signal handler!

I don't know what i did wrong.
Have you ever tried running this benchmark or the benchmarks like alexnet
or VGG?
May I ask for some advices for successfully runing this test_fwd_fc?

Thank you !!

------------------ 原始邮件 ------------------
发件人: "Matt Sinclair" mattdsinclair.wisc@gmail.com;
发送时间: 2023年5月10日(星期三) 凌晨5:34
收件人: "429442672"429442672@qq.com;
抄送: "gem5-dev"gem5-dev@gem5.org;"gem5-users"gem5-users@gem5.org;"Poremba,
Matthew"Matthew.Poremba@amd.com;
主题: Re: Problem on simulating GCN3 GPU: Running DNNMark too slow.

Hi,

Trying to answer your various questions:

  1. Similar to #2 below, I am unclear what "blocked" means.  It sounds
    like the program is just running, but is slower than you were hoping it
    would be?  If so, unfortunately, this is a well known problem with detailed
    simulators like gem5 -- they can take a long time to simulate a workload.
    However, there is another option, where you aren't using enough thread
    contexts, see #2 below.  If you are willing to, you can decrease the batch
    size, and usually the program simulates faster.  For FWD_FC in particular,
    you would do this by decreasing n (e.g., from to 100 to 4, 8, or 16):
    https://gem5.googlesource.com/public/gem5-resources/+/refs/heads/develop/src/gpu/DNNMark/config_example/fc_config.dnnmark#6
    .

  2. Define blocked -- what does this mean?  The bigger benchmarks here are
    very large ML workloads, it would not surprise me if they took days (or
    maybe weeks) to run them end-to-end in gem5.  Are you seeing kernels
    progressing through it (e.g., use the GPUKernelInfo debug flag to print
    when kernels launch and exit)?  If you are seeing kernels progress, it's
    just a really large workload and you'd have to be more patient.  My group
    is working on ways to cut down runtime for workloads like this, but nothing
    we have specifically tested for these workloads and no ETA on when that
    would be available/fully working.

It is also possible that you aren't running with enough CPU thread
contexts and the program is infinitely looping there (ROCm launches
additional CPU processes when setting up a GPU program, these require gem5
to have additional CPU thread contexts).  But without knowing where the
program seems to be blocked, it's hard to say if this is a problem or not.
But you could try increasing -n on the command line (e.g., from 3 to 5, or
from 5 to 10) to see if this resolves the current problem.  This will not
resolve the above issue though.

  1. I have never personally tried modeling a Transformer in DNNMark, so
    this might be a better question for the DNNMark authors.  But ultimately
    what you are suggesting is the right way to model things in DNNMark -- in
    the config files you can specify a series of layers, one connected after
    another.  So, if you knew what the layers in a Transformer are, in theory
    you could express it in a config file.  This assumes that DNNMark supports
    all of the layers in a Transformer though, which I do not know if that is
    true or not (you would need to ask the DNNMark authors).

  2. This seems like a question for DNNMark's authors.  In gem5, we are
    just running DNNMark in gem5.  But ultimately what I can recommend is you
    start with the base files (e.g.,
    https://gem5.googlesource.com/public/gem5-resources/+/refs/heads/develop/src/gpu/DNNMark/benchmarks/test_alexnet/test_alexnet.cc)
    and the config files (e.g.,
    https://gem5.googlesource.com/public/gem5-resources/+/refs/heads/develop/src/gpu/DNNMark/config_example/alexnet.dnnmark)
    and go from there.  When I started with DNNMark, I would observe the LOG
    prints it prints to the screen, then grep for those prints and examine the
    code.

  3. What is "ruby memory" -- is this L1, L2, or main memory size?
    Something else?  There are documents like this:
    https://www.gem5.org/2020/06/01/towards-full.html,
    https://www.gem5.org/2020/05/30/enabling-multi-gpu.html,
    https://www.gem5.org/2020/05/27/modern-gpu-applications.html, and
    https://www.gem5.org/documentation/general_docs/gpu_models/GCN3.  The GPU
    Ruby system uses the same building blocks as the CPU Ruby models:
    https://www.gem5.org/documentation/learning_gem5/part3/MSIintro/.  Not
    sure what exactly you are looking for though.

Thanks,
Matt

On Tue, May 9, 2023 at 4:34 AM 429442672 429442672@qq.com wrote:

hi everyone,

I have successfully built and ran DNNMark using the command:

sudo docker run --rm -v ${PWD}:${PWD} -v
${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0
-w ${PWD} gcn-gpu
gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py -n 3
--benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_softmax
-cdnnmark_test_fwd_softmax
--options="-config
gem5-resources/src/gpu/DNNMark/config_example/softmax_config.dnnmark -mmap
gem5-resources/src/gpu/DNNMark/mmap.bin"

with the output

Exiting because exiting with last active thread context

which may means i have correctly made the running environment.

However, i tried several benchmarks in

but meet following problems:

  1. problem on running test_fwd_fc

When i run test_fwd_fc using:

sudo docker run --rm -v ${PWD}:${PWD} -v
${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0
-w ${PWD} gcn-gpu gem5/build/GCN3_X86/gem5.opt
gem5/configs/example/apu_se.py -n3
--benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_fc
-c dnnmark_test_fwd_fc
--options="-config
gem5-resources/src/gpu/DNNMark/config_example/fc_config.dnnmark -mmap
gem5-resources/src/gpu/DNNMark/DNNMark_data.dat"

the problem is running for a few hours, even though i have modify the
input data (mmap.bin -> DNNMark_data.dat) to a smaller size 300MB (2GB
in default).
I have also tried several benchmarks, the only benchmark i done is the
test_fwd_pool and test_bwd_pool, when i ran benchmarks such as
conv、pool、fc, the program will be blocked, with out any output.
Is there anything i did wrong here? or these benchmards are too
compute-intensive to run, leading to slow running?
May i ask for any suggestion for running these benchmarks?

  1. problem on running test_VGG and test_alexnet.

I run them with the commands:

sudo docker run --rm -v ${PWD}:${PWD} -v
${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0
-w ${PWD} gcn-gpu gem5/build/GCN3_X86/gem5.opt
gem5/configs/example/apu_se.py -n3
--benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_alexnet
-c dnnmark_test_alexnet
--options="-config
gem5-resources/src/gpu/DNNMark/config_example/alexnet.dnnmark -mmap
gem5-resources/src/gpu/DNNMark/mmap.bin"

and

sudo docker run --rm -v ${PWD}:${PWD} -v
${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0
-w ${PWD} gcn-gpu gem5/build/GCN3_X86/gem5.opt
gem5/configs/example/apu_se.py -n3
--benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_VGG
-c dnnmark_test_VGG
--options="-config
gem5-resources/src/gpu/DNNMark/config_example/VGG.dnnmark -mmap
gem5-resources/src/gpu/DNNMark/mmap.bin"

but they are also blocked.
May i ask for any suggestion for running these benchmarks?

  1. question on modifying DNN network.

May i ask how to modify the DNN network architecture? For example, is it
possible to make a transformer block and run it on gem5? It seems that i
can change the configures in /DNNMark/config_example following the
example of alexnet.dnnmark, without modifing the code in
DNNMark/benchmarks/test_alexnet. May i ask is that correct?

  1. How can i get trace on running DNNMark.

Running DNNMark seem like a block box. It is possible to get the trace of
running DNNMark? For example, the process of data loading, computing, etc.

  1. question on apu_se.py

It seem that all the benchmarks require apu_se.py. May i ask is there
any more detailed documents to introduce what this apu_se.py did and how
to modify it?For example,how can i add more ruby memory to the gpu.

The documents and introduction for gem5 gcn gpu is pretty few, if it is
possible, could any one provide some help for me?

Thank you all very much!

I have not personally tried to run FWD_FC lately -- we do test some of the DNNMark layers in the weekly test, but testing all of them is not practical. Looking at the error though, it appears that the simulated gem5 memory system is not big enough. I would try increasing --mem-size on the command line. For example, try --mem-size=8GB or --mem-size=16GB (not sure exactly how much memory FWD_FC needs). Matt On Wed, May 10, 2023 at 1:33 AM 429442672 <429442672@qq.com> wrote: > Thank you so much for taking the time to answer my questions > > For the question 1: > > yes, blocked means as what you said: "the program is just running" > > I followed your suggestion and made some modifications: > > a. for src/gpu/DNNMark/config_example/fc_config.dnnmark: > > b. i generate a 30MB data as input, instead of using the mmap.bin > > > then i ran: > > sudo docker run --rm -v ${PWD}:${PWD} -v > ${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 > -w ${PWD} gcn-gpu gem5/build/GCN3_X86/gem5.opt > gem5/configs/example/apu_se.py -n4 > --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_fc > -c dnnmark_test_fwd_fc > --options="-config > gem5-resources/src/gpu/DNNMark/config_example/fc_config.dnnmark -mmap > gem5-resources/src/gpu/DNNMark/DNNMark_data.dat" > > > after a few hours, i got the output with: > > build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall mprotect(...) > build/GCN3_X86/arch/generic/debugfaults.hh:145: warn: MOVNTDQ: Ignoring > non-temporal hint, modeling as cacheable! > build/GCN3_X86/sim/mem_state.cc:99: panic: Someone allocated physical > memory at VA 0x10000000 without creating a VMA! > Memory Usage: 22622544 KBytes > Program aborted at tick 10636412834000 > --- BEGIN LIBC BACKTRACE --- > gem5/build/GCN3_X86/gem5.opt(+0x19efd50)[0x560c0deabd50] > gem5/build/GCN3_X86/gem5.opt(+0x1a1425e)[0x560c0ded025e] > /lib/x86_64-linux-gnu/libpthread.so.0(+0x14420)[0x7f33bb73f420] > /lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb)[0x7f33ba8e400b] > /lib/x86_64-linux-gnu/libc.so.6(abort+0x12b)[0x7f33ba8c3859] > gem5/build/GCN3_X86/gem5.opt(+0x4c20a5)[0x560c0c97e0a5] > gem5/build/GCN3_X86/gem5.opt(+0x1a80f2b)[0x560c0df3cf2b] > gem5/build/GCN3_X86/gem5.opt(+0x1a81623)[0x560c0df3d623] > gem5/build/GCN3_X86/gem5.opt(+0x1a928ab)[0x560c0df4e8ab] > gem5/build/GCN3_X86/gem5.opt(+0x12a3c92)[0x560c0d75fc92] > gem5/build/GCN3_X86/gem5.opt(+0x12dc7f5)[0x560c0d7987f5] > gem5/build/GCN3_X86/gem5.opt(+0x1304b15)[0x560c0d7c0b15] > gem5/build/GCN3_X86/gem5.opt(+0x1304cc0)[0x560c0d7c0cc0] > gem5/build/GCN3_X86/gem5.opt(+0x1a9427f)[0x560c0df5027f] > gem5/build/GCN3_X86/gem5.opt(+0x129bef0)[0x560c0d757ef0] > gem5/build/GCN3_X86/gem5.opt(+0x1a7333f)[0x560c0df2f33f] > gem5/build/GCN3_X86/gem5.opt(+0x16b9804)[0x560c0db75804] > gem5/build/GCN3_X86/gem5.opt(+0x16b3dc8)[0x560c0db6fdc8] > gem5/build/GCN3_X86/gem5.opt(+0x16b4b80)[0x560c0db70b80] > gem5/build/GCN3_X86/gem5.opt(+0x1a03665)[0x560c0debf665] > gem5/build/GCN3_X86/gem5.opt(+0x1a2bab4)[0x560c0dee7ab4] > gem5/build/GCN3_X86/gem5.opt(+0x1a2c093)[0x560c0dee8093] > gem5/build/GCN3_X86/gem5.opt(+0xadded2)[0x560c0cf99ed2] > gem5/build/GCN3_X86/gem5.opt(+0x4b6757)[0x560c0c972757] > /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x2a8748)[0x7f33bb9f6748] > > /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x8dd8)[0x7f33bb7cbf48] > > /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0x8fb)[0x7f33bb918e4b] > > /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyFunction_Vectorcall+0x94)[0x7f33bb9f6124] > /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x74d6d)[0x7f33bb7c2d6d] > > /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x7d86)[0x7f33bb7caef6] > > /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0x8fb)[0x7f33bb918e4b] > > /lib/x86_64-linux-gnu/libpython3.8.so.1.0(PyEval_EvalCodeEx+0x42)[0x7f33bb9191d2] > --- END LIBC BACKTRACE --- > Failed to execute default signal handler! > > > I don't know what i did wrong. > Have you ever tried running this benchmark or the benchmarks like alexnet > or VGG? > May I ask for some advices for successfully runing this test_fwd_fc? > > Thank you !! > > ------------------ 原始邮件 ------------------ > *发件人:* "Matt Sinclair" <mattdsinclair.wisc@gmail.com>; > *发送时间:* 2023年5月10日(星期三) 凌晨5:34 > *收件人:* "429442672"<429442672@qq.com>; > *抄送:* "gem5-dev"<gem5-dev@gem5.org>;"gem5-users"<gem5-users@gem5.org>;"Poremba, > Matthew"<Matthew.Poremba@amd.com>; > *主题:* Re: Problem on simulating GCN3 GPU: Running DNNMark too slow. > > Hi, > > Trying to answer your various questions: > > 1. Similar to #2 below, I am unclear what "blocked" means. It sounds > like the program is just running, but is slower than you were hoping it > would be? If so, unfortunately, this is a well known problem with detailed > simulators like gem5 -- they can take a long time to simulate a workload. > However, there is another option, where you aren't using enough thread > contexts, see #2 below. If you are willing to, you can decrease the batch > size, and usually the program simulates faster. For FWD_FC in particular, > you would do this by decreasing n (e.g., from to 100 to 4, 8, or 16): > https://gem5.googlesource.com/public/gem5-resources/+/refs/heads/develop/src/gpu/DNNMark/config_example/fc_config.dnnmark#6 > . > > 2. Define blocked -- what does this mean? The bigger benchmarks here are > very large ML workloads, it would not surprise me if they took days (or > maybe weeks) to run them end-to-end in gem5. Are you seeing kernels > progressing through it (e.g., use the GPUKernelInfo debug flag to print > when kernels launch and exit)? If you are seeing kernels progress, it's > just a really large workload and you'd have to be more patient. My group > is working on ways to cut down runtime for workloads like this, but nothing > we have specifically tested for these workloads and no ETA on when that > would be available/fully working. > > It is also possible that you aren't running with enough CPU thread > contexts and the program is infinitely looping there (ROCm launches > additional CPU processes when setting up a GPU program, these require gem5 > to have additional CPU thread contexts). But without knowing where the > program seems to be blocked, it's hard to say if this is a problem or not. > But you could try increasing -n on the command line (e.g., from 3 to 5, or > from 5 to 10) to see if this resolves the current problem. This will not > resolve the above issue though. > > 3. I have never personally tried modeling a Transformer in DNNMark, so > this might be a better question for the DNNMark authors. But ultimately > what you are suggesting is the right way to model things in DNNMark -- in > the config files you can specify a series of layers, one connected after > another. So, if you knew what the layers in a Transformer are, in theory > you could express it in a config file. This assumes that DNNMark supports > all of the layers in a Transformer though, which I do not know if that is > true or not (you would need to ask the DNNMark authors). > > 4. This seems like a question for DNNMark's authors. In gem5, we are > just running DNNMark in gem5. But ultimately what I can recommend is you > start with the base files (e.g., > https://gem5.googlesource.com/public/gem5-resources/+/refs/heads/develop/src/gpu/DNNMark/benchmarks/test_alexnet/test_alexnet.cc) > and the config files (e.g., > https://gem5.googlesource.com/public/gem5-resources/+/refs/heads/develop/src/gpu/DNNMark/config_example/alexnet.dnnmark) > and go from there. When I started with DNNMark, I would observe the LOG > prints it prints to the screen, then grep for those prints and examine the > code. > > 5. What is "ruby memory" -- is this L1, L2, or main memory size? > Something else? There are documents like this: > https://www.gem5.org/2020/06/01/towards-full.html, > https://www.gem5.org/2020/05/30/enabling-multi-gpu.html, > https://www.gem5.org/2020/05/27/modern-gpu-applications.html, and > https://www.gem5.org/documentation/general_docs/gpu_models/GCN3. The GPU > Ruby system uses the same building blocks as the CPU Ruby models: > https://www.gem5.org/documentation/learning_gem5/part3/MSIintro/. Not > sure what exactly you are looking for though. > > Thanks, > Matt > > On Tue, May 9, 2023 at 4:34 AM 429442672 <429442672@qq.com> wrote: > >> >> hi everyone, >> >> I have successfully built and ran DNNMark using the command: >> >> sudo docker run --rm -v ${PWD}:${PWD} -v >> ${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 >> -w ${PWD} gcn-gpu >> gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py -n 3 >> --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_softmax >> -cdnnmark_test_fwd_softmax >> --options="-config >> gem5-resources/src/gpu/DNNMark/config_example/softmax_config.dnnmark -mmap >> gem5-resources/src/gpu/DNNMark/mmap.bin" >> >> with the output >> >> Exiting because exiting with last active thread context >> >> which may means i have correctly made the running environment. >> >> >> However, i tried several benchmarks in >> >> >> but meet following problems: >> >> 1. problem on running test_fwd_fc >> >> When i run test_fwd_fc using: >> >> sudo docker run --rm -v ${PWD}:${PWD} -v >> ${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 >> -w ${PWD} gcn-gpu gem5/build/GCN3_X86/gem5.opt >> gem5/configs/example/apu_se.py -n3 >> --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_fc >> -c dnnmark_test_fwd_fc >> --options="-config >> gem5-resources/src/gpu/DNNMark/config_example/fc_config.dnnmark -mmap >> gem5-resources/src/gpu/DNNMark/DNNMark_data.dat" >> >> the problem is running for a few hours, even though i have modify the >> input data (mmap.bin -> DNNMark_data.dat) to a smaller size 300MB (2GB >> in default). >> I have also tried several benchmarks, the only benchmark i done is the >> test_fwd_pool and test_bwd_pool, when i ran benchmarks such as >> conv、pool、fc, the program will be blocked, with out any output. >> Is there anything i did wrong here? or these benchmards are too >> compute-intensive to run, leading to slow running? >> May i ask for any suggestion for running these benchmarks? >> >> 2. problem on running test_VGG and test_alexnet. >> >> I run them with the commands: >> >> sudo docker run --rm -v ${PWD}:${PWD} -v >> ${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 >> -w ${PWD} gcn-gpu gem5/build/GCN3_X86/gem5.opt >> gem5/configs/example/apu_se.py -n3 >> --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_alexnet >> -c dnnmark_test_alexnet >> --options="-config >> gem5-resources/src/gpu/DNNMark/config_example/alexnet.dnnmark -mmap >> gem5-resources/src/gpu/DNNMark/mmap.bin" >> >> and >> >> sudo docker run --rm -v ${PWD}:${PWD} -v >> ${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 >> -w ${PWD} gcn-gpu gem5/build/GCN3_X86/gem5.opt >> gem5/configs/example/apu_se.py -n3 >> --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_VGG >> -c dnnmark_test_VGG >> --options="-config >> gem5-resources/src/gpu/DNNMark/config_example/VGG.dnnmark -mmap >> gem5-resources/src/gpu/DNNMark/mmap.bin" >> >> >> but they are also blocked. >> May i ask for any suggestion for running these benchmarks? >> >> 3. question on modifying DNN network. >> >> May i ask how to modify the DNN network architecture? For example, is it >> possible to make a transformer block and run it on gem5? It seems that i >> can change the configures in /DNNMark/config_example following the >> example of alexnet.dnnmark, without modifing the code in >> DNNMark/benchmarks/test_alexnet. May i ask is that correct? >> >> 4. How can i get trace on running DNNMark. >> >> Running DNNMark seem like a block box. It is possible to get the trace of >> running DNNMark? For example, the process of data loading, computing, etc. >> >> 5. question on apu_se.py >> >> It seem that all the benchmarks require apu_se.py. May i ask is there >> any more detailed documents to introduce what this apu_se.py did and how >> to modify it?For example,how can i add more ruby memory to the gpu. >> >> >> >> >> The documents and introduction for gem5 gcn gpu is pretty few, if it is >> possible, could any one provide some help for me? >> >> Thank you all very much! >> >
4
429442672
Fri, May 12, 2023 3:01 AM

Dear Matt,
     When i run the benchmarks: dnnmark_test_VGG, dnnmark_test_alexnet, dnnmark_test_fwd_conv, and i got the same error like this (get Invalid filter channel number):

build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall fdatasync(...)
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall fdatasync(...)
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall fdatasync(...)
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall fdatasync(...)
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
MIOpen(HIP): Warning [ParseAndLoadDb] File is unreadable: /opt/rocm-4.0.1/miopen/share/miopen/db/gfx801_4.HIP.fdb.txt
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
MIOpen Error: /root/driver/MLOpen/src/ocl/convolutionocl.cpp:150: Invalid filter channel number
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
MIOpen Error: /root/driver/MLOpen/src/ocl/convolutionocl.cpp:150: Invalid filter channel number
MIOpen Error: 3 at /home/tang/gem5-resources/src/gpu/DNNMark/core/include/dnn_utility.h1057Ticks: 327510244000
Exiting because  exiting with last active thread context

My command line is:
gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py
-n 8 --mem-size=12GB --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_conv -c dnnmark_test_fwd_conv
--options="-config gem5-resources/src/gpu/DNNMark/config_example/conv_config.dnnmark -mmap gem5-resources/src/gpu/DNNMark/mmap.bin"

and i didn't change the setup of the default setup of conv_config.dnnmark and gfx801

May i ask, did i do something wrong here?
Have you ever test those benchmark without error, and could you please show me several your configurations?

Thank you so much!

Dear&nbsp;Matt, &nbsp; &nbsp; &nbsp;When i run the benchmarks:&nbsp;dnnmark_test_VGG, dnnmark_test_alexnet, dnnmark_test_fwd_conv, and i got the same error like this (get Invalid filter channel number): build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall fdatasync(...) build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall fdatasync(...) build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall fdatasync(...) build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall fdatasync(...) build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 MIOpen(HIP): Warning [ParseAndLoadDb] File is unreadable: /opt/rocm-4.0.1/miopen/share/miopen/db/gfx801_4.HIP.fdb.txt build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 MIOpen Error: /root/driver/MLOpen/src/ocl/convolutionocl.cpp:150: Invalid filter channel number build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 MIOpen Error: /root/driver/MLOpen/src/ocl/convolutionocl.cpp:150: Invalid filter channel number MIOpen Error: 3 at /home/tang/gem5-resources/src/gpu/DNNMark/core/include/dnn_utility.h1057Ticks: 327510244000 Exiting because exiting with last active thread context My command line is: gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py -n 8 --mem-size=12GB --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_conv -c dnnmark_test_fwd_conv --options="-config gem5-resources/src/gpu/DNNMark/config_example/conv_config.dnnmark -mmap gem5-resources/src/gpu/DNNMark/mmap.bin" and i didn't change the setup of the default setup of conv_config.dnnmark and gfx801 May i ask, did i do something wrong here? Have you ever test those benchmark without error, and could you please show me several your configurations? Thank you so much!
MS
Matt Sinclair
Fri, May 12, 2023 3:42 AM

I have not tried running these specific benchmarks in gem5 personally, so I
cannot say for certain what the error is or even if they are expected to
run to completion in gem5.  But, normally the error you're seeing happens
because you have not created the appropriate "cache" files for the GPU
kernel(s) the program is trying to run.  MIOpen first checks to see if the
desired kernel has been run on your machine before, and if not it tries to
do online compilation of that kernel.  Unfortunately online compilation of
kernels in gem5 is a) very slow and b), because it is very slow, not
supported in gem5 (basically, it is so slow as to not be worth supporting
in many cases, in my opinion).  So, instead, the expectation is that we
build the kernels we want ahead of time, before running the program.  You
may have seen in the examples we provide for DNNMark (
https://resources.gem5.org/resources/dnn-mark) that we have this "generate
cachefiles" script -- that is exactly what the purpose of that script is.
Moreover, on the same webpage, you may have noticed we include the path to
that cache directory in our docker commands (emphasis mine):

docker run --rm -v ${PWD}:${PWD} -v
${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0
-w ${PWD} gcr.io/gem5-test/gcn-gpu:v22-1 gem5/build/GCN3_X86/gem5.opt
gem5/configs/example/apu_se.py -n3
--benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_softmax
-cdnnmark_test_fwd_softmax --options="-config
gem5-resources/src/gpu/DNNMark/config_example/softmax_config.dnnmark
-mmap gem5-resources/src/gpu/DNNMark/mmap.bin"

From looking at your commands, I don't see you including this.  Thus, while
I do not know if that script by default produces the kernels needed for,
say, AlexNet, I strongly suspect you should start by running that script
and updating your docker commands to include the cache stuff ... then see
what happens from there.

Sidenote: normally when I see this:

MIOpen(HIP): Warning [ParseAndLoadDb] File is unreadable:
/opt/rocm-4.0.1/miopen/share/miopen/db/gfx801_4.HIP.fdb.txt

It means that the files MIOpen is expecting are not setup properly.
Normally I just symlink these extra files -- e.g., symlink gfx801_4... from
gfx801_32 ... (this is not the best performing, option because the
resources are different, but provides a basic setup step to avoid problems
like this).
Matt

On Thu, May 11, 2023 at 10:02 PM 429442672 via gem5-dev gem5-dev@gem5.org
wrote:

Dear Matt,
When i run the benchmarks: dnnmark_test_VGG, dnnmark_test_alexnet,
dnnmark_test_fwd_conv, and i got the same error like this (get Invalid
filter channel number):

build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall
fdatasync(...)
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall
fdatasync(...)
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall
fdatasync(...)
build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall
fdatasync(...)
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
MIOpen(HIP): Warning [ParseAndLoadDb] File is unreadable:
/opt/rocm-4.0.1/miopen/share/miopen/db/gfx801_4.HIP.fdb.txt
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
MIOpen Error: /root/driver/MLOpen/src/ocl/convolutionocl.cpp:150: Invalid
filter channel number
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6
MIOpen Error: /root/driver/MLOpen/src/ocl/convolutionocl.cpp:150: Invalid
filter channel number
MIOpen Error: 3 at
/home/tang/gem5-resources/src/gpu/DNNMark/core/include/dnn_utility.h1057Ticks:
327510244000
Exiting because exiting with last active thread context

My command line is:

gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py
-n 8 --mem-size=12GB
--benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_conv
-c dnnmark_test_fwd_conv
--options="-config
gem5-resources/src/gpu/DNNMark/config_example/conv_config.dnnmark -mmap
gem5-resources/src/gpu/DNNMark/mmap.bin"

and i didn't change the setup of the default setup of conv_config.dnnmark
and gfx801

May i ask, did i do something wrong here?
Have you ever test those benchmark without error, and could you please
show me several your configurations?

Thank you so much!


gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-leave@gem5.org

I have not tried running these specific benchmarks in gem5 personally, so I cannot say for certain what the error is or even if they are expected to run to completion in gem5. But, normally the error you're seeing happens because you have not created the appropriate "cache" files for the GPU kernel(s) the program is trying to run. MIOpen first checks to see if the desired kernel has been run on your machine before, and if not it tries to do online compilation of that kernel. Unfortunately online compilation of kernels in gem5 is a) very slow and b), because it is very slow, not supported in gem5 (basically, it is so slow as to not be worth supporting in many cases, in my opinion). So, instead, the expectation is that we build the kernels we want ahead of time, before running the program. You may have seen in the examples we provide for DNNMark ( https://resources.gem5.org/resources/dnn-mark) that we have this "generate cachefiles" script -- that is exactly what the purpose of that script is. Moreover, on the same webpage, you may have noticed we include the path to that cache directory in our docker commands (emphasis mine): docker run --rm -v ${PWD}:${PWD} -v *${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0* -w ${PWD} gcr.io/gem5-test/gcn-gpu:v22-1 gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py -n3 --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_softmax -cdnnmark_test_fwd_softmax --options="-config gem5-resources/src/gpu/DNNMark/config_example/softmax_config.dnnmark -mmap gem5-resources/src/gpu/DNNMark/mmap.bin" From looking at your commands, I don't see you including this. Thus, while I do not know if that script by default produces the kernels needed for, say, AlexNet, I strongly suspect you should start by running that script and updating your docker commands to include the cache stuff ... then see what happens from there. Sidenote: normally when I see this: MIOpen(HIP): Warning [ParseAndLoadDb] File is unreadable: /opt/rocm-4.0.1/miopen/share/miopen/db/gfx801_4.HIP.fdb.txt It means that the files MIOpen is expecting are not setup properly. Normally I just symlink these extra files -- e.g., symlink gfx801_4... from gfx801_32 ... (this is not the best performing, option because the resources are different, but provides a basic setup step to avoid problems like this). Matt On Thu, May 11, 2023 at 10:02 PM 429442672 via gem5-dev <gem5-dev@gem5.org> wrote: > Dear Matt, > When i run the benchmarks: dnnmark_test_VGG, dnnmark_test_alexnet, > dnnmark_test_fwd_conv, and i got the same error like this (get Invalid > filter channel number): > > build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall > fdatasync(...) > build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall > fdatasync(...) > build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall > fdatasync(...) > build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall > fdatasync(...) > build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 > build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 > build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 > build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 > build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 > build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 > build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 > build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 > build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 > build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 > build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 > MIOpen(HIP): Warning [ParseAndLoadDb] File is unreadable: > /opt/rocm-4.0.1/miopen/share/miopen/db/gfx801_4.HIP.fdb.txt > build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 > build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 > MIOpen Error: /root/driver/MLOpen/src/ocl/convolutionocl.cpp:150: Invalid > filter channel number > build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 > build/GCN3_X86/sim/syscall_emul.cc:683: warn: fcntl: unsupported command 6 > MIOpen Error: /root/driver/MLOpen/src/ocl/convolutionocl.cpp:150: Invalid > filter channel number > MIOpen Error: 3 at > /home/tang/gem5-resources/src/gpu/DNNMark/core/include/dnn_utility.h1057Ticks: > 327510244000 > Exiting because exiting with last active thread context > > > My command line is: > > gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py > -n 8 --mem-size=12GB > --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_conv > -c dnnmark_test_fwd_conv > --options="-config > gem5-resources/src/gpu/DNNMark/config_example/conv_config.dnnmark -mmap > gem5-resources/src/gpu/DNNMark/mmap.bin" > > and i didn't change the setup of the default setup of conv_config.dnnmark > and gfx801 > > May i ask, did i do something wrong here? > Have you ever test those benchmark without error, and could you please > show me several your configurations? > > Thank you so much! > > > > _______________________________________________ > gem5-dev mailing list -- gem5-dev@gem5.org > To unsubscribe send an email to gem5-dev-leave@gem5.org >