gem5-users@gem5.org

The gem5 Users mailing list

View all threads

Fetch stage too long for some instructions

NN
Nitesh Narayana GS
Tue, Aug 16, 2022 10:50 AM

Hi

I was trying to understand 'mla' and 'sdot' instructions in ARM SVE ISA. I
am using the gem5 pipeline view and 03CPUall debug flags to generate the
trace needed by Konata to create the pipeline view.

I see that mla sometimes takes too many fetch cycles, but sdot almost
always takes the same number of fetch cycles.

Here are the screenshots for reference. It's from an execution where I am
trying to do matrix multiplication.

As you can see on the mla pipeline, it takes about 155 cycles in the fetch
stage. But the sdot on the other hand takes 19 cycles for sdot.

The codes are similar in the sense of functionality.

I am also ruling out the icache miss scenario here ( since there doesn't
seem to be an icache miss and because the program assembly size is minimal).

Any idea whats/why is it happening?

Thanks and in advance!

GS Nitesh Narayana https://nitesh8998.gitlab.io/
Department of Computer Architecture
Polytechnic University of Catalonia Barcelona 2021-2025
Webpage: nitesh8998.gitlab.io

Hi I was trying to understand 'mla' and 'sdot' instructions in ARM SVE ISA. I am using the gem5 pipeline view and 03CPUall debug flags to generate the trace needed by Konata to create the pipeline view. I see that mla sometimes takes too many fetch cycles, but sdot almost always takes the same number of fetch cycles. Here are the screenshots for reference. It's from an execution where I am trying to do matrix multiplication. As you can see on the mla pipeline, it takes about 155 cycles in the fetch stage. But the sdot on the other hand takes 19 cycles for sdot. The codes are similar in the sense of functionality. I am also ruling out the icache miss scenario here ( since there doesn't seem to be an icache miss and because the program assembly size is minimal). Any idea whats/why is it happening? Thanks and in advance! -- *GS Nitesh Narayana <https://nitesh8998.gitlab.io/>* Department of Computer Architecture Polytechnic University of Catalonia Barcelona 2021-2025 Webpage: nitesh8998.gitlab.io
FC
Francisco Carlos
Fri, Aug 19, 2022 1:16 PM

Hello Nitesh.

To me, it seems that probably is a icache miss. Did you check if there is a cache miss or just assumed that is not a cache miss because the assembly is small?

Can you send me the binary file to run locally and check myself what could be?

Best regards


Francisco Carlos Silva Junior
Phd Student at University of Braislia


De: Nitesh Narayana GS nitesh@ac.upc.edu
Enviado: terça-feira, 16 de agosto de 2022 07:50
Para: gem5 users mailing list gem5-users@gem5.org
Assunto: [gem5-users] Fetch stage too long for some instructions

Hi

I was trying to understand 'mla' and 'sdot' instructions in ARM SVE ISA. I am using the gem5 pipeline view and 03CPUall debug flags to generate the trace needed by Konata to create the pipeline view.

I see that mla sometimes takes too many fetch cycles, but sdot almost always takes the same number of fetch cycles.

Here are the screenshots for reference. It's from an execution where I am trying to do matrix multiplication.

As you can see on the mla pipeline, it takes about 155 cycles in the fetch stage. But the sdot on the other hand takes 19 cycles for sdot.

The codes are similar in the sense of functionality.

I am also ruling out the icache miss scenario here ( since there doesn't seem to be an icache miss and because the program assembly size is minimal).

Any idea whats/why is it happening?

Thanks and in advance!

GS Nitesh Narayanahttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fnitesh8998.gitlab.io%2F&data=05%7C01%7C%7C142c670ebe614d5a7a5408da7f75cd02%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637962441172549496%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=zAh%2Br4FW9YyxTTTOWlArq%2BEyjZnKFYD5NxdyEk0lcTA%3D&reserved=0
Department of Computer Architecture
Polytechnic University of Catalonia Barcelona 2021-2025
Webpage: nitesh8998.gitlab.iohttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fnitesh8998.gitlab.io%2F&data=05%7C01%7C%7C142c670ebe614d5a7a5408da7f75cd02%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637962441172549496%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=zAh%2Br4FW9YyxTTTOWlArq%2BEyjZnKFYD5NxdyEk0lcTA%3D&reserved=0

Hello Nitesh. To me, it seems that probably is a icache miss. Did you check if there is a cache miss or just assumed that is not a cache miss because the assembly is small? Can you send me the binary file to run locally and check myself what could be? Best regards ---------------------------------------------------------------------------------------------------------------------------------------------- Francisco Carlos Silva Junior Phd Student at University of Braislia ________________________________ De: Nitesh Narayana GS <nitesh@ac.upc.edu> Enviado: terça-feira, 16 de agosto de 2022 07:50 Para: gem5 users mailing list <gem5-users@gem5.org> Assunto: [gem5-users] Fetch stage too long for some instructions Hi I was trying to understand 'mla' and 'sdot' instructions in ARM SVE ISA. I am using the gem5 pipeline view and 03CPUall debug flags to generate the trace needed by Konata to create the pipeline view. I see that mla sometimes takes too many fetch cycles, but sdot almost always takes the same number of fetch cycles. Here are the screenshots for reference. It's from an execution where I am trying to do matrix multiplication. As you can see on the mla pipeline, it takes about 155 cycles in the fetch stage. But the sdot on the other hand takes 19 cycles for sdot. The codes are similar in the sense of functionality. I am also ruling out the icache miss scenario here ( since there doesn't seem to be an icache miss and because the program assembly size is minimal). Any idea whats/why is it happening? Thanks and in advance! -- GS Nitesh Narayana<https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fnitesh8998.gitlab.io%2F&data=05%7C01%7C%7C142c670ebe614d5a7a5408da7f75cd02%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637962441172549496%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=zAh%2Br4FW9YyxTTTOWlArq%2BEyjZnKFYD5NxdyEk0lcTA%3D&reserved=0> Department of Computer Architecture Polytechnic University of Catalonia Barcelona 2021-2025 Webpage: nitesh8998.gitlab.io<https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fnitesh8998.gitlab.io%2F&data=05%7C01%7C%7C142c670ebe614d5a7a5408da7f75cd02%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637962441172549496%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=zAh%2Br4FW9YyxTTTOWlArq%2BEyjZnKFYD5NxdyEk0lcTA%3D&reserved=0>