gem5-dev@gem5.org

The gem5 Developer List

View all threads

EL2 MSR MRS instruction call (Arm-v8a aarch64)

AR
Atul Rahman
Thu, Aug 31, 2023 4:15 PM

Hello,
I am running a benchmark binary compiled with clang with  armv8a+fp+simd+crypto options.
All the workloads of this compiled benchmark have similar performance in gem5 compared to actual mobile device except this one workload (quite simple workload, running Convolutional Neural Network by using C++ code from scratch without using any external library).
I generated tarmac tracing for first few thousand instructions starting from the ROI.
I see that, there are SVC instructions and MSR, MRS instructions at EL2 level. I am failing to understand, why there is no HVC instructions in tarmac tracing log but I am seeing so many MSR and MRS instructions executed at EL2! I do think, this is causing the particular workload to perform poorly. I don’t see any such EL2 instructions for other workloads of the same benchmark,

I am using gem5’s fs_bigLIttle.py script (O3 ARM CPU tuned) for FS simulation.

Any insight on this topic would be very helpful. Thanks.

Sent from Mailhttps://go.microsoft.com/fwlink/?LinkId=550986 for Windows

Hello, I am running a benchmark binary compiled with clang with armv8a+fp+simd+crypto options. All the workloads of this compiled benchmark have similar performance in gem5 compared to actual mobile device except this one workload (quite simple workload, running Convolutional Neural Network by using C++ code from scratch without using any external library). I generated tarmac tracing for first few thousand instructions starting from the ROI. I see that, there are SVC instructions and MSR, MRS instructions at EL2 level. I am failing to understand, why there is no HVC instructions in tarmac tracing log but I am seeing so many MSR and MRS instructions executed at EL2! I do think, this is causing the particular workload to perform poorly. I don’t see any such EL2 instructions for other workloads of the same benchmark, I am using gem5’s fs_bigLIttle.py script (O3 ARM CPU tuned) for FS simulation. Any insight on this topic would be very helpful. Thanks. Sent from Mail<https://go.microsoft.com/fwlink/?LinkId=550986> for Windows
AR
Atul Rahman
Thu, Aug 31, 2023 4:34 PM

Delete the post

Sent from Mailhttps://go.microsoft.com/fwlink/?LinkId=550986 for Windows

From: Atul Rahman via gem5-devmailto:gem5-dev@gem5.org
Sent: Friday, September 1, 2023 1:17 AM
To: gem5-dev@gem5.orgmailto:gem5-dev@gem5.org
Cc: Atul Rahmanmailto:atul.rahman@outlook.com
Subject: [gem5-dev] EL2 MSR MRS instruction call (Arm-v8a aarch64)

Hello,
I am running a benchmark binary compiled with clang with  armv8a+fp+simd+crypto options.
All the workloads of this compiled benchmark have similar performance in gem5 compared to actual mobile device except this one workload (quite simple workload, running Convolutional Neural Network by using C++ code from scratch without using any external library).
I generated tarmac tracing for first few thousand instructions starting from the ROI.
I see that, there are SVC instructions and MSR, MRS instructions at EL2 level. I am failing to understand, why there is no HVC instructions in tarmac tracing log but I am seeing so many MSR and MRS instructions executed at EL2! I do think, this is causing the particular workload to perform poorly. I don’t see any such EL2 instructions for other workloads of the same benchmark,
I am using gem5’s fs_bigLIttle.py script (O3 ARM CPU tuned) for FS simulation.

Any insight on this topic would be very helpful. Thanks.

Sent from Mailhttps://go.microsoft.com/fwlink/?LinkId=550986 for Windows

Delete the post Sent from Mail<https://go.microsoft.com/fwlink/?LinkId=550986> for Windows From: Atul Rahman via gem5-dev<mailto:gem5-dev@gem5.org> Sent: Friday, September 1, 2023 1:17 AM To: gem5-dev@gem5.org<mailto:gem5-dev@gem5.org> Cc: Atul Rahman<mailto:atul.rahman@outlook.com> Subject: [gem5-dev] EL2 MSR MRS instruction call (Arm-v8a aarch64) Hello, I am running a benchmark binary compiled with clang with armv8a+fp+simd+crypto options. All the workloads of this compiled benchmark have similar performance in gem5 compared to actual mobile device except this one workload (quite simple workload, running Convolutional Neural Network by using C++ code from scratch without using any external library). I generated tarmac tracing for first few thousand instructions starting from the ROI. I see that, there are SVC instructions and MSR, MRS instructions at EL2 level. I am failing to understand, why there is no HVC instructions in tarmac tracing log but I am seeing so many MSR and MRS instructions executed at EL2! I do think, this is causing the particular workload to perform poorly. I don’t see any such EL2 instructions for other workloads of the same benchmark, I am using gem5’s fs_bigLIttle.py script (O3 ARM CPU tuned) for FS simulation. Any insight on this topic would be very helpful. Thanks. Sent from Mail<https://go.microsoft.com/fwlink/?LinkId=550986> for Windows