Hi everyone ,
I am doing a full system simulation -ARM arch - using fs_bigLITTLE.py
I am using numpy library in my benchmark , which is running on gem5 FS, the
problem I am facing is it takes a lot of time for the benchmark to just
import numpy 3-4 days , yet I don't see it importing it or completing the
import
I using the following command ,
./build/ARM/gem5.opt configs/example/arm/fs_bigLITTLE.py
--kernel=/home/saras/gem5-resources/src/arm-ubuntu/gem5/full_system_images/binaries/vmlinux.arm64
--disk=/home/saras/gem5-resources/src/arm-ubuntu/gem5/full_system_images/disks/arm64-ubuntu-server.img
--caches --cpu-type=atomic --kernel-init=/bin/bash
is it due to the unstable linux environment booted using /bin/bash
I am unable to claim it unstable as I don't get any errors or see any
anomalous behaviours s,it just keeps running the benchmarks which has
import numpy as first statement
I am unable to debug this problem to the root
any help provided would be much appreciated
Thank you in advance
Regards
Saras
On 1/5/2024 1:35 PM, saras nanda via gem5-users wrote:
Hi everyone ,
I am doing a full system simulation -ARM arch - using fs_bigLITTLE.py
I am using numpy library in my benchmark , which is running on gem5 FS, the problem I am facing is it takes a lot of
time for the benchmark to just import numpy 3-4 days , yet I don't see it importing it or completing the import
I using the following command ,
./build/ARM/gem5.opt configs/example/arm/fs_bigLITTLE.py
--kernel=/home/saras/gem5-resources/src/arm-ubuntu/gem5/full_system_images/binaries/vmlinux.arm64
--disk=/home/saras/gem5-resources/src/arm-ubuntu/gem5/full_system_images/disks/arm64-ubuntu-server.img --caches
--cpu-type=atomic --kernel-init=/bin/bash
is it due to the unstable linux environment booted using /bin/bash
I am unable to claim it unstable as I don't get any errors or see any anomalous behaviours s,it just keeps running the
benchmarks which has import numpy as first statement
I am unable to debug this problem to the root
any help provided would be much appreciated
Thank you in advance
You keep posting about this, and I am sorry we don't seem to have an answer.
I have a few comments / questions, though ...
What do you mean by "unstable linux environment booted using /bin/bash"?
The word "unstable" would generally mean something like "prone to unpredictable
failure". Here, I think you mean something a little different, along the lines
of "performs in a way I do not understand."
Rather than running your full benchmark, I wonder if you are able to start
python3, import numpy, then quit, and if so, how long that takes.
On my modern, fairly high speed, laptop, not in gem5, it takes something like
1.8 seconds. Allowing 10,000x slowdown for gem5 simulation of a program (I
would hope the slowdown would not be that bad if you're actually running
AtomicSimple or some similarly faster cpu model), though would mean about 5
hours to simulate.
On a server system I was able to do: perf stat -d python3 -c "import numpy"
and the results was:
Performance counter stats for 'python3 -c import numpy':
979.65 msec task-clock:u # 3.559 CPUs utilized
0 context-switches:u # 0.000 K/sec
0 cpu-migrations:u # 0.000 K/sec
6,201 page-faults:u # 0.006 M/sec
474,329,129 cycles:u # 0.484 GHz
566,694,729 instructions:u # 1.19 insn per cycle
129,243,270 branches:u # 131.928 M/sec
4,391,610 branch-misses:u # 3.40% of all branches
166,903,996 L1-dcache-loads:u # 170.371 M/sec
9,824,995 L1-dcache-load-misses:u # 5.89% of all L1-dcache hits
3,815,564 LLC-loads:u # 3.895 M/sec
103,009 LLC-load-misses:u # 2.70% of all LL-cache hits
0.275233187 seconds time elapsed
0.626476000 seconds user
0.355270000 seconds sys
The most relevant measure may be the 500-600 million instructions needed. To
get a sense of how long this will take under gem5, we need a sense of how many
instructions it can simulate per second. Let's suppose you have a 3 GHz host
processor with the previously mentioned 10000x slowdown in gem5. That would
mean it is as if the simulated cpu is running at 300 KHz. Assuming two cycles
per instruction and no pipelining, you need about 1 to 1.2 billion cycles
simulated. Dividing 1.2 billion by 300,000 gives 4000 seconds of simulation
time, a little over an hour.
Given the roughness of these calculations and differences between my laptop
(which was using WSL under Windows) versus a native Linux installation on the
server, the agreement seems reasonable to me.
Note that this is the amount of time needed after you have booted the OS.
Your benchmark could also be doing a lot of other stuff tat is somehow being
conflated there, too - I am not sure how you are drawing the conclusion that
it is in the process of importing numpy, but I don't mean to question what you
are doing. There could also be something going on here about differences in
details and versions of python, numpy, etc. Lastly, I am giving stats for
x86; ARM could clearly be somewhat different, though unlikely by a factor of
10 (say).
Do you have an actual ARM where you can measure time needed when not in gem5,
for the same application code and OS? That would give a baseline against
which to compare.
Hope maybe there is something here that helps.
EM