Empathy List Archives

gem5-users@gem5.org

The gem5 Users mailing list

Importing numpy library on gem5 FS

saras nanda

Fri, Jan 5, 2024 6:35 PM

Hi everyone ,

I am doing a full system simulation -ARM arch - using fs_bigLITTLE.py
I am using numpy library in my benchmark , which is running on gem5 FS, the
problem I am facing is it takes a lot of time for the benchmark to just
import numpy 3-4 days , yet I don't see it importing it or completing the
import

I using the following command ,

./build/ARM/gem5.opt configs/example/arm/fs_bigLITTLE.py
--kernel=/home/saras/gem5-resources/src/arm-ubuntu/gem5/full_system_images/binaries/vmlinux.arm64
--disk=/home/saras/gem5-resources/src/arm-ubuntu/gem5/full_system_images/disks/arm64-ubuntu-server.img
--caches --cpu-type=atomic --kernel-init=/bin/bash

is it due to the unstable linux environment booted using /bin/bash

I am unable to claim it unstable as I don't get any errors or see any
anomalous behaviours s,it just keeps running the benchmarks which has
import numpy as first statement

I am unable to debug this problem to the root

any help provided would be much appreciated

Thank you in advance

Regards
Saras

Hi everyone , I am doing a full system simulation -ARM arch - using fs_bigLITTLE.py I am using numpy library in my benchmark , which is running on gem5 FS, the problem I am facing is it takes a lot of time for the benchmark to just import numpy 3-4 days , yet I don't see it importing it or completing the import I using the following command , ./build/ARM/gem5.opt configs/example/arm/fs_bigLITTLE.py --kernel=/home/saras/gem5-resources/src/arm-ubuntu/gem5/full_system_images/binaries/vmlinux.arm64 --disk=/home/saras/gem5-resources/src/arm-ubuntu/gem5/full_system_images/disks/arm64-ubuntu-server.img --caches --cpu-type=atomic --kernel-init=/bin/bash is it due to the unstable linux environment booted using /bin/bash I am unable to claim it unstable as I don't get any errors or see any anomalous behaviours s,it just keeps running the benchmarks which has import numpy as first statement I am unable to debug this problem to the root any help provided would be much appreciated Thank you in advance Regards Saras

Eliot Moss

Fri, Jan 5, 2024 7:25 PM

On 1/5/2024 1:35 PM, saras nanda via gem5-users wrote:

Hi everyone ,

I am doing a full system simulation -ARM arch - using fs_bigLITTLE.py
I am using numpy library in my benchmark , which is running on gem5 FS, the problem I am facing is it takes a lot of
time for the benchmark to just import numpy 3-4 days , yet I don't see it importing it or completing the import

I using the following command ,

./build/ARM/gem5.opt configs/example/arm/fs_bigLITTLE.py
--kernel=/home/saras/gem5-resources/src/arm-ubuntu/gem5/full_system_images/binaries/vmlinux.arm64
--disk=/home/saras/gem5-resources/src/arm-ubuntu/gem5/full_system_images/disks/arm64-ubuntu-server.img --caches
--cpu-type=atomic --kernel-init=/bin/bash

is it due to the unstable linux environment booted using /bin/bash

I am unable to claim it unstable as I don't get any errors or see any anomalous behaviours s,it just keeps running the
benchmarks which has import numpy as first statement

I am unable to debug this problem to the root

any help provided would be much appreciated

Thank you in advance

You keep posting about this, and I am sorry we don't seem to have an answer.
I have a few comments / questions, though ...

What do you mean by "unstable linux environment booted using /bin/bash"?

The word "unstable" would generally mean something like "prone to unpredictable
failure". Here, I think you mean something a little different, along the lines
of "performs in a way I do not understand."

Rather than running your full benchmark, I wonder if you are able to start
python3, import numpy, then quit, and if so, how long that takes.

On my modern, fairly high speed, laptop, not in gem5, it takes something like
1.8 seconds. Allowing 10,000x slowdown for gem5 simulation of a program (I
would hope the slowdown would not be that bad if you're actually running
AtomicSimple or some similarly faster cpu model), though would mean about 5
hours to simulate.

On a server system I was able to do: perf stat -d python3 -c "import numpy"
and the results was:

Performance counter stats for 'python3 -c import numpy':

         979.65 msec task-clock:u              #    3.559 CPUs utilized
              0      context-switches:u        #    0.000 K/sec
              0      cpu-migrations:u          #    0.000 K/sec
          6,201      page-faults:u             #    0.006 M/sec
    474,329,129      cycles:u                  #    0.484 GHz
    566,694,729      instructions:u            #    1.19  insn per cycle
    129,243,270      branches:u                #  131.928 M/sec
      4,391,610      branch-misses:u           #    3.40% of all branches
    166,903,996      L1-dcache-loads:u         #  170.371 M/sec
      9,824,995      L1-dcache-load-misses:u   #    5.89% of all L1-dcache hits
      3,815,564      LLC-loads:u               #    3.895 M/sec
        103,009      LLC-load-misses:u         #    2.70% of all LL-cache hits

    0.275233187 seconds time elapsed

    0.626476000 seconds user
    0.355270000 seconds sys

The most relevant measure may be the 500-600 million instructions needed. To
get a sense of how long this will take under gem5, we need a sense of how many
instructions it can simulate per second. Let's suppose you have a 3 GHz host
processor with the previously mentioned 10000x slowdown in gem5. That would
mean it is as if the simulated cpu is running at 300 KHz. Assuming two cycles
per instruction and no pipelining, you need about 1 to 1.2 billion cycles
simulated. Dividing 1.2 billion by 300,000 gives 4000 seconds of simulation
time, a little over an hour.

Given the roughness of these calculations and differences between my laptop
(which was using WSL under Windows) versus a native Linux installation on the
server, the agreement seems reasonable to me.

Note that this is the amount of time needed after you have booted the OS.
Your benchmark could also be doing a lot of other stuff tat is somehow being
conflated there, too - I am not sure how you are drawing the conclusion that
it is in the process of importing numpy, but I don't mean to question what you
are doing. There could also be something going on here about differences in
details and versions of python, numpy, etc. Lastly, I am giving stats for
x86; ARM could clearly be somewhat different, though unlikely by a factor of
10 (say).

Do you have an actual ARM where you can measure time needed when not in gem5,
for the same application code and OS? That would give a baseline against
which to compare.

Hope maybe there is something here that helps.

On 1/5/2024 1:35 PM, saras nanda via gem5-users wrote: > Hi everyone , > > I am doing a full system simulation -ARM arch - using fs_bigLITTLE.py > I am using numpy library in my benchmark , which is running on gem5 FS, the problem I am facing is it takes a lot of > time for the benchmark to just import numpy 3-4 days , yet I don't see it importing it or completing the import > > I using the following command , > > ./build/ARM/gem5.opt configs/example/arm/fs_bigLITTLE.py > --kernel=/home/saras/gem5-resources/src/arm-ubuntu/gem5/full_system_images/binaries/vmlinux.arm64 > --disk=/home/saras/gem5-resources/src/arm-ubuntu/gem5/full_system_images/disks/arm64-ubuntu-server.img --caches > --cpu-type=atomic --kernel-init=/bin/bash > > is it due to the unstable linux environment booted using /bin/bash > > I am unable to claim it unstable as I don't get any errors or see any anomalous behaviours s,it just keeps running the > benchmarks which has import numpy as first statement > > I am unable to debug this problem to the root > > any help provided would be much appreciated > > Thank you in advance You keep posting about this, and I am sorry we don't seem to have an answer. I have a few comments / questions, though ... What do you mean by "unstable linux environment booted using /bin/bash"? The word "unstable" would generally mean something like "prone to unpredictable failure". Here, I think you mean something a little different, along the lines of "performs in a way I do not understand." Rather than running your full benchmark, I wonder if you are able to start python3, import numpy, then quit, and if so, how long that takes. On my modern, fairly high speed, laptop, not in gem5, it takes something like 1.8 seconds. Allowing 10,000x slowdown for gem5 simulation of a program (I would hope the slowdown would not be that bad if you're actually running AtomicSimple or some similarly faster cpu model), though would mean about 5 hours to simulate. On a server system I was able to do: perf stat -d python3 -c "import numpy" and the results was: Performance counter stats for 'python3 -c import numpy': 979.65 msec task-clock:u # 3.559 CPUs utilized 0 context-switches:u # 0.000 K/sec 0 cpu-migrations:u # 0.000 K/sec 6,201 page-faults:u # 0.006 M/sec 474,329,129 cycles:u # 0.484 GHz 566,694,729 instructions:u # 1.19 insn per cycle 129,243,270 branches:u # 131.928 M/sec 4,391,610 branch-misses:u # 3.40% of all branches 166,903,996 L1-dcache-loads:u # 170.371 M/sec 9,824,995 L1-dcache-load-misses:u # 5.89% of all L1-dcache hits 3,815,564 LLC-loads:u # 3.895 M/sec 103,009 LLC-load-misses:u # 2.70% of all LL-cache hits 0.275233187 seconds time elapsed 0.626476000 seconds user 0.355270000 seconds sys The most relevant measure may be the 500-600 million instructions needed. To get a sense of how long this will take under gem5, we need a sense of how many instructions it can simulate per second. Let's suppose you have a 3 GHz host processor with the previously mentioned 10000x slowdown in gem5. That would mean it is as if the simulated cpu is running at 300 KHz. Assuming two cycles per instruction and no pipelining, you need about 1 to 1.2 billion cycles simulated. Dividing 1.2 billion by 300,000 gives 4000 seconds of simulation time, a little over an hour. Given the roughness of these calculations and differences between my laptop (which was using WSL under Windows) versus a native Linux installation on the server, the agreement seems reasonable to me. Note that this is the amount of time needed after you have booted the OS. Your benchmark could also be doing a lot of other stuff tat is somehow being conflated there, too - I am not sure how you are drawing the conclusion that it is in the process of importing numpy, but I don't mean to question what you are doing. There could also be something going on here about differences in details and versions of python, numpy, etc. Lastly, I am giving stats for x86; ARM could clearly be somewhat different, though unlikely by a factor of 10 (say). Do you have an actual ARM where you can measure time needed when not in gem5, for the same application code and OS? That would give a baseline against which to compare. Hope maybe there is something here that helps. EM