gem5-users@gem5.org

The gem5 Users mailing list

View all threads

HPCG on RISCV

Νικόλαος Ταμπουρατζής
Wed, Oct 5, 2022 9:24 PM

In my previous results, I had used double (not float) for the
following variables: result, sq_i and sq_j. In the case of float
instead of double I get "nan" and not 0.000000.

Quoting Νικόλαος Ταμπουρατζής ntampouratzis@ece.auth.gr:

Dear Jason, all,

I am trying to find the accuracy problem with RISCV-FS and I observe
that the problem is created (at least in my dummy example) because
the variables (double) are set to zero in random simulated time (for
this reason I get different results among executions of the same
code). Specifically for the following dummy code:

#include <cmath>
#include <stdio.h>

int main(){

 int dim = 10;

 float result;

 for (int iter = 0; iter < 2; iter++){
     result = 0;
     for (int i = 0; i < dim; i++){
         for (int j = 0; j < dim; j++){
             float sq_i = sqrt(i);
             float sq_j = sqrt(j);
             result += sq_i * sq_j;
             printf("ITER: %d | i: %d | j: %d Result(i: %f | j:  

%f | i*j: %f): %f\n", iter, i , j, sq_i, sq_j, sq_i * sq_j, result);
}
}
printf("Final Result: %lf\n", result);
}
}

The correct Final Result in both iterations is 372.721656. However,
I get the following results in FS:

ITER: 0 | i: 0 | j: 0 Result(i: 0.000000 | j: 0.000000 | ij:
0.000000): 0.000000
ITER: 0 | i: 0 | j: 1 Result(i: 0.000000 | j: 1.000000 | i
j:
0.000000): 0.000000
ITER: 0 | i: 0 | j: 2 Result(i: 0.000000 | j: 1.414214 | ij:
0.000000): 0.000000
ITER: 0 | i: 0 | j: 3 Result(i: 0.000000 | j: 1.732051 | i
j:
0.000000): 0.000000
ITER: 0 | i: 0 | j: 4 Result(i: 0.000000 | j: 2.000000 | ij:
0.000000): 0.000000
ITER: 0 | i: 0 | j: 5 Result(i: 0.000000 | j: 2.236068 | i
j:
0.000000): 0.000000
ITER: 0 | i: 0 | j: 6 Result(i: 0.000000 | j: 2.449490 | ij:
0.000000): 0.000000
ITER: 0 | i: 0 | j: 7 Result(i: 0.000000 | j: 2.645751 | i
j:
0.000000): 0.000000
ITER: 0 | i: 0 | j: 8 Result(i: 0.000000 | j: 2.828427 | ij:
0.000000): 0.000000
ITER: 0 | i: 0 | j: 9 Result(i: 0.000000 | j: 3.000000 | i
j:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 0 Result(i: 1.000000 | j: 0.000000 | ij:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 1 Result(i: 1.000000 | j: 1.000000 | i
j:
1.000000): 1.000000
ITER: 0 | i: 1 | j: 2 Result(i: 1.000000 | j: 1.414214 | ij:
1.414214): 2.414214
ITER: 0 | i: 1 | j: 3 Result(i: 1.000000 | j: 1.732051 | i
j:
1.732051): 4.146264
ITER: 0 | i: 1 | j: 4 Result(i: 0.000000 | j: 2.000000 | ij:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 5 Result(i: 0.000000 | j: 2.236068 | i
j:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 6 Result(i: 0.000000 | j: 2.449490 | ij:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 7 Result(i: 0.000000 | j: 2.645751 | i
j:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 8 Result(i: 0.000000 | j: 2.828427 | ij:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 9 Result(i: 0.000000 | j: 3.000000 | i
j:
0.000000): 0.000000
ITER: 0 | i: 2 | j: 0 Result(i: 1.414214 | j: 0.000000 | ij:
0.000000): 0.000000
ITER: 0 | i: 2 | j: 1 Result(i: 1.414214 | j: 1.000000 | i
j:
1.414214): 1.414214
ITER: 0 | i: 2 | j: 2 Result(i: 1.414214 | j: 1.414214 | ij:
2.000000): 3.414214
ITER: 0 | i: 2 | j: 3 Result(i: 1.414214 | j: 1.732051 | i
j:
2.449490): 5.863703
ITER: 0 | i: 2 | j: 4 Result(i: 1.414214 | j: 2.000000 | ij:
2.828427): 8.692130
ITER: 0 | i: 2 | j: 5 Result(i: 1.414214 | j: 2.236068 | i
j:
3.162278): 11.854408
ITER: 0 | i: 2 | j: 6 Result(i: 1.414214 | j: 2.449490 | ij:
3.464102): 15.318510
ITER: 0 | i: 2 | j: 7 Result(i: 1.414214 | j: 2.645751 | i
j:
3.741657): 19.060167
ITER: 0 | i: 2 | j: 8 Result(i: 1.414214 | j: 2.828427 | ij:
4.000000): 23.060167
ITER: 0 | i: 2 | j: 9 Result(i: 1.414214 | j: 3.000000 | i
j:
4.242641): 27.302808
ITER: 0 | i: 3 | j: 0 Result(i: 1.732051 | j: 0.000000 | ij:
0.000000): 27.302808
ITER: 0 | i: 3 | j: 1 Result(i: 1.732051 | j: 1.000000 | i
j:
1.732051): 29.034859
ITER: 0 | i: 3 | j: 2 Result(i: 1.732051 | j: 1.414214 | ij:
2.449490): 31.484348
ITER: 0 | i: 3 | j: 3 Result(i: 1.732051 | j: 1.732051 | i
j:
3.000000): 34.484348
ITER: 0 | i: 3 | j: 4 Result(i: 1.732051 | j: 2.000000 | ij:
3.464102): 37.948450
ITER: 0 | i: 3 | j: 5 Result(i: 1.732051 | j: 2.236068 | i
j:
3.872983): 41.821433
ITER: 0 | i: 3 | j: 6 Result(i: 1.732051 | j: 2.449490 | ij:
4.242641): 46.064074
ITER: 0 | i: 3 | j: 7 Result(i: 1.732051 | j: 2.645751 | i
j:
4.582576): 50.646650
ITER: 0 | i: 3 | j: 8 Result(i: 1.732051 | j: 2.828427 | ij:
4.898979): 55.545629
ITER: 0 | i: 3 | j: 9 Result(i: 1.732051 | j: 3.000000 | i
j:
5.196152): 60.741782
ITER: 0 | i: 4 | j: 0 Result(i: 2.000000 | j: 0.000000 | ij:
0.000000): 60.741782
ITER: 0 | i: 4 | j: 1 Result(i: 2.000000 | j: 1.000000 | i
j:
2.000000): 62.741782
ITER: 0 | i: 4 | j: 2 Result(i: 2.000000 | j: 1.414214 | ij:
2.828427): 65.570209
ITER: 0 | i: 4 | j: 3 Result(i: 2.000000 | j: 1.732051 | i
j:
3.464102): 69.034310
ITER: 0 | i: 4 | j: 4 Result(i: 2.000000 | j: 2.000000 | ij:
4.000000): 73.034310
ITER: 0 | i: 4 | j: 5 Result(i: 2.000000 | j: 2.236068 | i
j:
4.472136): 77.506446
ITER: 0 | i: 4 | j: 6 Result(i: 2.000000 | j: 2.449490 | ij:
4.898979): 82.405426
ITER: 0 | i: 4 | j: 7 Result(i: 2.000000 | j: 2.645751 | i
j:
5.291503): 87.696928
ITER: 0 | i: 4 | j: 8 Result(i: 2.000000 | j: 2.828427 | ij:
5.656854): 93.353783
ITER: 0 | i: 4 | j: 9 Result(i: 2.000000 | j: 3.000000 | i
j:
6.000000): 99.353783
ITER: 0 | i: 5 | j: 0 Result(i: 2.236068 | j: 0.000000 | ij:
0.000000): 99.353783
ITER: 0 | i: 5 | j: 1 Result(i: 2.236068 | j: 1.000000 | i
j:
2.236068): 101.589851
ITER: 0 | i: 5 | j: 2 Result(i: 2.236068 | j: 1.414214 | ij:
3.162278): 104.752128
ITER: 0 | i: 5 | j: 3 Result(i: 2.236068 | j: 1.732051 | i
j:
3.872983): 108.625112
ITER: 0 | i: 5 | j: 4 Result(i: 2.236068 | j: 2.000000 | ij:
4.472136): 113.097248
ITER: 0 | i: 5 | j: 5 Result(i: 2.236068 | j: 2.236068 | i
j:
5.000000): 118.097248
ITER: 0 | i: 5 | j: 6 Result(i: 2.236068 | j: 2.449490 | ij:
5.477226): 123.574473
ITER: 0 | i: 5 | j: 7 Result(i: 2.236068 | j: 2.645751 | i
j:
5.916080): 129.490553
ITER: 0 | i: 5 | j: 8 Result(i: 2.236068 | j: 2.828427 | ij:
6.324555): 135.815108
ITER: 0 | i: 5 | j: 9 Result(i: 2.236068 | j: 3.000000 | i
j:
6.708204): 142.523312
ITER: 0 | i: 6 | j: 0 Result(i: 2.449490 | j: 0.000000 | ij:
0.000000): 142.523312
ITER: 0 | i: 6 | j: 1 Result(i: 2.449490 | j: 1.000000 | i
j:
2.449490): 144.972802
ITER: 0 | i: 6 | j: 2 Result(i: 2.449490 | j: 1.414214 | ij:
3.464102): 148.436904
ITER: 0 | i: 6 | j: 3 Result(i: 2.449490 | j: 1.732051 | i
j:
4.242641): 152.679544
ITER: 0 | i: 6 | j: 4 Result(i: 2.449490 | j: 2.000000 | ij:
4.898979): 157.578524
ITER: 0 | i: 6 | j: 5 Result(i: 2.449490 | j: 2.236068 | i
j:
5.477226): 163.055749
ITER: 0 | i: 6 | j: 6 Result(i: 2.449490 | j: 2.449490 | ij:
6.000000): 169.055749
ITER: 0 | i: 6 | j: 7 Result(i: 2.449490 | j: 2.645751 | i
j:
6.480741): 175.536490
ITER: 0 | i: 6 | j: 8 Result(i: 2.449490 | j: 2.828427 | ij:
6.928203): 182.464693
ITER: 0 | i: 6 | j: 9 Result(i: 2.449490 | j: 3.000000 | i
j:
7.348469): 189.813162
ITER: 0 | i: 7 | j: 0 Result(i: 2.645751 | j: 0.000000 | ij:
0.000000): 189.813162
ITER: 0 | i: 7 | j: 1 Result(i: 2.645751 | j: 1.000000 | i
j:
2.645751): 192.458914
ITER: 0 | i: 7 | j: 2 Result(i: 2.645751 | j: 1.414214 | ij:
3.741657): 196.200571
ITER: 0 | i: 7 | j: 3 Result(i: 2.645751 | j: 1.732051 | i
j:
4.582576): 200.783147
ITER: 0 | i: 7 | j: 4 Result(i: 2.645751 | j: 2.000000 | ij:
5.291503): 206.074649
ITER: 0 | i: 7 | j: 5 Result(i: 2.645751 | j: 2.236068 | i
j:
5.916080): 211.990729
ITER: 0 | i: 7 | j: 6 Result(i: 2.645751 | j: 2.449490 | ij:
6.480741): 218.471470
ITER: 0 | i: 7 | j: 7 Result(i: 2.645751 | j: 2.645751 | i
j:
7.000000): 225.471470
ITER: 0 | i: 7 | j: 8 Result(i: 2.645751 | j: 2.828427 | ij:
7.483315): 232.954785
ITER: 0 | i: 7 | j: 9 Result(i: 2.645751 | j: 3.000000 | i
j:
7.937254): 240.892039
ITER: 0 | i: 8 | j: 0 Result(i: 2.828427 | j: 0.000000 | ij:
0.000000): 240.892039
ITER: 0 | i: 8 | j: 1 Result(i: 2.828427 | j: 1.000000 | i
j:
2.828427): 243.720466
ITER: 0 | i: 8 | j: 2 Result(i: 2.828427 | j: 1.414214 | ij:
4.000000): 247.720466
ITER: 0 | i: 8 | j: 3 Result(i: 2.828427 | j: 1.732051 | i
j:
4.898979): 252.619445
ITER: 0 | i: 8 | j: 4 Result(i: 2.828427 | j: 2.000000 | ij:
5.656854): 258.276300
ITER: 0 | i: 8 | j: 5 Result(i: 2.828427 | j: 2.236068 | i
j:
6.324555): 264.600855
ITER: 0 | i: 8 | j: 6 Result(i: 2.828427 | j: 2.449490 | ij:
6.928203): 271.529058
ITER: 0 | i: 8 | j: 7 Result(i: 2.828427 | j: 2.645751 | i
j:
7.483315): 279.012373
ITER: 0 | i: 8 | j: 8 Result(i: 2.828427 | j: 2.828427 | ij:
8.000000): 287.012373
ITER: 0 | i: 8 | j: 9 Result(i: 2.828427 | j: 3.000000 | i
j:
8.485281): 295.497654
ITER: 0 | i: 9 | j: 0 Result(i: 3.000000 | j: 0.000000 | ij:
0.000000): 295.497654
ITER: 0 | i: 9 | j: 1 Result(i: 3.000000 | j: 1.000000 | i
j:
3.000000): 298.497654
ITER: 0 | i: 9 | j: 2 Result(i: 3.000000 | j: 1.414214 | ij:
4.242641): 302.740295
ITER: 0 | i: 9 | j: 3 Result(i: 3.000000 | j: 1.732051 | i
j:
5.196152): 307.936447
ITER: 0 | i: 9 | j: 4 Result(i: 3.000000 | j: 2.000000 | ij:
6.000000): 313.936447
ITER: 0 | i: 9 | j: 5 Result(i: 3.000000 | j: 2.236068 | i
j:
6.708204): 320.644651
ITER: 0 | i: 9 | j: 6 Result(i: 3.000000 | j: 2.449490 | ij:
7.348469): 327.993120
ITER: 0 | i: 9 | j: 7 Result(i: 3.000000 | j: 2.645751 | i
j:
7.937254): 335.930374
ITER: 0 | i: 9 | j: 8 Result(i: 3.000000 | j: 2.828427 | ij:
8.485281): 344.415656
ITER: 0 | i: 9 | j: 9 Result(i: 3.000000 | j: 3.000000 | i
j:
9.000000): 353.415656
Final Result: 353.415656
ITER: 1 | i: 0 | j: 0 Result(i: 0.000000 | j: 0.000000 | ij:
0.000000): 0.000000
ITER: 1 | i: 0 | j: 1 Result(i: 0.000000 | j: 1.000000 | i
j:
0.000000): 0.000000
ITER: 1 | i: 0 | j: 2 Result(i: 0.000000 | j: 1.414214 | ij:
0.000000): 0.000000
ITER: 1 | i: 0 | j: 3 Result(i: 0.000000 | j: 1.732051 | i
j:
0.000000): 0.000000
ITER: 1 | i: 0 | j: 4 Result(i: 0.000000 | j: 2.000000 | ij:
0.000000): 0.000000
ITER: 1 | i: 0 | j: 5 Result(i: 0.000000 | j: 2.236068 | i
j:
0.000000): 0.000000
ITER: 1 | i: 0 | j: 6 Result(i: 0.000000 | j: 2.449490 | ij:
0.000000): 0.000000
ITER: 1 | i: 0 | j: 7 Result(i: 0.000000 | j: 2.645751 | i
j:
0.000000): 0.000000
ITER: 1 | i: 0 | j: 8 Result(i: 0.000000 | j: 2.828427 | ij:
0.000000): 0.000000
ITER: 1 | i: 0 | j: 9 Result(i: 0.000000 | j: 3.000000 | i
j:
0.000000): 0.000000
ITER: 1 | i: 1 | j: 0 Result(i: 1.000000 | j: 0.000000 | ij:
0.000000): 0.000000
ITER: 1 | i: 1 | j: 1 Result(i: 1.000000 | j: 1.000000 | i
j:
1.000000): 1.000000
ITER: 1 | i: 1 | j: 2 Result(i: 1.000000 | j: 1.414214 | ij:
1.414214): 2.414214
ITER: 1 | i: 1 | j: 3 Result(i: 1.000000 | j: 1.732051 | i
j:
1.732051): 4.146264
ITER: 1 | i: 1 | j: 4 Result(i: 1.000000 | j: 2.000000 | ij:
2.000000): 6.146264
ITER: 1 | i: 1 | j: 5 Result(i: 1.000000 | j: 2.236068 | i
j:
2.236068): 8.382332
ITER: 1 | i: 1 | j: 6 Result(i: 1.000000 | j: 2.449490 | ij:
2.449490): 10.831822
ITER: 1 | i: 1 | j: 7 Result(i: 1.000000 | j: 2.645751 | i
j:
2.645751): 13.477573
ITER: 1 | i: 1 | j: 8 Result(i: 1.000000 | j: 2.828427 | ij:
2.828427): 16.306001
ITER: 1 | i: 1 | j: 9 Result(i: 1.000000 | j: 3.000000 | i
j:
3.000000): 19.306001
ITER: 1 | i: 2 | j: 0 Result(i: 1.414214 | j: 0.000000 | ij:
0.000000): 19.306001
ITER: 1 | i: 2 | j: 1 Result(i: 1.414214 | j: 1.000000 | i
j:
1.414214): 20.720214
ITER: 1 | i: 2 | j: 2 Result(i: 1.414214 | j: 1.414214 | ij:
2.000000): 22.720214
ITER: 1 | i: 2 | j: 3 Result(i: 1.414214 | j: 1.732051 | i
j:
2.449490): 25.169704
ITER: 1 | i: 2 | j: 4 Result(i: 1.414214 | j: 2.000000 | ij:
2.828427): 27.998131
ITER: 1 | i: 2 | j: 5 Result(i: 1.414214 | j: 2.236068 | i
j:
3.162278): 31.160409
ITER: 1 | i: 2 | j: 6 Result(i: 1.414214 | j: 2.449490 | ij:
3.464102): 34.624510
ITER: 1 | i: 2 | j: 7 Result(i: 1.414214 | j: 2.645751 | i
j:
3.741657): 38.366168
ITER: 1 | i: 2 | j: 8 Result(i: 1.414214 | j: 2.828427 | ij:
4.000000): 42.366168
ITER: 1 | i: 2 | j: 9 Result(i: 1.414214 | j: 3.000000 | i
j:
4.242641): 46.608808
ITER: 1 | i: 3 | j: 0 Result(i: 1.732051 | j: 0.000000 | ij:
0.000000): 46.608808
ITER: 1 | i: 3 | j: 1 Result(i: 1.732051 | j: 1.000000 | i
j:
1.732051): 48.340859
ITER: 1 | i: 3 | j: 2 Result(i: 1.732051 | j: 1.414214 | ij:
2.449490): 50.790349
ITER: 1 | i: 3 | j: 3 Result(i: 1.732051 | j: 1.732051 | i
j:
3.000000): 53.790349
ITER: 1 | i: 3 | j: 4 Result(i: 1.732051 | j: 2.000000 | ij:
3.464102): 57.254450
ITER: 1 | i: 3 | j: 5 Result(i: 1.732051 | j: 2.236068 | i
j:
3.872983): 61.127434
ITER: 1 | i: 3 | j: 6 Result(i: 1.732051 | j: 2.449490 | ij:
4.242641): 65.370075
ITER: 1 | i: 3 | j: 7 Result(i: 1.732051 | j: 2.645751 | i
j:
4.582576): 69.952650
ITER: 1 | i: 3 | j: 8 Result(i: 1.732051 | j: 2.828427 | ij:
4.898979): 74.851630
ITER: 1 | i: 3 | j: 9 Result(i: 1.732051 | j: 3.000000 | i
j:
5.196152): 80.047782
ITER: 1 | i: 4 | j: 0 Result(i: 2.000000 | j: 0.000000 | ij:
0.000000): 80.047782
ITER: 1 | i: 4 | j: 1 Result(i: 2.000000 | j: 1.000000 | i
j:
2.000000): 82.047782
ITER: 1 | i: 4 | j: 2 Result(i: 2.000000 | j: 1.414214 | ij:
2.828427): 84.876209
ITER: 1 | i: 4 | j: 3 Result(i: 2.000000 | j: 1.732051 | i
j:
3.464102): 88.340311
ITER: 1 | i: 4 | j: 4 Result(i: 2.000000 | j: 2.000000 | ij:
4.000000): 92.340311
ITER: 1 | i: 4 | j: 5 Result(i: 2.000000 | j: 2.236068 | i
j:
4.472136): 96.812447
ITER: 1 | i: 4 | j: 6 Result(i: 2.000000 | j: 2.449490 | ij:
4.898979): 101.711426
ITER: 1 | i: 4 | j: 7 Result(i: 2.000000 | j: 2.645751 | i
j:
5.291503): 107.002929
ITER: 1 | i: 4 | j: 8 Result(i: 2.000000 | j: 2.828427 | ij:
5.656854): 112.659783
ITER: 1 | i: 4 | j: 9 Result(i: 2.000000 | j: 3.000000 | i
j:
6.000000): 118.659783
ITER: 1 | i: 5 | j: 0 Result(i: 2.236068 | j: 0.000000 | ij:
0.000000): 118.659783
ITER: 1 | i: 5 | j: 1 Result(i: 2.236068 | j: 1.000000 | i
j:
2.236068): 120.895851
ITER: 1 | i: 5 | j: 2 Result(i: 2.236068 | j: 1.414214 | ij:
3.162278): 124.058129
ITER: 1 | i: 5 | j: 3 Result(i: 2.236068 | j: 1.732051 | i
j:
3.872983): 127.931112
ITER: 1 | i: 5 | j: 4 Result(i: 2.236068 | j: 2.000000 | ij:
4.472136): 132.403248
ITER: 1 | i: 5 | j: 5 Result(i: 2.236068 | j: 2.236068 | i
j:
5.000000): 137.403248
ITER: 1 | i: 5 | j: 6 Result(i: 2.236068 | j: 2.449490 | ij:
5.477226): 142.880474
ITER: 1 | i: 5 | j: 7 Result(i: 2.236068 | j: 2.645751 | i
j:
5.916080): 148.796553
ITER: 1 | i: 5 | j: 8 Result(i: 2.236068 | j: 2.828427 | ij:
6.324555): 155.121109
ITER: 1 | i: 5 | j: 9 Result(i: 2.236068 | j: 3.000000 | i
j:
6.708204): 161.829313
ITER: 1 | i: 6 | j: 0 Result(i: 2.449490 | j: 0.000000 | ij:
0.000000): 161.829313
ITER: 1 | i: 6 | j: 1 Result(i: 2.449490 | j: 1.000000 | i
j:
2.449490): 164.278802
ITER: 1 | i: 6 | j: 2 Result(i: 2.449490 | j: 1.414214 | ij:
3.464102): 167.742904
ITER: 1 | i: 6 | j: 3 Result(i: 2.449490 | j: 1.732051 | i
j:
4.242641): 171.985545
ITER: 1 | i: 6 | j: 4 Result(i: 2.449490 | j: 2.000000 | ij:
4.898979): 176.884524
ITER: 1 | i: 6 | j: 5 Result(i: 2.449490 | j: 2.236068 | i
j:
5.477226): 182.361750
ITER: 1 | i: 6 | j: 6 Result(i: 2.449490 | j: 2.449490 | ij:
6.000000): 188.361750
ITER: 1 | i: 6 | j: 7 Result(i: 2.449490 | j: 2.645751 | i
j:
6.480741): 194.842491
ITER: 1 | i: 6 | j: 8 Result(i: 2.449490 | j: 2.828427 | ij:
6.928203): 201.770694
ITER: 1 | i: 6 | j: 9 Result(i: 2.449490 | j: 3.000000 | i
j:
7.348469): 209.119163
ITER: 1 | i: 7 | j: 0 Result(i: 2.645751 | j: 0.000000 | ij:
0.000000): 209.119163
ITER: 1 | i: 7 | j: 1 Result(i: 2.645751 | j: 1.000000 | i
j:
2.645751): 211.764914
ITER: 1 | i: 7 | j: 2 Result(i: 2.645751 | j: 1.414214 | ij:
3.741657): 215.506572
ITER: 1 | i: 7 | j: 3 Result(i: 2.645751 | j: 1.732051 | i
j:
4.582576): 220.089147
ITER: 1 | i: 7 | j: 4 Result(i: 2.645751 | j: 2.000000 | ij:
5.291503): 225.380650
ITER: 1 | i: 7 | j: 5 Result(i: 2.645751 | j: 2.236068 | i
j:
5.916080): 231.296730
ITER: 1 | i: 7 | j: 6 Result(i: 2.645751 | j: 2.449490 | ij:
6.480741): 237.777470
ITER: 1 | i: 7 | j: 7 Result(i: 2.645751 | j: 2.645751 | i
j:
7.000000): 244.777470
ITER: 1 | i: 7 | j: 8 Result(i: 2.645751 | j: 2.828427 | ij:
7.483315): 252.260785
ITER: 1 | i: 7 | j: 9 Result(i: 2.645751 | j: 3.000000 | i
j:
7.937254): 260.198039
ITER: 1 | i: 8 | j: 0 Result(i: 2.828427 | j: 0.000000 | ij:
0.000000): 260.198039
ITER: 1 | i: 8 | j: 1 Result(i: 2.828427 | j: 1.000000 | i
j:
2.828427): 263.026466
ITER: 1 | i: 8 | j: 2 Result(i: 2.828427 | j: 1.414214 | ij:
4.000000): 267.026466
ITER: 1 | i: 8 | j: 3 Result(i: 2.828427 | j: 1.732051 | i
j:
4.898979): 271.925446
ITER: 1 | i: 8 | j: 4 Result(i: 2.828427 | j: 2.000000 | ij:
5.656854): 277.582300
ITER: 1 | i: 8 | j: 5 Result(i: 2.828427 | j: 2.236068 | i
j:
6.324555): 283.906855
ITER: 1 | i: 8 | j: 6 Result(i: 2.828427 | j: 2.449490 | ij:
6.928203): 290.835059
ITER: 1 | i: 8 | j: 7 Result(i: 2.828427 | j: 2.645751 | i
j:
7.483315): 298.318373
ITER: 1 | i: 8 | j: 8 Result(i: 2.828427 | j: 2.828427 | ij:
8.000000): 306.318373
ITER: 1 | i: 8 | j: 9 Result(i: 2.828427 | j: 3.000000 | i
j:
8.485281): 314.803655
ITER: 1 | i: 9 | j: 0 Result(i: 3.000000 | j: 0.000000 | ij:
0.000000): 314.803655
ITER: 1 | i: 9 | j: 1 Result(i: 3.000000 | j: 1.000000 | i
j:
3.000000): 317.803655
ITER: 1 | i: 9 | j: 2 Result(i: 3.000000 | j: 1.414214 | ij:
4.242641): 322.046295
ITER: 1 | i: 9 | j: 3 Result(i: 3.000000 | j: 1.732051 | i
j:
5.196152): 327.242448
ITER: 1 | i: 9 | j: 4 Result(i: 3.000000 | j: 2.000000 | ij:
6.000000): 333.242448
ITER: 1 | i: 9 | j: 5 Result(i: 3.000000 | j: 2.236068 | i
j:
6.708204): 339.950652
ITER: 1 | i: 9 | j: 6 Result(i: 3.000000 | j: 2.449490 | ij:
7.348469): 347.299121
ITER: 1 | i: 9 | j: 7 Result(i: 3.000000 | j: 2.645751 | i
j:
7.937254): 355.236375
ITER: 1 | i: 9 | j: 8 Result(i: 3.000000 | j: 2.828427 | ij:
8.485281): 363.721656
ITER: 1 | i: 9 | j: 9 Result(i: 3.000000 | j: 3.000000 | i
j:
9.000000): 372.721656
Final Result: 372.721656

As we can see in the following iterations the sqrt(1) as well as the
result is set to zero for some reason.

ITER: 0 | i: 1 | j: 4 Result(i: 0.000000 | j: 2.000000 | ij:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 5 Result(i: 0.000000 | j: 2.236068 | i
j:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 6 Result(i: 0.000000 | j: 2.449490 | ij:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 7 Result(i: 0.000000 | j: 2.645751 | i
j:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 8 Result(i: 0.000000 | j: 2.828427 | ij:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 9 Result(i: 0.000000 | j: 3.000000 | i
j:
0.000000): 0.000000

Please help me to resolve the accuracy issue! I think that it will
be very useful for gem5 community.

To be noticed, I find the correct simulated tick in which the
application started in FS (using m5 dumpstats), and I start the
--debug-start, but the trace file which is generated is 10x larger
than SE mode for the same application. How can I compare them?

Thank you in advance!
Best regards,
Nikos

Quoting Νικόλαος Ταμπουρατζής ntampouratzis@ece.auth.gr:

Dear Jason,

I am trying to use --debug-start but in FS mode it is very
difficult to find the tick on which the application is started!

However, I am writing the following very simple c++ program:

#include <cmath>
#include <stdio.h>

int main(){

int dim = 4096;

double result;

for (int iter = 0; iter < 2; iter++){
    result = 0;
    for (int i = 0; i < dim; i++){
        for (int j = 0; j < dim; j++){
            result += sqrt(i) * sqrt(j);
        }
    }
    printf("Result: %lf\n", result); //Result: 30530733453.127449
}

}

I cross-compile it using: riscv64-linux-gnu-g++ -static -O3 -o
test_riscv test_riscv.cpp

While in X86 (without cross-compilation of course), QEMU-RISCV,
GEM5-SE the result is the same (30530733453.127449), in GEM5-FS the
result is different! In addition, the result is also different
between the 2 iterations.

Please reproduce the error if you want in order to verify my result.
Ηow can the issue be resolved?

Thank you in advance!

Best regards,
Nikos

Quoting Jason Lowe-Power jason@lowepower.com:

Hi Nikos,

You can use --debug-start to start the debugging after some number of
ticks. Also, I would expect that the difference should come up quickly, so
no need to run the program to the end.

For the FS mode one, you will want to just start the trace as the
application starts. This could be a bit of a pain.

I'm not really sure what fundamentally could be different. FS and SE mode
use the exact same code for executing instructions, so I don't think that's
the problem. Have you tried running for smaller inputs or just one
iteration?

Jason

On Wed, Sep 21, 2022 at 9:04 AM Νικόλαος Ταμπουρατζής <
ntampouratzis@ece.auth.gr> wrote:

Dear Bobby,

Iam trying to add --debug-flags=Exec (building the gem5 for gem5.opt
not for gem5.fast which I had) but the debug traces exceed the 20GB
(and it is not finished yet) for less than 1 simulated second. How can
I reduce the size of the debug-flags (or set something more specific)?

In contrast I build the HPCG benchmark with DHPCG_DEBUG flag. If you
want, you can compare these two output files
(hpcg20010909T014640_SE_Mode & HPCG-Benchmark_3.1_FS_Mode). As you can
see, something goes wrong with the accuracy of calculations in FS mode
(benchmark uses double precission). You can find the files here:
http://kition.mhl.tuc.gr:8000/d/68d82f3533/

Best regards,
Nikos

Quoting Jason Lowe-Power jason@lowepower.com:

That's quite odd that it works in SE mode but not FS mode!

I would suggest running with --debug-flags=Exec for both and then

perform a

diff to see how they differ.

Cheers,
Jason

On Tue, Sep 20, 2022 at 2:45 PM Νικόλαος Ταμπουρατζής <
ntampouratzis@ece.auth.gr> wrote:

Dear Bobby,

In QEMU I get the same (correct) results that I get in SE mode
simulation. I get invalid results in FS simulation (in both
riscv-fs.py and riscv-ubuntu-run.py). I cannot access real RISCV
hardware at this moment, however, if you want you may execute my xhpcg
binary (http://kition.mhl.tuc.gr:8000/f/4ca25fdd3c/) with the
following configuration:

./xhpcg --nx=16 --ny=16 --nz=16 --npx=1 --npy=1 --npz=1 --rt=0.1

Please let me know if you have any updates!

Best regards,
Nikos

Quoting Jason Lowe-Power jason@lowepower.com:

Hi Nikos,

I notice you said the following in your original email:

In addition, I used the RISCV Ubuntu image

),

I installed the gcc compiler, compile it (through qemu) and I get
wrong results too.

Is this saying you get the wrong results is QEMU? If so, the bug is in

GCC

or the HPCG workload, not in gem5. If not, I would test in QEMU to

make

sure the binary works there. Another way you could test to see if the
problem is your binary or gem5 would be to run it on real hardware. We

have

access to some RISC-V hardware here at UC Davis, if you don't have

access

to it.

Cheers,
Jason

On Tue, Sep 20, 2022 at 12:58 AM Νικόλαος Ταμπουρατζής <
ntampouratzis@ece.auth.gr> wrote:

Dear Bobby,

  1. I use the original riscv-fs.py which is provided in the latest

gem5

release.
I run the gem5 once (./build/RISCV/gem5.fast -d ./HPCG_FS_results
./configs/example/gem5_library/riscv-fs.py) in order to download the
riscv-bootloader-vmlinux-5.10 and riscv-disk-img.
After this I mount the riscv-disk-img (sudo mount -o loop
riscv-disk-img /mnt), put the xhpcg executable and I do the following
changes in riscv-fs.py to boot the riscv-disk-img with executable:

image = CustomDiskImageResource(
local_path = "/home/cossim/.cache/gem5/riscv-disk-img",
)

Set the Full System workload.

board.set_kernel_disk_workload(
kernel=Resource("riscv-bootloader-vmlinux-5.10"),
disk_image=image,
)

Finally, in the gem5/src/python/gem5/components/boards/riscv_board.py
I change the last line to "return ["console=ttyS0",
"root={root_value}", "rw"]" in order to allow the write permissions

in

the image.

  1. The HPCG benchmark after some iterations calculates if the results
    are valid or not valid. In the case of FS it gives invalid results.

As

I see from the results, one (at least) problem is that produces
different results in each HPCG execution (with the same

configuration).

Here is the HPCG output and riscv-fs.py
(http://kition.mhl.tuc.gr:8000/d/68d82f3533/). You may reproduce the
results in the video if you use the xhpcg executable
(http://kition.mhl.tuc.gr:8000/f/4ca25fdd3c/)

Please help me in order to solve it!

Finally, I get invalid results in the HPL benchmark in FS mode too.

Best regards,
Nikos

Quoting Bobby Bruce bbruce@ucdavis.edu:

I'm going to need a bit more information to help:

  1. In what way have you modified
    ./configs/example/gem5_library/riscv-fs.py? Can you attach the

script

here?

  1. What error are you getting or in what way are the results

invalid?

Dr. Bobby R. Bruce
Room 3050,
Kemper Hall, UC Davis
Davis,
CA, 95616

web: https://www.bobbybruce.net

On Mon, Sep 19, 2022 at 1:43 PM Νικόλαος Ταμπουρατζής <
ntampouratzis@ece.auth.gr> wrote:

Dear gem5 community,

I have successfully cross-compile the HPCG benchmark for RISCV

(Serial

version, without MPI and OpenMP). While it working properly in

gem5

SE

mode (./build/RISCV/gem5.fast -d ./HPCG_SE_results
./configs/example/se.py -c xhpcg --options '--nx=16 --ny=16

--nz=16

--npx=1 --npy=1 --npz=1 --rt=0.1'), I get invalid results in FS
simulation using "./build/RISCV/gem5.fast -d ./HPCG_FS_results
./configs/example/gem5_library/riscv-fs.py" (I mount the riscv

image

and put it).

Can you help me please?

In addition, I used the RISCV Ubuntu image
(

),

I installed the gcc compiler, compile it (through qemu) and I get
wrong results too.

Here is the Makefile which I use, the hpcg executable for RISCV
(xhpcg), and a video that shows the results
(http://kition.mhl.tuc.gr:8000/f/4ca25fdd3c/).

P.S. I use the latest gem5 version.

Thank you in advance! :)

Best regards,
Nikos


gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-leave@gem5.org


gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-leave@gem5.org


gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-leave@gem5.org


gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-leave@gem5.org


gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-leave@gem5.org


gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-leave@gem5.org

In my previous results, I had used double (not float) for the following variables: result, sq_i and sq_j. In the case of float instead of double I get "nan" and not 0.000000. Quoting Νικόλαος Ταμπουρατζής <ntampouratzis@ece.auth.gr>: > Dear Jason, all, > > I am trying to find the accuracy problem with RISCV-FS and I observe > that the problem is created (at least in my dummy example) because > the variables (double) are set to zero in random simulated time (for > this reason I get different results among executions of the same > code). Specifically for the following dummy code: > > > #include <cmath> > #include <stdio.h> > > int main(){ > > int dim = 10; > > float result; > > for (int iter = 0; iter < 2; iter++){ > result = 0; > for (int i = 0; i < dim; i++){ > for (int j = 0; j < dim; j++){ > float sq_i = sqrt(i); > float sq_j = sqrt(j); > result += sq_i * sq_j; > printf("ITER: %d | i: %d | j: %d Result(i: %f | j: > %f | i*j: %f): %f\n", iter, i , j, sq_i, sq_j, sq_i * sq_j, result); > } > } > printf("Final Result: %lf\n", result); > } > } > > > The correct Final Result in both iterations is 372.721656. However, > I get the following results in FS: > > ITER: 0 | i: 0 | j: 0 Result(i: 0.000000 | j: 0.000000 | i*j: > 0.000000): 0.000000 > ITER: 0 | i: 0 | j: 1 Result(i: 0.000000 | j: 1.000000 | i*j: > 0.000000): 0.000000 > ITER: 0 | i: 0 | j: 2 Result(i: 0.000000 | j: 1.414214 | i*j: > 0.000000): 0.000000 > ITER: 0 | i: 0 | j: 3 Result(i: 0.000000 | j: 1.732051 | i*j: > 0.000000): 0.000000 > ITER: 0 | i: 0 | j: 4 Result(i: 0.000000 | j: 2.000000 | i*j: > 0.000000): 0.000000 > ITER: 0 | i: 0 | j: 5 Result(i: 0.000000 | j: 2.236068 | i*j: > 0.000000): 0.000000 > ITER: 0 | i: 0 | j: 6 Result(i: 0.000000 | j: 2.449490 | i*j: > 0.000000): 0.000000 > ITER: 0 | i: 0 | j: 7 Result(i: 0.000000 | j: 2.645751 | i*j: > 0.000000): 0.000000 > ITER: 0 | i: 0 | j: 8 Result(i: 0.000000 | j: 2.828427 | i*j: > 0.000000): 0.000000 > ITER: 0 | i: 0 | j: 9 Result(i: 0.000000 | j: 3.000000 | i*j: > 0.000000): 0.000000 > ITER: 0 | i: 1 | j: 0 Result(i: 1.000000 | j: 0.000000 | i*j: > 0.000000): 0.000000 > ITER: 0 | i: 1 | j: 1 Result(i: 1.000000 | j: 1.000000 | i*j: > 1.000000): 1.000000 > ITER: 0 | i: 1 | j: 2 Result(i: 1.000000 | j: 1.414214 | i*j: > 1.414214): 2.414214 > ITER: 0 | i: 1 | j: 3 Result(i: 1.000000 | j: 1.732051 | i*j: > 1.732051): 4.146264 > ITER: 0 | i: 1 | j: 4 Result(i: 0.000000 | j: 2.000000 | i*j: > 0.000000): 0.000000 > ITER: 0 | i: 1 | j: 5 Result(i: 0.000000 | j: 2.236068 | i*j: > 0.000000): 0.000000 > ITER: 0 | i: 1 | j: 6 Result(i: 0.000000 | j: 2.449490 | i*j: > 0.000000): 0.000000 > ITER: 0 | i: 1 | j: 7 Result(i: 0.000000 | j: 2.645751 | i*j: > 0.000000): 0.000000 > ITER: 0 | i: 1 | j: 8 Result(i: 0.000000 | j: 2.828427 | i*j: > 0.000000): 0.000000 > ITER: 0 | i: 1 | j: 9 Result(i: 0.000000 | j: 3.000000 | i*j: > 0.000000): 0.000000 > ITER: 0 | i: 2 | j: 0 Result(i: 1.414214 | j: 0.000000 | i*j: > 0.000000): 0.000000 > ITER: 0 | i: 2 | j: 1 Result(i: 1.414214 | j: 1.000000 | i*j: > 1.414214): 1.414214 > ITER: 0 | i: 2 | j: 2 Result(i: 1.414214 | j: 1.414214 | i*j: > 2.000000): 3.414214 > ITER: 0 | i: 2 | j: 3 Result(i: 1.414214 | j: 1.732051 | i*j: > 2.449490): 5.863703 > ITER: 0 | i: 2 | j: 4 Result(i: 1.414214 | j: 2.000000 | i*j: > 2.828427): 8.692130 > ITER: 0 | i: 2 | j: 5 Result(i: 1.414214 | j: 2.236068 | i*j: > 3.162278): 11.854408 > ITER: 0 | i: 2 | j: 6 Result(i: 1.414214 | j: 2.449490 | i*j: > 3.464102): 15.318510 > ITER: 0 | i: 2 | j: 7 Result(i: 1.414214 | j: 2.645751 | i*j: > 3.741657): 19.060167 > ITER: 0 | i: 2 | j: 8 Result(i: 1.414214 | j: 2.828427 | i*j: > 4.000000): 23.060167 > ITER: 0 | i: 2 | j: 9 Result(i: 1.414214 | j: 3.000000 | i*j: > 4.242641): 27.302808 > ITER: 0 | i: 3 | j: 0 Result(i: 1.732051 | j: 0.000000 | i*j: > 0.000000): 27.302808 > ITER: 0 | i: 3 | j: 1 Result(i: 1.732051 | j: 1.000000 | i*j: > 1.732051): 29.034859 > ITER: 0 | i: 3 | j: 2 Result(i: 1.732051 | j: 1.414214 | i*j: > 2.449490): 31.484348 > ITER: 0 | i: 3 | j: 3 Result(i: 1.732051 | j: 1.732051 | i*j: > 3.000000): 34.484348 > ITER: 0 | i: 3 | j: 4 Result(i: 1.732051 | j: 2.000000 | i*j: > 3.464102): 37.948450 > ITER: 0 | i: 3 | j: 5 Result(i: 1.732051 | j: 2.236068 | i*j: > 3.872983): 41.821433 > ITER: 0 | i: 3 | j: 6 Result(i: 1.732051 | j: 2.449490 | i*j: > 4.242641): 46.064074 > ITER: 0 | i: 3 | j: 7 Result(i: 1.732051 | j: 2.645751 | i*j: > 4.582576): 50.646650 > ITER: 0 | i: 3 | j: 8 Result(i: 1.732051 | j: 2.828427 | i*j: > 4.898979): 55.545629 > ITER: 0 | i: 3 | j: 9 Result(i: 1.732051 | j: 3.000000 | i*j: > 5.196152): 60.741782 > ITER: 0 | i: 4 | j: 0 Result(i: 2.000000 | j: 0.000000 | i*j: > 0.000000): 60.741782 > ITER: 0 | i: 4 | j: 1 Result(i: 2.000000 | j: 1.000000 | i*j: > 2.000000): 62.741782 > ITER: 0 | i: 4 | j: 2 Result(i: 2.000000 | j: 1.414214 | i*j: > 2.828427): 65.570209 > ITER: 0 | i: 4 | j: 3 Result(i: 2.000000 | j: 1.732051 | i*j: > 3.464102): 69.034310 > ITER: 0 | i: 4 | j: 4 Result(i: 2.000000 | j: 2.000000 | i*j: > 4.000000): 73.034310 > ITER: 0 | i: 4 | j: 5 Result(i: 2.000000 | j: 2.236068 | i*j: > 4.472136): 77.506446 > ITER: 0 | i: 4 | j: 6 Result(i: 2.000000 | j: 2.449490 | i*j: > 4.898979): 82.405426 > ITER: 0 | i: 4 | j: 7 Result(i: 2.000000 | j: 2.645751 | i*j: > 5.291503): 87.696928 > ITER: 0 | i: 4 | j: 8 Result(i: 2.000000 | j: 2.828427 | i*j: > 5.656854): 93.353783 > ITER: 0 | i: 4 | j: 9 Result(i: 2.000000 | j: 3.000000 | i*j: > 6.000000): 99.353783 > ITER: 0 | i: 5 | j: 0 Result(i: 2.236068 | j: 0.000000 | i*j: > 0.000000): 99.353783 > ITER: 0 | i: 5 | j: 1 Result(i: 2.236068 | j: 1.000000 | i*j: > 2.236068): 101.589851 > ITER: 0 | i: 5 | j: 2 Result(i: 2.236068 | j: 1.414214 | i*j: > 3.162278): 104.752128 > ITER: 0 | i: 5 | j: 3 Result(i: 2.236068 | j: 1.732051 | i*j: > 3.872983): 108.625112 > ITER: 0 | i: 5 | j: 4 Result(i: 2.236068 | j: 2.000000 | i*j: > 4.472136): 113.097248 > ITER: 0 | i: 5 | j: 5 Result(i: 2.236068 | j: 2.236068 | i*j: > 5.000000): 118.097248 > ITER: 0 | i: 5 | j: 6 Result(i: 2.236068 | j: 2.449490 | i*j: > 5.477226): 123.574473 > ITER: 0 | i: 5 | j: 7 Result(i: 2.236068 | j: 2.645751 | i*j: > 5.916080): 129.490553 > ITER: 0 | i: 5 | j: 8 Result(i: 2.236068 | j: 2.828427 | i*j: > 6.324555): 135.815108 > ITER: 0 | i: 5 | j: 9 Result(i: 2.236068 | j: 3.000000 | i*j: > 6.708204): 142.523312 > ITER: 0 | i: 6 | j: 0 Result(i: 2.449490 | j: 0.000000 | i*j: > 0.000000): 142.523312 > ITER: 0 | i: 6 | j: 1 Result(i: 2.449490 | j: 1.000000 | i*j: > 2.449490): 144.972802 > ITER: 0 | i: 6 | j: 2 Result(i: 2.449490 | j: 1.414214 | i*j: > 3.464102): 148.436904 > ITER: 0 | i: 6 | j: 3 Result(i: 2.449490 | j: 1.732051 | i*j: > 4.242641): 152.679544 > ITER: 0 | i: 6 | j: 4 Result(i: 2.449490 | j: 2.000000 | i*j: > 4.898979): 157.578524 > ITER: 0 | i: 6 | j: 5 Result(i: 2.449490 | j: 2.236068 | i*j: > 5.477226): 163.055749 > ITER: 0 | i: 6 | j: 6 Result(i: 2.449490 | j: 2.449490 | i*j: > 6.000000): 169.055749 > ITER: 0 | i: 6 | j: 7 Result(i: 2.449490 | j: 2.645751 | i*j: > 6.480741): 175.536490 > ITER: 0 | i: 6 | j: 8 Result(i: 2.449490 | j: 2.828427 | i*j: > 6.928203): 182.464693 > ITER: 0 | i: 6 | j: 9 Result(i: 2.449490 | j: 3.000000 | i*j: > 7.348469): 189.813162 > ITER: 0 | i: 7 | j: 0 Result(i: 2.645751 | j: 0.000000 | i*j: > 0.000000): 189.813162 > ITER: 0 | i: 7 | j: 1 Result(i: 2.645751 | j: 1.000000 | i*j: > 2.645751): 192.458914 > ITER: 0 | i: 7 | j: 2 Result(i: 2.645751 | j: 1.414214 | i*j: > 3.741657): 196.200571 > ITER: 0 | i: 7 | j: 3 Result(i: 2.645751 | j: 1.732051 | i*j: > 4.582576): 200.783147 > ITER: 0 | i: 7 | j: 4 Result(i: 2.645751 | j: 2.000000 | i*j: > 5.291503): 206.074649 > ITER: 0 | i: 7 | j: 5 Result(i: 2.645751 | j: 2.236068 | i*j: > 5.916080): 211.990729 > ITER: 0 | i: 7 | j: 6 Result(i: 2.645751 | j: 2.449490 | i*j: > 6.480741): 218.471470 > ITER: 0 | i: 7 | j: 7 Result(i: 2.645751 | j: 2.645751 | i*j: > 7.000000): 225.471470 > ITER: 0 | i: 7 | j: 8 Result(i: 2.645751 | j: 2.828427 | i*j: > 7.483315): 232.954785 > ITER: 0 | i: 7 | j: 9 Result(i: 2.645751 | j: 3.000000 | i*j: > 7.937254): 240.892039 > ITER: 0 | i: 8 | j: 0 Result(i: 2.828427 | j: 0.000000 | i*j: > 0.000000): 240.892039 > ITER: 0 | i: 8 | j: 1 Result(i: 2.828427 | j: 1.000000 | i*j: > 2.828427): 243.720466 > ITER: 0 | i: 8 | j: 2 Result(i: 2.828427 | j: 1.414214 | i*j: > 4.000000): 247.720466 > ITER: 0 | i: 8 | j: 3 Result(i: 2.828427 | j: 1.732051 | i*j: > 4.898979): 252.619445 > ITER: 0 | i: 8 | j: 4 Result(i: 2.828427 | j: 2.000000 | i*j: > 5.656854): 258.276300 > ITER: 0 | i: 8 | j: 5 Result(i: 2.828427 | j: 2.236068 | i*j: > 6.324555): 264.600855 > ITER: 0 | i: 8 | j: 6 Result(i: 2.828427 | j: 2.449490 | i*j: > 6.928203): 271.529058 > ITER: 0 | i: 8 | j: 7 Result(i: 2.828427 | j: 2.645751 | i*j: > 7.483315): 279.012373 > ITER: 0 | i: 8 | j: 8 Result(i: 2.828427 | j: 2.828427 | i*j: > 8.000000): 287.012373 > ITER: 0 | i: 8 | j: 9 Result(i: 2.828427 | j: 3.000000 | i*j: > 8.485281): 295.497654 > ITER: 0 | i: 9 | j: 0 Result(i: 3.000000 | j: 0.000000 | i*j: > 0.000000): 295.497654 > ITER: 0 | i: 9 | j: 1 Result(i: 3.000000 | j: 1.000000 | i*j: > 3.000000): 298.497654 > ITER: 0 | i: 9 | j: 2 Result(i: 3.000000 | j: 1.414214 | i*j: > 4.242641): 302.740295 > ITER: 0 | i: 9 | j: 3 Result(i: 3.000000 | j: 1.732051 | i*j: > 5.196152): 307.936447 > ITER: 0 | i: 9 | j: 4 Result(i: 3.000000 | j: 2.000000 | i*j: > 6.000000): 313.936447 > ITER: 0 | i: 9 | j: 5 Result(i: 3.000000 | j: 2.236068 | i*j: > 6.708204): 320.644651 > ITER: 0 | i: 9 | j: 6 Result(i: 3.000000 | j: 2.449490 | i*j: > 7.348469): 327.993120 > ITER: 0 | i: 9 | j: 7 Result(i: 3.000000 | j: 2.645751 | i*j: > 7.937254): 335.930374 > ITER: 0 | i: 9 | j: 8 Result(i: 3.000000 | j: 2.828427 | i*j: > 8.485281): 344.415656 > ITER: 0 | i: 9 | j: 9 Result(i: 3.000000 | j: 3.000000 | i*j: > 9.000000): 353.415656 > Final Result: 353.415656 > ITER: 1 | i: 0 | j: 0 Result(i: 0.000000 | j: 0.000000 | i*j: > 0.000000): 0.000000 > ITER: 1 | i: 0 | j: 1 Result(i: 0.000000 | j: 1.000000 | i*j: > 0.000000): 0.000000 > ITER: 1 | i: 0 | j: 2 Result(i: 0.000000 | j: 1.414214 | i*j: > 0.000000): 0.000000 > ITER: 1 | i: 0 | j: 3 Result(i: 0.000000 | j: 1.732051 | i*j: > 0.000000): 0.000000 > ITER: 1 | i: 0 | j: 4 Result(i: 0.000000 | j: 2.000000 | i*j: > 0.000000): 0.000000 > ITER: 1 | i: 0 | j: 5 Result(i: 0.000000 | j: 2.236068 | i*j: > 0.000000): 0.000000 > ITER: 1 | i: 0 | j: 6 Result(i: 0.000000 | j: 2.449490 | i*j: > 0.000000): 0.000000 > ITER: 1 | i: 0 | j: 7 Result(i: 0.000000 | j: 2.645751 | i*j: > 0.000000): 0.000000 > ITER: 1 | i: 0 | j: 8 Result(i: 0.000000 | j: 2.828427 | i*j: > 0.000000): 0.000000 > ITER: 1 | i: 0 | j: 9 Result(i: 0.000000 | j: 3.000000 | i*j: > 0.000000): 0.000000 > ITER: 1 | i: 1 | j: 0 Result(i: 1.000000 | j: 0.000000 | i*j: > 0.000000): 0.000000 > ITER: 1 | i: 1 | j: 1 Result(i: 1.000000 | j: 1.000000 | i*j: > 1.000000): 1.000000 > ITER: 1 | i: 1 | j: 2 Result(i: 1.000000 | j: 1.414214 | i*j: > 1.414214): 2.414214 > ITER: 1 | i: 1 | j: 3 Result(i: 1.000000 | j: 1.732051 | i*j: > 1.732051): 4.146264 > ITER: 1 | i: 1 | j: 4 Result(i: 1.000000 | j: 2.000000 | i*j: > 2.000000): 6.146264 > ITER: 1 | i: 1 | j: 5 Result(i: 1.000000 | j: 2.236068 | i*j: > 2.236068): 8.382332 > ITER: 1 | i: 1 | j: 6 Result(i: 1.000000 | j: 2.449490 | i*j: > 2.449490): 10.831822 > ITER: 1 | i: 1 | j: 7 Result(i: 1.000000 | j: 2.645751 | i*j: > 2.645751): 13.477573 > ITER: 1 | i: 1 | j: 8 Result(i: 1.000000 | j: 2.828427 | i*j: > 2.828427): 16.306001 > ITER: 1 | i: 1 | j: 9 Result(i: 1.000000 | j: 3.000000 | i*j: > 3.000000): 19.306001 > ITER: 1 | i: 2 | j: 0 Result(i: 1.414214 | j: 0.000000 | i*j: > 0.000000): 19.306001 > ITER: 1 | i: 2 | j: 1 Result(i: 1.414214 | j: 1.000000 | i*j: > 1.414214): 20.720214 > ITER: 1 | i: 2 | j: 2 Result(i: 1.414214 | j: 1.414214 | i*j: > 2.000000): 22.720214 > ITER: 1 | i: 2 | j: 3 Result(i: 1.414214 | j: 1.732051 | i*j: > 2.449490): 25.169704 > ITER: 1 | i: 2 | j: 4 Result(i: 1.414214 | j: 2.000000 | i*j: > 2.828427): 27.998131 > ITER: 1 | i: 2 | j: 5 Result(i: 1.414214 | j: 2.236068 | i*j: > 3.162278): 31.160409 > ITER: 1 | i: 2 | j: 6 Result(i: 1.414214 | j: 2.449490 | i*j: > 3.464102): 34.624510 > ITER: 1 | i: 2 | j: 7 Result(i: 1.414214 | j: 2.645751 | i*j: > 3.741657): 38.366168 > ITER: 1 | i: 2 | j: 8 Result(i: 1.414214 | j: 2.828427 | i*j: > 4.000000): 42.366168 > ITER: 1 | i: 2 | j: 9 Result(i: 1.414214 | j: 3.000000 | i*j: > 4.242641): 46.608808 > ITER: 1 | i: 3 | j: 0 Result(i: 1.732051 | j: 0.000000 | i*j: > 0.000000): 46.608808 > ITER: 1 | i: 3 | j: 1 Result(i: 1.732051 | j: 1.000000 | i*j: > 1.732051): 48.340859 > ITER: 1 | i: 3 | j: 2 Result(i: 1.732051 | j: 1.414214 | i*j: > 2.449490): 50.790349 > ITER: 1 | i: 3 | j: 3 Result(i: 1.732051 | j: 1.732051 | i*j: > 3.000000): 53.790349 > ITER: 1 | i: 3 | j: 4 Result(i: 1.732051 | j: 2.000000 | i*j: > 3.464102): 57.254450 > ITER: 1 | i: 3 | j: 5 Result(i: 1.732051 | j: 2.236068 | i*j: > 3.872983): 61.127434 > ITER: 1 | i: 3 | j: 6 Result(i: 1.732051 | j: 2.449490 | i*j: > 4.242641): 65.370075 > ITER: 1 | i: 3 | j: 7 Result(i: 1.732051 | j: 2.645751 | i*j: > 4.582576): 69.952650 > ITER: 1 | i: 3 | j: 8 Result(i: 1.732051 | j: 2.828427 | i*j: > 4.898979): 74.851630 > ITER: 1 | i: 3 | j: 9 Result(i: 1.732051 | j: 3.000000 | i*j: > 5.196152): 80.047782 > ITER: 1 | i: 4 | j: 0 Result(i: 2.000000 | j: 0.000000 | i*j: > 0.000000): 80.047782 > ITER: 1 | i: 4 | j: 1 Result(i: 2.000000 | j: 1.000000 | i*j: > 2.000000): 82.047782 > ITER: 1 | i: 4 | j: 2 Result(i: 2.000000 | j: 1.414214 | i*j: > 2.828427): 84.876209 > ITER: 1 | i: 4 | j: 3 Result(i: 2.000000 | j: 1.732051 | i*j: > 3.464102): 88.340311 > ITER: 1 | i: 4 | j: 4 Result(i: 2.000000 | j: 2.000000 | i*j: > 4.000000): 92.340311 > ITER: 1 | i: 4 | j: 5 Result(i: 2.000000 | j: 2.236068 | i*j: > 4.472136): 96.812447 > ITER: 1 | i: 4 | j: 6 Result(i: 2.000000 | j: 2.449490 | i*j: > 4.898979): 101.711426 > ITER: 1 | i: 4 | j: 7 Result(i: 2.000000 | j: 2.645751 | i*j: > 5.291503): 107.002929 > ITER: 1 | i: 4 | j: 8 Result(i: 2.000000 | j: 2.828427 | i*j: > 5.656854): 112.659783 > ITER: 1 | i: 4 | j: 9 Result(i: 2.000000 | j: 3.000000 | i*j: > 6.000000): 118.659783 > ITER: 1 | i: 5 | j: 0 Result(i: 2.236068 | j: 0.000000 | i*j: > 0.000000): 118.659783 > ITER: 1 | i: 5 | j: 1 Result(i: 2.236068 | j: 1.000000 | i*j: > 2.236068): 120.895851 > ITER: 1 | i: 5 | j: 2 Result(i: 2.236068 | j: 1.414214 | i*j: > 3.162278): 124.058129 > ITER: 1 | i: 5 | j: 3 Result(i: 2.236068 | j: 1.732051 | i*j: > 3.872983): 127.931112 > ITER: 1 | i: 5 | j: 4 Result(i: 2.236068 | j: 2.000000 | i*j: > 4.472136): 132.403248 > ITER: 1 | i: 5 | j: 5 Result(i: 2.236068 | j: 2.236068 | i*j: > 5.000000): 137.403248 > ITER: 1 | i: 5 | j: 6 Result(i: 2.236068 | j: 2.449490 | i*j: > 5.477226): 142.880474 > ITER: 1 | i: 5 | j: 7 Result(i: 2.236068 | j: 2.645751 | i*j: > 5.916080): 148.796553 > ITER: 1 | i: 5 | j: 8 Result(i: 2.236068 | j: 2.828427 | i*j: > 6.324555): 155.121109 > ITER: 1 | i: 5 | j: 9 Result(i: 2.236068 | j: 3.000000 | i*j: > 6.708204): 161.829313 > ITER: 1 | i: 6 | j: 0 Result(i: 2.449490 | j: 0.000000 | i*j: > 0.000000): 161.829313 > ITER: 1 | i: 6 | j: 1 Result(i: 2.449490 | j: 1.000000 | i*j: > 2.449490): 164.278802 > ITER: 1 | i: 6 | j: 2 Result(i: 2.449490 | j: 1.414214 | i*j: > 3.464102): 167.742904 > ITER: 1 | i: 6 | j: 3 Result(i: 2.449490 | j: 1.732051 | i*j: > 4.242641): 171.985545 > ITER: 1 | i: 6 | j: 4 Result(i: 2.449490 | j: 2.000000 | i*j: > 4.898979): 176.884524 > ITER: 1 | i: 6 | j: 5 Result(i: 2.449490 | j: 2.236068 | i*j: > 5.477226): 182.361750 > ITER: 1 | i: 6 | j: 6 Result(i: 2.449490 | j: 2.449490 | i*j: > 6.000000): 188.361750 > ITER: 1 | i: 6 | j: 7 Result(i: 2.449490 | j: 2.645751 | i*j: > 6.480741): 194.842491 > ITER: 1 | i: 6 | j: 8 Result(i: 2.449490 | j: 2.828427 | i*j: > 6.928203): 201.770694 > ITER: 1 | i: 6 | j: 9 Result(i: 2.449490 | j: 3.000000 | i*j: > 7.348469): 209.119163 > ITER: 1 | i: 7 | j: 0 Result(i: 2.645751 | j: 0.000000 | i*j: > 0.000000): 209.119163 > ITER: 1 | i: 7 | j: 1 Result(i: 2.645751 | j: 1.000000 | i*j: > 2.645751): 211.764914 > ITER: 1 | i: 7 | j: 2 Result(i: 2.645751 | j: 1.414214 | i*j: > 3.741657): 215.506572 > ITER: 1 | i: 7 | j: 3 Result(i: 2.645751 | j: 1.732051 | i*j: > 4.582576): 220.089147 > ITER: 1 | i: 7 | j: 4 Result(i: 2.645751 | j: 2.000000 | i*j: > 5.291503): 225.380650 > ITER: 1 | i: 7 | j: 5 Result(i: 2.645751 | j: 2.236068 | i*j: > 5.916080): 231.296730 > ITER: 1 | i: 7 | j: 6 Result(i: 2.645751 | j: 2.449490 | i*j: > 6.480741): 237.777470 > ITER: 1 | i: 7 | j: 7 Result(i: 2.645751 | j: 2.645751 | i*j: > 7.000000): 244.777470 > ITER: 1 | i: 7 | j: 8 Result(i: 2.645751 | j: 2.828427 | i*j: > 7.483315): 252.260785 > ITER: 1 | i: 7 | j: 9 Result(i: 2.645751 | j: 3.000000 | i*j: > 7.937254): 260.198039 > ITER: 1 | i: 8 | j: 0 Result(i: 2.828427 | j: 0.000000 | i*j: > 0.000000): 260.198039 > ITER: 1 | i: 8 | j: 1 Result(i: 2.828427 | j: 1.000000 | i*j: > 2.828427): 263.026466 > ITER: 1 | i: 8 | j: 2 Result(i: 2.828427 | j: 1.414214 | i*j: > 4.000000): 267.026466 > ITER: 1 | i: 8 | j: 3 Result(i: 2.828427 | j: 1.732051 | i*j: > 4.898979): 271.925446 > ITER: 1 | i: 8 | j: 4 Result(i: 2.828427 | j: 2.000000 | i*j: > 5.656854): 277.582300 > ITER: 1 | i: 8 | j: 5 Result(i: 2.828427 | j: 2.236068 | i*j: > 6.324555): 283.906855 > ITER: 1 | i: 8 | j: 6 Result(i: 2.828427 | j: 2.449490 | i*j: > 6.928203): 290.835059 > ITER: 1 | i: 8 | j: 7 Result(i: 2.828427 | j: 2.645751 | i*j: > 7.483315): 298.318373 > ITER: 1 | i: 8 | j: 8 Result(i: 2.828427 | j: 2.828427 | i*j: > 8.000000): 306.318373 > ITER: 1 | i: 8 | j: 9 Result(i: 2.828427 | j: 3.000000 | i*j: > 8.485281): 314.803655 > ITER: 1 | i: 9 | j: 0 Result(i: 3.000000 | j: 0.000000 | i*j: > 0.000000): 314.803655 > ITER: 1 | i: 9 | j: 1 Result(i: 3.000000 | j: 1.000000 | i*j: > 3.000000): 317.803655 > ITER: 1 | i: 9 | j: 2 Result(i: 3.000000 | j: 1.414214 | i*j: > 4.242641): 322.046295 > ITER: 1 | i: 9 | j: 3 Result(i: 3.000000 | j: 1.732051 | i*j: > 5.196152): 327.242448 > ITER: 1 | i: 9 | j: 4 Result(i: 3.000000 | j: 2.000000 | i*j: > 6.000000): 333.242448 > ITER: 1 | i: 9 | j: 5 Result(i: 3.000000 | j: 2.236068 | i*j: > 6.708204): 339.950652 > ITER: 1 | i: 9 | j: 6 Result(i: 3.000000 | j: 2.449490 | i*j: > 7.348469): 347.299121 > ITER: 1 | i: 9 | j: 7 Result(i: 3.000000 | j: 2.645751 | i*j: > 7.937254): 355.236375 > ITER: 1 | i: 9 | j: 8 Result(i: 3.000000 | j: 2.828427 | i*j: > 8.485281): 363.721656 > ITER: 1 | i: 9 | j: 9 Result(i: 3.000000 | j: 3.000000 | i*j: > 9.000000): 372.721656 > Final Result: 372.721656 > > > > As we can see in the following iterations the sqrt(1) as well as the > result is set to zero for some reason. > > ITER: 0 | i: 1 | j: 4 Result(i: 0.000000 | j: 2.000000 | i*j: > 0.000000): 0.000000 > ITER: 0 | i: 1 | j: 5 Result(i: 0.000000 | j: 2.236068 | i*j: > 0.000000): 0.000000 > ITER: 0 | i: 1 | j: 6 Result(i: 0.000000 | j: 2.449490 | i*j: > 0.000000): 0.000000 > ITER: 0 | i: 1 | j: 7 Result(i: 0.000000 | j: 2.645751 | i*j: > 0.000000): 0.000000 > ITER: 0 | i: 1 | j: 8 Result(i: 0.000000 | j: 2.828427 | i*j: > 0.000000): 0.000000 > ITER: 0 | i: 1 | j: 9 Result(i: 0.000000 | j: 3.000000 | i*j: > 0.000000): 0.000000 > > Please help me to resolve the accuracy issue! I think that it will > be very useful for gem5 community. > > To be noticed, I find the correct simulated tick in which the > application started in FS (using m5 dumpstats), and I start the > --debug-start, but the trace file which is generated is 10x larger > than SE mode for the same application. How can I compare them? > > Thank you in advance! > Best regards, > Nikos > > Quoting Νικόλαος Ταμπουρατζής <ntampouratzis@ece.auth.gr>: > >> Dear Jason, >> >> I am trying to use --debug-start but in FS mode it is very >> difficult to find the tick on which the application is started! >> >> However, I am writing the following very simple c++ program: >> >> #include <cmath> >> #include <stdio.h> >> >> int main(){ >> >> int dim = 4096; >> >> double result; >> >> for (int iter = 0; iter < 2; iter++){ >> result = 0; >> for (int i = 0; i < dim; i++){ >> for (int j = 0; j < dim; j++){ >> result += sqrt(i) * sqrt(j); >> } >> } >> printf("Result: %lf\n", result); //Result: 30530733453.127449 >> } >> } >> >> I cross-compile it using: riscv64-linux-gnu-g++ -static -O3 -o >> test_riscv test_riscv.cpp >> >> >> While in X86 (without cross-compilation of course), QEMU-RISCV, >> GEM5-SE the result is the same (30530733453.127449), in GEM5-FS the >> result is different! In addition, the result is also different >> between the 2 iterations. >> >> Please reproduce the error if you want in order to verify my result. >> Ηow can the issue be resolved? >> >> Thank you in advance! >> >> Best regards, >> Nikos >> >> >> Quoting Jason Lowe-Power <jason@lowepower.com>: >> >>> Hi Nikos, >>> >>> You can use --debug-start to start the debugging after some number of >>> ticks. Also, I would expect that the difference should come up quickly, so >>> no need to run the program to the end. >>> >>> For the FS mode one, you will want to just start the trace as the >>> application starts. This could be a bit of a pain. >>> >>> I'm not really sure what fundamentally could be different. FS and SE mode >>> use the exact same code for executing instructions, so I don't think that's >>> the problem. Have you tried running for smaller inputs or just one >>> iteration? >>> >>> Jason >>> >>> >>> >>> On Wed, Sep 21, 2022 at 9:04 AM Νικόλαος Ταμπουρατζής < >>> ntampouratzis@ece.auth.gr> wrote: >>> >>>> Dear Bobby, >>>> >>>> Iam trying to add --debug-flags=Exec (building the gem5 for gem5.opt >>>> not for gem5.fast which I had) but the debug traces exceed the 20GB >>>> (and it is not finished yet) for less than 1 simulated second. How can >>>> I reduce the size of the debug-flags (or set something more specific)? >>>> >>>> In contrast I build the HPCG benchmark with DHPCG_DEBUG flag. If you >>>> want, you can compare these two output files >>>> (hpcg20010909T014640_SE_Mode & HPCG-Benchmark_3.1_FS_Mode). As you can >>>> see, something goes wrong with the accuracy of calculations in FS mode >>>> (benchmark uses double precission). You can find the files here: >>>> http://kition.mhl.tuc.gr:8000/d/68d82f3533/ >>>> >>>> Best regards, >>>> Nikos >>>> >>>> Quoting Jason Lowe-Power <jason@lowepower.com>: >>>> >>>>> That's quite odd that it works in SE mode but not FS mode! >>>>> >>>>> I would suggest running with --debug-flags=Exec for both and then >>>> perform a >>>>> diff to see how they differ. >>>>> >>>>> Cheers, >>>>> Jason >>>>> >>>>> On Tue, Sep 20, 2022 at 2:45 PM Νικόλαος Ταμπουρατζής < >>>>> ntampouratzis@ece.auth.gr> wrote: >>>>> >>>>>> Dear Bobby, >>>>>> >>>>>> In QEMU I get the same (correct) results that I get in SE mode >>>>>> simulation. I get invalid results in FS simulation (in both >>>>>> riscv-fs.py and riscv-ubuntu-run.py). I cannot access real RISCV >>>>>> hardware at this moment, however, if you want you may execute my xhpcg >>>>>> binary (http://kition.mhl.tuc.gr:8000/f/4ca25fdd3c/) with the >>>>>> following configuration: >>>>>> >>>>>> ./xhpcg --nx=16 --ny=16 --nz=16 --npx=1 --npy=1 --npz=1 --rt=0.1 >>>>>> >>>>>> Please let me know if you have any updates! >>>>>> >>>>>> Best regards, >>>>>> Nikos >>>>>> >>>>>> >>>>>> Quoting Jason Lowe-Power <jason@lowepower.com>: >>>>>> >>>>>>> Hi Nikos, >>>>>>> >>>>>>> I notice you said the following in your original email: >>>>>>> >>>>>>> In addition, I used the RISCV Ubuntu image >>>>>>>> (https://github.com/gem5/gem5-resources/tree/stable/src/riscv-ubuntu >>>> ), >>>>>>>> I installed the gcc compiler, compile it (through qemu) and I get >>>>>>>> wrong results too. >>>>>>> >>>>>>> >>>>>>> Is this saying you get the wrong results is QEMU? If so, the bug is in >>>>>> GCC >>>>>>> or the HPCG workload, not in gem5. If not, I would test in QEMU to >>>> make >>>>>>> sure the binary works there. Another way you could test to see if the >>>>>>> problem is your binary or gem5 would be to run it on real hardware. We >>>>>> have >>>>>>> access to some RISC-V hardware here at UC Davis, if you don't have >>>> access >>>>>>> to it. >>>>>>> >>>>>>> Cheers, >>>>>>> Jason >>>>>>> >>>>>>> On Tue, Sep 20, 2022 at 12:58 AM Νικόλαος Ταμπουρατζής < >>>>>>> ntampouratzis@ece.auth.gr> wrote: >>>>>>> >>>>>>>> Dear Bobby, >>>>>>>> >>>>>>>> 1) I use the original riscv-fs.py which is provided in the latest >>>> gem5 >>>>>>>> release. >>>>>>>> I run the gem5 once (./build/RISCV/gem5.fast -d ./HPCG_FS_results >>>>>>>> ./configs/example/gem5_library/riscv-fs.py) in order to download the >>>>>>>> riscv-bootloader-vmlinux-5.10 and riscv-disk-img. >>>>>>>> After this I mount the riscv-disk-img (sudo mount -o loop >>>>>>>> riscv-disk-img /mnt), put the xhpcg executable and I do the following >>>>>>>> changes in riscv-fs.py to boot the riscv-disk-img with executable: >>>>>>>> >>>>>>>> image = CustomDiskImageResource( >>>>>>>> local_path = "/home/cossim/.cache/gem5/riscv-disk-img", >>>>>>>> ) >>>>>>>> >>>>>>>> # Set the Full System workload. >>>>>>>> board.set_kernel_disk_workload( >>>>>>>> kernel=Resource("riscv-bootloader-vmlinux-5.10"), >>>>>>>> disk_image=image, >>>>>>>> ) >>>>>>>> >>>>>>>> Finally, in the gem5/src/python/gem5/components/boards/riscv_board.py >>>>>>>> I change the last line to "return ["console=ttyS0", >>>>>>>> "root={root_value}", "rw"]" in order to allow the write permissions >>>> in >>>>>>>> the image. >>>>>>>> >>>>>>>> >>>>>>>> 2) The HPCG benchmark after some iterations calculates if the results >>>>>>>> are valid or not valid. In the case of FS it gives invalid results. >>>> As >>>>>>>> I see from the results, one (at least) problem is that produces >>>>>>>> different results in each HPCG execution (with the same >>>> configuration). >>>>>>>> >>>>>>>> Here is the HPCG output and riscv-fs.py >>>>>>>> (http://kition.mhl.tuc.gr:8000/d/68d82f3533/). You may reproduce the >>>>>>>> results in the video if you use the xhpcg executable >>>>>>>> (http://kition.mhl.tuc.gr:8000/f/4ca25fdd3c/) >>>>>>>> >>>>>>>> Please help me in order to solve it! >>>>>>>> >>>>>>>> Finally, I get invalid results in the HPL benchmark in FS mode too. >>>>>>>> >>>>>>>> Best regards, >>>>>>>> Nikos >>>>>>>> >>>>>>>> >>>>>>>> Quoting Bobby Bruce <bbruce@ucdavis.edu>: >>>>>>>> >>>>>>>> > I'm going to need a bit more information to help: >>>>>>>> > >>>>>>>> > 1. In what way have you modified >>>>>>>> > ./configs/example/gem5_library/riscv-fs.py? Can you attach the >>>> script >>>>>>>> here? >>>>>>>> > 2. What error are you getting or in what way are the results >>>> invalid? >>>>>>>> > >>>>>>>> > - >>>>>>>> > Dr. Bobby R. Bruce >>>>>>>> > Room 3050, >>>>>>>> > Kemper Hall, UC Davis >>>>>>>> > Davis, >>>>>>>> > CA, 95616 >>>>>>>> > >>>>>>>> > web: https://www.bobbybruce.net >>>>>>>> > >>>>>>>> > >>>>>>>> > On Mon, Sep 19, 2022 at 1:43 PM Νικόλαος Ταμπουρατζής < >>>>>>>> > ntampouratzis@ece.auth.gr> wrote: >>>>>>>> > >>>>>>>> >> >>>>>>>> >> Dear gem5 community, >>>>>>>> >> >>>>>>>> >> I have successfully cross-compile the HPCG benchmark for RISCV >>>>>> (Serial >>>>>>>> >> version, without MPI and OpenMP). While it working properly in >>>> gem5 >>>>>> SE >>>>>>>> >> mode (./build/RISCV/gem5.fast -d ./HPCG_SE_results >>>>>>>> >> ./configs/example/se.py -c xhpcg --options '--nx=16 --ny=16 >>>> --nz=16 >>>>>>>> >> --npx=1 --npy=1 --npz=1 --rt=0.1'), I get invalid results in FS >>>>>>>> >> simulation using "./build/RISCV/gem5.fast -d ./HPCG_FS_results >>>>>>>> >> ./configs/example/gem5_library/riscv-fs.py" (I mount the riscv >>>> image >>>>>>>> >> and put it). >>>>>>>> >> >>>>>>>> >> Can you help me please? >>>>>>>> >> >>>>>>>> >> In addition, I used the RISCV Ubuntu image >>>>>>>> >> ( >>>> https://github.com/gem5/gem5-resources/tree/stable/src/riscv-ubuntu >>>>>> ), >>>>>>>> >> I installed the gcc compiler, compile it (through qemu) and I get >>>>>>>> >> wrong results too. >>>>>>>> >> >>>>>>>> >> Here is the Makefile which I use, the hpcg executable for RISCV >>>>>>>> >> (xhpcg), and a video that shows the results >>>>>>>> >> (http://kition.mhl.tuc.gr:8000/f/4ca25fdd3c/). >>>>>>>> >> >>>>>>>> >> P.S. I use the latest gem5 version. >>>>>>>> >> >>>>>>>> >> Thank you in advance! :) >>>>>>>> >> >>>>>>>> >> Best regards, >>>>>>>> >> Nikos >>>>>>>> >> _______________________________________________ >>>>>>>> >> gem5-users mailing list -- gem5-users@gem5.org >>>>>>>> >> To unsubscribe send an email to gem5-users-leave@gem5.org >>>>>>>> >> >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> gem5-users mailing list -- gem5-users@gem5.org >>>>>>>> To unsubscribe send an email to gem5-users-leave@gem5.org >>>>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> gem5-users mailing list -- gem5-users@gem5.org >>>>>> To unsubscribe send an email to gem5-users-leave@gem5.org >>>>>> >>>> >>>> >>>> _______________________________________________ >>>> gem5-users mailing list -- gem5-users@gem5.org >>>> To unsubscribe send an email to gem5-users-leave@gem5.org >>>> >> >> >> _______________________________________________ >> gem5-users mailing list -- gem5-users@gem5.org >> To unsubscribe send an email to gem5-users-leave@gem5.org > > > _______________________________________________ > gem5-users mailing list -- gem5-users@gem5.org > To unsubscribe send an email to gem5-users-leave@gem5.org
BB
Bobby Bruce
Fri, Oct 7, 2022 1:04 AM

Hey Niko,

Thanks for this analysis. I jumped a little into this today but didn't get
as far as you did. I wanted to find a quick way to recreate the following:
https://gem5-review.googlesource.com/c/public/gem5/+/64211.  Please feel
free to use this, if it helps any.

It's very strange to me that this bug hasn't manifested itself before but
it's undeniably there. I'll try to spend more time looking at this tomorrow
with some traces and debug flags and see if I can narrow down the problem.

--
Dr. Bobby R. Bruce
Room 3050,
Kemper Hall, UC Davis
Davis,
CA, 95616

web: https://www.bobbybruce.net

On Wed, Oct 5, 2022 at 2:26 PM Νικόλαος Ταμπουρατζής <
ntampouratzis@ece.auth.gr> wrote:

In my previous results, I had used double (not float) for the
following variables: result, sq_i and sq_j. In the case of float
instead of double I get "nan" and not 0.000000.

Quoting Νικόλαος Ταμπουρατζής ntampouratzis@ece.auth.gr:

Dear Jason, all,

I am trying to find the accuracy problem with RISCV-FS and I observe
that the problem is created (at least in my dummy example) because
the variables (double) are set to zero in random simulated time (for
this reason I get different results among executions of the same
code). Specifically for the following dummy code:

#include <cmath>
#include <stdio.h>

int main(){

 int dim = 10;

 float result;

 for (int iter = 0; iter < 2; iter++){
     result = 0;
     for (int i = 0; i < dim; i++){
         for (int j = 0; j < dim; j++){
             float sq_i = sqrt(i);
             float sq_j = sqrt(j);
             result += sq_i * sq_j;
             printf("ITER: %d | i: %d | j: %d Result(i: %f | j:

%f | i*j: %f): %f\n", iter, i , j, sq_i, sq_j, sq_i * sq_j, result);
}
}
printf("Final Result: %lf\n", result);
}
}

The correct Final Result in both iterations is 372.721656. However,
I get the following results in FS:

ITER: 0 | i: 0 | j: 0 Result(i: 0.000000 | j: 0.000000 | ij:
0.000000): 0.000000
ITER: 0 | i: 0 | j: 1 Result(i: 0.000000 | j: 1.000000 | i
j:
0.000000): 0.000000
ITER: 0 | i: 0 | j: 2 Result(i: 0.000000 | j: 1.414214 | ij:
0.000000): 0.000000
ITER: 0 | i: 0 | j: 3 Result(i: 0.000000 | j: 1.732051 | i
j:
0.000000): 0.000000
ITER: 0 | i: 0 | j: 4 Result(i: 0.000000 | j: 2.000000 | ij:
0.000000): 0.000000
ITER: 0 | i: 0 | j: 5 Result(i: 0.000000 | j: 2.236068 | i
j:
0.000000): 0.000000
ITER: 0 | i: 0 | j: 6 Result(i: 0.000000 | j: 2.449490 | ij:
0.000000): 0.000000
ITER: 0 | i: 0 | j: 7 Result(i: 0.000000 | j: 2.645751 | i
j:
0.000000): 0.000000
ITER: 0 | i: 0 | j: 8 Result(i: 0.000000 | j: 2.828427 | ij:
0.000000): 0.000000
ITER: 0 | i: 0 | j: 9 Result(i: 0.000000 | j: 3.000000 | i
j:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 0 Result(i: 1.000000 | j: 0.000000 | ij:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 1 Result(i: 1.000000 | j: 1.000000 | i
j:
1.000000): 1.000000
ITER: 0 | i: 1 | j: 2 Result(i: 1.000000 | j: 1.414214 | ij:
1.414214): 2.414214
ITER: 0 | i: 1 | j: 3 Result(i: 1.000000 | j: 1.732051 | i
j:
1.732051): 4.146264
ITER: 0 | i: 1 | j: 4 Result(i: 0.000000 | j: 2.000000 | ij:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 5 Result(i: 0.000000 | j: 2.236068 | i
j:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 6 Result(i: 0.000000 | j: 2.449490 | ij:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 7 Result(i: 0.000000 | j: 2.645751 | i
j:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 8 Result(i: 0.000000 | j: 2.828427 | ij:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 9 Result(i: 0.000000 | j: 3.000000 | i
j:
0.000000): 0.000000
ITER: 0 | i: 2 | j: 0 Result(i: 1.414214 | j: 0.000000 | ij:
0.000000): 0.000000
ITER: 0 | i: 2 | j: 1 Result(i: 1.414214 | j: 1.000000 | i
j:
1.414214): 1.414214
ITER: 0 | i: 2 | j: 2 Result(i: 1.414214 | j: 1.414214 | ij:
2.000000): 3.414214
ITER: 0 | i: 2 | j: 3 Result(i: 1.414214 | j: 1.732051 | i
j:
2.449490): 5.863703
ITER: 0 | i: 2 | j: 4 Result(i: 1.414214 | j: 2.000000 | ij:
2.828427): 8.692130
ITER: 0 | i: 2 | j: 5 Result(i: 1.414214 | j: 2.236068 | i
j:
3.162278): 11.854408
ITER: 0 | i: 2 | j: 6 Result(i: 1.414214 | j: 2.449490 | ij:
3.464102): 15.318510
ITER: 0 | i: 2 | j: 7 Result(i: 1.414214 | j: 2.645751 | i
j:
3.741657): 19.060167
ITER: 0 | i: 2 | j: 8 Result(i: 1.414214 | j: 2.828427 | ij:
4.000000): 23.060167
ITER: 0 | i: 2 | j: 9 Result(i: 1.414214 | j: 3.000000 | i
j:
4.242641): 27.302808
ITER: 0 | i: 3 | j: 0 Result(i: 1.732051 | j: 0.000000 | ij:
0.000000): 27.302808
ITER: 0 | i: 3 | j: 1 Result(i: 1.732051 | j: 1.000000 | i
j:
1.732051): 29.034859
ITER: 0 | i: 3 | j: 2 Result(i: 1.732051 | j: 1.414214 | ij:
2.449490): 31.484348
ITER: 0 | i: 3 | j: 3 Result(i: 1.732051 | j: 1.732051 | i
j:
3.000000): 34.484348
ITER: 0 | i: 3 | j: 4 Result(i: 1.732051 | j: 2.000000 | ij:
3.464102): 37.948450
ITER: 0 | i: 3 | j: 5 Result(i: 1.732051 | j: 2.236068 | i
j:
3.872983): 41.821433
ITER: 0 | i: 3 | j: 6 Result(i: 1.732051 | j: 2.449490 | ij:
4.242641): 46.064074
ITER: 0 | i: 3 | j: 7 Result(i: 1.732051 | j: 2.645751 | i
j:
4.582576): 50.646650
ITER: 0 | i: 3 | j: 8 Result(i: 1.732051 | j: 2.828427 | ij:
4.898979): 55.545629
ITER: 0 | i: 3 | j: 9 Result(i: 1.732051 | j: 3.000000 | i
j:
5.196152): 60.741782
ITER: 0 | i: 4 | j: 0 Result(i: 2.000000 | j: 0.000000 | ij:
0.000000): 60.741782
ITER: 0 | i: 4 | j: 1 Result(i: 2.000000 | j: 1.000000 | i
j:
2.000000): 62.741782
ITER: 0 | i: 4 | j: 2 Result(i: 2.000000 | j: 1.414214 | ij:
2.828427): 65.570209
ITER: 0 | i: 4 | j: 3 Result(i: 2.000000 | j: 1.732051 | i
j:
3.464102): 69.034310
ITER: 0 | i: 4 | j: 4 Result(i: 2.000000 | j: 2.000000 | ij:
4.000000): 73.034310
ITER: 0 | i: 4 | j: 5 Result(i: 2.000000 | j: 2.236068 | i
j:
4.472136): 77.506446
ITER: 0 | i: 4 | j: 6 Result(i: 2.000000 | j: 2.449490 | ij:
4.898979): 82.405426
ITER: 0 | i: 4 | j: 7 Result(i: 2.000000 | j: 2.645751 | i
j:
5.291503): 87.696928
ITER: 0 | i: 4 | j: 8 Result(i: 2.000000 | j: 2.828427 | ij:
5.656854): 93.353783
ITER: 0 | i: 4 | j: 9 Result(i: 2.000000 | j: 3.000000 | i
j:
6.000000): 99.353783
ITER: 0 | i: 5 | j: 0 Result(i: 2.236068 | j: 0.000000 | ij:
0.000000): 99.353783
ITER: 0 | i: 5 | j: 1 Result(i: 2.236068 | j: 1.000000 | i
j:
2.236068): 101.589851
ITER: 0 | i: 5 | j: 2 Result(i: 2.236068 | j: 1.414214 | ij:
3.162278): 104.752128
ITER: 0 | i: 5 | j: 3 Result(i: 2.236068 | j: 1.732051 | i
j:
3.872983): 108.625112
ITER: 0 | i: 5 | j: 4 Result(i: 2.236068 | j: 2.000000 | ij:
4.472136): 113.097248
ITER: 0 | i: 5 | j: 5 Result(i: 2.236068 | j: 2.236068 | i
j:
5.000000): 118.097248
ITER: 0 | i: 5 | j: 6 Result(i: 2.236068 | j: 2.449490 | ij:
5.477226): 123.574473
ITER: 0 | i: 5 | j: 7 Result(i: 2.236068 | j: 2.645751 | i
j:
5.916080): 129.490553
ITER: 0 | i: 5 | j: 8 Result(i: 2.236068 | j: 2.828427 | ij:
6.324555): 135.815108
ITER: 0 | i: 5 | j: 9 Result(i: 2.236068 | j: 3.000000 | i
j:
6.708204): 142.523312
ITER: 0 | i: 6 | j: 0 Result(i: 2.449490 | j: 0.000000 | ij:
0.000000): 142.523312
ITER: 0 | i: 6 | j: 1 Result(i: 2.449490 | j: 1.000000 | i
j:
2.449490): 144.972802
ITER: 0 | i: 6 | j: 2 Result(i: 2.449490 | j: 1.414214 | ij:
3.464102): 148.436904
ITER: 0 | i: 6 | j: 3 Result(i: 2.449490 | j: 1.732051 | i
j:
4.242641): 152.679544
ITER: 0 | i: 6 | j: 4 Result(i: 2.449490 | j: 2.000000 | ij:
4.898979): 157.578524
ITER: 0 | i: 6 | j: 5 Result(i: 2.449490 | j: 2.236068 | i
j:
5.477226): 163.055749
ITER: 0 | i: 6 | j: 6 Result(i: 2.449490 | j: 2.449490 | ij:
6.000000): 169.055749
ITER: 0 | i: 6 | j: 7 Result(i: 2.449490 | j: 2.645751 | i
j:
6.480741): 175.536490
ITER: 0 | i: 6 | j: 8 Result(i: 2.449490 | j: 2.828427 | ij:
6.928203): 182.464693
ITER: 0 | i: 6 | j: 9 Result(i: 2.449490 | j: 3.000000 | i
j:
7.348469): 189.813162
ITER: 0 | i: 7 | j: 0 Result(i: 2.645751 | j: 0.000000 | ij:
0.000000): 189.813162
ITER: 0 | i: 7 | j: 1 Result(i: 2.645751 | j: 1.000000 | i
j:
2.645751): 192.458914
ITER: 0 | i: 7 | j: 2 Result(i: 2.645751 | j: 1.414214 | ij:
3.741657): 196.200571
ITER: 0 | i: 7 | j: 3 Result(i: 2.645751 | j: 1.732051 | i
j:
4.582576): 200.783147
ITER: 0 | i: 7 | j: 4 Result(i: 2.645751 | j: 2.000000 | ij:
5.291503): 206.074649
ITER: 0 | i: 7 | j: 5 Result(i: 2.645751 | j: 2.236068 | i
j:
5.916080): 211.990729
ITER: 0 | i: 7 | j: 6 Result(i: 2.645751 | j: 2.449490 | ij:
6.480741): 218.471470
ITER: 0 | i: 7 | j: 7 Result(i: 2.645751 | j: 2.645751 | i
j:
7.000000): 225.471470
ITER: 0 | i: 7 | j: 8 Result(i: 2.645751 | j: 2.828427 | ij:
7.483315): 232.954785
ITER: 0 | i: 7 | j: 9 Result(i: 2.645751 | j: 3.000000 | i
j:
7.937254): 240.892039
ITER: 0 | i: 8 | j: 0 Result(i: 2.828427 | j: 0.000000 | ij:
0.000000): 240.892039
ITER: 0 | i: 8 | j: 1 Result(i: 2.828427 | j: 1.000000 | i
j:
2.828427): 243.720466
ITER: 0 | i: 8 | j: 2 Result(i: 2.828427 | j: 1.414214 | ij:
4.000000): 247.720466
ITER: 0 | i: 8 | j: 3 Result(i: 2.828427 | j: 1.732051 | i
j:
4.898979): 252.619445
ITER: 0 | i: 8 | j: 4 Result(i: 2.828427 | j: 2.000000 | ij:
5.656854): 258.276300
ITER: 0 | i: 8 | j: 5 Result(i: 2.828427 | j: 2.236068 | i
j:
6.324555): 264.600855
ITER: 0 | i: 8 | j: 6 Result(i: 2.828427 | j: 2.449490 | ij:
6.928203): 271.529058
ITER: 0 | i: 8 | j: 7 Result(i: 2.828427 | j: 2.645751 | i
j:
7.483315): 279.012373
ITER: 0 | i: 8 | j: 8 Result(i: 2.828427 | j: 2.828427 | ij:
8.000000): 287.012373
ITER: 0 | i: 8 | j: 9 Result(i: 2.828427 | j: 3.000000 | i
j:
8.485281): 295.497654
ITER: 0 | i: 9 | j: 0 Result(i: 3.000000 | j: 0.000000 | ij:
0.000000): 295.497654
ITER: 0 | i: 9 | j: 1 Result(i: 3.000000 | j: 1.000000 | i
j:
3.000000): 298.497654
ITER: 0 | i: 9 | j: 2 Result(i: 3.000000 | j: 1.414214 | ij:
4.242641): 302.740295
ITER: 0 | i: 9 | j: 3 Result(i: 3.000000 | j: 1.732051 | i
j:
5.196152): 307.936447
ITER: 0 | i: 9 | j: 4 Result(i: 3.000000 | j: 2.000000 | ij:
6.000000): 313.936447
ITER: 0 | i: 9 | j: 5 Result(i: 3.000000 | j: 2.236068 | i
j:
6.708204): 320.644651
ITER: 0 | i: 9 | j: 6 Result(i: 3.000000 | j: 2.449490 | ij:
7.348469): 327.993120
ITER: 0 | i: 9 | j: 7 Result(i: 3.000000 | j: 2.645751 | i
j:
7.937254): 335.930374
ITER: 0 | i: 9 | j: 8 Result(i: 3.000000 | j: 2.828427 | ij:
8.485281): 344.415656
ITER: 0 | i: 9 | j: 9 Result(i: 3.000000 | j: 3.000000 | i
j:
9.000000): 353.415656
Final Result: 353.415656
ITER: 1 | i: 0 | j: 0 Result(i: 0.000000 | j: 0.000000 | ij:
0.000000): 0.000000
ITER: 1 | i: 0 | j: 1 Result(i: 0.000000 | j: 1.000000 | i
j:
0.000000): 0.000000
ITER: 1 | i: 0 | j: 2 Result(i: 0.000000 | j: 1.414214 | ij:
0.000000): 0.000000
ITER: 1 | i: 0 | j: 3 Result(i: 0.000000 | j: 1.732051 | i
j:
0.000000): 0.000000
ITER: 1 | i: 0 | j: 4 Result(i: 0.000000 | j: 2.000000 | ij:
0.000000): 0.000000
ITER: 1 | i: 0 | j: 5 Result(i: 0.000000 | j: 2.236068 | i
j:
0.000000): 0.000000
ITER: 1 | i: 0 | j: 6 Result(i: 0.000000 | j: 2.449490 | ij:
0.000000): 0.000000
ITER: 1 | i: 0 | j: 7 Result(i: 0.000000 | j: 2.645751 | i
j:
0.000000): 0.000000
ITER: 1 | i: 0 | j: 8 Result(i: 0.000000 | j: 2.828427 | ij:
0.000000): 0.000000
ITER: 1 | i: 0 | j: 9 Result(i: 0.000000 | j: 3.000000 | i
j:
0.000000): 0.000000
ITER: 1 | i: 1 | j: 0 Result(i: 1.000000 | j: 0.000000 | ij:
0.000000): 0.000000
ITER: 1 | i: 1 | j: 1 Result(i: 1.000000 | j: 1.000000 | i
j:
1.000000): 1.000000
ITER: 1 | i: 1 | j: 2 Result(i: 1.000000 | j: 1.414214 | ij:
1.414214): 2.414214
ITER: 1 | i: 1 | j: 3 Result(i: 1.000000 | j: 1.732051 | i
j:
1.732051): 4.146264
ITER: 1 | i: 1 | j: 4 Result(i: 1.000000 | j: 2.000000 | ij:
2.000000): 6.146264
ITER: 1 | i: 1 | j: 5 Result(i: 1.000000 | j: 2.236068 | i
j:
2.236068): 8.382332
ITER: 1 | i: 1 | j: 6 Result(i: 1.000000 | j: 2.449490 | ij:
2.449490): 10.831822
ITER: 1 | i: 1 | j: 7 Result(i: 1.000000 | j: 2.645751 | i
j:
2.645751): 13.477573
ITER: 1 | i: 1 | j: 8 Result(i: 1.000000 | j: 2.828427 | ij:
2.828427): 16.306001
ITER: 1 | i: 1 | j: 9 Result(i: 1.000000 | j: 3.000000 | i
j:
3.000000): 19.306001
ITER: 1 | i: 2 | j: 0 Result(i: 1.414214 | j: 0.000000 | ij:
0.000000): 19.306001
ITER: 1 | i: 2 | j: 1 Result(i: 1.414214 | j: 1.000000 | i
j:
1.414214): 20.720214
ITER: 1 | i: 2 | j: 2 Result(i: 1.414214 | j: 1.414214 | ij:
2.000000): 22.720214
ITER: 1 | i: 2 | j: 3 Result(i: 1.414214 | j: 1.732051 | i
j:
2.449490): 25.169704
ITER: 1 | i: 2 | j: 4 Result(i: 1.414214 | j: 2.000000 | ij:
2.828427): 27.998131
ITER: 1 | i: 2 | j: 5 Result(i: 1.414214 | j: 2.236068 | i
j:
3.162278): 31.160409
ITER: 1 | i: 2 | j: 6 Result(i: 1.414214 | j: 2.449490 | ij:
3.464102): 34.624510
ITER: 1 | i: 2 | j: 7 Result(i: 1.414214 | j: 2.645751 | i
j:
3.741657): 38.366168
ITER: 1 | i: 2 | j: 8 Result(i: 1.414214 | j: 2.828427 | ij:
4.000000): 42.366168
ITER: 1 | i: 2 | j: 9 Result(i: 1.414214 | j: 3.000000 | i
j:
4.242641): 46.608808
ITER: 1 | i: 3 | j: 0 Result(i: 1.732051 | j: 0.000000 | ij:
0.000000): 46.608808
ITER: 1 | i: 3 | j: 1 Result(i: 1.732051 | j: 1.000000 | i
j:
1.732051): 48.340859
ITER: 1 | i: 3 | j: 2 Result(i: 1.732051 | j: 1.414214 | ij:
2.449490): 50.790349
ITER: 1 | i: 3 | j: 3 Result(i: 1.732051 | j: 1.732051 | i
j:
3.000000): 53.790349
ITER: 1 | i: 3 | j: 4 Result(i: 1.732051 | j: 2.000000 | ij:
3.464102): 57.254450
ITER: 1 | i: 3 | j: 5 Result(i: 1.732051 | j: 2.236068 | i
j:
3.872983): 61.127434
ITER: 1 | i: 3 | j: 6 Result(i: 1.732051 | j: 2.449490 | ij:
4.242641): 65.370075
ITER: 1 | i: 3 | j: 7 Result(i: 1.732051 | j: 2.645751 | i
j:
4.582576): 69.952650
ITER: 1 | i: 3 | j: 8 Result(i: 1.732051 | j: 2.828427 | ij:
4.898979): 74.851630
ITER: 1 | i: 3 | j: 9 Result(i: 1.732051 | j: 3.000000 | i
j:
5.196152): 80.047782
ITER: 1 | i: 4 | j: 0 Result(i: 2.000000 | j: 0.000000 | ij:
0.000000): 80.047782
ITER: 1 | i: 4 | j: 1 Result(i: 2.000000 | j: 1.000000 | i
j:
2.000000): 82.047782
ITER: 1 | i: 4 | j: 2 Result(i: 2.000000 | j: 1.414214 | ij:
2.828427): 84.876209
ITER: 1 | i: 4 | j: 3 Result(i: 2.000000 | j: 1.732051 | i
j:
3.464102): 88.340311
ITER: 1 | i: 4 | j: 4 Result(i: 2.000000 | j: 2.000000 | ij:
4.000000): 92.340311
ITER: 1 | i: 4 | j: 5 Result(i: 2.000000 | j: 2.236068 | i
j:
4.472136): 96.812447
ITER: 1 | i: 4 | j: 6 Result(i: 2.000000 | j: 2.449490 | ij:
4.898979): 101.711426
ITER: 1 | i: 4 | j: 7 Result(i: 2.000000 | j: 2.645751 | i
j:
5.291503): 107.002929
ITER: 1 | i: 4 | j: 8 Result(i: 2.000000 | j: 2.828427 | ij:
5.656854): 112.659783
ITER: 1 | i: 4 | j: 9 Result(i: 2.000000 | j: 3.000000 | i
j:
6.000000): 118.659783
ITER: 1 | i: 5 | j: 0 Result(i: 2.236068 | j: 0.000000 | ij:
0.000000): 118.659783
ITER: 1 | i: 5 | j: 1 Result(i: 2.236068 | j: 1.000000 | i
j:
2.236068): 120.895851
ITER: 1 | i: 5 | j: 2 Result(i: 2.236068 | j: 1.414214 | ij:
3.162278): 124.058129
ITER: 1 | i: 5 | j: 3 Result(i: 2.236068 | j: 1.732051 | i
j:
3.872983): 127.931112
ITER: 1 | i: 5 | j: 4 Result(i: 2.236068 | j: 2.000000 | ij:
4.472136): 132.403248
ITER: 1 | i: 5 | j: 5 Result(i: 2.236068 | j: 2.236068 | i
j:
5.000000): 137.403248
ITER: 1 | i: 5 | j: 6 Result(i: 2.236068 | j: 2.449490 | ij:
5.477226): 142.880474
ITER: 1 | i: 5 | j: 7 Result(i: 2.236068 | j: 2.645751 | i
j:
5.916080): 148.796553
ITER: 1 | i: 5 | j: 8 Result(i: 2.236068 | j: 2.828427 | ij:
6.324555): 155.121109
ITER: 1 | i: 5 | j: 9 Result(i: 2.236068 | j: 3.000000 | i
j:
6.708204): 161.829313
ITER: 1 | i: 6 | j: 0 Result(i: 2.449490 | j: 0.000000 | ij:
0.000000): 161.829313
ITER: 1 | i: 6 | j: 1 Result(i: 2.449490 | j: 1.000000 | i
j:
2.449490): 164.278802
ITER: 1 | i: 6 | j: 2 Result(i: 2.449490 | j: 1.414214 | ij:
3.464102): 167.742904
ITER: 1 | i: 6 | j: 3 Result(i: 2.449490 | j: 1.732051 | i
j:
4.242641): 171.985545
ITER: 1 | i: 6 | j: 4 Result(i: 2.449490 | j: 2.000000 | ij:
4.898979): 176.884524
ITER: 1 | i: 6 | j: 5 Result(i: 2.449490 | j: 2.236068 | i
j:
5.477226): 182.361750
ITER: 1 | i: 6 | j: 6 Result(i: 2.449490 | j: 2.449490 | ij:
6.000000): 188.361750
ITER: 1 | i: 6 | j: 7 Result(i: 2.449490 | j: 2.645751 | i
j:
6.480741): 194.842491
ITER: 1 | i: 6 | j: 8 Result(i: 2.449490 | j: 2.828427 | ij:
6.928203): 201.770694
ITER: 1 | i: 6 | j: 9 Result(i: 2.449490 | j: 3.000000 | i
j:
7.348469): 209.119163
ITER: 1 | i: 7 | j: 0 Result(i: 2.645751 | j: 0.000000 | ij:
0.000000): 209.119163
ITER: 1 | i: 7 | j: 1 Result(i: 2.645751 | j: 1.000000 | i
j:
2.645751): 211.764914
ITER: 1 | i: 7 | j: 2 Result(i: 2.645751 | j: 1.414214 | ij:
3.741657): 215.506572
ITER: 1 | i: 7 | j: 3 Result(i: 2.645751 | j: 1.732051 | i
j:
4.582576): 220.089147
ITER: 1 | i: 7 | j: 4 Result(i: 2.645751 | j: 2.000000 | ij:
5.291503): 225.380650
ITER: 1 | i: 7 | j: 5 Result(i: 2.645751 | j: 2.236068 | i
j:
5.916080): 231.296730
ITER: 1 | i: 7 | j: 6 Result(i: 2.645751 | j: 2.449490 | ij:
6.480741): 237.777470
ITER: 1 | i: 7 | j: 7 Result(i: 2.645751 | j: 2.645751 | i
j:
7.000000): 244.777470
ITER: 1 | i: 7 | j: 8 Result(i: 2.645751 | j: 2.828427 | ij:
7.483315): 252.260785
ITER: 1 | i: 7 | j: 9 Result(i: 2.645751 | j: 3.000000 | i
j:
7.937254): 260.198039
ITER: 1 | i: 8 | j: 0 Result(i: 2.828427 | j: 0.000000 | ij:
0.000000): 260.198039
ITER: 1 | i: 8 | j: 1 Result(i: 2.828427 | j: 1.000000 | i
j:
2.828427): 263.026466
ITER: 1 | i: 8 | j: 2 Result(i: 2.828427 | j: 1.414214 | ij:
4.000000): 267.026466
ITER: 1 | i: 8 | j: 3 Result(i: 2.828427 | j: 1.732051 | i
j:
4.898979): 271.925446
ITER: 1 | i: 8 | j: 4 Result(i: 2.828427 | j: 2.000000 | ij:
5.656854): 277.582300
ITER: 1 | i: 8 | j: 5 Result(i: 2.828427 | j: 2.236068 | i
j:
6.324555): 283.906855
ITER: 1 | i: 8 | j: 6 Result(i: 2.828427 | j: 2.449490 | ij:
6.928203): 290.835059
ITER: 1 | i: 8 | j: 7 Result(i: 2.828427 | j: 2.645751 | i
j:
7.483315): 298.318373
ITER: 1 | i: 8 | j: 8 Result(i: 2.828427 | j: 2.828427 | ij:
8.000000): 306.318373
ITER: 1 | i: 8 | j: 9 Result(i: 2.828427 | j: 3.000000 | i
j:
8.485281): 314.803655
ITER: 1 | i: 9 | j: 0 Result(i: 3.000000 | j: 0.000000 | ij:
0.000000): 314.803655
ITER: 1 | i: 9 | j: 1 Result(i: 3.000000 | j: 1.000000 | i
j:
3.000000): 317.803655
ITER: 1 | i: 9 | j: 2 Result(i: 3.000000 | j: 1.414214 | ij:
4.242641): 322.046295
ITER: 1 | i: 9 | j: 3 Result(i: 3.000000 | j: 1.732051 | i
j:
5.196152): 327.242448
ITER: 1 | i: 9 | j: 4 Result(i: 3.000000 | j: 2.000000 | ij:
6.000000): 333.242448
ITER: 1 | i: 9 | j: 5 Result(i: 3.000000 | j: 2.236068 | i
j:
6.708204): 339.950652
ITER: 1 | i: 9 | j: 6 Result(i: 3.000000 | j: 2.449490 | ij:
7.348469): 347.299121
ITER: 1 | i: 9 | j: 7 Result(i: 3.000000 | j: 2.645751 | i
j:
7.937254): 355.236375
ITER: 1 | i: 9 | j: 8 Result(i: 3.000000 | j: 2.828427 | ij:
8.485281): 363.721656
ITER: 1 | i: 9 | j: 9 Result(i: 3.000000 | j: 3.000000 | i
j:
9.000000): 372.721656
Final Result: 372.721656

As we can see in the following iterations the sqrt(1) as well as the
result is set to zero for some reason.

ITER: 0 | i: 1 | j: 4 Result(i: 0.000000 | j: 2.000000 | ij:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 5 Result(i: 0.000000 | j: 2.236068 | i
j:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 6 Result(i: 0.000000 | j: 2.449490 | ij:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 7 Result(i: 0.000000 | j: 2.645751 | i
j:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 8 Result(i: 0.000000 | j: 2.828427 | ij:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 9 Result(i: 0.000000 | j: 3.000000 | i
j:
0.000000): 0.000000

Please help me to resolve the accuracy issue! I think that it will
be very useful for gem5 community.

To be noticed, I find the correct simulated tick in which the
application started in FS (using m5 dumpstats), and I start the
--debug-start, but the trace file which is generated is 10x larger
than SE mode for the same application. How can I compare them?

Thank you in advance!
Best regards,
Nikos

Quoting Νικόλαος Ταμπουρατζής ntampouratzis@ece.auth.gr:

Dear Jason,

I am trying to use --debug-start but in FS mode it is very
difficult to find the tick on which the application is started!

However, I am writing the following very simple c++ program:

#include <cmath>
#include <stdio.h>

int main(){

int dim = 4096;

double result;

for (int iter = 0; iter < 2; iter++){
    result = 0;
    for (int i = 0; i < dim; i++){
        for (int j = 0; j < dim; j++){
            result += sqrt(i) * sqrt(j);
        }
    }
    printf("Result: %lf\n", result); //Result: 30530733453.127449
}

}

I cross-compile it using: riscv64-linux-gnu-g++ -static -O3 -o
test_riscv test_riscv.cpp

While in X86 (without cross-compilation of course), QEMU-RISCV,
GEM5-SE the result is the same (30530733453.127449), in GEM5-FS the
result is different! In addition, the result is also different
between the 2 iterations.

Please reproduce the error if you want in order to verify my result.
Ηow can the issue be resolved?

Thank you in advance!

Best regards,
Nikos

Quoting Jason Lowe-Power jason@lowepower.com:

Hi Nikos,

You can use --debug-start to start the debugging after some number of
ticks. Also, I would expect that the difference should come up

quickly, so

no need to run the program to the end.

For the FS mode one, you will want to just start the trace as the
application starts. This could be a bit of a pain.

I'm not really sure what fundamentally could be different. FS and SE

mode

use the exact same code for executing instructions, so I don't think

that's

the problem. Have you tried running for smaller inputs or just one
iteration?

Jason

On Wed, Sep 21, 2022 at 9:04 AM Νικόλαος Ταμπουρατζής <
ntampouratzis@ece.auth.gr> wrote:

Dear Bobby,

Iam trying to add --debug-flags=Exec (building the gem5 for gem5.opt
not for gem5.fast which I had) but the debug traces exceed the 20GB
(and it is not finished yet) for less than 1 simulated second. How can
I reduce the size of the debug-flags (or set something more specific)?

In contrast I build the HPCG benchmark with DHPCG_DEBUG flag. If you
want, you can compare these two output files
(hpcg20010909T014640_SE_Mode & HPCG-Benchmark_3.1_FS_Mode). As you can
see, something goes wrong with the accuracy of calculations in FS mode
(benchmark uses double precission). You can find the files here:
http://kition.mhl.tuc.gr:8000/d/68d82f3533/

Best regards,
Nikos

Quoting Jason Lowe-Power jason@lowepower.com:

That's quite odd that it works in SE mode but not FS mode!

I would suggest running with --debug-flags=Exec for both and then

perform a

diff to see how they differ.

Cheers,
Jason

On Tue, Sep 20, 2022 at 2:45 PM Νικόλαος Ταμπουρατζής <
ntampouratzis@ece.auth.gr> wrote:

Dear Bobby,

In QEMU I get the same (correct) results that I get in SE mode
simulation. I get invalid results in FS simulation (in both
riscv-fs.py and riscv-ubuntu-run.py). I cannot access real RISCV
hardware at this moment, however, if you want you may execute my

xhpcg

binary (http://kition.mhl.tuc.gr:8000/f/4ca25fdd3c/) with the
following configuration:

./xhpcg --nx=16 --ny=16 --nz=16 --npx=1 --npy=1 --npz=1 --rt=0.1

Please let me know if you have any updates!

Best regards,
Nikos

Quoting Jason Lowe-Power jason@lowepower.com:

Hi Nikos,

I notice you said the following in your original email:

In addition, I used the RISCV Ubuntu image

(

),

I installed the gcc compiler, compile it (through qemu) and I get
wrong results too.

Is this saying you get the wrong results is QEMU? If so, the bug

is in

GCC

or the HPCG workload, not in gem5. If not, I would test in QEMU to

make

sure the binary works there. Another way you could test to see if

the

problem is your binary or gem5 would be to run it on real

hardware. We

have

access to some RISC-V hardware here at UC Davis, if you don't have

access

to it.

Cheers,
Jason

On Tue, Sep 20, 2022 at 12:58 AM Νικόλαος Ταμπουρατζής <
ntampouratzis@ece.auth.gr> wrote:

Dear Bobby,

  1. I use the original riscv-fs.py which is provided in the latest

gem5

release.
I run the gem5 once (./build/RISCV/gem5.fast -d ./HPCG_FS_results
./configs/example/gem5_library/riscv-fs.py) in order to download

the

riscv-bootloader-vmlinux-5.10 and riscv-disk-img.
After this I mount the riscv-disk-img (sudo mount -o loop
riscv-disk-img /mnt), put the xhpcg executable and I do the

following

changes in riscv-fs.py to boot the riscv-disk-img with executable:

image = CustomDiskImageResource(
local_path = "/home/cossim/.cache/gem5/riscv-disk-img",
)

Set the Full System workload.

board.set_kernel_disk_workload(

kernel=Resource("riscv-bootloader-vmlinux-5.10"),

                 disk_image=image,

)

Finally, in the

gem5/src/python/gem5/components/boards/riscv_board.py

I change the last line to "return ["console=ttyS0",
"root={root_value}", "rw"]" in order to allow the write

permissions

in

the image.

  1. The HPCG benchmark after some iterations calculates if the

results

are valid or not valid. In the case of FS it gives invalid

results.

As

I see from the results, one (at least) problem is that produces
different results in each HPCG execution (with the same

configuration).

Here is the HPCG output and riscv-fs.py
(http://kition.mhl.tuc.gr:8000/d/68d82f3533/). You may reproduce

the

results in the video if you use the xhpcg executable
(http://kition.mhl.tuc.gr:8000/f/4ca25fdd3c/)

Please help me in order to solve it!

Finally, I get invalid results in the HPL benchmark in FS mode

too.

Best regards,
Nikos

Quoting Bobby Bruce bbruce@ucdavis.edu:

I'm going to need a bit more information to help:

  1. In what way have you modified
    ./configs/example/gem5_library/riscv-fs.py? Can you attach the

script

here?

  1. What error are you getting or in what way are the results

invalid?

Dr. Bobby R. Bruce
Room 3050,
Kemper Hall, UC Davis
Davis,
CA, 95616

web: https://www.bobbybruce.net

On Mon, Sep 19, 2022 at 1:43 PM Νικόλαος Ταμπουρατζής <
ntampouratzis@ece.auth.gr> wrote:

Dear gem5 community,

I have successfully cross-compile the HPCG benchmark for RISCV

(Serial

version, without MPI and OpenMP). While it working properly in

gem5

SE

mode (./build/RISCV/gem5.fast -d ./HPCG_SE_results
./configs/example/se.py -c xhpcg --options '--nx=16 --ny=16

--nz=16

--npx=1 --npy=1 --npz=1 --rt=0.1'), I get invalid results in FS
simulation using "./build/RISCV/gem5.fast -d ./HPCG_FS_results
./configs/example/gem5_library/riscv-fs.py" (I mount the riscv

image

and put it).

Can you help me please?

In addition, I used the RISCV Ubuntu image
(

),

I installed the gcc compiler, compile it (through qemu) and I

get

wrong results too.

Here is the Makefile which I use, the hpcg executable for RISCV
(xhpcg), and a video that shows the results
(http://kition.mhl.tuc.gr:8000/f/4ca25fdd3c/).

P.S. I use the latest gem5 version.

Thank you in advance! :)

Best regards,
Nikos


gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-leave@gem5.org


gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-leave@gem5.org


gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-leave@gem5.org


gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-leave@gem5.org


gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-leave@gem5.org


gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-leave@gem5.org


gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-leave@gem5.org

Hey Niko, Thanks for this analysis. I jumped a little into this today but didn't get as far as you did. I wanted to find a quick way to recreate the following: https://gem5-review.googlesource.com/c/public/gem5/+/64211. Please feel free to use this, if it helps any. It's very strange to me that this bug hasn't manifested itself before but it's undeniably there. I'll try to spend more time looking at this tomorrow with some traces and debug flags and see if I can narrow down the problem. -- Dr. Bobby R. Bruce Room 3050, Kemper Hall, UC Davis Davis, CA, 95616 web: https://www.bobbybruce.net On Wed, Oct 5, 2022 at 2:26 PM Νικόλαος Ταμπουρατζής < ntampouratzis@ece.auth.gr> wrote: > In my previous results, I had used double (not float) for the > following variables: result, sq_i and sq_j. In the case of float > instead of double I get "nan" and not 0.000000. > > Quoting Νικόλαος Ταμπουρατζής <ntampouratzis@ece.auth.gr>: > > > Dear Jason, all, > > > > I am trying to find the accuracy problem with RISCV-FS and I observe > > that the problem is created (at least in my dummy example) because > > the variables (double) are set to zero in random simulated time (for > > this reason I get different results among executions of the same > > code). Specifically for the following dummy code: > > > > > > #include <cmath> > > #include <stdio.h> > > > > int main(){ > > > > int dim = 10; > > > > float result; > > > > for (int iter = 0; iter < 2; iter++){ > > result = 0; > > for (int i = 0; i < dim; i++){ > > for (int j = 0; j < dim; j++){ > > float sq_i = sqrt(i); > > float sq_j = sqrt(j); > > result += sq_i * sq_j; > > printf("ITER: %d | i: %d | j: %d Result(i: %f | j: > > %f | i*j: %f): %f\n", iter, i , j, sq_i, sq_j, sq_i * sq_j, result); > > } > > } > > printf("Final Result: %lf\n", result); > > } > > } > > > > > > The correct Final Result in both iterations is 372.721656. However, > > I get the following results in FS: > > > > ITER: 0 | i: 0 | j: 0 Result(i: 0.000000 | j: 0.000000 | i*j: > > 0.000000): 0.000000 > > ITER: 0 | i: 0 | j: 1 Result(i: 0.000000 | j: 1.000000 | i*j: > > 0.000000): 0.000000 > > ITER: 0 | i: 0 | j: 2 Result(i: 0.000000 | j: 1.414214 | i*j: > > 0.000000): 0.000000 > > ITER: 0 | i: 0 | j: 3 Result(i: 0.000000 | j: 1.732051 | i*j: > > 0.000000): 0.000000 > > ITER: 0 | i: 0 | j: 4 Result(i: 0.000000 | j: 2.000000 | i*j: > > 0.000000): 0.000000 > > ITER: 0 | i: 0 | j: 5 Result(i: 0.000000 | j: 2.236068 | i*j: > > 0.000000): 0.000000 > > ITER: 0 | i: 0 | j: 6 Result(i: 0.000000 | j: 2.449490 | i*j: > > 0.000000): 0.000000 > > ITER: 0 | i: 0 | j: 7 Result(i: 0.000000 | j: 2.645751 | i*j: > > 0.000000): 0.000000 > > ITER: 0 | i: 0 | j: 8 Result(i: 0.000000 | j: 2.828427 | i*j: > > 0.000000): 0.000000 > > ITER: 0 | i: 0 | j: 9 Result(i: 0.000000 | j: 3.000000 | i*j: > > 0.000000): 0.000000 > > ITER: 0 | i: 1 | j: 0 Result(i: 1.000000 | j: 0.000000 | i*j: > > 0.000000): 0.000000 > > ITER: 0 | i: 1 | j: 1 Result(i: 1.000000 | j: 1.000000 | i*j: > > 1.000000): 1.000000 > > ITER: 0 | i: 1 | j: 2 Result(i: 1.000000 | j: 1.414214 | i*j: > > 1.414214): 2.414214 > > ITER: 0 | i: 1 | j: 3 Result(i: 1.000000 | j: 1.732051 | i*j: > > 1.732051): 4.146264 > > ITER: 0 | i: 1 | j: 4 Result(i: 0.000000 | j: 2.000000 | i*j: > > 0.000000): 0.000000 > > ITER: 0 | i: 1 | j: 5 Result(i: 0.000000 | j: 2.236068 | i*j: > > 0.000000): 0.000000 > > ITER: 0 | i: 1 | j: 6 Result(i: 0.000000 | j: 2.449490 | i*j: > > 0.000000): 0.000000 > > ITER: 0 | i: 1 | j: 7 Result(i: 0.000000 | j: 2.645751 | i*j: > > 0.000000): 0.000000 > > ITER: 0 | i: 1 | j: 8 Result(i: 0.000000 | j: 2.828427 | i*j: > > 0.000000): 0.000000 > > ITER: 0 | i: 1 | j: 9 Result(i: 0.000000 | j: 3.000000 | i*j: > > 0.000000): 0.000000 > > ITER: 0 | i: 2 | j: 0 Result(i: 1.414214 | j: 0.000000 | i*j: > > 0.000000): 0.000000 > > ITER: 0 | i: 2 | j: 1 Result(i: 1.414214 | j: 1.000000 | i*j: > > 1.414214): 1.414214 > > ITER: 0 | i: 2 | j: 2 Result(i: 1.414214 | j: 1.414214 | i*j: > > 2.000000): 3.414214 > > ITER: 0 | i: 2 | j: 3 Result(i: 1.414214 | j: 1.732051 | i*j: > > 2.449490): 5.863703 > > ITER: 0 | i: 2 | j: 4 Result(i: 1.414214 | j: 2.000000 | i*j: > > 2.828427): 8.692130 > > ITER: 0 | i: 2 | j: 5 Result(i: 1.414214 | j: 2.236068 | i*j: > > 3.162278): 11.854408 > > ITER: 0 | i: 2 | j: 6 Result(i: 1.414214 | j: 2.449490 | i*j: > > 3.464102): 15.318510 > > ITER: 0 | i: 2 | j: 7 Result(i: 1.414214 | j: 2.645751 | i*j: > > 3.741657): 19.060167 > > ITER: 0 | i: 2 | j: 8 Result(i: 1.414214 | j: 2.828427 | i*j: > > 4.000000): 23.060167 > > ITER: 0 | i: 2 | j: 9 Result(i: 1.414214 | j: 3.000000 | i*j: > > 4.242641): 27.302808 > > ITER: 0 | i: 3 | j: 0 Result(i: 1.732051 | j: 0.000000 | i*j: > > 0.000000): 27.302808 > > ITER: 0 | i: 3 | j: 1 Result(i: 1.732051 | j: 1.000000 | i*j: > > 1.732051): 29.034859 > > ITER: 0 | i: 3 | j: 2 Result(i: 1.732051 | j: 1.414214 | i*j: > > 2.449490): 31.484348 > > ITER: 0 | i: 3 | j: 3 Result(i: 1.732051 | j: 1.732051 | i*j: > > 3.000000): 34.484348 > > ITER: 0 | i: 3 | j: 4 Result(i: 1.732051 | j: 2.000000 | i*j: > > 3.464102): 37.948450 > > ITER: 0 | i: 3 | j: 5 Result(i: 1.732051 | j: 2.236068 | i*j: > > 3.872983): 41.821433 > > ITER: 0 | i: 3 | j: 6 Result(i: 1.732051 | j: 2.449490 | i*j: > > 4.242641): 46.064074 > > ITER: 0 | i: 3 | j: 7 Result(i: 1.732051 | j: 2.645751 | i*j: > > 4.582576): 50.646650 > > ITER: 0 | i: 3 | j: 8 Result(i: 1.732051 | j: 2.828427 | i*j: > > 4.898979): 55.545629 > > ITER: 0 | i: 3 | j: 9 Result(i: 1.732051 | j: 3.000000 | i*j: > > 5.196152): 60.741782 > > ITER: 0 | i: 4 | j: 0 Result(i: 2.000000 | j: 0.000000 | i*j: > > 0.000000): 60.741782 > > ITER: 0 | i: 4 | j: 1 Result(i: 2.000000 | j: 1.000000 | i*j: > > 2.000000): 62.741782 > > ITER: 0 | i: 4 | j: 2 Result(i: 2.000000 | j: 1.414214 | i*j: > > 2.828427): 65.570209 > > ITER: 0 | i: 4 | j: 3 Result(i: 2.000000 | j: 1.732051 | i*j: > > 3.464102): 69.034310 > > ITER: 0 | i: 4 | j: 4 Result(i: 2.000000 | j: 2.000000 | i*j: > > 4.000000): 73.034310 > > ITER: 0 | i: 4 | j: 5 Result(i: 2.000000 | j: 2.236068 | i*j: > > 4.472136): 77.506446 > > ITER: 0 | i: 4 | j: 6 Result(i: 2.000000 | j: 2.449490 | i*j: > > 4.898979): 82.405426 > > ITER: 0 | i: 4 | j: 7 Result(i: 2.000000 | j: 2.645751 | i*j: > > 5.291503): 87.696928 > > ITER: 0 | i: 4 | j: 8 Result(i: 2.000000 | j: 2.828427 | i*j: > > 5.656854): 93.353783 > > ITER: 0 | i: 4 | j: 9 Result(i: 2.000000 | j: 3.000000 | i*j: > > 6.000000): 99.353783 > > ITER: 0 | i: 5 | j: 0 Result(i: 2.236068 | j: 0.000000 | i*j: > > 0.000000): 99.353783 > > ITER: 0 | i: 5 | j: 1 Result(i: 2.236068 | j: 1.000000 | i*j: > > 2.236068): 101.589851 > > ITER: 0 | i: 5 | j: 2 Result(i: 2.236068 | j: 1.414214 | i*j: > > 3.162278): 104.752128 > > ITER: 0 | i: 5 | j: 3 Result(i: 2.236068 | j: 1.732051 | i*j: > > 3.872983): 108.625112 > > ITER: 0 | i: 5 | j: 4 Result(i: 2.236068 | j: 2.000000 | i*j: > > 4.472136): 113.097248 > > ITER: 0 | i: 5 | j: 5 Result(i: 2.236068 | j: 2.236068 | i*j: > > 5.000000): 118.097248 > > ITER: 0 | i: 5 | j: 6 Result(i: 2.236068 | j: 2.449490 | i*j: > > 5.477226): 123.574473 > > ITER: 0 | i: 5 | j: 7 Result(i: 2.236068 | j: 2.645751 | i*j: > > 5.916080): 129.490553 > > ITER: 0 | i: 5 | j: 8 Result(i: 2.236068 | j: 2.828427 | i*j: > > 6.324555): 135.815108 > > ITER: 0 | i: 5 | j: 9 Result(i: 2.236068 | j: 3.000000 | i*j: > > 6.708204): 142.523312 > > ITER: 0 | i: 6 | j: 0 Result(i: 2.449490 | j: 0.000000 | i*j: > > 0.000000): 142.523312 > > ITER: 0 | i: 6 | j: 1 Result(i: 2.449490 | j: 1.000000 | i*j: > > 2.449490): 144.972802 > > ITER: 0 | i: 6 | j: 2 Result(i: 2.449490 | j: 1.414214 | i*j: > > 3.464102): 148.436904 > > ITER: 0 | i: 6 | j: 3 Result(i: 2.449490 | j: 1.732051 | i*j: > > 4.242641): 152.679544 > > ITER: 0 | i: 6 | j: 4 Result(i: 2.449490 | j: 2.000000 | i*j: > > 4.898979): 157.578524 > > ITER: 0 | i: 6 | j: 5 Result(i: 2.449490 | j: 2.236068 | i*j: > > 5.477226): 163.055749 > > ITER: 0 | i: 6 | j: 6 Result(i: 2.449490 | j: 2.449490 | i*j: > > 6.000000): 169.055749 > > ITER: 0 | i: 6 | j: 7 Result(i: 2.449490 | j: 2.645751 | i*j: > > 6.480741): 175.536490 > > ITER: 0 | i: 6 | j: 8 Result(i: 2.449490 | j: 2.828427 | i*j: > > 6.928203): 182.464693 > > ITER: 0 | i: 6 | j: 9 Result(i: 2.449490 | j: 3.000000 | i*j: > > 7.348469): 189.813162 > > ITER: 0 | i: 7 | j: 0 Result(i: 2.645751 | j: 0.000000 | i*j: > > 0.000000): 189.813162 > > ITER: 0 | i: 7 | j: 1 Result(i: 2.645751 | j: 1.000000 | i*j: > > 2.645751): 192.458914 > > ITER: 0 | i: 7 | j: 2 Result(i: 2.645751 | j: 1.414214 | i*j: > > 3.741657): 196.200571 > > ITER: 0 | i: 7 | j: 3 Result(i: 2.645751 | j: 1.732051 | i*j: > > 4.582576): 200.783147 > > ITER: 0 | i: 7 | j: 4 Result(i: 2.645751 | j: 2.000000 | i*j: > > 5.291503): 206.074649 > > ITER: 0 | i: 7 | j: 5 Result(i: 2.645751 | j: 2.236068 | i*j: > > 5.916080): 211.990729 > > ITER: 0 | i: 7 | j: 6 Result(i: 2.645751 | j: 2.449490 | i*j: > > 6.480741): 218.471470 > > ITER: 0 | i: 7 | j: 7 Result(i: 2.645751 | j: 2.645751 | i*j: > > 7.000000): 225.471470 > > ITER: 0 | i: 7 | j: 8 Result(i: 2.645751 | j: 2.828427 | i*j: > > 7.483315): 232.954785 > > ITER: 0 | i: 7 | j: 9 Result(i: 2.645751 | j: 3.000000 | i*j: > > 7.937254): 240.892039 > > ITER: 0 | i: 8 | j: 0 Result(i: 2.828427 | j: 0.000000 | i*j: > > 0.000000): 240.892039 > > ITER: 0 | i: 8 | j: 1 Result(i: 2.828427 | j: 1.000000 | i*j: > > 2.828427): 243.720466 > > ITER: 0 | i: 8 | j: 2 Result(i: 2.828427 | j: 1.414214 | i*j: > > 4.000000): 247.720466 > > ITER: 0 | i: 8 | j: 3 Result(i: 2.828427 | j: 1.732051 | i*j: > > 4.898979): 252.619445 > > ITER: 0 | i: 8 | j: 4 Result(i: 2.828427 | j: 2.000000 | i*j: > > 5.656854): 258.276300 > > ITER: 0 | i: 8 | j: 5 Result(i: 2.828427 | j: 2.236068 | i*j: > > 6.324555): 264.600855 > > ITER: 0 | i: 8 | j: 6 Result(i: 2.828427 | j: 2.449490 | i*j: > > 6.928203): 271.529058 > > ITER: 0 | i: 8 | j: 7 Result(i: 2.828427 | j: 2.645751 | i*j: > > 7.483315): 279.012373 > > ITER: 0 | i: 8 | j: 8 Result(i: 2.828427 | j: 2.828427 | i*j: > > 8.000000): 287.012373 > > ITER: 0 | i: 8 | j: 9 Result(i: 2.828427 | j: 3.000000 | i*j: > > 8.485281): 295.497654 > > ITER: 0 | i: 9 | j: 0 Result(i: 3.000000 | j: 0.000000 | i*j: > > 0.000000): 295.497654 > > ITER: 0 | i: 9 | j: 1 Result(i: 3.000000 | j: 1.000000 | i*j: > > 3.000000): 298.497654 > > ITER: 0 | i: 9 | j: 2 Result(i: 3.000000 | j: 1.414214 | i*j: > > 4.242641): 302.740295 > > ITER: 0 | i: 9 | j: 3 Result(i: 3.000000 | j: 1.732051 | i*j: > > 5.196152): 307.936447 > > ITER: 0 | i: 9 | j: 4 Result(i: 3.000000 | j: 2.000000 | i*j: > > 6.000000): 313.936447 > > ITER: 0 | i: 9 | j: 5 Result(i: 3.000000 | j: 2.236068 | i*j: > > 6.708204): 320.644651 > > ITER: 0 | i: 9 | j: 6 Result(i: 3.000000 | j: 2.449490 | i*j: > > 7.348469): 327.993120 > > ITER: 0 | i: 9 | j: 7 Result(i: 3.000000 | j: 2.645751 | i*j: > > 7.937254): 335.930374 > > ITER: 0 | i: 9 | j: 8 Result(i: 3.000000 | j: 2.828427 | i*j: > > 8.485281): 344.415656 > > ITER: 0 | i: 9 | j: 9 Result(i: 3.000000 | j: 3.000000 | i*j: > > 9.000000): 353.415656 > > Final Result: 353.415656 > > ITER: 1 | i: 0 | j: 0 Result(i: 0.000000 | j: 0.000000 | i*j: > > 0.000000): 0.000000 > > ITER: 1 | i: 0 | j: 1 Result(i: 0.000000 | j: 1.000000 | i*j: > > 0.000000): 0.000000 > > ITER: 1 | i: 0 | j: 2 Result(i: 0.000000 | j: 1.414214 | i*j: > > 0.000000): 0.000000 > > ITER: 1 | i: 0 | j: 3 Result(i: 0.000000 | j: 1.732051 | i*j: > > 0.000000): 0.000000 > > ITER: 1 | i: 0 | j: 4 Result(i: 0.000000 | j: 2.000000 | i*j: > > 0.000000): 0.000000 > > ITER: 1 | i: 0 | j: 5 Result(i: 0.000000 | j: 2.236068 | i*j: > > 0.000000): 0.000000 > > ITER: 1 | i: 0 | j: 6 Result(i: 0.000000 | j: 2.449490 | i*j: > > 0.000000): 0.000000 > > ITER: 1 | i: 0 | j: 7 Result(i: 0.000000 | j: 2.645751 | i*j: > > 0.000000): 0.000000 > > ITER: 1 | i: 0 | j: 8 Result(i: 0.000000 | j: 2.828427 | i*j: > > 0.000000): 0.000000 > > ITER: 1 | i: 0 | j: 9 Result(i: 0.000000 | j: 3.000000 | i*j: > > 0.000000): 0.000000 > > ITER: 1 | i: 1 | j: 0 Result(i: 1.000000 | j: 0.000000 | i*j: > > 0.000000): 0.000000 > > ITER: 1 | i: 1 | j: 1 Result(i: 1.000000 | j: 1.000000 | i*j: > > 1.000000): 1.000000 > > ITER: 1 | i: 1 | j: 2 Result(i: 1.000000 | j: 1.414214 | i*j: > > 1.414214): 2.414214 > > ITER: 1 | i: 1 | j: 3 Result(i: 1.000000 | j: 1.732051 | i*j: > > 1.732051): 4.146264 > > ITER: 1 | i: 1 | j: 4 Result(i: 1.000000 | j: 2.000000 | i*j: > > 2.000000): 6.146264 > > ITER: 1 | i: 1 | j: 5 Result(i: 1.000000 | j: 2.236068 | i*j: > > 2.236068): 8.382332 > > ITER: 1 | i: 1 | j: 6 Result(i: 1.000000 | j: 2.449490 | i*j: > > 2.449490): 10.831822 > > ITER: 1 | i: 1 | j: 7 Result(i: 1.000000 | j: 2.645751 | i*j: > > 2.645751): 13.477573 > > ITER: 1 | i: 1 | j: 8 Result(i: 1.000000 | j: 2.828427 | i*j: > > 2.828427): 16.306001 > > ITER: 1 | i: 1 | j: 9 Result(i: 1.000000 | j: 3.000000 | i*j: > > 3.000000): 19.306001 > > ITER: 1 | i: 2 | j: 0 Result(i: 1.414214 | j: 0.000000 | i*j: > > 0.000000): 19.306001 > > ITER: 1 | i: 2 | j: 1 Result(i: 1.414214 | j: 1.000000 | i*j: > > 1.414214): 20.720214 > > ITER: 1 | i: 2 | j: 2 Result(i: 1.414214 | j: 1.414214 | i*j: > > 2.000000): 22.720214 > > ITER: 1 | i: 2 | j: 3 Result(i: 1.414214 | j: 1.732051 | i*j: > > 2.449490): 25.169704 > > ITER: 1 | i: 2 | j: 4 Result(i: 1.414214 | j: 2.000000 | i*j: > > 2.828427): 27.998131 > > ITER: 1 | i: 2 | j: 5 Result(i: 1.414214 | j: 2.236068 | i*j: > > 3.162278): 31.160409 > > ITER: 1 | i: 2 | j: 6 Result(i: 1.414214 | j: 2.449490 | i*j: > > 3.464102): 34.624510 > > ITER: 1 | i: 2 | j: 7 Result(i: 1.414214 | j: 2.645751 | i*j: > > 3.741657): 38.366168 > > ITER: 1 | i: 2 | j: 8 Result(i: 1.414214 | j: 2.828427 | i*j: > > 4.000000): 42.366168 > > ITER: 1 | i: 2 | j: 9 Result(i: 1.414214 | j: 3.000000 | i*j: > > 4.242641): 46.608808 > > ITER: 1 | i: 3 | j: 0 Result(i: 1.732051 | j: 0.000000 | i*j: > > 0.000000): 46.608808 > > ITER: 1 | i: 3 | j: 1 Result(i: 1.732051 | j: 1.000000 | i*j: > > 1.732051): 48.340859 > > ITER: 1 | i: 3 | j: 2 Result(i: 1.732051 | j: 1.414214 | i*j: > > 2.449490): 50.790349 > > ITER: 1 | i: 3 | j: 3 Result(i: 1.732051 | j: 1.732051 | i*j: > > 3.000000): 53.790349 > > ITER: 1 | i: 3 | j: 4 Result(i: 1.732051 | j: 2.000000 | i*j: > > 3.464102): 57.254450 > > ITER: 1 | i: 3 | j: 5 Result(i: 1.732051 | j: 2.236068 | i*j: > > 3.872983): 61.127434 > > ITER: 1 | i: 3 | j: 6 Result(i: 1.732051 | j: 2.449490 | i*j: > > 4.242641): 65.370075 > > ITER: 1 | i: 3 | j: 7 Result(i: 1.732051 | j: 2.645751 | i*j: > > 4.582576): 69.952650 > > ITER: 1 | i: 3 | j: 8 Result(i: 1.732051 | j: 2.828427 | i*j: > > 4.898979): 74.851630 > > ITER: 1 | i: 3 | j: 9 Result(i: 1.732051 | j: 3.000000 | i*j: > > 5.196152): 80.047782 > > ITER: 1 | i: 4 | j: 0 Result(i: 2.000000 | j: 0.000000 | i*j: > > 0.000000): 80.047782 > > ITER: 1 | i: 4 | j: 1 Result(i: 2.000000 | j: 1.000000 | i*j: > > 2.000000): 82.047782 > > ITER: 1 | i: 4 | j: 2 Result(i: 2.000000 | j: 1.414214 | i*j: > > 2.828427): 84.876209 > > ITER: 1 | i: 4 | j: 3 Result(i: 2.000000 | j: 1.732051 | i*j: > > 3.464102): 88.340311 > > ITER: 1 | i: 4 | j: 4 Result(i: 2.000000 | j: 2.000000 | i*j: > > 4.000000): 92.340311 > > ITER: 1 | i: 4 | j: 5 Result(i: 2.000000 | j: 2.236068 | i*j: > > 4.472136): 96.812447 > > ITER: 1 | i: 4 | j: 6 Result(i: 2.000000 | j: 2.449490 | i*j: > > 4.898979): 101.711426 > > ITER: 1 | i: 4 | j: 7 Result(i: 2.000000 | j: 2.645751 | i*j: > > 5.291503): 107.002929 > > ITER: 1 | i: 4 | j: 8 Result(i: 2.000000 | j: 2.828427 | i*j: > > 5.656854): 112.659783 > > ITER: 1 | i: 4 | j: 9 Result(i: 2.000000 | j: 3.000000 | i*j: > > 6.000000): 118.659783 > > ITER: 1 | i: 5 | j: 0 Result(i: 2.236068 | j: 0.000000 | i*j: > > 0.000000): 118.659783 > > ITER: 1 | i: 5 | j: 1 Result(i: 2.236068 | j: 1.000000 | i*j: > > 2.236068): 120.895851 > > ITER: 1 | i: 5 | j: 2 Result(i: 2.236068 | j: 1.414214 | i*j: > > 3.162278): 124.058129 > > ITER: 1 | i: 5 | j: 3 Result(i: 2.236068 | j: 1.732051 | i*j: > > 3.872983): 127.931112 > > ITER: 1 | i: 5 | j: 4 Result(i: 2.236068 | j: 2.000000 | i*j: > > 4.472136): 132.403248 > > ITER: 1 | i: 5 | j: 5 Result(i: 2.236068 | j: 2.236068 | i*j: > > 5.000000): 137.403248 > > ITER: 1 | i: 5 | j: 6 Result(i: 2.236068 | j: 2.449490 | i*j: > > 5.477226): 142.880474 > > ITER: 1 | i: 5 | j: 7 Result(i: 2.236068 | j: 2.645751 | i*j: > > 5.916080): 148.796553 > > ITER: 1 | i: 5 | j: 8 Result(i: 2.236068 | j: 2.828427 | i*j: > > 6.324555): 155.121109 > > ITER: 1 | i: 5 | j: 9 Result(i: 2.236068 | j: 3.000000 | i*j: > > 6.708204): 161.829313 > > ITER: 1 | i: 6 | j: 0 Result(i: 2.449490 | j: 0.000000 | i*j: > > 0.000000): 161.829313 > > ITER: 1 | i: 6 | j: 1 Result(i: 2.449490 | j: 1.000000 | i*j: > > 2.449490): 164.278802 > > ITER: 1 | i: 6 | j: 2 Result(i: 2.449490 | j: 1.414214 | i*j: > > 3.464102): 167.742904 > > ITER: 1 | i: 6 | j: 3 Result(i: 2.449490 | j: 1.732051 | i*j: > > 4.242641): 171.985545 > > ITER: 1 | i: 6 | j: 4 Result(i: 2.449490 | j: 2.000000 | i*j: > > 4.898979): 176.884524 > > ITER: 1 | i: 6 | j: 5 Result(i: 2.449490 | j: 2.236068 | i*j: > > 5.477226): 182.361750 > > ITER: 1 | i: 6 | j: 6 Result(i: 2.449490 | j: 2.449490 | i*j: > > 6.000000): 188.361750 > > ITER: 1 | i: 6 | j: 7 Result(i: 2.449490 | j: 2.645751 | i*j: > > 6.480741): 194.842491 > > ITER: 1 | i: 6 | j: 8 Result(i: 2.449490 | j: 2.828427 | i*j: > > 6.928203): 201.770694 > > ITER: 1 | i: 6 | j: 9 Result(i: 2.449490 | j: 3.000000 | i*j: > > 7.348469): 209.119163 > > ITER: 1 | i: 7 | j: 0 Result(i: 2.645751 | j: 0.000000 | i*j: > > 0.000000): 209.119163 > > ITER: 1 | i: 7 | j: 1 Result(i: 2.645751 | j: 1.000000 | i*j: > > 2.645751): 211.764914 > > ITER: 1 | i: 7 | j: 2 Result(i: 2.645751 | j: 1.414214 | i*j: > > 3.741657): 215.506572 > > ITER: 1 | i: 7 | j: 3 Result(i: 2.645751 | j: 1.732051 | i*j: > > 4.582576): 220.089147 > > ITER: 1 | i: 7 | j: 4 Result(i: 2.645751 | j: 2.000000 | i*j: > > 5.291503): 225.380650 > > ITER: 1 | i: 7 | j: 5 Result(i: 2.645751 | j: 2.236068 | i*j: > > 5.916080): 231.296730 > > ITER: 1 | i: 7 | j: 6 Result(i: 2.645751 | j: 2.449490 | i*j: > > 6.480741): 237.777470 > > ITER: 1 | i: 7 | j: 7 Result(i: 2.645751 | j: 2.645751 | i*j: > > 7.000000): 244.777470 > > ITER: 1 | i: 7 | j: 8 Result(i: 2.645751 | j: 2.828427 | i*j: > > 7.483315): 252.260785 > > ITER: 1 | i: 7 | j: 9 Result(i: 2.645751 | j: 3.000000 | i*j: > > 7.937254): 260.198039 > > ITER: 1 | i: 8 | j: 0 Result(i: 2.828427 | j: 0.000000 | i*j: > > 0.000000): 260.198039 > > ITER: 1 | i: 8 | j: 1 Result(i: 2.828427 | j: 1.000000 | i*j: > > 2.828427): 263.026466 > > ITER: 1 | i: 8 | j: 2 Result(i: 2.828427 | j: 1.414214 | i*j: > > 4.000000): 267.026466 > > ITER: 1 | i: 8 | j: 3 Result(i: 2.828427 | j: 1.732051 | i*j: > > 4.898979): 271.925446 > > ITER: 1 | i: 8 | j: 4 Result(i: 2.828427 | j: 2.000000 | i*j: > > 5.656854): 277.582300 > > ITER: 1 | i: 8 | j: 5 Result(i: 2.828427 | j: 2.236068 | i*j: > > 6.324555): 283.906855 > > ITER: 1 | i: 8 | j: 6 Result(i: 2.828427 | j: 2.449490 | i*j: > > 6.928203): 290.835059 > > ITER: 1 | i: 8 | j: 7 Result(i: 2.828427 | j: 2.645751 | i*j: > > 7.483315): 298.318373 > > ITER: 1 | i: 8 | j: 8 Result(i: 2.828427 | j: 2.828427 | i*j: > > 8.000000): 306.318373 > > ITER: 1 | i: 8 | j: 9 Result(i: 2.828427 | j: 3.000000 | i*j: > > 8.485281): 314.803655 > > ITER: 1 | i: 9 | j: 0 Result(i: 3.000000 | j: 0.000000 | i*j: > > 0.000000): 314.803655 > > ITER: 1 | i: 9 | j: 1 Result(i: 3.000000 | j: 1.000000 | i*j: > > 3.000000): 317.803655 > > ITER: 1 | i: 9 | j: 2 Result(i: 3.000000 | j: 1.414214 | i*j: > > 4.242641): 322.046295 > > ITER: 1 | i: 9 | j: 3 Result(i: 3.000000 | j: 1.732051 | i*j: > > 5.196152): 327.242448 > > ITER: 1 | i: 9 | j: 4 Result(i: 3.000000 | j: 2.000000 | i*j: > > 6.000000): 333.242448 > > ITER: 1 | i: 9 | j: 5 Result(i: 3.000000 | j: 2.236068 | i*j: > > 6.708204): 339.950652 > > ITER: 1 | i: 9 | j: 6 Result(i: 3.000000 | j: 2.449490 | i*j: > > 7.348469): 347.299121 > > ITER: 1 | i: 9 | j: 7 Result(i: 3.000000 | j: 2.645751 | i*j: > > 7.937254): 355.236375 > > ITER: 1 | i: 9 | j: 8 Result(i: 3.000000 | j: 2.828427 | i*j: > > 8.485281): 363.721656 > > ITER: 1 | i: 9 | j: 9 Result(i: 3.000000 | j: 3.000000 | i*j: > > 9.000000): 372.721656 > > Final Result: 372.721656 > > > > > > > > As we can see in the following iterations the sqrt(1) as well as the > > result is set to zero for some reason. > > > > ITER: 0 | i: 1 | j: 4 Result(i: 0.000000 | j: 2.000000 | i*j: > > 0.000000): 0.000000 > > ITER: 0 | i: 1 | j: 5 Result(i: 0.000000 | j: 2.236068 | i*j: > > 0.000000): 0.000000 > > ITER: 0 | i: 1 | j: 6 Result(i: 0.000000 | j: 2.449490 | i*j: > > 0.000000): 0.000000 > > ITER: 0 | i: 1 | j: 7 Result(i: 0.000000 | j: 2.645751 | i*j: > > 0.000000): 0.000000 > > ITER: 0 | i: 1 | j: 8 Result(i: 0.000000 | j: 2.828427 | i*j: > > 0.000000): 0.000000 > > ITER: 0 | i: 1 | j: 9 Result(i: 0.000000 | j: 3.000000 | i*j: > > 0.000000): 0.000000 > > > > Please help me to resolve the accuracy issue! I think that it will > > be very useful for gem5 community. > > > > To be noticed, I find the correct simulated tick in which the > > application started in FS (using m5 dumpstats), and I start the > > --debug-start, but the trace file which is generated is 10x larger > > than SE mode for the same application. How can I compare them? > > > > Thank you in advance! > > Best regards, > > Nikos > > > > Quoting Νικόλαος Ταμπουρατζής <ntampouratzis@ece.auth.gr>: > > > >> Dear Jason, > >> > >> I am trying to use --debug-start but in FS mode it is very > >> difficult to find the tick on which the application is started! > >> > >> However, I am writing the following very simple c++ program: > >> > >> #include <cmath> > >> #include <stdio.h> > >> > >> int main(){ > >> > >> int dim = 4096; > >> > >> double result; > >> > >> for (int iter = 0; iter < 2; iter++){ > >> result = 0; > >> for (int i = 0; i < dim; i++){ > >> for (int j = 0; j < dim; j++){ > >> result += sqrt(i) * sqrt(j); > >> } > >> } > >> printf("Result: %lf\n", result); //Result: 30530733453.127449 > >> } > >> } > >> > >> I cross-compile it using: riscv64-linux-gnu-g++ -static -O3 -o > >> test_riscv test_riscv.cpp > >> > >> > >> While in X86 (without cross-compilation of course), QEMU-RISCV, > >> GEM5-SE the result is the same (30530733453.127449), in GEM5-FS the > >> result is different! In addition, the result is also different > >> between the 2 iterations. > >> > >> Please reproduce the error if you want in order to verify my result. > >> Ηow can the issue be resolved? > >> > >> Thank you in advance! > >> > >> Best regards, > >> Nikos > >> > >> > >> Quoting Jason Lowe-Power <jason@lowepower.com>: > >> > >>> Hi Nikos, > >>> > >>> You can use --debug-start to start the debugging after some number of > >>> ticks. Also, I would expect that the difference should come up > quickly, so > >>> no need to run the program to the end. > >>> > >>> For the FS mode one, you will want to just start the trace as the > >>> application starts. This could be a bit of a pain. > >>> > >>> I'm not really sure what fundamentally could be different. FS and SE > mode > >>> use the exact same code for executing instructions, so I don't think > that's > >>> the problem. Have you tried running for smaller inputs or just one > >>> iteration? > >>> > >>> Jason > >>> > >>> > >>> > >>> On Wed, Sep 21, 2022 at 9:04 AM Νικόλαος Ταμπουρατζής < > >>> ntampouratzis@ece.auth.gr> wrote: > >>> > >>>> Dear Bobby, > >>>> > >>>> Iam trying to add --debug-flags=Exec (building the gem5 for gem5.opt > >>>> not for gem5.fast which I had) but the debug traces exceed the 20GB > >>>> (and it is not finished yet) for less than 1 simulated second. How can > >>>> I reduce the size of the debug-flags (or set something more specific)? > >>>> > >>>> In contrast I build the HPCG benchmark with DHPCG_DEBUG flag. If you > >>>> want, you can compare these two output files > >>>> (hpcg20010909T014640_SE_Mode & HPCG-Benchmark_3.1_FS_Mode). As you can > >>>> see, something goes wrong with the accuracy of calculations in FS mode > >>>> (benchmark uses double precission). You can find the files here: > >>>> http://kition.mhl.tuc.gr:8000/d/68d82f3533/ > >>>> > >>>> Best regards, > >>>> Nikos > >>>> > >>>> Quoting Jason Lowe-Power <jason@lowepower.com>: > >>>> > >>>>> That's quite odd that it works in SE mode but not FS mode! > >>>>> > >>>>> I would suggest running with --debug-flags=Exec for both and then > >>>> perform a > >>>>> diff to see how they differ. > >>>>> > >>>>> Cheers, > >>>>> Jason > >>>>> > >>>>> On Tue, Sep 20, 2022 at 2:45 PM Νικόλαος Ταμπουρατζής < > >>>>> ntampouratzis@ece.auth.gr> wrote: > >>>>> > >>>>>> Dear Bobby, > >>>>>> > >>>>>> In QEMU I get the same (correct) results that I get in SE mode > >>>>>> simulation. I get invalid results in FS simulation (in both > >>>>>> riscv-fs.py and riscv-ubuntu-run.py). I cannot access real RISCV > >>>>>> hardware at this moment, however, if you want you may execute my > xhpcg > >>>>>> binary (http://kition.mhl.tuc.gr:8000/f/4ca25fdd3c/) with the > >>>>>> following configuration: > >>>>>> > >>>>>> ./xhpcg --nx=16 --ny=16 --nz=16 --npx=1 --npy=1 --npz=1 --rt=0.1 > >>>>>> > >>>>>> Please let me know if you have any updates! > >>>>>> > >>>>>> Best regards, > >>>>>> Nikos > >>>>>> > >>>>>> > >>>>>> Quoting Jason Lowe-Power <jason@lowepower.com>: > >>>>>> > >>>>>>> Hi Nikos, > >>>>>>> > >>>>>>> I notice you said the following in your original email: > >>>>>>> > >>>>>>> In addition, I used the RISCV Ubuntu image > >>>>>>>> ( > https://github.com/gem5/gem5-resources/tree/stable/src/riscv-ubuntu > >>>> ), > >>>>>>>> I installed the gcc compiler, compile it (through qemu) and I get > >>>>>>>> wrong results too. > >>>>>>> > >>>>>>> > >>>>>>> Is this saying you get the wrong results is QEMU? If so, the bug > is in > >>>>>> GCC > >>>>>>> or the HPCG workload, not in gem5. If not, I would test in QEMU to > >>>> make > >>>>>>> sure the binary works there. Another way you could test to see if > the > >>>>>>> problem is your binary or gem5 would be to run it on real > hardware. We > >>>>>> have > >>>>>>> access to some RISC-V hardware here at UC Davis, if you don't have > >>>> access > >>>>>>> to it. > >>>>>>> > >>>>>>> Cheers, > >>>>>>> Jason > >>>>>>> > >>>>>>> On Tue, Sep 20, 2022 at 12:58 AM Νικόλαος Ταμπουρατζής < > >>>>>>> ntampouratzis@ece.auth.gr> wrote: > >>>>>>> > >>>>>>>> Dear Bobby, > >>>>>>>> > >>>>>>>> 1) I use the original riscv-fs.py which is provided in the latest > >>>> gem5 > >>>>>>>> release. > >>>>>>>> I run the gem5 once (./build/RISCV/gem5.fast -d ./HPCG_FS_results > >>>>>>>> ./configs/example/gem5_library/riscv-fs.py) in order to download > the > >>>>>>>> riscv-bootloader-vmlinux-5.10 and riscv-disk-img. > >>>>>>>> After this I mount the riscv-disk-img (sudo mount -o loop > >>>>>>>> riscv-disk-img /mnt), put the xhpcg executable and I do the > following > >>>>>>>> changes in riscv-fs.py to boot the riscv-disk-img with executable: > >>>>>>>> > >>>>>>>> image = CustomDiskImageResource( > >>>>>>>> local_path = "/home/cossim/.cache/gem5/riscv-disk-img", > >>>>>>>> ) > >>>>>>>> > >>>>>>>> # Set the Full System workload. > >>>>>>>> board.set_kernel_disk_workload( > >>>>>>>> > kernel=Resource("riscv-bootloader-vmlinux-5.10"), > >>>>>>>> disk_image=image, > >>>>>>>> ) > >>>>>>>> > >>>>>>>> Finally, in the > gem5/src/python/gem5/components/boards/riscv_board.py > >>>>>>>> I change the last line to "return ["console=ttyS0", > >>>>>>>> "root={root_value}", "rw"]" in order to allow the write > permissions > >>>> in > >>>>>>>> the image. > >>>>>>>> > >>>>>>>> > >>>>>>>> 2) The HPCG benchmark after some iterations calculates if the > results > >>>>>>>> are valid or not valid. In the case of FS it gives invalid > results. > >>>> As > >>>>>>>> I see from the results, one (at least) problem is that produces > >>>>>>>> different results in each HPCG execution (with the same > >>>> configuration). > >>>>>>>> > >>>>>>>> Here is the HPCG output and riscv-fs.py > >>>>>>>> (http://kition.mhl.tuc.gr:8000/d/68d82f3533/). You may reproduce > the > >>>>>>>> results in the video if you use the xhpcg executable > >>>>>>>> (http://kition.mhl.tuc.gr:8000/f/4ca25fdd3c/) > >>>>>>>> > >>>>>>>> Please help me in order to solve it! > >>>>>>>> > >>>>>>>> Finally, I get invalid results in the HPL benchmark in FS mode > too. > >>>>>>>> > >>>>>>>> Best regards, > >>>>>>>> Nikos > >>>>>>>> > >>>>>>>> > >>>>>>>> Quoting Bobby Bruce <bbruce@ucdavis.edu>: > >>>>>>>> > >>>>>>>> > I'm going to need a bit more information to help: > >>>>>>>> > > >>>>>>>> > 1. In what way have you modified > >>>>>>>> > ./configs/example/gem5_library/riscv-fs.py? Can you attach the > >>>> script > >>>>>>>> here? > >>>>>>>> > 2. What error are you getting or in what way are the results > >>>> invalid? > >>>>>>>> > > >>>>>>>> > - > >>>>>>>> > Dr. Bobby R. Bruce > >>>>>>>> > Room 3050, > >>>>>>>> > Kemper Hall, UC Davis > >>>>>>>> > Davis, > >>>>>>>> > CA, 95616 > >>>>>>>> > > >>>>>>>> > web: https://www.bobbybruce.net > >>>>>>>> > > >>>>>>>> > > >>>>>>>> > On Mon, Sep 19, 2022 at 1:43 PM Νικόλαος Ταμπουρατζής < > >>>>>>>> > ntampouratzis@ece.auth.gr> wrote: > >>>>>>>> > > >>>>>>>> >> > >>>>>>>> >> Dear gem5 community, > >>>>>>>> >> > >>>>>>>> >> I have successfully cross-compile the HPCG benchmark for RISCV > >>>>>> (Serial > >>>>>>>> >> version, without MPI and OpenMP). While it working properly in > >>>> gem5 > >>>>>> SE > >>>>>>>> >> mode (./build/RISCV/gem5.fast -d ./HPCG_SE_results > >>>>>>>> >> ./configs/example/se.py -c xhpcg --options '--nx=16 --ny=16 > >>>> --nz=16 > >>>>>>>> >> --npx=1 --npy=1 --npz=1 --rt=0.1'), I get invalid results in FS > >>>>>>>> >> simulation using "./build/RISCV/gem5.fast -d ./HPCG_FS_results > >>>>>>>> >> ./configs/example/gem5_library/riscv-fs.py" (I mount the riscv > >>>> image > >>>>>>>> >> and put it). > >>>>>>>> >> > >>>>>>>> >> Can you help me please? > >>>>>>>> >> > >>>>>>>> >> In addition, I used the RISCV Ubuntu image > >>>>>>>> >> ( > >>>> https://github.com/gem5/gem5-resources/tree/stable/src/riscv-ubuntu > >>>>>> ), > >>>>>>>> >> I installed the gcc compiler, compile it (through qemu) and I > get > >>>>>>>> >> wrong results too. > >>>>>>>> >> > >>>>>>>> >> Here is the Makefile which I use, the hpcg executable for RISCV > >>>>>>>> >> (xhpcg), and a video that shows the results > >>>>>>>> >> (http://kition.mhl.tuc.gr:8000/f/4ca25fdd3c/). > >>>>>>>> >> > >>>>>>>> >> P.S. I use the latest gem5 version. > >>>>>>>> >> > >>>>>>>> >> Thank you in advance! :) > >>>>>>>> >> > >>>>>>>> >> Best regards, > >>>>>>>> >> Nikos > >>>>>>>> >> _______________________________________________ > >>>>>>>> >> gem5-users mailing list -- gem5-users@gem5.org > >>>>>>>> >> To unsubscribe send an email to gem5-users-leave@gem5.org > >>>>>>>> >> > >>>>>>>> > >>>>>>>> > >>>>>>>> _______________________________________________ > >>>>>>>> gem5-users mailing list -- gem5-users@gem5.org > >>>>>>>> To unsubscribe send an email to gem5-users-leave@gem5.org > >>>>>>>> > >>>>>> > >>>>>> > >>>>>> _______________________________________________ > >>>>>> gem5-users mailing list -- gem5-users@gem5.org > >>>>>> To unsubscribe send an email to gem5-users-leave@gem5.org > >>>>>> > >>>> > >>>> > >>>> _______________________________________________ > >>>> gem5-users mailing list -- gem5-users@gem5.org > >>>> To unsubscribe send an email to gem5-users-leave@gem5.org > >>>> > >> > >> > >> _______________________________________________ > >> gem5-users mailing list -- gem5-users@gem5.org > >> To unsubscribe send an email to gem5-users-leave@gem5.org > > > > > > _______________________________________________ > > gem5-users mailing list -- gem5-users@gem5.org > > To unsubscribe send an email to gem5-users-leave@gem5.org > > > _______________________________________________ > gem5-users mailing list -- gem5-users@gem5.org > To unsubscribe send an email to gem5-users-leave@gem5.org >
Νικόλαος Ταμπουρατζής
Fri, Oct 7, 2022 7:11 AM

Dear Boddy,

Thanks a lot for the effort! I looked in detail and I observe that the
problem is created only using float and double variables (in the case
of int it is working properly in FS mode). Specifically, in the case
of float the variables are set to "nan", while in the case of double
the variables are set to 0.000000 (in random time - probably from some
instruction of simulated OS?). You may use a simple c/c++ example in
order to get some traces before going to HPCG...

Thank you in advance!!
Best regards,
Nikos

Quoting Bobby Bruce bbruce@ucdavis.edu:

Hey Niko,

Thanks for this analysis. I jumped a little into this today but didn't get
as far as you did. I wanted to find a quick way to recreate the following:
https://gem5-review.googlesource.com/c/public/gem5/+/64211.  Please feel
free to use this, if it helps any.

It's very strange to me that this bug hasn't manifested itself before but
it's undeniably there. I'll try to spend more time looking at this tomorrow
with some traces and debug flags and see if I can narrow down the problem.

--
Dr. Bobby R. Bruce
Room 3050,
Kemper Hall, UC Davis
Davis,
CA, 95616

web: https://www.bobbybruce.net

On Wed, Oct 5, 2022 at 2:26 PM Νικόλαος Ταμπουρατζής <
ntampouratzis@ece.auth.gr> wrote:

In my previous results, I had used double (not float) for the
following variables: result, sq_i and sq_j. In the case of float
instead of double I get "nan" and not 0.000000.

Quoting Νικόλαος Ταμπουρατζής ntampouratzis@ece.auth.gr:

Dear Jason, all,

I am trying to find the accuracy problem with RISCV-FS and I observe
that the problem is created (at least in my dummy example) because
the variables (double) are set to zero in random simulated time (for
this reason I get different results among executions of the same
code). Specifically for the following dummy code:

#include <cmath>
#include <stdio.h>

int main(){

 int dim = 10;

 float result;

 for (int iter = 0; iter < 2; iter++){
     result = 0;
     for (int i = 0; i < dim; i++){
         for (int j = 0; j < dim; j++){
             float sq_i = sqrt(i);
             float sq_j = sqrt(j);
             result += sq_i * sq_j;
             printf("ITER: %d | i: %d | j: %d Result(i: %f | j:

%f | i*j: %f): %f\n", iter, i , j, sq_i, sq_j, sq_i * sq_j, result);
}
}
printf("Final Result: %lf\n", result);
}
}

The correct Final Result in both iterations is 372.721656. However,
I get the following results in FS:

ITER: 0 | i: 0 | j: 0 Result(i: 0.000000 | j: 0.000000 | ij:
0.000000): 0.000000
ITER: 0 | i: 0 | j: 1 Result(i: 0.000000 | j: 1.000000 | i
j:
0.000000): 0.000000
ITER: 0 | i: 0 | j: 2 Result(i: 0.000000 | j: 1.414214 | ij:
0.000000): 0.000000
ITER: 0 | i: 0 | j: 3 Result(i: 0.000000 | j: 1.732051 | i
j:
0.000000): 0.000000
ITER: 0 | i: 0 | j: 4 Result(i: 0.000000 | j: 2.000000 | ij:
0.000000): 0.000000
ITER: 0 | i: 0 | j: 5 Result(i: 0.000000 | j: 2.236068 | i
j:
0.000000): 0.000000
ITER: 0 | i: 0 | j: 6 Result(i: 0.000000 | j: 2.449490 | ij:
0.000000): 0.000000
ITER: 0 | i: 0 | j: 7 Result(i: 0.000000 | j: 2.645751 | i
j:
0.000000): 0.000000
ITER: 0 | i: 0 | j: 8 Result(i: 0.000000 | j: 2.828427 | ij:
0.000000): 0.000000
ITER: 0 | i: 0 | j: 9 Result(i: 0.000000 | j: 3.000000 | i
j:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 0 Result(i: 1.000000 | j: 0.000000 | ij:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 1 Result(i: 1.000000 | j: 1.000000 | i
j:
1.000000): 1.000000
ITER: 0 | i: 1 | j: 2 Result(i: 1.000000 | j: 1.414214 | ij:
1.414214): 2.414214
ITER: 0 | i: 1 | j: 3 Result(i: 1.000000 | j: 1.732051 | i
j:
1.732051): 4.146264
ITER: 0 | i: 1 | j: 4 Result(i: 0.000000 | j: 2.000000 | ij:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 5 Result(i: 0.000000 | j: 2.236068 | i
j:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 6 Result(i: 0.000000 | j: 2.449490 | ij:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 7 Result(i: 0.000000 | j: 2.645751 | i
j:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 8 Result(i: 0.000000 | j: 2.828427 | ij:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 9 Result(i: 0.000000 | j: 3.000000 | i
j:
0.000000): 0.000000
ITER: 0 | i: 2 | j: 0 Result(i: 1.414214 | j: 0.000000 | ij:
0.000000): 0.000000
ITER: 0 | i: 2 | j: 1 Result(i: 1.414214 | j: 1.000000 | i
j:
1.414214): 1.414214
ITER: 0 | i: 2 | j: 2 Result(i: 1.414214 | j: 1.414214 | ij:
2.000000): 3.414214
ITER: 0 | i: 2 | j: 3 Result(i: 1.414214 | j: 1.732051 | i
j:
2.449490): 5.863703
ITER: 0 | i: 2 | j: 4 Result(i: 1.414214 | j: 2.000000 | ij:
2.828427): 8.692130
ITER: 0 | i: 2 | j: 5 Result(i: 1.414214 | j: 2.236068 | i
j:
3.162278): 11.854408
ITER: 0 | i: 2 | j: 6 Result(i: 1.414214 | j: 2.449490 | ij:
3.464102): 15.318510
ITER: 0 | i: 2 | j: 7 Result(i: 1.414214 | j: 2.645751 | i
j:
3.741657): 19.060167
ITER: 0 | i: 2 | j: 8 Result(i: 1.414214 | j: 2.828427 | ij:
4.000000): 23.060167
ITER: 0 | i: 2 | j: 9 Result(i: 1.414214 | j: 3.000000 | i
j:
4.242641): 27.302808
ITER: 0 | i: 3 | j: 0 Result(i: 1.732051 | j: 0.000000 | ij:
0.000000): 27.302808
ITER: 0 | i: 3 | j: 1 Result(i: 1.732051 | j: 1.000000 | i
j:
1.732051): 29.034859
ITER: 0 | i: 3 | j: 2 Result(i: 1.732051 | j: 1.414214 | ij:
2.449490): 31.484348
ITER: 0 | i: 3 | j: 3 Result(i: 1.732051 | j: 1.732051 | i
j:
3.000000): 34.484348
ITER: 0 | i: 3 | j: 4 Result(i: 1.732051 | j: 2.000000 | ij:
3.464102): 37.948450
ITER: 0 | i: 3 | j: 5 Result(i: 1.732051 | j: 2.236068 | i
j:
3.872983): 41.821433
ITER: 0 | i: 3 | j: 6 Result(i: 1.732051 | j: 2.449490 | ij:
4.242641): 46.064074
ITER: 0 | i: 3 | j: 7 Result(i: 1.732051 | j: 2.645751 | i
j:
4.582576): 50.646650
ITER: 0 | i: 3 | j: 8 Result(i: 1.732051 | j: 2.828427 | ij:
4.898979): 55.545629
ITER: 0 | i: 3 | j: 9 Result(i: 1.732051 | j: 3.000000 | i
j:
5.196152): 60.741782
ITER: 0 | i: 4 | j: 0 Result(i: 2.000000 | j: 0.000000 | ij:
0.000000): 60.741782
ITER: 0 | i: 4 | j: 1 Result(i: 2.000000 | j: 1.000000 | i
j:
2.000000): 62.741782
ITER: 0 | i: 4 | j: 2 Result(i: 2.000000 | j: 1.414214 | ij:
2.828427): 65.570209
ITER: 0 | i: 4 | j: 3 Result(i: 2.000000 | j: 1.732051 | i
j:
3.464102): 69.034310
ITER: 0 | i: 4 | j: 4 Result(i: 2.000000 | j: 2.000000 | ij:
4.000000): 73.034310
ITER: 0 | i: 4 | j: 5 Result(i: 2.000000 | j: 2.236068 | i
j:
4.472136): 77.506446
ITER: 0 | i: 4 | j: 6 Result(i: 2.000000 | j: 2.449490 | ij:
4.898979): 82.405426
ITER: 0 | i: 4 | j: 7 Result(i: 2.000000 | j: 2.645751 | i
j:
5.291503): 87.696928
ITER: 0 | i: 4 | j: 8 Result(i: 2.000000 | j: 2.828427 | ij:
5.656854): 93.353783
ITER: 0 | i: 4 | j: 9 Result(i: 2.000000 | j: 3.000000 | i
j:
6.000000): 99.353783
ITER: 0 | i: 5 | j: 0 Result(i: 2.236068 | j: 0.000000 | ij:
0.000000): 99.353783
ITER: 0 | i: 5 | j: 1 Result(i: 2.236068 | j: 1.000000 | i
j:
2.236068): 101.589851
ITER: 0 | i: 5 | j: 2 Result(i: 2.236068 | j: 1.414214 | ij:
3.162278): 104.752128
ITER: 0 | i: 5 | j: 3 Result(i: 2.236068 | j: 1.732051 | i
j:
3.872983): 108.625112
ITER: 0 | i: 5 | j: 4 Result(i: 2.236068 | j: 2.000000 | ij:
4.472136): 113.097248
ITER: 0 | i: 5 | j: 5 Result(i: 2.236068 | j: 2.236068 | i
j:
5.000000): 118.097248
ITER: 0 | i: 5 | j: 6 Result(i: 2.236068 | j: 2.449490 | ij:
5.477226): 123.574473
ITER: 0 | i: 5 | j: 7 Result(i: 2.236068 | j: 2.645751 | i
j:
5.916080): 129.490553
ITER: 0 | i: 5 | j: 8 Result(i: 2.236068 | j: 2.828427 | ij:
6.324555): 135.815108
ITER: 0 | i: 5 | j: 9 Result(i: 2.236068 | j: 3.000000 | i
j:
6.708204): 142.523312
ITER: 0 | i: 6 | j: 0 Result(i: 2.449490 | j: 0.000000 | ij:
0.000000): 142.523312
ITER: 0 | i: 6 | j: 1 Result(i: 2.449490 | j: 1.000000 | i
j:
2.449490): 144.972802
ITER: 0 | i: 6 | j: 2 Result(i: 2.449490 | j: 1.414214 | ij:
3.464102): 148.436904
ITER: 0 | i: 6 | j: 3 Result(i: 2.449490 | j: 1.732051 | i
j:
4.242641): 152.679544
ITER: 0 | i: 6 | j: 4 Result(i: 2.449490 | j: 2.000000 | ij:
4.898979): 157.578524
ITER: 0 | i: 6 | j: 5 Result(i: 2.449490 | j: 2.236068 | i
j:
5.477226): 163.055749
ITER: 0 | i: 6 | j: 6 Result(i: 2.449490 | j: 2.449490 | ij:
6.000000): 169.055749
ITER: 0 | i: 6 | j: 7 Result(i: 2.449490 | j: 2.645751 | i
j:
6.480741): 175.536490
ITER: 0 | i: 6 | j: 8 Result(i: 2.449490 | j: 2.828427 | ij:
6.928203): 182.464693
ITER: 0 | i: 6 | j: 9 Result(i: 2.449490 | j: 3.000000 | i
j:
7.348469): 189.813162
ITER: 0 | i: 7 | j: 0 Result(i: 2.645751 | j: 0.000000 | ij:
0.000000): 189.813162
ITER: 0 | i: 7 | j: 1 Result(i: 2.645751 | j: 1.000000 | i
j:
2.645751): 192.458914
ITER: 0 | i: 7 | j: 2 Result(i: 2.645751 | j: 1.414214 | ij:
3.741657): 196.200571
ITER: 0 | i: 7 | j: 3 Result(i: 2.645751 | j: 1.732051 | i
j:
4.582576): 200.783147
ITER: 0 | i: 7 | j: 4 Result(i: 2.645751 | j: 2.000000 | ij:
5.291503): 206.074649
ITER: 0 | i: 7 | j: 5 Result(i: 2.645751 | j: 2.236068 | i
j:
5.916080): 211.990729
ITER: 0 | i: 7 | j: 6 Result(i: 2.645751 | j: 2.449490 | ij:
6.480741): 218.471470
ITER: 0 | i: 7 | j: 7 Result(i: 2.645751 | j: 2.645751 | i
j:
7.000000): 225.471470
ITER: 0 | i: 7 | j: 8 Result(i: 2.645751 | j: 2.828427 | ij:
7.483315): 232.954785
ITER: 0 | i: 7 | j: 9 Result(i: 2.645751 | j: 3.000000 | i
j:
7.937254): 240.892039
ITER: 0 | i: 8 | j: 0 Result(i: 2.828427 | j: 0.000000 | ij:
0.000000): 240.892039
ITER: 0 | i: 8 | j: 1 Result(i: 2.828427 | j: 1.000000 | i
j:
2.828427): 243.720466
ITER: 0 | i: 8 | j: 2 Result(i: 2.828427 | j: 1.414214 | ij:
4.000000): 247.720466
ITER: 0 | i: 8 | j: 3 Result(i: 2.828427 | j: 1.732051 | i
j:
4.898979): 252.619445
ITER: 0 | i: 8 | j: 4 Result(i: 2.828427 | j: 2.000000 | ij:
5.656854): 258.276300
ITER: 0 | i: 8 | j: 5 Result(i: 2.828427 | j: 2.236068 | i
j:
6.324555): 264.600855
ITER: 0 | i: 8 | j: 6 Result(i: 2.828427 | j: 2.449490 | ij:
6.928203): 271.529058
ITER: 0 | i: 8 | j: 7 Result(i: 2.828427 | j: 2.645751 | i
j:
7.483315): 279.012373
ITER: 0 | i: 8 | j: 8 Result(i: 2.828427 | j: 2.828427 | ij:
8.000000): 287.012373
ITER: 0 | i: 8 | j: 9 Result(i: 2.828427 | j: 3.000000 | i
j:
8.485281): 295.497654
ITER: 0 | i: 9 | j: 0 Result(i: 3.000000 | j: 0.000000 | ij:
0.000000): 295.497654
ITER: 0 | i: 9 | j: 1 Result(i: 3.000000 | j: 1.000000 | i
j:
3.000000): 298.497654
ITER: 0 | i: 9 | j: 2 Result(i: 3.000000 | j: 1.414214 | ij:
4.242641): 302.740295
ITER: 0 | i: 9 | j: 3 Result(i: 3.000000 | j: 1.732051 | i
j:
5.196152): 307.936447
ITER: 0 | i: 9 | j: 4 Result(i: 3.000000 | j: 2.000000 | ij:
6.000000): 313.936447
ITER: 0 | i: 9 | j: 5 Result(i: 3.000000 | j: 2.236068 | i
j:
6.708204): 320.644651
ITER: 0 | i: 9 | j: 6 Result(i: 3.000000 | j: 2.449490 | ij:
7.348469): 327.993120
ITER: 0 | i: 9 | j: 7 Result(i: 3.000000 | j: 2.645751 | i
j:
7.937254): 335.930374
ITER: 0 | i: 9 | j: 8 Result(i: 3.000000 | j: 2.828427 | ij:
8.485281): 344.415656
ITER: 0 | i: 9 | j: 9 Result(i: 3.000000 | j: 3.000000 | i
j:
9.000000): 353.415656
Final Result: 353.415656
ITER: 1 | i: 0 | j: 0 Result(i: 0.000000 | j: 0.000000 | ij:
0.000000): 0.000000
ITER: 1 | i: 0 | j: 1 Result(i: 0.000000 | j: 1.000000 | i
j:
0.000000): 0.000000
ITER: 1 | i: 0 | j: 2 Result(i: 0.000000 | j: 1.414214 | ij:
0.000000): 0.000000
ITER: 1 | i: 0 | j: 3 Result(i: 0.000000 | j: 1.732051 | i
j:
0.000000): 0.000000
ITER: 1 | i: 0 | j: 4 Result(i: 0.000000 | j: 2.000000 | ij:
0.000000): 0.000000
ITER: 1 | i: 0 | j: 5 Result(i: 0.000000 | j: 2.236068 | i
j:
0.000000): 0.000000
ITER: 1 | i: 0 | j: 6 Result(i: 0.000000 | j: 2.449490 | ij:
0.000000): 0.000000
ITER: 1 | i: 0 | j: 7 Result(i: 0.000000 | j: 2.645751 | i
j:
0.000000): 0.000000
ITER: 1 | i: 0 | j: 8 Result(i: 0.000000 | j: 2.828427 | ij:
0.000000): 0.000000
ITER: 1 | i: 0 | j: 9 Result(i: 0.000000 | j: 3.000000 | i
j:
0.000000): 0.000000
ITER: 1 | i: 1 | j: 0 Result(i: 1.000000 | j: 0.000000 | ij:
0.000000): 0.000000
ITER: 1 | i: 1 | j: 1 Result(i: 1.000000 | j: 1.000000 | i
j:
1.000000): 1.000000
ITER: 1 | i: 1 | j: 2 Result(i: 1.000000 | j: 1.414214 | ij:
1.414214): 2.414214
ITER: 1 | i: 1 | j: 3 Result(i: 1.000000 | j: 1.732051 | i
j:
1.732051): 4.146264
ITER: 1 | i: 1 | j: 4 Result(i: 1.000000 | j: 2.000000 | ij:
2.000000): 6.146264
ITER: 1 | i: 1 | j: 5 Result(i: 1.000000 | j: 2.236068 | i
j:
2.236068): 8.382332
ITER: 1 | i: 1 | j: 6 Result(i: 1.000000 | j: 2.449490 | ij:
2.449490): 10.831822
ITER: 1 | i: 1 | j: 7 Result(i: 1.000000 | j: 2.645751 | i
j:
2.645751): 13.477573
ITER: 1 | i: 1 | j: 8 Result(i: 1.000000 | j: 2.828427 | ij:
2.828427): 16.306001
ITER: 1 | i: 1 | j: 9 Result(i: 1.000000 | j: 3.000000 | i
j:
3.000000): 19.306001
ITER: 1 | i: 2 | j: 0 Result(i: 1.414214 | j: 0.000000 | ij:
0.000000): 19.306001
ITER: 1 | i: 2 | j: 1 Result(i: 1.414214 | j: 1.000000 | i
j:
1.414214): 20.720214
ITER: 1 | i: 2 | j: 2 Result(i: 1.414214 | j: 1.414214 | ij:
2.000000): 22.720214
ITER: 1 | i: 2 | j: 3 Result(i: 1.414214 | j: 1.732051 | i
j:
2.449490): 25.169704
ITER: 1 | i: 2 | j: 4 Result(i: 1.414214 | j: 2.000000 | ij:
2.828427): 27.998131
ITER: 1 | i: 2 | j: 5 Result(i: 1.414214 | j: 2.236068 | i
j:
3.162278): 31.160409
ITER: 1 | i: 2 | j: 6 Result(i: 1.414214 | j: 2.449490 | ij:
3.464102): 34.624510
ITER: 1 | i: 2 | j: 7 Result(i: 1.414214 | j: 2.645751 | i
j:
3.741657): 38.366168
ITER: 1 | i: 2 | j: 8 Result(i: 1.414214 | j: 2.828427 | ij:
4.000000): 42.366168
ITER: 1 | i: 2 | j: 9 Result(i: 1.414214 | j: 3.000000 | i
j:
4.242641): 46.608808
ITER: 1 | i: 3 | j: 0 Result(i: 1.732051 | j: 0.000000 | ij:
0.000000): 46.608808
ITER: 1 | i: 3 | j: 1 Result(i: 1.732051 | j: 1.000000 | i
j:
1.732051): 48.340859
ITER: 1 | i: 3 | j: 2 Result(i: 1.732051 | j: 1.414214 | ij:
2.449490): 50.790349
ITER: 1 | i: 3 | j: 3 Result(i: 1.732051 | j: 1.732051 | i
j:
3.000000): 53.790349
ITER: 1 | i: 3 | j: 4 Result(i: 1.732051 | j: 2.000000 | ij:
3.464102): 57.254450
ITER: 1 | i: 3 | j: 5 Result(i: 1.732051 | j: 2.236068 | i
j:
3.872983): 61.127434
ITER: 1 | i: 3 | j: 6 Result(i: 1.732051 | j: 2.449490 | ij:
4.242641): 65.370075
ITER: 1 | i: 3 | j: 7 Result(i: 1.732051 | j: 2.645751 | i
j:
4.582576): 69.952650
ITER: 1 | i: 3 | j: 8 Result(i: 1.732051 | j: 2.828427 | ij:
4.898979): 74.851630
ITER: 1 | i: 3 | j: 9 Result(i: 1.732051 | j: 3.000000 | i
j:
5.196152): 80.047782
ITER: 1 | i: 4 | j: 0 Result(i: 2.000000 | j: 0.000000 | ij:
0.000000): 80.047782
ITER: 1 | i: 4 | j: 1 Result(i: 2.000000 | j: 1.000000 | i
j:
2.000000): 82.047782
ITER: 1 | i: 4 | j: 2 Result(i: 2.000000 | j: 1.414214 | ij:
2.828427): 84.876209
ITER: 1 | i: 4 | j: 3 Result(i: 2.000000 | j: 1.732051 | i
j:
3.464102): 88.340311
ITER: 1 | i: 4 | j: 4 Result(i: 2.000000 | j: 2.000000 | ij:
4.000000): 92.340311
ITER: 1 | i: 4 | j: 5 Result(i: 2.000000 | j: 2.236068 | i
j:
4.472136): 96.812447
ITER: 1 | i: 4 | j: 6 Result(i: 2.000000 | j: 2.449490 | ij:
4.898979): 101.711426
ITER: 1 | i: 4 | j: 7 Result(i: 2.000000 | j: 2.645751 | i
j:
5.291503): 107.002929
ITER: 1 | i: 4 | j: 8 Result(i: 2.000000 | j: 2.828427 | ij:
5.656854): 112.659783
ITER: 1 | i: 4 | j: 9 Result(i: 2.000000 | j: 3.000000 | i
j:
6.000000): 118.659783
ITER: 1 | i: 5 | j: 0 Result(i: 2.236068 | j: 0.000000 | ij:
0.000000): 118.659783
ITER: 1 | i: 5 | j: 1 Result(i: 2.236068 | j: 1.000000 | i
j:
2.236068): 120.895851
ITER: 1 | i: 5 | j: 2 Result(i: 2.236068 | j: 1.414214 | ij:
3.162278): 124.058129
ITER: 1 | i: 5 | j: 3 Result(i: 2.236068 | j: 1.732051 | i
j:
3.872983): 127.931112
ITER: 1 | i: 5 | j: 4 Result(i: 2.236068 | j: 2.000000 | ij:
4.472136): 132.403248
ITER: 1 | i: 5 | j: 5 Result(i: 2.236068 | j: 2.236068 | i
j:
5.000000): 137.403248
ITER: 1 | i: 5 | j: 6 Result(i: 2.236068 | j: 2.449490 | ij:
5.477226): 142.880474
ITER: 1 | i: 5 | j: 7 Result(i: 2.236068 | j: 2.645751 | i
j:
5.916080): 148.796553
ITER: 1 | i: 5 | j: 8 Result(i: 2.236068 | j: 2.828427 | ij:
6.324555): 155.121109
ITER: 1 | i: 5 | j: 9 Result(i: 2.236068 | j: 3.000000 | i
j:
6.708204): 161.829313
ITER: 1 | i: 6 | j: 0 Result(i: 2.449490 | j: 0.000000 | ij:
0.000000): 161.829313
ITER: 1 | i: 6 | j: 1 Result(i: 2.449490 | j: 1.000000 | i
j:
2.449490): 164.278802
ITER: 1 | i: 6 | j: 2 Result(i: 2.449490 | j: 1.414214 | ij:
3.464102): 167.742904
ITER: 1 | i: 6 | j: 3 Result(i: 2.449490 | j: 1.732051 | i
j:
4.242641): 171.985545
ITER: 1 | i: 6 | j: 4 Result(i: 2.449490 | j: 2.000000 | ij:
4.898979): 176.884524
ITER: 1 | i: 6 | j: 5 Result(i: 2.449490 | j: 2.236068 | i
j:
5.477226): 182.361750
ITER: 1 | i: 6 | j: 6 Result(i: 2.449490 | j: 2.449490 | ij:
6.000000): 188.361750
ITER: 1 | i: 6 | j: 7 Result(i: 2.449490 | j: 2.645751 | i
j:
6.480741): 194.842491
ITER: 1 | i: 6 | j: 8 Result(i: 2.449490 | j: 2.828427 | ij:
6.928203): 201.770694
ITER: 1 | i: 6 | j: 9 Result(i: 2.449490 | j: 3.000000 | i
j:
7.348469): 209.119163
ITER: 1 | i: 7 | j: 0 Result(i: 2.645751 | j: 0.000000 | ij:
0.000000): 209.119163
ITER: 1 | i: 7 | j: 1 Result(i: 2.645751 | j: 1.000000 | i
j:
2.645751): 211.764914
ITER: 1 | i: 7 | j: 2 Result(i: 2.645751 | j: 1.414214 | ij:
3.741657): 215.506572
ITER: 1 | i: 7 | j: 3 Result(i: 2.645751 | j: 1.732051 | i
j:
4.582576): 220.089147
ITER: 1 | i: 7 | j: 4 Result(i: 2.645751 | j: 2.000000 | ij:
5.291503): 225.380650
ITER: 1 | i: 7 | j: 5 Result(i: 2.645751 | j: 2.236068 | i
j:
5.916080): 231.296730
ITER: 1 | i: 7 | j: 6 Result(i: 2.645751 | j: 2.449490 | ij:
6.480741): 237.777470
ITER: 1 | i: 7 | j: 7 Result(i: 2.645751 | j: 2.645751 | i
j:
7.000000): 244.777470
ITER: 1 | i: 7 | j: 8 Result(i: 2.645751 | j: 2.828427 | ij:
7.483315): 252.260785
ITER: 1 | i: 7 | j: 9 Result(i: 2.645751 | j: 3.000000 | i
j:
7.937254): 260.198039
ITER: 1 | i: 8 | j: 0 Result(i: 2.828427 | j: 0.000000 | ij:
0.000000): 260.198039
ITER: 1 | i: 8 | j: 1 Result(i: 2.828427 | j: 1.000000 | i
j:
2.828427): 263.026466
ITER: 1 | i: 8 | j: 2 Result(i: 2.828427 | j: 1.414214 | ij:
4.000000): 267.026466
ITER: 1 | i: 8 | j: 3 Result(i: 2.828427 | j: 1.732051 | i
j:
4.898979): 271.925446
ITER: 1 | i: 8 | j: 4 Result(i: 2.828427 | j: 2.000000 | ij:
5.656854): 277.582300
ITER: 1 | i: 8 | j: 5 Result(i: 2.828427 | j: 2.236068 | i
j:
6.324555): 283.906855
ITER: 1 | i: 8 | j: 6 Result(i: 2.828427 | j: 2.449490 | ij:
6.928203): 290.835059
ITER: 1 | i: 8 | j: 7 Result(i: 2.828427 | j: 2.645751 | i
j:
7.483315): 298.318373
ITER: 1 | i: 8 | j: 8 Result(i: 2.828427 | j: 2.828427 | ij:
8.000000): 306.318373
ITER: 1 | i: 8 | j: 9 Result(i: 2.828427 | j: 3.000000 | i
j:
8.485281): 314.803655
ITER: 1 | i: 9 | j: 0 Result(i: 3.000000 | j: 0.000000 | ij:
0.000000): 314.803655
ITER: 1 | i: 9 | j: 1 Result(i: 3.000000 | j: 1.000000 | i
j:
3.000000): 317.803655
ITER: 1 | i: 9 | j: 2 Result(i: 3.000000 | j: 1.414214 | ij:
4.242641): 322.046295
ITER: 1 | i: 9 | j: 3 Result(i: 3.000000 | j: 1.732051 | i
j:
5.196152): 327.242448
ITER: 1 | i: 9 | j: 4 Result(i: 3.000000 | j: 2.000000 | ij:
6.000000): 333.242448
ITER: 1 | i: 9 | j: 5 Result(i: 3.000000 | j: 2.236068 | i
j:
6.708204): 339.950652
ITER: 1 | i: 9 | j: 6 Result(i: 3.000000 | j: 2.449490 | ij:
7.348469): 347.299121
ITER: 1 | i: 9 | j: 7 Result(i: 3.000000 | j: 2.645751 | i
j:
7.937254): 355.236375
ITER: 1 | i: 9 | j: 8 Result(i: 3.000000 | j: 2.828427 | ij:
8.485281): 363.721656
ITER: 1 | i: 9 | j: 9 Result(i: 3.000000 | j: 3.000000 | i
j:
9.000000): 372.721656
Final Result: 372.721656

As we can see in the following iterations the sqrt(1) as well as the
result is set to zero for some reason.

ITER: 0 | i: 1 | j: 4 Result(i: 0.000000 | j: 2.000000 | ij:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 5 Result(i: 0.000000 | j: 2.236068 | i
j:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 6 Result(i: 0.000000 | j: 2.449490 | ij:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 7 Result(i: 0.000000 | j: 2.645751 | i
j:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 8 Result(i: 0.000000 | j: 2.828427 | ij:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 9 Result(i: 0.000000 | j: 3.000000 | i
j:
0.000000): 0.000000

Please help me to resolve the accuracy issue! I think that it will
be very useful for gem5 community.

To be noticed, I find the correct simulated tick in which the
application started in FS (using m5 dumpstats), and I start the
--debug-start, but the trace file which is generated is 10x larger
than SE mode for the same application. How can I compare them?

Thank you in advance!
Best regards,
Nikos

Quoting Νικόλαος Ταμπουρατζής ntampouratzis@ece.auth.gr:

Dear Jason,

I am trying to use --debug-start but in FS mode it is very
difficult to find the tick on which the application is started!

However, I am writing the following very simple c++ program:

#include <cmath>
#include <stdio.h>

int main(){

int dim = 4096;

double result;

for (int iter = 0; iter < 2; iter++){
    result = 0;
    for (int i = 0; i < dim; i++){
        for (int j = 0; j < dim; j++){
            result += sqrt(i) * sqrt(j);
        }
    }
    printf("Result: %lf\n", result); //Result: 30530733453.127449
}

}

I cross-compile it using: riscv64-linux-gnu-g++ -static -O3 -o
test_riscv test_riscv.cpp

While in X86 (without cross-compilation of course), QEMU-RISCV,
GEM5-SE the result is the same (30530733453.127449), in GEM5-FS the
result is different! In addition, the result is also different
between the 2 iterations.

Please reproduce the error if you want in order to verify my result.
Ηow can the issue be resolved?

Thank you in advance!

Best regards,
Nikos

Quoting Jason Lowe-Power jason@lowepower.com:

Hi Nikos,

You can use --debug-start to start the debugging after some number of
ticks. Also, I would expect that the difference should come up

quickly, so

no need to run the program to the end.

For the FS mode one, you will want to just start the trace as the
application starts. This could be a bit of a pain.

I'm not really sure what fundamentally could be different. FS and SE

mode

use the exact same code for executing instructions, so I don't think

that's

the problem. Have you tried running for smaller inputs or just one
iteration?

Jason

On Wed, Sep 21, 2022 at 9:04 AM Νικόλαος Ταμπουρατζής <
ntampouratzis@ece.auth.gr> wrote:

Dear Bobby,

Iam trying to add --debug-flags=Exec (building the gem5 for gem5.opt
not for gem5.fast which I had) but the debug traces exceed the 20GB
(and it is not finished yet) for less than 1 simulated second. How can
I reduce the size of the debug-flags (or set something more specific)?

In contrast I build the HPCG benchmark with DHPCG_DEBUG flag. If you
want, you can compare these two output files
(hpcg20010909T014640_SE_Mode & HPCG-Benchmark_3.1_FS_Mode). As you can
see, something goes wrong with the accuracy of calculations in FS mode
(benchmark uses double precission). You can find the files here:
http://kition.mhl.tuc.gr:8000/d/68d82f3533/

Best regards,
Nikos

Quoting Jason Lowe-Power jason@lowepower.com:

That's quite odd that it works in SE mode but not FS mode!

I would suggest running with --debug-flags=Exec for both and then

perform a

diff to see how they differ.

Cheers,
Jason

On Tue, Sep 20, 2022 at 2:45 PM Νικόλαος Ταμπουρατζής <
ntampouratzis@ece.auth.gr> wrote:

Dear Bobby,

In QEMU I get the same (correct) results that I get in SE mode
simulation. I get invalid results in FS simulation (in both
riscv-fs.py and riscv-ubuntu-run.py). I cannot access real RISCV
hardware at this moment, however, if you want you may execute my

xhpcg

binary (http://kition.mhl.tuc.gr:8000/f/4ca25fdd3c/) with the
following configuration:

./xhpcg --nx=16 --ny=16 --nz=16 --npx=1 --npy=1 --npz=1 --rt=0.1

Please let me know if you have any updates!

Best regards,
Nikos

Quoting Jason Lowe-Power jason@lowepower.com:

Hi Nikos,

I notice you said the following in your original email:

In addition, I used the RISCV Ubuntu image

(

),

I installed the gcc compiler, compile it (through qemu) and I get
wrong results too.

Is this saying you get the wrong results is QEMU? If so, the bug

is in

GCC

or the HPCG workload, not in gem5. If not, I would test in QEMU to

make

sure the binary works there. Another way you could test to see if

the

problem is your binary or gem5 would be to run it on real

hardware. We

have

access to some RISC-V hardware here at UC Davis, if you don't have

access

to it.

Cheers,
Jason

On Tue, Sep 20, 2022 at 12:58 AM Νικόλαος Ταμπουρατζής <
ntampouratzis@ece.auth.gr> wrote:

Dear Bobby,

  1. I use the original riscv-fs.py which is provided in the latest

gem5

release.
I run the gem5 once (./build/RISCV/gem5.fast -d ./HPCG_FS_results
./configs/example/gem5_library/riscv-fs.py) in order to download

the

riscv-bootloader-vmlinux-5.10 and riscv-disk-img.
After this I mount the riscv-disk-img (sudo mount -o loop
riscv-disk-img /mnt), put the xhpcg executable and I do the

following

changes in riscv-fs.py to boot the riscv-disk-img with executable:

image = CustomDiskImageResource(
local_path = "/home/cossim/.cache/gem5/riscv-disk-img",
)

Set the Full System workload.

board.set_kernel_disk_workload(

kernel=Resource("riscv-bootloader-vmlinux-5.10"),

                 disk_image=image,

)

Finally, in the

gem5/src/python/gem5/components/boards/riscv_board.py

I change the last line to "return ["console=ttyS0",
"root={root_value}", "rw"]" in order to allow the write

permissions

in

the image.

  1. The HPCG benchmark after some iterations calculates if the

results

are valid or not valid. In the case of FS it gives invalid

results.

As

I see from the results, one (at least) problem is that produces
different results in each HPCG execution (with the same

configuration).

Here is the HPCG output and riscv-fs.py
(http://kition.mhl.tuc.gr:8000/d/68d82f3533/). You may reproduce

the

results in the video if you use the xhpcg executable
(http://kition.mhl.tuc.gr:8000/f/4ca25fdd3c/)

Please help me in order to solve it!

Finally, I get invalid results in the HPL benchmark in FS mode

too.

Best regards,
Nikos

Quoting Bobby Bruce bbruce@ucdavis.edu:

I'm going to need a bit more information to help:

  1. In what way have you modified
    ./configs/example/gem5_library/riscv-fs.py? Can you attach the

script

here?

  1. What error are you getting or in what way are the results

invalid?

Dr. Bobby R. Bruce
Room 3050,
Kemper Hall, UC Davis
Davis,
CA, 95616

web: https://www.bobbybruce.net

On Mon, Sep 19, 2022 at 1:43 PM Νικόλαος Ταμπουρατζής <
ntampouratzis@ece.auth.gr> wrote:

Dear gem5 community,

I have successfully cross-compile the HPCG benchmark for RISCV

(Serial

version, without MPI and OpenMP). While it working properly in

gem5

SE

mode (./build/RISCV/gem5.fast -d ./HPCG_SE_results
./configs/example/se.py -c xhpcg --options '--nx=16 --ny=16

--nz=16

--npx=1 --npy=1 --npz=1 --rt=0.1'), I get invalid results in FS
simulation using "./build/RISCV/gem5.fast -d ./HPCG_FS_results
./configs/example/gem5_library/riscv-fs.py" (I mount the riscv

image

and put it).

Can you help me please?

In addition, I used the RISCV Ubuntu image
(

),

I installed the gcc compiler, compile it (through qemu) and I

get

wrong results too.

Here is the Makefile which I use, the hpcg executable for RISCV
(xhpcg), and a video that shows the results
(http://kition.mhl.tuc.gr:8000/f/4ca25fdd3c/).

P.S. I use the latest gem5 version.

Thank you in advance! :)

Best regards,
Nikos


gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-leave@gem5.org


gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-leave@gem5.org


gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-leave@gem5.org


gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-leave@gem5.org


gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-leave@gem5.org


gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-leave@gem5.org


gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-leave@gem5.org

Dear Boddy, Thanks a lot for the effort! I looked in detail and I observe that the problem is created only using float and double variables (in the case of int it is working properly in FS mode). Specifically, in the case of float the variables are set to "nan", while in the case of double the variables are set to 0.000000 (in random time - probably from some instruction of simulated OS?). You may use a simple c/c++ example in order to get some traces before going to HPCG... Thank you in advance!! Best regards, Nikos Quoting Bobby Bruce <bbruce@ucdavis.edu>: > Hey Niko, > > Thanks for this analysis. I jumped a little into this today but didn't get > as far as you did. I wanted to find a quick way to recreate the following: > https://gem5-review.googlesource.com/c/public/gem5/+/64211. Please feel > free to use this, if it helps any. > > It's very strange to me that this bug hasn't manifested itself before but > it's undeniably there. I'll try to spend more time looking at this tomorrow > with some traces and debug flags and see if I can narrow down the problem. > > -- > Dr. Bobby R. Bruce > Room 3050, > Kemper Hall, UC Davis > Davis, > CA, 95616 > > web: https://www.bobbybruce.net > > > On Wed, Oct 5, 2022 at 2:26 PM Νικόλαος Ταμπουρατζής < > ntampouratzis@ece.auth.gr> wrote: > >> In my previous results, I had used double (not float) for the >> following variables: result, sq_i and sq_j. In the case of float >> instead of double I get "nan" and not 0.000000. >> >> Quoting Νικόλαος Ταμπουρατζής <ntampouratzis@ece.auth.gr>: >> >> > Dear Jason, all, >> > >> > I am trying to find the accuracy problem with RISCV-FS and I observe >> > that the problem is created (at least in my dummy example) because >> > the variables (double) are set to zero in random simulated time (for >> > this reason I get different results among executions of the same >> > code). Specifically for the following dummy code: >> > >> > >> > #include <cmath> >> > #include <stdio.h> >> > >> > int main(){ >> > >> > int dim = 10; >> > >> > float result; >> > >> > for (int iter = 0; iter < 2; iter++){ >> > result = 0; >> > for (int i = 0; i < dim; i++){ >> > for (int j = 0; j < dim; j++){ >> > float sq_i = sqrt(i); >> > float sq_j = sqrt(j); >> > result += sq_i * sq_j; >> > printf("ITER: %d | i: %d | j: %d Result(i: %f | j: >> > %f | i*j: %f): %f\n", iter, i , j, sq_i, sq_j, sq_i * sq_j, result); >> > } >> > } >> > printf("Final Result: %lf\n", result); >> > } >> > } >> > >> > >> > The correct Final Result in both iterations is 372.721656. However, >> > I get the following results in FS: >> > >> > ITER: 0 | i: 0 | j: 0 Result(i: 0.000000 | j: 0.000000 | i*j: >> > 0.000000): 0.000000 >> > ITER: 0 | i: 0 | j: 1 Result(i: 0.000000 | j: 1.000000 | i*j: >> > 0.000000): 0.000000 >> > ITER: 0 | i: 0 | j: 2 Result(i: 0.000000 | j: 1.414214 | i*j: >> > 0.000000): 0.000000 >> > ITER: 0 | i: 0 | j: 3 Result(i: 0.000000 | j: 1.732051 | i*j: >> > 0.000000): 0.000000 >> > ITER: 0 | i: 0 | j: 4 Result(i: 0.000000 | j: 2.000000 | i*j: >> > 0.000000): 0.000000 >> > ITER: 0 | i: 0 | j: 5 Result(i: 0.000000 | j: 2.236068 | i*j: >> > 0.000000): 0.000000 >> > ITER: 0 | i: 0 | j: 6 Result(i: 0.000000 | j: 2.449490 | i*j: >> > 0.000000): 0.000000 >> > ITER: 0 | i: 0 | j: 7 Result(i: 0.000000 | j: 2.645751 | i*j: >> > 0.000000): 0.000000 >> > ITER: 0 | i: 0 | j: 8 Result(i: 0.000000 | j: 2.828427 | i*j: >> > 0.000000): 0.000000 >> > ITER: 0 | i: 0 | j: 9 Result(i: 0.000000 | j: 3.000000 | i*j: >> > 0.000000): 0.000000 >> > ITER: 0 | i: 1 | j: 0 Result(i: 1.000000 | j: 0.000000 | i*j: >> > 0.000000): 0.000000 >> > ITER: 0 | i: 1 | j: 1 Result(i: 1.000000 | j: 1.000000 | i*j: >> > 1.000000): 1.000000 >> > ITER: 0 | i: 1 | j: 2 Result(i: 1.000000 | j: 1.414214 | i*j: >> > 1.414214): 2.414214 >> > ITER: 0 | i: 1 | j: 3 Result(i: 1.000000 | j: 1.732051 | i*j: >> > 1.732051): 4.146264 >> > ITER: 0 | i: 1 | j: 4 Result(i: 0.000000 | j: 2.000000 | i*j: >> > 0.000000): 0.000000 >> > ITER: 0 | i: 1 | j: 5 Result(i: 0.000000 | j: 2.236068 | i*j: >> > 0.000000): 0.000000 >> > ITER: 0 | i: 1 | j: 6 Result(i: 0.000000 | j: 2.449490 | i*j: >> > 0.000000): 0.000000 >> > ITER: 0 | i: 1 | j: 7 Result(i: 0.000000 | j: 2.645751 | i*j: >> > 0.000000): 0.000000 >> > ITER: 0 | i: 1 | j: 8 Result(i: 0.000000 | j: 2.828427 | i*j: >> > 0.000000): 0.000000 >> > ITER: 0 | i: 1 | j: 9 Result(i: 0.000000 | j: 3.000000 | i*j: >> > 0.000000): 0.000000 >> > ITER: 0 | i: 2 | j: 0 Result(i: 1.414214 | j: 0.000000 | i*j: >> > 0.000000): 0.000000 >> > ITER: 0 | i: 2 | j: 1 Result(i: 1.414214 | j: 1.000000 | i*j: >> > 1.414214): 1.414214 >> > ITER: 0 | i: 2 | j: 2 Result(i: 1.414214 | j: 1.414214 | i*j: >> > 2.000000): 3.414214 >> > ITER: 0 | i: 2 | j: 3 Result(i: 1.414214 | j: 1.732051 | i*j: >> > 2.449490): 5.863703 >> > ITER: 0 | i: 2 | j: 4 Result(i: 1.414214 | j: 2.000000 | i*j: >> > 2.828427): 8.692130 >> > ITER: 0 | i: 2 | j: 5 Result(i: 1.414214 | j: 2.236068 | i*j: >> > 3.162278): 11.854408 >> > ITER: 0 | i: 2 | j: 6 Result(i: 1.414214 | j: 2.449490 | i*j: >> > 3.464102): 15.318510 >> > ITER: 0 | i: 2 | j: 7 Result(i: 1.414214 | j: 2.645751 | i*j: >> > 3.741657): 19.060167 >> > ITER: 0 | i: 2 | j: 8 Result(i: 1.414214 | j: 2.828427 | i*j: >> > 4.000000): 23.060167 >> > ITER: 0 | i: 2 | j: 9 Result(i: 1.414214 | j: 3.000000 | i*j: >> > 4.242641): 27.302808 >> > ITER: 0 | i: 3 | j: 0 Result(i: 1.732051 | j: 0.000000 | i*j: >> > 0.000000): 27.302808 >> > ITER: 0 | i: 3 | j: 1 Result(i: 1.732051 | j: 1.000000 | i*j: >> > 1.732051): 29.034859 >> > ITER: 0 | i: 3 | j: 2 Result(i: 1.732051 | j: 1.414214 | i*j: >> > 2.449490): 31.484348 >> > ITER: 0 | i: 3 | j: 3 Result(i: 1.732051 | j: 1.732051 | i*j: >> > 3.000000): 34.484348 >> > ITER: 0 | i: 3 | j: 4 Result(i: 1.732051 | j: 2.000000 | i*j: >> > 3.464102): 37.948450 >> > ITER: 0 | i: 3 | j: 5 Result(i: 1.732051 | j: 2.236068 | i*j: >> > 3.872983): 41.821433 >> > ITER: 0 | i: 3 | j: 6 Result(i: 1.732051 | j: 2.449490 | i*j: >> > 4.242641): 46.064074 >> > ITER: 0 | i: 3 | j: 7 Result(i: 1.732051 | j: 2.645751 | i*j: >> > 4.582576): 50.646650 >> > ITER: 0 | i: 3 | j: 8 Result(i: 1.732051 | j: 2.828427 | i*j: >> > 4.898979): 55.545629 >> > ITER: 0 | i: 3 | j: 9 Result(i: 1.732051 | j: 3.000000 | i*j: >> > 5.196152): 60.741782 >> > ITER: 0 | i: 4 | j: 0 Result(i: 2.000000 | j: 0.000000 | i*j: >> > 0.000000): 60.741782 >> > ITER: 0 | i: 4 | j: 1 Result(i: 2.000000 | j: 1.000000 | i*j: >> > 2.000000): 62.741782 >> > ITER: 0 | i: 4 | j: 2 Result(i: 2.000000 | j: 1.414214 | i*j: >> > 2.828427): 65.570209 >> > ITER: 0 | i: 4 | j: 3 Result(i: 2.000000 | j: 1.732051 | i*j: >> > 3.464102): 69.034310 >> > ITER: 0 | i: 4 | j: 4 Result(i: 2.000000 | j: 2.000000 | i*j: >> > 4.000000): 73.034310 >> > ITER: 0 | i: 4 | j: 5 Result(i: 2.000000 | j: 2.236068 | i*j: >> > 4.472136): 77.506446 >> > ITER: 0 | i: 4 | j: 6 Result(i: 2.000000 | j: 2.449490 | i*j: >> > 4.898979): 82.405426 >> > ITER: 0 | i: 4 | j: 7 Result(i: 2.000000 | j: 2.645751 | i*j: >> > 5.291503): 87.696928 >> > ITER: 0 | i: 4 | j: 8 Result(i: 2.000000 | j: 2.828427 | i*j: >> > 5.656854): 93.353783 >> > ITER: 0 | i: 4 | j: 9 Result(i: 2.000000 | j: 3.000000 | i*j: >> > 6.000000): 99.353783 >> > ITER: 0 | i: 5 | j: 0 Result(i: 2.236068 | j: 0.000000 | i*j: >> > 0.000000): 99.353783 >> > ITER: 0 | i: 5 | j: 1 Result(i: 2.236068 | j: 1.000000 | i*j: >> > 2.236068): 101.589851 >> > ITER: 0 | i: 5 | j: 2 Result(i: 2.236068 | j: 1.414214 | i*j: >> > 3.162278): 104.752128 >> > ITER: 0 | i: 5 | j: 3 Result(i: 2.236068 | j: 1.732051 | i*j: >> > 3.872983): 108.625112 >> > ITER: 0 | i: 5 | j: 4 Result(i: 2.236068 | j: 2.000000 | i*j: >> > 4.472136): 113.097248 >> > ITER: 0 | i: 5 | j: 5 Result(i: 2.236068 | j: 2.236068 | i*j: >> > 5.000000): 118.097248 >> > ITER: 0 | i: 5 | j: 6 Result(i: 2.236068 | j: 2.449490 | i*j: >> > 5.477226): 123.574473 >> > ITER: 0 | i: 5 | j: 7 Result(i: 2.236068 | j: 2.645751 | i*j: >> > 5.916080): 129.490553 >> > ITER: 0 | i: 5 | j: 8 Result(i: 2.236068 | j: 2.828427 | i*j: >> > 6.324555): 135.815108 >> > ITER: 0 | i: 5 | j: 9 Result(i: 2.236068 | j: 3.000000 | i*j: >> > 6.708204): 142.523312 >> > ITER: 0 | i: 6 | j: 0 Result(i: 2.449490 | j: 0.000000 | i*j: >> > 0.000000): 142.523312 >> > ITER: 0 | i: 6 | j: 1 Result(i: 2.449490 | j: 1.000000 | i*j: >> > 2.449490): 144.972802 >> > ITER: 0 | i: 6 | j: 2 Result(i: 2.449490 | j: 1.414214 | i*j: >> > 3.464102): 148.436904 >> > ITER: 0 | i: 6 | j: 3 Result(i: 2.449490 | j: 1.732051 | i*j: >> > 4.242641): 152.679544 >> > ITER: 0 | i: 6 | j: 4 Result(i: 2.449490 | j: 2.000000 | i*j: >> > 4.898979): 157.578524 >> > ITER: 0 | i: 6 | j: 5 Result(i: 2.449490 | j: 2.236068 | i*j: >> > 5.477226): 163.055749 >> > ITER: 0 | i: 6 | j: 6 Result(i: 2.449490 | j: 2.449490 | i*j: >> > 6.000000): 169.055749 >> > ITER: 0 | i: 6 | j: 7 Result(i: 2.449490 | j: 2.645751 | i*j: >> > 6.480741): 175.536490 >> > ITER: 0 | i: 6 | j: 8 Result(i: 2.449490 | j: 2.828427 | i*j: >> > 6.928203): 182.464693 >> > ITER: 0 | i: 6 | j: 9 Result(i: 2.449490 | j: 3.000000 | i*j: >> > 7.348469): 189.813162 >> > ITER: 0 | i: 7 | j: 0 Result(i: 2.645751 | j: 0.000000 | i*j: >> > 0.000000): 189.813162 >> > ITER: 0 | i: 7 | j: 1 Result(i: 2.645751 | j: 1.000000 | i*j: >> > 2.645751): 192.458914 >> > ITER: 0 | i: 7 | j: 2 Result(i: 2.645751 | j: 1.414214 | i*j: >> > 3.741657): 196.200571 >> > ITER: 0 | i: 7 | j: 3 Result(i: 2.645751 | j: 1.732051 | i*j: >> > 4.582576): 200.783147 >> > ITER: 0 | i: 7 | j: 4 Result(i: 2.645751 | j: 2.000000 | i*j: >> > 5.291503): 206.074649 >> > ITER: 0 | i: 7 | j: 5 Result(i: 2.645751 | j: 2.236068 | i*j: >> > 5.916080): 211.990729 >> > ITER: 0 | i: 7 | j: 6 Result(i: 2.645751 | j: 2.449490 | i*j: >> > 6.480741): 218.471470 >> > ITER: 0 | i: 7 | j: 7 Result(i: 2.645751 | j: 2.645751 | i*j: >> > 7.000000): 225.471470 >> > ITER: 0 | i: 7 | j: 8 Result(i: 2.645751 | j: 2.828427 | i*j: >> > 7.483315): 232.954785 >> > ITER: 0 | i: 7 | j: 9 Result(i: 2.645751 | j: 3.000000 | i*j: >> > 7.937254): 240.892039 >> > ITER: 0 | i: 8 | j: 0 Result(i: 2.828427 | j: 0.000000 | i*j: >> > 0.000000): 240.892039 >> > ITER: 0 | i: 8 | j: 1 Result(i: 2.828427 | j: 1.000000 | i*j: >> > 2.828427): 243.720466 >> > ITER: 0 | i: 8 | j: 2 Result(i: 2.828427 | j: 1.414214 | i*j: >> > 4.000000): 247.720466 >> > ITER: 0 | i: 8 | j: 3 Result(i: 2.828427 | j: 1.732051 | i*j: >> > 4.898979): 252.619445 >> > ITER: 0 | i: 8 | j: 4 Result(i: 2.828427 | j: 2.000000 | i*j: >> > 5.656854): 258.276300 >> > ITER: 0 | i: 8 | j: 5 Result(i: 2.828427 | j: 2.236068 | i*j: >> > 6.324555): 264.600855 >> > ITER: 0 | i: 8 | j: 6 Result(i: 2.828427 | j: 2.449490 | i*j: >> > 6.928203): 271.529058 >> > ITER: 0 | i: 8 | j: 7 Result(i: 2.828427 | j: 2.645751 | i*j: >> > 7.483315): 279.012373 >> > ITER: 0 | i: 8 | j: 8 Result(i: 2.828427 | j: 2.828427 | i*j: >> > 8.000000): 287.012373 >> > ITER: 0 | i: 8 | j: 9 Result(i: 2.828427 | j: 3.000000 | i*j: >> > 8.485281): 295.497654 >> > ITER: 0 | i: 9 | j: 0 Result(i: 3.000000 | j: 0.000000 | i*j: >> > 0.000000): 295.497654 >> > ITER: 0 | i: 9 | j: 1 Result(i: 3.000000 | j: 1.000000 | i*j: >> > 3.000000): 298.497654 >> > ITER: 0 | i: 9 | j: 2 Result(i: 3.000000 | j: 1.414214 | i*j: >> > 4.242641): 302.740295 >> > ITER: 0 | i: 9 | j: 3 Result(i: 3.000000 | j: 1.732051 | i*j: >> > 5.196152): 307.936447 >> > ITER: 0 | i: 9 | j: 4 Result(i: 3.000000 | j: 2.000000 | i*j: >> > 6.000000): 313.936447 >> > ITER: 0 | i: 9 | j: 5 Result(i: 3.000000 | j: 2.236068 | i*j: >> > 6.708204): 320.644651 >> > ITER: 0 | i: 9 | j: 6 Result(i: 3.000000 | j: 2.449490 | i*j: >> > 7.348469): 327.993120 >> > ITER: 0 | i: 9 | j: 7 Result(i: 3.000000 | j: 2.645751 | i*j: >> > 7.937254): 335.930374 >> > ITER: 0 | i: 9 | j: 8 Result(i: 3.000000 | j: 2.828427 | i*j: >> > 8.485281): 344.415656 >> > ITER: 0 | i: 9 | j: 9 Result(i: 3.000000 | j: 3.000000 | i*j: >> > 9.000000): 353.415656 >> > Final Result: 353.415656 >> > ITER: 1 | i: 0 | j: 0 Result(i: 0.000000 | j: 0.000000 | i*j: >> > 0.000000): 0.000000 >> > ITER: 1 | i: 0 | j: 1 Result(i: 0.000000 | j: 1.000000 | i*j: >> > 0.000000): 0.000000 >> > ITER: 1 | i: 0 | j: 2 Result(i: 0.000000 | j: 1.414214 | i*j: >> > 0.000000): 0.000000 >> > ITER: 1 | i: 0 | j: 3 Result(i: 0.000000 | j: 1.732051 | i*j: >> > 0.000000): 0.000000 >> > ITER: 1 | i: 0 | j: 4 Result(i: 0.000000 | j: 2.000000 | i*j: >> > 0.000000): 0.000000 >> > ITER: 1 | i: 0 | j: 5 Result(i: 0.000000 | j: 2.236068 | i*j: >> > 0.000000): 0.000000 >> > ITER: 1 | i: 0 | j: 6 Result(i: 0.000000 | j: 2.449490 | i*j: >> > 0.000000): 0.000000 >> > ITER: 1 | i: 0 | j: 7 Result(i: 0.000000 | j: 2.645751 | i*j: >> > 0.000000): 0.000000 >> > ITER: 1 | i: 0 | j: 8 Result(i: 0.000000 | j: 2.828427 | i*j: >> > 0.000000): 0.000000 >> > ITER: 1 | i: 0 | j: 9 Result(i: 0.000000 | j: 3.000000 | i*j: >> > 0.000000): 0.000000 >> > ITER: 1 | i: 1 | j: 0 Result(i: 1.000000 | j: 0.000000 | i*j: >> > 0.000000): 0.000000 >> > ITER: 1 | i: 1 | j: 1 Result(i: 1.000000 | j: 1.000000 | i*j: >> > 1.000000): 1.000000 >> > ITER: 1 | i: 1 | j: 2 Result(i: 1.000000 | j: 1.414214 | i*j: >> > 1.414214): 2.414214 >> > ITER: 1 | i: 1 | j: 3 Result(i: 1.000000 | j: 1.732051 | i*j: >> > 1.732051): 4.146264 >> > ITER: 1 | i: 1 | j: 4 Result(i: 1.000000 | j: 2.000000 | i*j: >> > 2.000000): 6.146264 >> > ITER: 1 | i: 1 | j: 5 Result(i: 1.000000 | j: 2.236068 | i*j: >> > 2.236068): 8.382332 >> > ITER: 1 | i: 1 | j: 6 Result(i: 1.000000 | j: 2.449490 | i*j: >> > 2.449490): 10.831822 >> > ITER: 1 | i: 1 | j: 7 Result(i: 1.000000 | j: 2.645751 | i*j: >> > 2.645751): 13.477573 >> > ITER: 1 | i: 1 | j: 8 Result(i: 1.000000 | j: 2.828427 | i*j: >> > 2.828427): 16.306001 >> > ITER: 1 | i: 1 | j: 9 Result(i: 1.000000 | j: 3.000000 | i*j: >> > 3.000000): 19.306001 >> > ITER: 1 | i: 2 | j: 0 Result(i: 1.414214 | j: 0.000000 | i*j: >> > 0.000000): 19.306001 >> > ITER: 1 | i: 2 | j: 1 Result(i: 1.414214 | j: 1.000000 | i*j: >> > 1.414214): 20.720214 >> > ITER: 1 | i: 2 | j: 2 Result(i: 1.414214 | j: 1.414214 | i*j: >> > 2.000000): 22.720214 >> > ITER: 1 | i: 2 | j: 3 Result(i: 1.414214 | j: 1.732051 | i*j: >> > 2.449490): 25.169704 >> > ITER: 1 | i: 2 | j: 4 Result(i: 1.414214 | j: 2.000000 | i*j: >> > 2.828427): 27.998131 >> > ITER: 1 | i: 2 | j: 5 Result(i: 1.414214 | j: 2.236068 | i*j: >> > 3.162278): 31.160409 >> > ITER: 1 | i: 2 | j: 6 Result(i: 1.414214 | j: 2.449490 | i*j: >> > 3.464102): 34.624510 >> > ITER: 1 | i: 2 | j: 7 Result(i: 1.414214 | j: 2.645751 | i*j: >> > 3.741657): 38.366168 >> > ITER: 1 | i: 2 | j: 8 Result(i: 1.414214 | j: 2.828427 | i*j: >> > 4.000000): 42.366168 >> > ITER: 1 | i: 2 | j: 9 Result(i: 1.414214 | j: 3.000000 | i*j: >> > 4.242641): 46.608808 >> > ITER: 1 | i: 3 | j: 0 Result(i: 1.732051 | j: 0.000000 | i*j: >> > 0.000000): 46.608808 >> > ITER: 1 | i: 3 | j: 1 Result(i: 1.732051 | j: 1.000000 | i*j: >> > 1.732051): 48.340859 >> > ITER: 1 | i: 3 | j: 2 Result(i: 1.732051 | j: 1.414214 | i*j: >> > 2.449490): 50.790349 >> > ITER: 1 | i: 3 | j: 3 Result(i: 1.732051 | j: 1.732051 | i*j: >> > 3.000000): 53.790349 >> > ITER: 1 | i: 3 | j: 4 Result(i: 1.732051 | j: 2.000000 | i*j: >> > 3.464102): 57.254450 >> > ITER: 1 | i: 3 | j: 5 Result(i: 1.732051 | j: 2.236068 | i*j: >> > 3.872983): 61.127434 >> > ITER: 1 | i: 3 | j: 6 Result(i: 1.732051 | j: 2.449490 | i*j: >> > 4.242641): 65.370075 >> > ITER: 1 | i: 3 | j: 7 Result(i: 1.732051 | j: 2.645751 | i*j: >> > 4.582576): 69.952650 >> > ITER: 1 | i: 3 | j: 8 Result(i: 1.732051 | j: 2.828427 | i*j: >> > 4.898979): 74.851630 >> > ITER: 1 | i: 3 | j: 9 Result(i: 1.732051 | j: 3.000000 | i*j: >> > 5.196152): 80.047782 >> > ITER: 1 | i: 4 | j: 0 Result(i: 2.000000 | j: 0.000000 | i*j: >> > 0.000000): 80.047782 >> > ITER: 1 | i: 4 | j: 1 Result(i: 2.000000 | j: 1.000000 | i*j: >> > 2.000000): 82.047782 >> > ITER: 1 | i: 4 | j: 2 Result(i: 2.000000 | j: 1.414214 | i*j: >> > 2.828427): 84.876209 >> > ITER: 1 | i: 4 | j: 3 Result(i: 2.000000 | j: 1.732051 | i*j: >> > 3.464102): 88.340311 >> > ITER: 1 | i: 4 | j: 4 Result(i: 2.000000 | j: 2.000000 | i*j: >> > 4.000000): 92.340311 >> > ITER: 1 | i: 4 | j: 5 Result(i: 2.000000 | j: 2.236068 | i*j: >> > 4.472136): 96.812447 >> > ITER: 1 | i: 4 | j: 6 Result(i: 2.000000 | j: 2.449490 | i*j: >> > 4.898979): 101.711426 >> > ITER: 1 | i: 4 | j: 7 Result(i: 2.000000 | j: 2.645751 | i*j: >> > 5.291503): 107.002929 >> > ITER: 1 | i: 4 | j: 8 Result(i: 2.000000 | j: 2.828427 | i*j: >> > 5.656854): 112.659783 >> > ITER: 1 | i: 4 | j: 9 Result(i: 2.000000 | j: 3.000000 | i*j: >> > 6.000000): 118.659783 >> > ITER: 1 | i: 5 | j: 0 Result(i: 2.236068 | j: 0.000000 | i*j: >> > 0.000000): 118.659783 >> > ITER: 1 | i: 5 | j: 1 Result(i: 2.236068 | j: 1.000000 | i*j: >> > 2.236068): 120.895851 >> > ITER: 1 | i: 5 | j: 2 Result(i: 2.236068 | j: 1.414214 | i*j: >> > 3.162278): 124.058129 >> > ITER: 1 | i: 5 | j: 3 Result(i: 2.236068 | j: 1.732051 | i*j: >> > 3.872983): 127.931112 >> > ITER: 1 | i: 5 | j: 4 Result(i: 2.236068 | j: 2.000000 | i*j: >> > 4.472136): 132.403248 >> > ITER: 1 | i: 5 | j: 5 Result(i: 2.236068 | j: 2.236068 | i*j: >> > 5.000000): 137.403248 >> > ITER: 1 | i: 5 | j: 6 Result(i: 2.236068 | j: 2.449490 | i*j: >> > 5.477226): 142.880474 >> > ITER: 1 | i: 5 | j: 7 Result(i: 2.236068 | j: 2.645751 | i*j: >> > 5.916080): 148.796553 >> > ITER: 1 | i: 5 | j: 8 Result(i: 2.236068 | j: 2.828427 | i*j: >> > 6.324555): 155.121109 >> > ITER: 1 | i: 5 | j: 9 Result(i: 2.236068 | j: 3.000000 | i*j: >> > 6.708204): 161.829313 >> > ITER: 1 | i: 6 | j: 0 Result(i: 2.449490 | j: 0.000000 | i*j: >> > 0.000000): 161.829313 >> > ITER: 1 | i: 6 | j: 1 Result(i: 2.449490 | j: 1.000000 | i*j: >> > 2.449490): 164.278802 >> > ITER: 1 | i: 6 | j: 2 Result(i: 2.449490 | j: 1.414214 | i*j: >> > 3.464102): 167.742904 >> > ITER: 1 | i: 6 | j: 3 Result(i: 2.449490 | j: 1.732051 | i*j: >> > 4.242641): 171.985545 >> > ITER: 1 | i: 6 | j: 4 Result(i: 2.449490 | j: 2.000000 | i*j: >> > 4.898979): 176.884524 >> > ITER: 1 | i: 6 | j: 5 Result(i: 2.449490 | j: 2.236068 | i*j: >> > 5.477226): 182.361750 >> > ITER: 1 | i: 6 | j: 6 Result(i: 2.449490 | j: 2.449490 | i*j: >> > 6.000000): 188.361750 >> > ITER: 1 | i: 6 | j: 7 Result(i: 2.449490 | j: 2.645751 | i*j: >> > 6.480741): 194.842491 >> > ITER: 1 | i: 6 | j: 8 Result(i: 2.449490 | j: 2.828427 | i*j: >> > 6.928203): 201.770694 >> > ITER: 1 | i: 6 | j: 9 Result(i: 2.449490 | j: 3.000000 | i*j: >> > 7.348469): 209.119163 >> > ITER: 1 | i: 7 | j: 0 Result(i: 2.645751 | j: 0.000000 | i*j: >> > 0.000000): 209.119163 >> > ITER: 1 | i: 7 | j: 1 Result(i: 2.645751 | j: 1.000000 | i*j: >> > 2.645751): 211.764914 >> > ITER: 1 | i: 7 | j: 2 Result(i: 2.645751 | j: 1.414214 | i*j: >> > 3.741657): 215.506572 >> > ITER: 1 | i: 7 | j: 3 Result(i: 2.645751 | j: 1.732051 | i*j: >> > 4.582576): 220.089147 >> > ITER: 1 | i: 7 | j: 4 Result(i: 2.645751 | j: 2.000000 | i*j: >> > 5.291503): 225.380650 >> > ITER: 1 | i: 7 | j: 5 Result(i: 2.645751 | j: 2.236068 | i*j: >> > 5.916080): 231.296730 >> > ITER: 1 | i: 7 | j: 6 Result(i: 2.645751 | j: 2.449490 | i*j: >> > 6.480741): 237.777470 >> > ITER: 1 | i: 7 | j: 7 Result(i: 2.645751 | j: 2.645751 | i*j: >> > 7.000000): 244.777470 >> > ITER: 1 | i: 7 | j: 8 Result(i: 2.645751 | j: 2.828427 | i*j: >> > 7.483315): 252.260785 >> > ITER: 1 | i: 7 | j: 9 Result(i: 2.645751 | j: 3.000000 | i*j: >> > 7.937254): 260.198039 >> > ITER: 1 | i: 8 | j: 0 Result(i: 2.828427 | j: 0.000000 | i*j: >> > 0.000000): 260.198039 >> > ITER: 1 | i: 8 | j: 1 Result(i: 2.828427 | j: 1.000000 | i*j: >> > 2.828427): 263.026466 >> > ITER: 1 | i: 8 | j: 2 Result(i: 2.828427 | j: 1.414214 | i*j: >> > 4.000000): 267.026466 >> > ITER: 1 | i: 8 | j: 3 Result(i: 2.828427 | j: 1.732051 | i*j: >> > 4.898979): 271.925446 >> > ITER: 1 | i: 8 | j: 4 Result(i: 2.828427 | j: 2.000000 | i*j: >> > 5.656854): 277.582300 >> > ITER: 1 | i: 8 | j: 5 Result(i: 2.828427 | j: 2.236068 | i*j: >> > 6.324555): 283.906855 >> > ITER: 1 | i: 8 | j: 6 Result(i: 2.828427 | j: 2.449490 | i*j: >> > 6.928203): 290.835059 >> > ITER: 1 | i: 8 | j: 7 Result(i: 2.828427 | j: 2.645751 | i*j: >> > 7.483315): 298.318373 >> > ITER: 1 | i: 8 | j: 8 Result(i: 2.828427 | j: 2.828427 | i*j: >> > 8.000000): 306.318373 >> > ITER: 1 | i: 8 | j: 9 Result(i: 2.828427 | j: 3.000000 | i*j: >> > 8.485281): 314.803655 >> > ITER: 1 | i: 9 | j: 0 Result(i: 3.000000 | j: 0.000000 | i*j: >> > 0.000000): 314.803655 >> > ITER: 1 | i: 9 | j: 1 Result(i: 3.000000 | j: 1.000000 | i*j: >> > 3.000000): 317.803655 >> > ITER: 1 | i: 9 | j: 2 Result(i: 3.000000 | j: 1.414214 | i*j: >> > 4.242641): 322.046295 >> > ITER: 1 | i: 9 | j: 3 Result(i: 3.000000 | j: 1.732051 | i*j: >> > 5.196152): 327.242448 >> > ITER: 1 | i: 9 | j: 4 Result(i: 3.000000 | j: 2.000000 | i*j: >> > 6.000000): 333.242448 >> > ITER: 1 | i: 9 | j: 5 Result(i: 3.000000 | j: 2.236068 | i*j: >> > 6.708204): 339.950652 >> > ITER: 1 | i: 9 | j: 6 Result(i: 3.000000 | j: 2.449490 | i*j: >> > 7.348469): 347.299121 >> > ITER: 1 | i: 9 | j: 7 Result(i: 3.000000 | j: 2.645751 | i*j: >> > 7.937254): 355.236375 >> > ITER: 1 | i: 9 | j: 8 Result(i: 3.000000 | j: 2.828427 | i*j: >> > 8.485281): 363.721656 >> > ITER: 1 | i: 9 | j: 9 Result(i: 3.000000 | j: 3.000000 | i*j: >> > 9.000000): 372.721656 >> > Final Result: 372.721656 >> > >> > >> > >> > As we can see in the following iterations the sqrt(1) as well as the >> > result is set to zero for some reason. >> > >> > ITER: 0 | i: 1 | j: 4 Result(i: 0.000000 | j: 2.000000 | i*j: >> > 0.000000): 0.000000 >> > ITER: 0 | i: 1 | j: 5 Result(i: 0.000000 | j: 2.236068 | i*j: >> > 0.000000): 0.000000 >> > ITER: 0 | i: 1 | j: 6 Result(i: 0.000000 | j: 2.449490 | i*j: >> > 0.000000): 0.000000 >> > ITER: 0 | i: 1 | j: 7 Result(i: 0.000000 | j: 2.645751 | i*j: >> > 0.000000): 0.000000 >> > ITER: 0 | i: 1 | j: 8 Result(i: 0.000000 | j: 2.828427 | i*j: >> > 0.000000): 0.000000 >> > ITER: 0 | i: 1 | j: 9 Result(i: 0.000000 | j: 3.000000 | i*j: >> > 0.000000): 0.000000 >> > >> > Please help me to resolve the accuracy issue! I think that it will >> > be very useful for gem5 community. >> > >> > To be noticed, I find the correct simulated tick in which the >> > application started in FS (using m5 dumpstats), and I start the >> > --debug-start, but the trace file which is generated is 10x larger >> > than SE mode for the same application. How can I compare them? >> > >> > Thank you in advance! >> > Best regards, >> > Nikos >> > >> > Quoting Νικόλαος Ταμπουρατζής <ntampouratzis@ece.auth.gr>: >> > >> >> Dear Jason, >> >> >> >> I am trying to use --debug-start but in FS mode it is very >> >> difficult to find the tick on which the application is started! >> >> >> >> However, I am writing the following very simple c++ program: >> >> >> >> #include <cmath> >> >> #include <stdio.h> >> >> >> >> int main(){ >> >> >> >> int dim = 4096; >> >> >> >> double result; >> >> >> >> for (int iter = 0; iter < 2; iter++){ >> >> result = 0; >> >> for (int i = 0; i < dim; i++){ >> >> for (int j = 0; j < dim; j++){ >> >> result += sqrt(i) * sqrt(j); >> >> } >> >> } >> >> printf("Result: %lf\n", result); //Result: 30530733453.127449 >> >> } >> >> } >> >> >> >> I cross-compile it using: riscv64-linux-gnu-g++ -static -O3 -o >> >> test_riscv test_riscv.cpp >> >> >> >> >> >> While in X86 (without cross-compilation of course), QEMU-RISCV, >> >> GEM5-SE the result is the same (30530733453.127449), in GEM5-FS the >> >> result is different! In addition, the result is also different >> >> between the 2 iterations. >> >> >> >> Please reproduce the error if you want in order to verify my result. >> >> Ηow can the issue be resolved? >> >> >> >> Thank you in advance! >> >> >> >> Best regards, >> >> Nikos >> >> >> >> >> >> Quoting Jason Lowe-Power <jason@lowepower.com>: >> >> >> >>> Hi Nikos, >> >>> >> >>> You can use --debug-start to start the debugging after some number of >> >>> ticks. Also, I would expect that the difference should come up >> quickly, so >> >>> no need to run the program to the end. >> >>> >> >>> For the FS mode one, you will want to just start the trace as the >> >>> application starts. This could be a bit of a pain. >> >>> >> >>> I'm not really sure what fundamentally could be different. FS and SE >> mode >> >>> use the exact same code for executing instructions, so I don't think >> that's >> >>> the problem. Have you tried running for smaller inputs or just one >> >>> iteration? >> >>> >> >>> Jason >> >>> >> >>> >> >>> >> >>> On Wed, Sep 21, 2022 at 9:04 AM Νικόλαος Ταμπουρατζής < >> >>> ntampouratzis@ece.auth.gr> wrote: >> >>> >> >>>> Dear Bobby, >> >>>> >> >>>> Iam trying to add --debug-flags=Exec (building the gem5 for gem5.opt >> >>>> not for gem5.fast which I had) but the debug traces exceed the 20GB >> >>>> (and it is not finished yet) for less than 1 simulated second. How can >> >>>> I reduce the size of the debug-flags (or set something more specific)? >> >>>> >> >>>> In contrast I build the HPCG benchmark with DHPCG_DEBUG flag. If you >> >>>> want, you can compare these two output files >> >>>> (hpcg20010909T014640_SE_Mode & HPCG-Benchmark_3.1_FS_Mode). As you can >> >>>> see, something goes wrong with the accuracy of calculations in FS mode >> >>>> (benchmark uses double precission). You can find the files here: >> >>>> http://kition.mhl.tuc.gr:8000/d/68d82f3533/ >> >>>> >> >>>> Best regards, >> >>>> Nikos >> >>>> >> >>>> Quoting Jason Lowe-Power <jason@lowepower.com>: >> >>>> >> >>>>> That's quite odd that it works in SE mode but not FS mode! >> >>>>> >> >>>>> I would suggest running with --debug-flags=Exec for both and then >> >>>> perform a >> >>>>> diff to see how they differ. >> >>>>> >> >>>>> Cheers, >> >>>>> Jason >> >>>>> >> >>>>> On Tue, Sep 20, 2022 at 2:45 PM Νικόλαος Ταμπουρατζής < >> >>>>> ntampouratzis@ece.auth.gr> wrote: >> >>>>> >> >>>>>> Dear Bobby, >> >>>>>> >> >>>>>> In QEMU I get the same (correct) results that I get in SE mode >> >>>>>> simulation. I get invalid results in FS simulation (in both >> >>>>>> riscv-fs.py and riscv-ubuntu-run.py). I cannot access real RISCV >> >>>>>> hardware at this moment, however, if you want you may execute my >> xhpcg >> >>>>>> binary (http://kition.mhl.tuc.gr:8000/f/4ca25fdd3c/) with the >> >>>>>> following configuration: >> >>>>>> >> >>>>>> ./xhpcg --nx=16 --ny=16 --nz=16 --npx=1 --npy=1 --npz=1 --rt=0.1 >> >>>>>> >> >>>>>> Please let me know if you have any updates! >> >>>>>> >> >>>>>> Best regards, >> >>>>>> Nikos >> >>>>>> >> >>>>>> >> >>>>>> Quoting Jason Lowe-Power <jason@lowepower.com>: >> >>>>>> >> >>>>>>> Hi Nikos, >> >>>>>>> >> >>>>>>> I notice you said the following in your original email: >> >>>>>>> >> >>>>>>> In addition, I used the RISCV Ubuntu image >> >>>>>>>> ( >> https://github.com/gem5/gem5-resources/tree/stable/src/riscv-ubuntu >> >>>> ), >> >>>>>>>> I installed the gcc compiler, compile it (through qemu) and I get >> >>>>>>>> wrong results too. >> >>>>>>> >> >>>>>>> >> >>>>>>> Is this saying you get the wrong results is QEMU? If so, the bug >> is in >> >>>>>> GCC >> >>>>>>> or the HPCG workload, not in gem5. If not, I would test in QEMU to >> >>>> make >> >>>>>>> sure the binary works there. Another way you could test to see if >> the >> >>>>>>> problem is your binary or gem5 would be to run it on real >> hardware. We >> >>>>>> have >> >>>>>>> access to some RISC-V hardware here at UC Davis, if you don't have >> >>>> access >> >>>>>>> to it. >> >>>>>>> >> >>>>>>> Cheers, >> >>>>>>> Jason >> >>>>>>> >> >>>>>>> On Tue, Sep 20, 2022 at 12:58 AM Νικόλαος Ταμπουρατζής < >> >>>>>>> ntampouratzis@ece.auth.gr> wrote: >> >>>>>>> >> >>>>>>>> Dear Bobby, >> >>>>>>>> >> >>>>>>>> 1) I use the original riscv-fs.py which is provided in the latest >> >>>> gem5 >> >>>>>>>> release. >> >>>>>>>> I run the gem5 once (./build/RISCV/gem5.fast -d ./HPCG_FS_results >> >>>>>>>> ./configs/example/gem5_library/riscv-fs.py) in order to download >> the >> >>>>>>>> riscv-bootloader-vmlinux-5.10 and riscv-disk-img. >> >>>>>>>> After this I mount the riscv-disk-img (sudo mount -o loop >> >>>>>>>> riscv-disk-img /mnt), put the xhpcg executable and I do the >> following >> >>>>>>>> changes in riscv-fs.py to boot the riscv-disk-img with executable: >> >>>>>>>> >> >>>>>>>> image = CustomDiskImageResource( >> >>>>>>>> local_path = "/home/cossim/.cache/gem5/riscv-disk-img", >> >>>>>>>> ) >> >>>>>>>> >> >>>>>>>> # Set the Full System workload. >> >>>>>>>> board.set_kernel_disk_workload( >> >>>>>>>> >> kernel=Resource("riscv-bootloader-vmlinux-5.10"), >> >>>>>>>> disk_image=image, >> >>>>>>>> ) >> >>>>>>>> >> >>>>>>>> Finally, in the >> gem5/src/python/gem5/components/boards/riscv_board.py >> >>>>>>>> I change the last line to "return ["console=ttyS0", >> >>>>>>>> "root={root_value}", "rw"]" in order to allow the write >> permissions >> >>>> in >> >>>>>>>> the image. >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> 2) The HPCG benchmark after some iterations calculates if the >> results >> >>>>>>>> are valid or not valid. In the case of FS it gives invalid >> results. >> >>>> As >> >>>>>>>> I see from the results, one (at least) problem is that produces >> >>>>>>>> different results in each HPCG execution (with the same >> >>>> configuration). >> >>>>>>>> >> >>>>>>>> Here is the HPCG output and riscv-fs.py >> >>>>>>>> (http://kition.mhl.tuc.gr:8000/d/68d82f3533/). You may reproduce >> the >> >>>>>>>> results in the video if you use the xhpcg executable >> >>>>>>>> (http://kition.mhl.tuc.gr:8000/f/4ca25fdd3c/) >> >>>>>>>> >> >>>>>>>> Please help me in order to solve it! >> >>>>>>>> >> >>>>>>>> Finally, I get invalid results in the HPL benchmark in FS mode >> too. >> >>>>>>>> >> >>>>>>>> Best regards, >> >>>>>>>> Nikos >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> Quoting Bobby Bruce <bbruce@ucdavis.edu>: >> >>>>>>>> >> >>>>>>>> > I'm going to need a bit more information to help: >> >>>>>>>> > >> >>>>>>>> > 1. In what way have you modified >> >>>>>>>> > ./configs/example/gem5_library/riscv-fs.py? Can you attach the >> >>>> script >> >>>>>>>> here? >> >>>>>>>> > 2. What error are you getting or in what way are the results >> >>>> invalid? >> >>>>>>>> > >> >>>>>>>> > - >> >>>>>>>> > Dr. Bobby R. Bruce >> >>>>>>>> > Room 3050, >> >>>>>>>> > Kemper Hall, UC Davis >> >>>>>>>> > Davis, >> >>>>>>>> > CA, 95616 >> >>>>>>>> > >> >>>>>>>> > web: https://www.bobbybruce.net >> >>>>>>>> > >> >>>>>>>> > >> >>>>>>>> > On Mon, Sep 19, 2022 at 1:43 PM Νικόλαος Ταμπουρατζής < >> >>>>>>>> > ntampouratzis@ece.auth.gr> wrote: >> >>>>>>>> > >> >>>>>>>> >> >> >>>>>>>> >> Dear gem5 community, >> >>>>>>>> >> >> >>>>>>>> >> I have successfully cross-compile the HPCG benchmark for RISCV >> >>>>>> (Serial >> >>>>>>>> >> version, without MPI and OpenMP). While it working properly in >> >>>> gem5 >> >>>>>> SE >> >>>>>>>> >> mode (./build/RISCV/gem5.fast -d ./HPCG_SE_results >> >>>>>>>> >> ./configs/example/se.py -c xhpcg --options '--nx=16 --ny=16 >> >>>> --nz=16 >> >>>>>>>> >> --npx=1 --npy=1 --npz=1 --rt=0.1'), I get invalid results in FS >> >>>>>>>> >> simulation using "./build/RISCV/gem5.fast -d ./HPCG_FS_results >> >>>>>>>> >> ./configs/example/gem5_library/riscv-fs.py" (I mount the riscv >> >>>> image >> >>>>>>>> >> and put it). >> >>>>>>>> >> >> >>>>>>>> >> Can you help me please? >> >>>>>>>> >> >> >>>>>>>> >> In addition, I used the RISCV Ubuntu image >> >>>>>>>> >> ( >> >>>> https://github.com/gem5/gem5-resources/tree/stable/src/riscv-ubuntu >> >>>>>> ), >> >>>>>>>> >> I installed the gcc compiler, compile it (through qemu) and I >> get >> >>>>>>>> >> wrong results too. >> >>>>>>>> >> >> >>>>>>>> >> Here is the Makefile which I use, the hpcg executable for RISCV >> >>>>>>>> >> (xhpcg), and a video that shows the results >> >>>>>>>> >> (http://kition.mhl.tuc.gr:8000/f/4ca25fdd3c/). >> >>>>>>>> >> >> >>>>>>>> >> P.S. I use the latest gem5 version. >> >>>>>>>> >> >> >>>>>>>> >> Thank you in advance! :) >> >>>>>>>> >> >> >>>>>>>> >> Best regards, >> >>>>>>>> >> Nikos >> >>>>>>>> >> _______________________________________________ >> >>>>>>>> >> gem5-users mailing list -- gem5-users@gem5.org >> >>>>>>>> >> To unsubscribe send an email to gem5-users-leave@gem5.org >> >>>>>>>> >> >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> _______________________________________________ >> >>>>>>>> gem5-users mailing list -- gem5-users@gem5.org >> >>>>>>>> To unsubscribe send an email to gem5-users-leave@gem5.org >> >>>>>>>> >> >>>>>> >> >>>>>> >> >>>>>> _______________________________________________ >> >>>>>> gem5-users mailing list -- gem5-users@gem5.org >> >>>>>> To unsubscribe send an email to gem5-users-leave@gem5.org >> >>>>>> >> >>>> >> >>>> >> >>>> _______________________________________________ >> >>>> gem5-users mailing list -- gem5-users@gem5.org >> >>>> To unsubscribe send an email to gem5-users-leave@gem5.org >> >>>> >> >> >> >> >> >> _______________________________________________ >> >> gem5-users mailing list -- gem5-users@gem5.org >> >> To unsubscribe send an email to gem5-users-leave@gem5.org >> > >> > >> > _______________________________________________ >> > gem5-users mailing list -- gem5-users@gem5.org >> > To unsubscribe send an email to gem5-users-leave@gem5.org >> >> >> _______________________________________________ >> gem5-users mailing list -- gem5-users@gem5.org >> To unsubscribe send an email to gem5-users-leave@gem5.org >>
JL
Jason Lowe-Power
Fri, Oct 7, 2022 4:01 PM

I have an idea...

Have you put a breakpoint in the implementation of the fsqrt_d function? I
would like to know if when running in SE mode and running in FS mode we are
using the same rounding mode. My hypothesis is that in FS mode the rounding
mode is set differently.

Cheers,
Jason

On Fri, Oct 7, 2022 at 12:15 AM Νικόλαος Ταμπουρατζής <
ntampouratzis@ece.auth.gr> wrote:

Dear Boddy,

Thanks a lot for the effort! I looked in detail and I observe that the
problem is created only using float and double variables (in the case
of int it is working properly in FS mode). Specifically, in the case
of float the variables are set to "nan", while in the case of double
the variables are set to 0.000000 (in random time - probably from some
instruction of simulated OS?). You may use a simple c/c++ example in
order to get some traces before going to HPCG...

Thank you in advance!!
Best regards,
Nikos

Quoting Bobby Bruce bbruce@ucdavis.edu:

Hey Niko,

Thanks for this analysis. I jumped a little into this today but didn't

get

as far as you did. I wanted to find a quick way to recreate the

following:

https://gem5-review.googlesource.com/c/public/gem5/+/64211.  Please feel
free to use this, if it helps any.

It's very strange to me that this bug hasn't manifested itself before but
it's undeniably there. I'll try to spend more time looking at this

tomorrow

with some traces and debug flags and see if I can narrow down the

problem.

--
Dr. Bobby R. Bruce
Room 3050,
Kemper Hall, UC Davis
Davis,
CA, 95616

web: https://www.bobbybruce.net

On Wed, Oct 5, 2022 at 2:26 PM Νικόλαος Ταμπουρατζής <
ntampouratzis@ece.auth.gr> wrote:

In my previous results, I had used double (not float) for the
following variables: result, sq_i and sq_j. In the case of float
instead of double I get "nan" and not 0.000000.

Quoting Νικόλαος Ταμπουρατζής ntampouratzis@ece.auth.gr:

Dear Jason, all,

I am trying to find the accuracy problem with RISCV-FS and I observe
that the problem is created (at least in my dummy example) because
the variables (double) are set to zero in random simulated time (for
this reason I get different results among executions of the same
code). Specifically for the following dummy code:

#include <cmath>
#include <stdio.h>

int main(){

 int dim = 10;

 float result;

 for (int iter = 0; iter < 2; iter++){
     result = 0;
     for (int i = 0; i < dim; i++){
         for (int j = 0; j < dim; j++){
             float sq_i = sqrt(i);
             float sq_j = sqrt(j);
             result += sq_i * sq_j;
             printf("ITER: %d | i: %d | j: %d Result(i: %f | j:

%f | i*j: %f): %f\n", iter, i , j, sq_i, sq_j, sq_i * sq_j, result);
}
}
printf("Final Result: %lf\n", result);
}
}

The correct Final Result in both iterations is 372.721656. However,
I get the following results in FS:

ITER: 0 | i: 0 | j: 0 Result(i: 0.000000 | j: 0.000000 | ij:
0.000000): 0.000000
ITER: 0 | i: 0 | j: 1 Result(i: 0.000000 | j: 1.000000 | i
j:
0.000000): 0.000000
ITER: 0 | i: 0 | j: 2 Result(i: 0.000000 | j: 1.414214 | ij:
0.000000): 0.000000
ITER: 0 | i: 0 | j: 3 Result(i: 0.000000 | j: 1.732051 | i
j:
0.000000): 0.000000
ITER: 0 | i: 0 | j: 4 Result(i: 0.000000 | j: 2.000000 | ij:
0.000000): 0.000000
ITER: 0 | i: 0 | j: 5 Result(i: 0.000000 | j: 2.236068 | i
j:
0.000000): 0.000000
ITER: 0 | i: 0 | j: 6 Result(i: 0.000000 | j: 2.449490 | ij:
0.000000): 0.000000
ITER: 0 | i: 0 | j: 7 Result(i: 0.000000 | j: 2.645751 | i
j:
0.000000): 0.000000
ITER: 0 | i: 0 | j: 8 Result(i: 0.000000 | j: 2.828427 | ij:
0.000000): 0.000000
ITER: 0 | i: 0 | j: 9 Result(i: 0.000000 | j: 3.000000 | i
j:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 0 Result(i: 1.000000 | j: 0.000000 | ij:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 1 Result(i: 1.000000 | j: 1.000000 | i
j:
1.000000): 1.000000
ITER: 0 | i: 1 | j: 2 Result(i: 1.000000 | j: 1.414214 | ij:
1.414214): 2.414214
ITER: 0 | i: 1 | j: 3 Result(i: 1.000000 | j: 1.732051 | i
j:
1.732051): 4.146264
ITER: 0 | i: 1 | j: 4 Result(i: 0.000000 | j: 2.000000 | ij:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 5 Result(i: 0.000000 | j: 2.236068 | i
j:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 6 Result(i: 0.000000 | j: 2.449490 | ij:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 7 Result(i: 0.000000 | j: 2.645751 | i
j:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 8 Result(i: 0.000000 | j: 2.828427 | ij:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 9 Result(i: 0.000000 | j: 3.000000 | i
j:
0.000000): 0.000000
ITER: 0 | i: 2 | j: 0 Result(i: 1.414214 | j: 0.000000 | ij:
0.000000): 0.000000
ITER: 0 | i: 2 | j: 1 Result(i: 1.414214 | j: 1.000000 | i
j:
1.414214): 1.414214
ITER: 0 | i: 2 | j: 2 Result(i: 1.414214 | j: 1.414214 | ij:
2.000000): 3.414214
ITER: 0 | i: 2 | j: 3 Result(i: 1.414214 | j: 1.732051 | i
j:
2.449490): 5.863703
ITER: 0 | i: 2 | j: 4 Result(i: 1.414214 | j: 2.000000 | ij:
2.828427): 8.692130
ITER: 0 | i: 2 | j: 5 Result(i: 1.414214 | j: 2.236068 | i
j:
3.162278): 11.854408
ITER: 0 | i: 2 | j: 6 Result(i: 1.414214 | j: 2.449490 | ij:
3.464102): 15.318510
ITER: 0 | i: 2 | j: 7 Result(i: 1.414214 | j: 2.645751 | i
j:
3.741657): 19.060167
ITER: 0 | i: 2 | j: 8 Result(i: 1.414214 | j: 2.828427 | ij:
4.000000): 23.060167
ITER: 0 | i: 2 | j: 9 Result(i: 1.414214 | j: 3.000000 | i
j:
4.242641): 27.302808
ITER: 0 | i: 3 | j: 0 Result(i: 1.732051 | j: 0.000000 | ij:
0.000000): 27.302808
ITER: 0 | i: 3 | j: 1 Result(i: 1.732051 | j: 1.000000 | i
j:
1.732051): 29.034859
ITER: 0 | i: 3 | j: 2 Result(i: 1.732051 | j: 1.414214 | ij:
2.449490): 31.484348
ITER: 0 | i: 3 | j: 3 Result(i: 1.732051 | j: 1.732051 | i
j:
3.000000): 34.484348
ITER: 0 | i: 3 | j: 4 Result(i: 1.732051 | j: 2.000000 | ij:
3.464102): 37.948450
ITER: 0 | i: 3 | j: 5 Result(i: 1.732051 | j: 2.236068 | i
j:
3.872983): 41.821433
ITER: 0 | i: 3 | j: 6 Result(i: 1.732051 | j: 2.449490 | ij:
4.242641): 46.064074
ITER: 0 | i: 3 | j: 7 Result(i: 1.732051 | j: 2.645751 | i
j:
4.582576): 50.646650
ITER: 0 | i: 3 | j: 8 Result(i: 1.732051 | j: 2.828427 | ij:
4.898979): 55.545629
ITER: 0 | i: 3 | j: 9 Result(i: 1.732051 | j: 3.000000 | i
j:
5.196152): 60.741782
ITER: 0 | i: 4 | j: 0 Result(i: 2.000000 | j: 0.000000 | ij:
0.000000): 60.741782
ITER: 0 | i: 4 | j: 1 Result(i: 2.000000 | j: 1.000000 | i
j:
2.000000): 62.741782
ITER: 0 | i: 4 | j: 2 Result(i: 2.000000 | j: 1.414214 | ij:
2.828427): 65.570209
ITER: 0 | i: 4 | j: 3 Result(i: 2.000000 | j: 1.732051 | i
j:
3.464102): 69.034310
ITER: 0 | i: 4 | j: 4 Result(i: 2.000000 | j: 2.000000 | ij:
4.000000): 73.034310
ITER: 0 | i: 4 | j: 5 Result(i: 2.000000 | j: 2.236068 | i
j:
4.472136): 77.506446
ITER: 0 | i: 4 | j: 6 Result(i: 2.000000 | j: 2.449490 | ij:
4.898979): 82.405426
ITER: 0 | i: 4 | j: 7 Result(i: 2.000000 | j: 2.645751 | i
j:
5.291503): 87.696928
ITER: 0 | i: 4 | j: 8 Result(i: 2.000000 | j: 2.828427 | ij:
5.656854): 93.353783
ITER: 0 | i: 4 | j: 9 Result(i: 2.000000 | j: 3.000000 | i
j:
6.000000): 99.353783
ITER: 0 | i: 5 | j: 0 Result(i: 2.236068 | j: 0.000000 | ij:
0.000000): 99.353783
ITER: 0 | i: 5 | j: 1 Result(i: 2.236068 | j: 1.000000 | i
j:
2.236068): 101.589851
ITER: 0 | i: 5 | j: 2 Result(i: 2.236068 | j: 1.414214 | ij:
3.162278): 104.752128
ITER: 0 | i: 5 | j: 3 Result(i: 2.236068 | j: 1.732051 | i
j:
3.872983): 108.625112
ITER: 0 | i: 5 | j: 4 Result(i: 2.236068 | j: 2.000000 | ij:
4.472136): 113.097248
ITER: 0 | i: 5 | j: 5 Result(i: 2.236068 | j: 2.236068 | i
j:
5.000000): 118.097248
ITER: 0 | i: 5 | j: 6 Result(i: 2.236068 | j: 2.449490 | ij:
5.477226): 123.574473
ITER: 0 | i: 5 | j: 7 Result(i: 2.236068 | j: 2.645751 | i
j:
5.916080): 129.490553
ITER: 0 | i: 5 | j: 8 Result(i: 2.236068 | j: 2.828427 | ij:
6.324555): 135.815108
ITER: 0 | i: 5 | j: 9 Result(i: 2.236068 | j: 3.000000 | i
j:
6.708204): 142.523312
ITER: 0 | i: 6 | j: 0 Result(i: 2.449490 | j: 0.000000 | ij:
0.000000): 142.523312
ITER: 0 | i: 6 | j: 1 Result(i: 2.449490 | j: 1.000000 | i
j:
2.449490): 144.972802
ITER: 0 | i: 6 | j: 2 Result(i: 2.449490 | j: 1.414214 | ij:
3.464102): 148.436904
ITER: 0 | i: 6 | j: 3 Result(i: 2.449490 | j: 1.732051 | i
j:
4.242641): 152.679544
ITER: 0 | i: 6 | j: 4 Result(i: 2.449490 | j: 2.000000 | ij:
4.898979): 157.578524
ITER: 0 | i: 6 | j: 5 Result(i: 2.449490 | j: 2.236068 | i
j:
5.477226): 163.055749
ITER: 0 | i: 6 | j: 6 Result(i: 2.449490 | j: 2.449490 | ij:
6.000000): 169.055749
ITER: 0 | i: 6 | j: 7 Result(i: 2.449490 | j: 2.645751 | i
j:
6.480741): 175.536490
ITER: 0 | i: 6 | j: 8 Result(i: 2.449490 | j: 2.828427 | ij:
6.928203): 182.464693
ITER: 0 | i: 6 | j: 9 Result(i: 2.449490 | j: 3.000000 | i
j:
7.348469): 189.813162
ITER: 0 | i: 7 | j: 0 Result(i: 2.645751 | j: 0.000000 | ij:
0.000000): 189.813162
ITER: 0 | i: 7 | j: 1 Result(i: 2.645751 | j: 1.000000 | i
j:
2.645751): 192.458914
ITER: 0 | i: 7 | j: 2 Result(i: 2.645751 | j: 1.414214 | ij:
3.741657): 196.200571
ITER: 0 | i: 7 | j: 3 Result(i: 2.645751 | j: 1.732051 | i
j:
4.582576): 200.783147
ITER: 0 | i: 7 | j: 4 Result(i: 2.645751 | j: 2.000000 | ij:
5.291503): 206.074649
ITER: 0 | i: 7 | j: 5 Result(i: 2.645751 | j: 2.236068 | i
j:
5.916080): 211.990729
ITER: 0 | i: 7 | j: 6 Result(i: 2.645751 | j: 2.449490 | ij:
6.480741): 218.471470
ITER: 0 | i: 7 | j: 7 Result(i: 2.645751 | j: 2.645751 | i
j:
7.000000): 225.471470
ITER: 0 | i: 7 | j: 8 Result(i: 2.645751 | j: 2.828427 | ij:
7.483315): 232.954785
ITER: 0 | i: 7 | j: 9 Result(i: 2.645751 | j: 3.000000 | i
j:
7.937254): 240.892039
ITER: 0 | i: 8 | j: 0 Result(i: 2.828427 | j: 0.000000 | ij:
0.000000): 240.892039
ITER: 0 | i: 8 | j: 1 Result(i: 2.828427 | j: 1.000000 | i
j:
2.828427): 243.720466
ITER: 0 | i: 8 | j: 2 Result(i: 2.828427 | j: 1.414214 | ij:
4.000000): 247.720466
ITER: 0 | i: 8 | j: 3 Result(i: 2.828427 | j: 1.732051 | i
j:
4.898979): 252.619445
ITER: 0 | i: 8 | j: 4 Result(i: 2.828427 | j: 2.000000 | ij:
5.656854): 258.276300
ITER: 0 | i: 8 | j: 5 Result(i: 2.828427 | j: 2.236068 | i
j:
6.324555): 264.600855
ITER: 0 | i: 8 | j: 6 Result(i: 2.828427 | j: 2.449490 | ij:
6.928203): 271.529058
ITER: 0 | i: 8 | j: 7 Result(i: 2.828427 | j: 2.645751 | i
j:
7.483315): 279.012373
ITER: 0 | i: 8 | j: 8 Result(i: 2.828427 | j: 2.828427 | ij:
8.000000): 287.012373
ITER: 0 | i: 8 | j: 9 Result(i: 2.828427 | j: 3.000000 | i
j:
8.485281): 295.497654
ITER: 0 | i: 9 | j: 0 Result(i: 3.000000 | j: 0.000000 | ij:
0.000000): 295.497654
ITER: 0 | i: 9 | j: 1 Result(i: 3.000000 | j: 1.000000 | i
j:
3.000000): 298.497654
ITER: 0 | i: 9 | j: 2 Result(i: 3.000000 | j: 1.414214 | ij:
4.242641): 302.740295
ITER: 0 | i: 9 | j: 3 Result(i: 3.000000 | j: 1.732051 | i
j:
5.196152): 307.936447
ITER: 0 | i: 9 | j: 4 Result(i: 3.000000 | j: 2.000000 | ij:
6.000000): 313.936447
ITER: 0 | i: 9 | j: 5 Result(i: 3.000000 | j: 2.236068 | i
j:
6.708204): 320.644651
ITER: 0 | i: 9 | j: 6 Result(i: 3.000000 | j: 2.449490 | ij:
7.348469): 327.993120
ITER: 0 | i: 9 | j: 7 Result(i: 3.000000 | j: 2.645751 | i
j:
7.937254): 335.930374
ITER: 0 | i: 9 | j: 8 Result(i: 3.000000 | j: 2.828427 | ij:
8.485281): 344.415656
ITER: 0 | i: 9 | j: 9 Result(i: 3.000000 | j: 3.000000 | i
j:
9.000000): 353.415656
Final Result: 353.415656
ITER: 1 | i: 0 | j: 0 Result(i: 0.000000 | j: 0.000000 | ij:
0.000000): 0.000000
ITER: 1 | i: 0 | j: 1 Result(i: 0.000000 | j: 1.000000 | i
j:
0.000000): 0.000000
ITER: 1 | i: 0 | j: 2 Result(i: 0.000000 | j: 1.414214 | ij:
0.000000): 0.000000
ITER: 1 | i: 0 | j: 3 Result(i: 0.000000 | j: 1.732051 | i
j:
0.000000): 0.000000
ITER: 1 | i: 0 | j: 4 Result(i: 0.000000 | j: 2.000000 | ij:
0.000000): 0.000000
ITER: 1 | i: 0 | j: 5 Result(i: 0.000000 | j: 2.236068 | i
j:
0.000000): 0.000000
ITER: 1 | i: 0 | j: 6 Result(i: 0.000000 | j: 2.449490 | ij:
0.000000): 0.000000
ITER: 1 | i: 0 | j: 7 Result(i: 0.000000 | j: 2.645751 | i
j:
0.000000): 0.000000
ITER: 1 | i: 0 | j: 8 Result(i: 0.000000 | j: 2.828427 | ij:
0.000000): 0.000000
ITER: 1 | i: 0 | j: 9 Result(i: 0.000000 | j: 3.000000 | i
j:
0.000000): 0.000000
ITER: 1 | i: 1 | j: 0 Result(i: 1.000000 | j: 0.000000 | ij:
0.000000): 0.000000
ITER: 1 | i: 1 | j: 1 Result(i: 1.000000 | j: 1.000000 | i
j:
1.000000): 1.000000
ITER: 1 | i: 1 | j: 2 Result(i: 1.000000 | j: 1.414214 | ij:
1.414214): 2.414214
ITER: 1 | i: 1 | j: 3 Result(i: 1.000000 | j: 1.732051 | i
j:
1.732051): 4.146264
ITER: 1 | i: 1 | j: 4 Result(i: 1.000000 | j: 2.000000 | ij:
2.000000): 6.146264
ITER: 1 | i: 1 | j: 5 Result(i: 1.000000 | j: 2.236068 | i
j:
2.236068): 8.382332
ITER: 1 | i: 1 | j: 6 Result(i: 1.000000 | j: 2.449490 | ij:
2.449490): 10.831822
ITER: 1 | i: 1 | j: 7 Result(i: 1.000000 | j: 2.645751 | i
j:
2.645751): 13.477573
ITER: 1 | i: 1 | j: 8 Result(i: 1.000000 | j: 2.828427 | ij:
2.828427): 16.306001
ITER: 1 | i: 1 | j: 9 Result(i: 1.000000 | j: 3.000000 | i
j:
3.000000): 19.306001
ITER: 1 | i: 2 | j: 0 Result(i: 1.414214 | j: 0.000000 | ij:
0.000000): 19.306001
ITER: 1 | i: 2 | j: 1 Result(i: 1.414214 | j: 1.000000 | i
j:
1.414214): 20.720214
ITER: 1 | i: 2 | j: 2 Result(i: 1.414214 | j: 1.414214 | ij:
2.000000): 22.720214
ITER: 1 | i: 2 | j: 3 Result(i: 1.414214 | j: 1.732051 | i
j:
2.449490): 25.169704
ITER: 1 | i: 2 | j: 4 Result(i: 1.414214 | j: 2.000000 | ij:
2.828427): 27.998131
ITER: 1 | i: 2 | j: 5 Result(i: 1.414214 | j: 2.236068 | i
j:
3.162278): 31.160409
ITER: 1 | i: 2 | j: 6 Result(i: 1.414214 | j: 2.449490 | ij:
3.464102): 34.624510
ITER: 1 | i: 2 | j: 7 Result(i: 1.414214 | j: 2.645751 | i
j:
3.741657): 38.366168
ITER: 1 | i: 2 | j: 8 Result(i: 1.414214 | j: 2.828427 | ij:
4.000000): 42.366168
ITER: 1 | i: 2 | j: 9 Result(i: 1.414214 | j: 3.000000 | i
j:
4.242641): 46.608808
ITER: 1 | i: 3 | j: 0 Result(i: 1.732051 | j: 0.000000 | ij:
0.000000): 46.608808
ITER: 1 | i: 3 | j: 1 Result(i: 1.732051 | j: 1.000000 | i
j:
1.732051): 48.340859
ITER: 1 | i: 3 | j: 2 Result(i: 1.732051 | j: 1.414214 | ij:
2.449490): 50.790349
ITER: 1 | i: 3 | j: 3 Result(i: 1.732051 | j: 1.732051 | i
j:
3.000000): 53.790349
ITER: 1 | i: 3 | j: 4 Result(i: 1.732051 | j: 2.000000 | ij:
3.464102): 57.254450
ITER: 1 | i: 3 | j: 5 Result(i: 1.732051 | j: 2.236068 | i
j:
3.872983): 61.127434
ITER: 1 | i: 3 | j: 6 Result(i: 1.732051 | j: 2.449490 | ij:
4.242641): 65.370075
ITER: 1 | i: 3 | j: 7 Result(i: 1.732051 | j: 2.645751 | i
j:
4.582576): 69.952650
ITER: 1 | i: 3 | j: 8 Result(i: 1.732051 | j: 2.828427 | ij:
4.898979): 74.851630
ITER: 1 | i: 3 | j: 9 Result(i: 1.732051 | j: 3.000000 | i
j:
5.196152): 80.047782
ITER: 1 | i: 4 | j: 0 Result(i: 2.000000 | j: 0.000000 | ij:
0.000000): 80.047782
ITER: 1 | i: 4 | j: 1 Result(i: 2.000000 | j: 1.000000 | i
j:
2.000000): 82.047782
ITER: 1 | i: 4 | j: 2 Result(i: 2.000000 | j: 1.414214 | ij:
2.828427): 84.876209
ITER: 1 | i: 4 | j: 3 Result(i: 2.000000 | j: 1.732051 | i
j:
3.464102): 88.340311
ITER: 1 | i: 4 | j: 4 Result(i: 2.000000 | j: 2.000000 | ij:
4.000000): 92.340311
ITER: 1 | i: 4 | j: 5 Result(i: 2.000000 | j: 2.236068 | i
j:
4.472136): 96.812447
ITER: 1 | i: 4 | j: 6 Result(i: 2.000000 | j: 2.449490 | ij:
4.898979): 101.711426
ITER: 1 | i: 4 | j: 7 Result(i: 2.000000 | j: 2.645751 | i
j:
5.291503): 107.002929
ITER: 1 | i: 4 | j: 8 Result(i: 2.000000 | j: 2.828427 | ij:
5.656854): 112.659783
ITER: 1 | i: 4 | j: 9 Result(i: 2.000000 | j: 3.000000 | i
j:
6.000000): 118.659783
ITER: 1 | i: 5 | j: 0 Result(i: 2.236068 | j: 0.000000 | ij:
0.000000): 118.659783
ITER: 1 | i: 5 | j: 1 Result(i: 2.236068 | j: 1.000000 | i
j:
2.236068): 120.895851
ITER: 1 | i: 5 | j: 2 Result(i: 2.236068 | j: 1.414214 | ij:
3.162278): 124.058129
ITER: 1 | i: 5 | j: 3 Result(i: 2.236068 | j: 1.732051 | i
j:
3.872983): 127.931112
ITER: 1 | i: 5 | j: 4 Result(i: 2.236068 | j: 2.000000 | ij:
4.472136): 132.403248
ITER: 1 | i: 5 | j: 5 Result(i: 2.236068 | j: 2.236068 | i
j:
5.000000): 137.403248
ITER: 1 | i: 5 | j: 6 Result(i: 2.236068 | j: 2.449490 | ij:
5.477226): 142.880474
ITER: 1 | i: 5 | j: 7 Result(i: 2.236068 | j: 2.645751 | i
j:
5.916080): 148.796553
ITER: 1 | i: 5 | j: 8 Result(i: 2.236068 | j: 2.828427 | ij:
6.324555): 155.121109
ITER: 1 | i: 5 | j: 9 Result(i: 2.236068 | j: 3.000000 | i
j:
6.708204): 161.829313
ITER: 1 | i: 6 | j: 0 Result(i: 2.449490 | j: 0.000000 | ij:
0.000000): 161.829313
ITER: 1 | i: 6 | j: 1 Result(i: 2.449490 | j: 1.000000 | i
j:
2.449490): 164.278802
ITER: 1 | i: 6 | j: 2 Result(i: 2.449490 | j: 1.414214 | ij:
3.464102): 167.742904
ITER: 1 | i: 6 | j: 3 Result(i: 2.449490 | j: 1.732051 | i
j:
4.242641): 171.985545
ITER: 1 | i: 6 | j: 4 Result(i: 2.449490 | j: 2.000000 | ij:
4.898979): 176.884524
ITER: 1 | i: 6 | j: 5 Result(i: 2.449490 | j: 2.236068 | i
j:
5.477226): 182.361750
ITER: 1 | i: 6 | j: 6 Result(i: 2.449490 | j: 2.449490 | ij:
6.000000): 188.361750
ITER: 1 | i: 6 | j: 7 Result(i: 2.449490 | j: 2.645751 | i
j:
6.480741): 194.842491
ITER: 1 | i: 6 | j: 8 Result(i: 2.449490 | j: 2.828427 | ij:
6.928203): 201.770694
ITER: 1 | i: 6 | j: 9 Result(i: 2.449490 | j: 3.000000 | i
j:
7.348469): 209.119163
ITER: 1 | i: 7 | j: 0 Result(i: 2.645751 | j: 0.000000 | ij:
0.000000): 209.119163
ITER: 1 | i: 7 | j: 1 Result(i: 2.645751 | j: 1.000000 | i
j:
2.645751): 211.764914
ITER: 1 | i: 7 | j: 2 Result(i: 2.645751 | j: 1.414214 | ij:
3.741657): 215.506572
ITER: 1 | i: 7 | j: 3 Result(i: 2.645751 | j: 1.732051 | i
j:
4.582576): 220.089147
ITER: 1 | i: 7 | j: 4 Result(i: 2.645751 | j: 2.000000 | ij:
5.291503): 225.380650
ITER: 1 | i: 7 | j: 5 Result(i: 2.645751 | j: 2.236068 | i
j:
5.916080): 231.296730
ITER: 1 | i: 7 | j: 6 Result(i: 2.645751 | j: 2.449490 | ij:
6.480741): 237.777470
ITER: 1 | i: 7 | j: 7 Result(i: 2.645751 | j: 2.645751 | i
j:
7.000000): 244.777470
ITER: 1 | i: 7 | j: 8 Result(i: 2.645751 | j: 2.828427 | ij:
7.483315): 252.260785
ITER: 1 | i: 7 | j: 9 Result(i: 2.645751 | j: 3.000000 | i
j:
7.937254): 260.198039
ITER: 1 | i: 8 | j: 0 Result(i: 2.828427 | j: 0.000000 | ij:
0.000000): 260.198039
ITER: 1 | i: 8 | j: 1 Result(i: 2.828427 | j: 1.000000 | i
j:
2.828427): 263.026466
ITER: 1 | i: 8 | j: 2 Result(i: 2.828427 | j: 1.414214 | ij:
4.000000): 267.026466
ITER: 1 | i: 8 | j: 3 Result(i: 2.828427 | j: 1.732051 | i
j:
4.898979): 271.925446
ITER: 1 | i: 8 | j: 4 Result(i: 2.828427 | j: 2.000000 | ij:
5.656854): 277.582300
ITER: 1 | i: 8 | j: 5 Result(i: 2.828427 | j: 2.236068 | i
j:
6.324555): 283.906855
ITER: 1 | i: 8 | j: 6 Result(i: 2.828427 | j: 2.449490 | ij:
6.928203): 290.835059
ITER: 1 | i: 8 | j: 7 Result(i: 2.828427 | j: 2.645751 | i
j:
7.483315): 298.318373
ITER: 1 | i: 8 | j: 8 Result(i: 2.828427 | j: 2.828427 | ij:
8.000000): 306.318373
ITER: 1 | i: 8 | j: 9 Result(i: 2.828427 | j: 3.000000 | i
j:
8.485281): 314.803655
ITER: 1 | i: 9 | j: 0 Result(i: 3.000000 | j: 0.000000 | ij:
0.000000): 314.803655
ITER: 1 | i: 9 | j: 1 Result(i: 3.000000 | j: 1.000000 | i
j:
3.000000): 317.803655
ITER: 1 | i: 9 | j: 2 Result(i: 3.000000 | j: 1.414214 | ij:
4.242641): 322.046295
ITER: 1 | i: 9 | j: 3 Result(i: 3.000000 | j: 1.732051 | i
j:
5.196152): 327.242448
ITER: 1 | i: 9 | j: 4 Result(i: 3.000000 | j: 2.000000 | ij:
6.000000): 333.242448
ITER: 1 | i: 9 | j: 5 Result(i: 3.000000 | j: 2.236068 | i
j:
6.708204): 339.950652
ITER: 1 | i: 9 | j: 6 Result(i: 3.000000 | j: 2.449490 | ij:
7.348469): 347.299121
ITER: 1 | i: 9 | j: 7 Result(i: 3.000000 | j: 2.645751 | i
j:
7.937254): 355.236375
ITER: 1 | i: 9 | j: 8 Result(i: 3.000000 | j: 2.828427 | ij:
8.485281): 363.721656
ITER: 1 | i: 9 | j: 9 Result(i: 3.000000 | j: 3.000000 | i
j:
9.000000): 372.721656
Final Result: 372.721656

As we can see in the following iterations the sqrt(1) as well as the
result is set to zero for some reason.

ITER: 0 | i: 1 | j: 4 Result(i: 0.000000 | j: 2.000000 | ij:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 5 Result(i: 0.000000 | j: 2.236068 | i
j:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 6 Result(i: 0.000000 | j: 2.449490 | ij:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 7 Result(i: 0.000000 | j: 2.645751 | i
j:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 8 Result(i: 0.000000 | j: 2.828427 | ij:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 9 Result(i: 0.000000 | j: 3.000000 | i
j:
0.000000): 0.000000

Please help me to resolve the accuracy issue! I think that it will
be very useful for gem5 community.

To be noticed, I find the correct simulated tick in which the
application started in FS (using m5 dumpstats), and I start the
--debug-start, but the trace file which is generated is 10x larger
than SE mode for the same application. How can I compare them?

Thank you in advance!
Best regards,
Nikos

Quoting Νικόλαος Ταμπουρατζής ntampouratzis@ece.auth.gr:

Dear Jason,

I am trying to use --debug-start but in FS mode it is very
difficult to find the tick on which the application is started!

However, I am writing the following very simple c++ program:

#include <cmath>
#include <stdio.h>

int main(){

int dim = 4096;

double result;

for (int iter = 0; iter < 2; iter++){
    result = 0;
    for (int i = 0; i < dim; i++){
        for (int j = 0; j < dim; j++){
            result += sqrt(i) * sqrt(j);
        }
    }
    printf("Result: %lf\n", result); //Result: 30530733453.127449
}

}

I cross-compile it using: riscv64-linux-gnu-g++ -static -O3 -o
test_riscv test_riscv.cpp

While in X86 (without cross-compilation of course), QEMU-RISCV,
GEM5-SE the result is the same (30530733453.127449), in GEM5-FS the
result is different! In addition, the result is also different
between the 2 iterations.

Please reproduce the error if you want in order to verify my result.
Ηow can the issue be resolved?

Thank you in advance!

Best regards,
Nikos

Quoting Jason Lowe-Power jason@lowepower.com:

Hi Nikos,

You can use --debug-start to start the debugging after some number

of

ticks. Also, I would expect that the difference should come up

quickly, so

no need to run the program to the end.

For the FS mode one, you will want to just start the trace as the
application starts. This could be a bit of a pain.

I'm not really sure what fundamentally could be different. FS and SE

mode

use the exact same code for executing instructions, so I don't think

that's

the problem. Have you tried running for smaller inputs or just one
iteration?

Jason

On Wed, Sep 21, 2022 at 9:04 AM Νικόλαος Ταμπουρατζής <
ntampouratzis@ece.auth.gr> wrote:

Dear Bobby,

Iam trying to add --debug-flags=Exec (building the gem5 for

gem5.opt

not for gem5.fast which I had) but the debug traces exceed the 20GB
(and it is not finished yet) for less than 1 simulated second. How

can

I reduce the size of the debug-flags (or set something more

specific)?

In contrast I build the HPCG benchmark with DHPCG_DEBUG flag. If

you

want, you can compare these two output files
(hpcg20010909T014640_SE_Mode & HPCG-Benchmark_3.1_FS_Mode). As you

can

see, something goes wrong with the accuracy of calculations in FS

mode

(benchmark uses double precission). You can find the files here:
http://kition.mhl.tuc.gr:8000/d/68d82f3533/

Best regards,
Nikos

Quoting Jason Lowe-Power jason@lowepower.com:

That's quite odd that it works in SE mode but not FS mode!

I would suggest running with --debug-flags=Exec for both and then

perform a

diff to see how they differ.

Cheers,
Jason

On Tue, Sep 20, 2022 at 2:45 PM Νικόλαος Ταμπουρατζής <
ntampouratzis@ece.auth.gr> wrote:

Dear Bobby,

In QEMU I get the same (correct) results that I get in SE mode
simulation. I get invalid results in FS simulation (in both
riscv-fs.py and riscv-ubuntu-run.py). I cannot access real RISCV
hardware at this moment, however, if you want you may execute my

xhpcg

binary (http://kition.mhl.tuc.gr:8000/f/4ca25fdd3c/) with the
following configuration:

./xhpcg --nx=16 --ny=16 --nz=16 --npx=1 --npy=1 --npz=1 --rt=0.1

Please let me know if you have any updates!

Best regards,
Nikos

Quoting Jason Lowe-Power jason@lowepower.com:

Hi Nikos,

I notice you said the following in your original email:

In addition, I used the RISCV Ubuntu image

(

),

I installed the gcc compiler, compile it (through qemu) and I

get

wrong results too.

Is this saying you get the wrong results is QEMU? If so, the bug

is in

GCC

or the HPCG workload, not in gem5. If not, I would test in QEMU

to

make

sure the binary works there. Another way you could test to see

if

the

problem is your binary or gem5 would be to run it on real

hardware. We

have

access to some RISC-V hardware here at UC Davis, if you don't

have

access

to it.

Cheers,
Jason

On Tue, Sep 20, 2022 at 12:58 AM Νικόλαος Ταμπουρατζής <
ntampouratzis@ece.auth.gr> wrote:

Dear Bobby,

  1. I use the original riscv-fs.py which is provided in the

latest

gem5

release.
I run the gem5 once (./build/RISCV/gem5.fast -d

./HPCG_FS_results

./configs/example/gem5_library/riscv-fs.py) in order to

download

the

riscv-bootloader-vmlinux-5.10 and riscv-disk-img.
After this I mount the riscv-disk-img (sudo mount -o loop
riscv-disk-img /mnt), put the xhpcg executable and I do the

following

changes in riscv-fs.py to boot the riscv-disk-img with

executable:

image = CustomDiskImageResource(
local_path = "/home/cossim/.cache/gem5/riscv-disk-img",
)

Set the Full System workload.

board.set_kernel_disk_workload(

kernel=Resource("riscv-bootloader-vmlinux-5.10"),

                 disk_image=image,

)

Finally, in the

gem5/src/python/gem5/components/boards/riscv_board.py

I change the last line to "return ["console=ttyS0",
"root={root_value}", "rw"]" in order to allow the write

permissions

in

the image.

  1. The HPCG benchmark after some iterations calculates if the

results

are valid or not valid. In the case of FS it gives invalid

results.

As

I see from the results, one (at least) problem is that produces
different results in each HPCG execution (with the same

configuration).

Here is the HPCG output and riscv-fs.py
(http://kition.mhl.tuc.gr:8000/d/68d82f3533/). You may

reproduce

the

results in the video if you use the xhpcg executable
(http://kition.mhl.tuc.gr:8000/f/4ca25fdd3c/)

Please help me in order to solve it!

Finally, I get invalid results in the HPL benchmark in FS mode

too.

Best regards,
Nikos

Quoting Bobby Bruce bbruce@ucdavis.edu:

I'm going to need a bit more information to help:

  1. In what way have you modified
    ./configs/example/gem5_library/riscv-fs.py? Can you attach

the

script

here?

  1. What error are you getting or in what way are the results

invalid?

Dr. Bobby R. Bruce
Room 3050,
Kemper Hall, UC Davis
Davis,
CA, 95616

web: https://www.bobbybruce.net

On Mon, Sep 19, 2022 at 1:43 PM Νικόλαος Ταμπουρατζής <
ntampouratzis@ece.auth.gr> wrote:

Dear gem5 community,

I have successfully cross-compile the HPCG benchmark for

RISCV

(Serial

version, without MPI and OpenMP). While it working properly

in

gem5

SE

mode (./build/RISCV/gem5.fast -d ./HPCG_SE_results
./configs/example/se.py -c xhpcg --options '--nx=16 --ny=16

--nz=16

--npx=1 --npy=1 --npz=1 --rt=0.1'), I get invalid results

in FS

simulation using "./build/RISCV/gem5.fast -d

./HPCG_FS_results

./configs/example/gem5_library/riscv-fs.py" (I mount the

riscv

image

and put it).

Can you help me please?

In addition, I used the RISCV Ubuntu image
(

),

I installed the gcc compiler, compile it (through qemu) and

I

get

wrong results too.

Here is the Makefile which I use, the hpcg executable for

RISCV

(xhpcg), and a video that shows the results
(http://kition.mhl.tuc.gr:8000/f/4ca25fdd3c/).

P.S. I use the latest gem5 version.

Thank you in advance! :)

Best regards,
Nikos


gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-leave@gem5.org


gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-leave@gem5.org


gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-leave@gem5.org


gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-leave@gem5.org


gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-leave@gem5.org


gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-leave@gem5.org


gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-leave@gem5.org


gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-leave@gem5.org

I have an idea... Have you put a breakpoint in the implementation of the fsqrt_d function? I would like to know if when running in SE mode and running in FS mode we are using the same rounding mode. My hypothesis is that in FS mode the rounding mode is set differently. Cheers, Jason On Fri, Oct 7, 2022 at 12:15 AM Νικόλαος Ταμπουρατζής < ntampouratzis@ece.auth.gr> wrote: > Dear Boddy, > > Thanks a lot for the effort! I looked in detail and I observe that the > problem is created only using float and double variables (in the case > of int it is working properly in FS mode). Specifically, in the case > of float the variables are set to "nan", while in the case of double > the variables are set to 0.000000 (in random time - probably from some > instruction of simulated OS?). You may use a simple c/c++ example in > order to get some traces before going to HPCG... > > Thank you in advance!! > Best regards, > Nikos > > > Quoting Bobby Bruce <bbruce@ucdavis.edu>: > > > Hey Niko, > > > > Thanks for this analysis. I jumped a little into this today but didn't > get > > as far as you did. I wanted to find a quick way to recreate the > following: > > https://gem5-review.googlesource.com/c/public/gem5/+/64211. Please feel > > free to use this, if it helps any. > > > > It's very strange to me that this bug hasn't manifested itself before but > > it's undeniably there. I'll try to spend more time looking at this > tomorrow > > with some traces and debug flags and see if I can narrow down the > problem. > > > > -- > > Dr. Bobby R. Bruce > > Room 3050, > > Kemper Hall, UC Davis > > Davis, > > CA, 95616 > > > > web: https://www.bobbybruce.net > > > > > > On Wed, Oct 5, 2022 at 2:26 PM Νικόλαος Ταμπουρατζής < > > ntampouratzis@ece.auth.gr> wrote: > > > >> In my previous results, I had used double (not float) for the > >> following variables: result, sq_i and sq_j. In the case of float > >> instead of double I get "nan" and not 0.000000. > >> > >> Quoting Νικόλαος Ταμπουρατζής <ntampouratzis@ece.auth.gr>: > >> > >> > Dear Jason, all, > >> > > >> > I am trying to find the accuracy problem with RISCV-FS and I observe > >> > that the problem is created (at least in my dummy example) because > >> > the variables (double) are set to zero in random simulated time (for > >> > this reason I get different results among executions of the same > >> > code). Specifically for the following dummy code: > >> > > >> > > >> > #include <cmath> > >> > #include <stdio.h> > >> > > >> > int main(){ > >> > > >> > int dim = 10; > >> > > >> > float result; > >> > > >> > for (int iter = 0; iter < 2; iter++){ > >> > result = 0; > >> > for (int i = 0; i < dim; i++){ > >> > for (int j = 0; j < dim; j++){ > >> > float sq_i = sqrt(i); > >> > float sq_j = sqrt(j); > >> > result += sq_i * sq_j; > >> > printf("ITER: %d | i: %d | j: %d Result(i: %f | j: > >> > %f | i*j: %f): %f\n", iter, i , j, sq_i, sq_j, sq_i * sq_j, result); > >> > } > >> > } > >> > printf("Final Result: %lf\n", result); > >> > } > >> > } > >> > > >> > > >> > The correct Final Result in both iterations is 372.721656. However, > >> > I get the following results in FS: > >> > > >> > ITER: 0 | i: 0 | j: 0 Result(i: 0.000000 | j: 0.000000 | i*j: > >> > 0.000000): 0.000000 > >> > ITER: 0 | i: 0 | j: 1 Result(i: 0.000000 | j: 1.000000 | i*j: > >> > 0.000000): 0.000000 > >> > ITER: 0 | i: 0 | j: 2 Result(i: 0.000000 | j: 1.414214 | i*j: > >> > 0.000000): 0.000000 > >> > ITER: 0 | i: 0 | j: 3 Result(i: 0.000000 | j: 1.732051 | i*j: > >> > 0.000000): 0.000000 > >> > ITER: 0 | i: 0 | j: 4 Result(i: 0.000000 | j: 2.000000 | i*j: > >> > 0.000000): 0.000000 > >> > ITER: 0 | i: 0 | j: 5 Result(i: 0.000000 | j: 2.236068 | i*j: > >> > 0.000000): 0.000000 > >> > ITER: 0 | i: 0 | j: 6 Result(i: 0.000000 | j: 2.449490 | i*j: > >> > 0.000000): 0.000000 > >> > ITER: 0 | i: 0 | j: 7 Result(i: 0.000000 | j: 2.645751 | i*j: > >> > 0.000000): 0.000000 > >> > ITER: 0 | i: 0 | j: 8 Result(i: 0.000000 | j: 2.828427 | i*j: > >> > 0.000000): 0.000000 > >> > ITER: 0 | i: 0 | j: 9 Result(i: 0.000000 | j: 3.000000 | i*j: > >> > 0.000000): 0.000000 > >> > ITER: 0 | i: 1 | j: 0 Result(i: 1.000000 | j: 0.000000 | i*j: > >> > 0.000000): 0.000000 > >> > ITER: 0 | i: 1 | j: 1 Result(i: 1.000000 | j: 1.000000 | i*j: > >> > 1.000000): 1.000000 > >> > ITER: 0 | i: 1 | j: 2 Result(i: 1.000000 | j: 1.414214 | i*j: > >> > 1.414214): 2.414214 > >> > ITER: 0 | i: 1 | j: 3 Result(i: 1.000000 | j: 1.732051 | i*j: > >> > 1.732051): 4.146264 > >> > ITER: 0 | i: 1 | j: 4 Result(i: 0.000000 | j: 2.000000 | i*j: > >> > 0.000000): 0.000000 > >> > ITER: 0 | i: 1 | j: 5 Result(i: 0.000000 | j: 2.236068 | i*j: > >> > 0.000000): 0.000000 > >> > ITER: 0 | i: 1 | j: 6 Result(i: 0.000000 | j: 2.449490 | i*j: > >> > 0.000000): 0.000000 > >> > ITER: 0 | i: 1 | j: 7 Result(i: 0.000000 | j: 2.645751 | i*j: > >> > 0.000000): 0.000000 > >> > ITER: 0 | i: 1 | j: 8 Result(i: 0.000000 | j: 2.828427 | i*j: > >> > 0.000000): 0.000000 > >> > ITER: 0 | i: 1 | j: 9 Result(i: 0.000000 | j: 3.000000 | i*j: > >> > 0.000000): 0.000000 > >> > ITER: 0 | i: 2 | j: 0 Result(i: 1.414214 | j: 0.000000 | i*j: > >> > 0.000000): 0.000000 > >> > ITER: 0 | i: 2 | j: 1 Result(i: 1.414214 | j: 1.000000 | i*j: > >> > 1.414214): 1.414214 > >> > ITER: 0 | i: 2 | j: 2 Result(i: 1.414214 | j: 1.414214 | i*j: > >> > 2.000000): 3.414214 > >> > ITER: 0 | i: 2 | j: 3 Result(i: 1.414214 | j: 1.732051 | i*j: > >> > 2.449490): 5.863703 > >> > ITER: 0 | i: 2 | j: 4 Result(i: 1.414214 | j: 2.000000 | i*j: > >> > 2.828427): 8.692130 > >> > ITER: 0 | i: 2 | j: 5 Result(i: 1.414214 | j: 2.236068 | i*j: > >> > 3.162278): 11.854408 > >> > ITER: 0 | i: 2 | j: 6 Result(i: 1.414214 | j: 2.449490 | i*j: > >> > 3.464102): 15.318510 > >> > ITER: 0 | i: 2 | j: 7 Result(i: 1.414214 | j: 2.645751 | i*j: > >> > 3.741657): 19.060167 > >> > ITER: 0 | i: 2 | j: 8 Result(i: 1.414214 | j: 2.828427 | i*j: > >> > 4.000000): 23.060167 > >> > ITER: 0 | i: 2 | j: 9 Result(i: 1.414214 | j: 3.000000 | i*j: > >> > 4.242641): 27.302808 > >> > ITER: 0 | i: 3 | j: 0 Result(i: 1.732051 | j: 0.000000 | i*j: > >> > 0.000000): 27.302808 > >> > ITER: 0 | i: 3 | j: 1 Result(i: 1.732051 | j: 1.000000 | i*j: > >> > 1.732051): 29.034859 > >> > ITER: 0 | i: 3 | j: 2 Result(i: 1.732051 | j: 1.414214 | i*j: > >> > 2.449490): 31.484348 > >> > ITER: 0 | i: 3 | j: 3 Result(i: 1.732051 | j: 1.732051 | i*j: > >> > 3.000000): 34.484348 > >> > ITER: 0 | i: 3 | j: 4 Result(i: 1.732051 | j: 2.000000 | i*j: > >> > 3.464102): 37.948450 > >> > ITER: 0 | i: 3 | j: 5 Result(i: 1.732051 | j: 2.236068 | i*j: > >> > 3.872983): 41.821433 > >> > ITER: 0 | i: 3 | j: 6 Result(i: 1.732051 | j: 2.449490 | i*j: > >> > 4.242641): 46.064074 > >> > ITER: 0 | i: 3 | j: 7 Result(i: 1.732051 | j: 2.645751 | i*j: > >> > 4.582576): 50.646650 > >> > ITER: 0 | i: 3 | j: 8 Result(i: 1.732051 | j: 2.828427 | i*j: > >> > 4.898979): 55.545629 > >> > ITER: 0 | i: 3 | j: 9 Result(i: 1.732051 | j: 3.000000 | i*j: > >> > 5.196152): 60.741782 > >> > ITER: 0 | i: 4 | j: 0 Result(i: 2.000000 | j: 0.000000 | i*j: > >> > 0.000000): 60.741782 > >> > ITER: 0 | i: 4 | j: 1 Result(i: 2.000000 | j: 1.000000 | i*j: > >> > 2.000000): 62.741782 > >> > ITER: 0 | i: 4 | j: 2 Result(i: 2.000000 | j: 1.414214 | i*j: > >> > 2.828427): 65.570209 > >> > ITER: 0 | i: 4 | j: 3 Result(i: 2.000000 | j: 1.732051 | i*j: > >> > 3.464102): 69.034310 > >> > ITER: 0 | i: 4 | j: 4 Result(i: 2.000000 | j: 2.000000 | i*j: > >> > 4.000000): 73.034310 > >> > ITER: 0 | i: 4 | j: 5 Result(i: 2.000000 | j: 2.236068 | i*j: > >> > 4.472136): 77.506446 > >> > ITER: 0 | i: 4 | j: 6 Result(i: 2.000000 | j: 2.449490 | i*j: > >> > 4.898979): 82.405426 > >> > ITER: 0 | i: 4 | j: 7 Result(i: 2.000000 | j: 2.645751 | i*j: > >> > 5.291503): 87.696928 > >> > ITER: 0 | i: 4 | j: 8 Result(i: 2.000000 | j: 2.828427 | i*j: > >> > 5.656854): 93.353783 > >> > ITER: 0 | i: 4 | j: 9 Result(i: 2.000000 | j: 3.000000 | i*j: > >> > 6.000000): 99.353783 > >> > ITER: 0 | i: 5 | j: 0 Result(i: 2.236068 | j: 0.000000 | i*j: > >> > 0.000000): 99.353783 > >> > ITER: 0 | i: 5 | j: 1 Result(i: 2.236068 | j: 1.000000 | i*j: > >> > 2.236068): 101.589851 > >> > ITER: 0 | i: 5 | j: 2 Result(i: 2.236068 | j: 1.414214 | i*j: > >> > 3.162278): 104.752128 > >> > ITER: 0 | i: 5 | j: 3 Result(i: 2.236068 | j: 1.732051 | i*j: > >> > 3.872983): 108.625112 > >> > ITER: 0 | i: 5 | j: 4 Result(i: 2.236068 | j: 2.000000 | i*j: > >> > 4.472136): 113.097248 > >> > ITER: 0 | i: 5 | j: 5 Result(i: 2.236068 | j: 2.236068 | i*j: > >> > 5.000000): 118.097248 > >> > ITER: 0 | i: 5 | j: 6 Result(i: 2.236068 | j: 2.449490 | i*j: > >> > 5.477226): 123.574473 > >> > ITER: 0 | i: 5 | j: 7 Result(i: 2.236068 | j: 2.645751 | i*j: > >> > 5.916080): 129.490553 > >> > ITER: 0 | i: 5 | j: 8 Result(i: 2.236068 | j: 2.828427 | i*j: > >> > 6.324555): 135.815108 > >> > ITER: 0 | i: 5 | j: 9 Result(i: 2.236068 | j: 3.000000 | i*j: > >> > 6.708204): 142.523312 > >> > ITER: 0 | i: 6 | j: 0 Result(i: 2.449490 | j: 0.000000 | i*j: > >> > 0.000000): 142.523312 > >> > ITER: 0 | i: 6 | j: 1 Result(i: 2.449490 | j: 1.000000 | i*j: > >> > 2.449490): 144.972802 > >> > ITER: 0 | i: 6 | j: 2 Result(i: 2.449490 | j: 1.414214 | i*j: > >> > 3.464102): 148.436904 > >> > ITER: 0 | i: 6 | j: 3 Result(i: 2.449490 | j: 1.732051 | i*j: > >> > 4.242641): 152.679544 > >> > ITER: 0 | i: 6 | j: 4 Result(i: 2.449490 | j: 2.000000 | i*j: > >> > 4.898979): 157.578524 > >> > ITER: 0 | i: 6 | j: 5 Result(i: 2.449490 | j: 2.236068 | i*j: > >> > 5.477226): 163.055749 > >> > ITER: 0 | i: 6 | j: 6 Result(i: 2.449490 | j: 2.449490 | i*j: > >> > 6.000000): 169.055749 > >> > ITER: 0 | i: 6 | j: 7 Result(i: 2.449490 | j: 2.645751 | i*j: > >> > 6.480741): 175.536490 > >> > ITER: 0 | i: 6 | j: 8 Result(i: 2.449490 | j: 2.828427 | i*j: > >> > 6.928203): 182.464693 > >> > ITER: 0 | i: 6 | j: 9 Result(i: 2.449490 | j: 3.000000 | i*j: > >> > 7.348469): 189.813162 > >> > ITER: 0 | i: 7 | j: 0 Result(i: 2.645751 | j: 0.000000 | i*j: > >> > 0.000000): 189.813162 > >> > ITER: 0 | i: 7 | j: 1 Result(i: 2.645751 | j: 1.000000 | i*j: > >> > 2.645751): 192.458914 > >> > ITER: 0 | i: 7 | j: 2 Result(i: 2.645751 | j: 1.414214 | i*j: > >> > 3.741657): 196.200571 > >> > ITER: 0 | i: 7 | j: 3 Result(i: 2.645751 | j: 1.732051 | i*j: > >> > 4.582576): 200.783147 > >> > ITER: 0 | i: 7 | j: 4 Result(i: 2.645751 | j: 2.000000 | i*j: > >> > 5.291503): 206.074649 > >> > ITER: 0 | i: 7 | j: 5 Result(i: 2.645751 | j: 2.236068 | i*j: > >> > 5.916080): 211.990729 > >> > ITER: 0 | i: 7 | j: 6 Result(i: 2.645751 | j: 2.449490 | i*j: > >> > 6.480741): 218.471470 > >> > ITER: 0 | i: 7 | j: 7 Result(i: 2.645751 | j: 2.645751 | i*j: > >> > 7.000000): 225.471470 > >> > ITER: 0 | i: 7 | j: 8 Result(i: 2.645751 | j: 2.828427 | i*j: > >> > 7.483315): 232.954785 > >> > ITER: 0 | i: 7 | j: 9 Result(i: 2.645751 | j: 3.000000 | i*j: > >> > 7.937254): 240.892039 > >> > ITER: 0 | i: 8 | j: 0 Result(i: 2.828427 | j: 0.000000 | i*j: > >> > 0.000000): 240.892039 > >> > ITER: 0 | i: 8 | j: 1 Result(i: 2.828427 | j: 1.000000 | i*j: > >> > 2.828427): 243.720466 > >> > ITER: 0 | i: 8 | j: 2 Result(i: 2.828427 | j: 1.414214 | i*j: > >> > 4.000000): 247.720466 > >> > ITER: 0 | i: 8 | j: 3 Result(i: 2.828427 | j: 1.732051 | i*j: > >> > 4.898979): 252.619445 > >> > ITER: 0 | i: 8 | j: 4 Result(i: 2.828427 | j: 2.000000 | i*j: > >> > 5.656854): 258.276300 > >> > ITER: 0 | i: 8 | j: 5 Result(i: 2.828427 | j: 2.236068 | i*j: > >> > 6.324555): 264.600855 > >> > ITER: 0 | i: 8 | j: 6 Result(i: 2.828427 | j: 2.449490 | i*j: > >> > 6.928203): 271.529058 > >> > ITER: 0 | i: 8 | j: 7 Result(i: 2.828427 | j: 2.645751 | i*j: > >> > 7.483315): 279.012373 > >> > ITER: 0 | i: 8 | j: 8 Result(i: 2.828427 | j: 2.828427 | i*j: > >> > 8.000000): 287.012373 > >> > ITER: 0 | i: 8 | j: 9 Result(i: 2.828427 | j: 3.000000 | i*j: > >> > 8.485281): 295.497654 > >> > ITER: 0 | i: 9 | j: 0 Result(i: 3.000000 | j: 0.000000 | i*j: > >> > 0.000000): 295.497654 > >> > ITER: 0 | i: 9 | j: 1 Result(i: 3.000000 | j: 1.000000 | i*j: > >> > 3.000000): 298.497654 > >> > ITER: 0 | i: 9 | j: 2 Result(i: 3.000000 | j: 1.414214 | i*j: > >> > 4.242641): 302.740295 > >> > ITER: 0 | i: 9 | j: 3 Result(i: 3.000000 | j: 1.732051 | i*j: > >> > 5.196152): 307.936447 > >> > ITER: 0 | i: 9 | j: 4 Result(i: 3.000000 | j: 2.000000 | i*j: > >> > 6.000000): 313.936447 > >> > ITER: 0 | i: 9 | j: 5 Result(i: 3.000000 | j: 2.236068 | i*j: > >> > 6.708204): 320.644651 > >> > ITER: 0 | i: 9 | j: 6 Result(i: 3.000000 | j: 2.449490 | i*j: > >> > 7.348469): 327.993120 > >> > ITER: 0 | i: 9 | j: 7 Result(i: 3.000000 | j: 2.645751 | i*j: > >> > 7.937254): 335.930374 > >> > ITER: 0 | i: 9 | j: 8 Result(i: 3.000000 | j: 2.828427 | i*j: > >> > 8.485281): 344.415656 > >> > ITER: 0 | i: 9 | j: 9 Result(i: 3.000000 | j: 3.000000 | i*j: > >> > 9.000000): 353.415656 > >> > Final Result: 353.415656 > >> > ITER: 1 | i: 0 | j: 0 Result(i: 0.000000 | j: 0.000000 | i*j: > >> > 0.000000): 0.000000 > >> > ITER: 1 | i: 0 | j: 1 Result(i: 0.000000 | j: 1.000000 | i*j: > >> > 0.000000): 0.000000 > >> > ITER: 1 | i: 0 | j: 2 Result(i: 0.000000 | j: 1.414214 | i*j: > >> > 0.000000): 0.000000 > >> > ITER: 1 | i: 0 | j: 3 Result(i: 0.000000 | j: 1.732051 | i*j: > >> > 0.000000): 0.000000 > >> > ITER: 1 | i: 0 | j: 4 Result(i: 0.000000 | j: 2.000000 | i*j: > >> > 0.000000): 0.000000 > >> > ITER: 1 | i: 0 | j: 5 Result(i: 0.000000 | j: 2.236068 | i*j: > >> > 0.000000): 0.000000 > >> > ITER: 1 | i: 0 | j: 6 Result(i: 0.000000 | j: 2.449490 | i*j: > >> > 0.000000): 0.000000 > >> > ITER: 1 | i: 0 | j: 7 Result(i: 0.000000 | j: 2.645751 | i*j: > >> > 0.000000): 0.000000 > >> > ITER: 1 | i: 0 | j: 8 Result(i: 0.000000 | j: 2.828427 | i*j: > >> > 0.000000): 0.000000 > >> > ITER: 1 | i: 0 | j: 9 Result(i: 0.000000 | j: 3.000000 | i*j: > >> > 0.000000): 0.000000 > >> > ITER: 1 | i: 1 | j: 0 Result(i: 1.000000 | j: 0.000000 | i*j: > >> > 0.000000): 0.000000 > >> > ITER: 1 | i: 1 | j: 1 Result(i: 1.000000 | j: 1.000000 | i*j: > >> > 1.000000): 1.000000 > >> > ITER: 1 | i: 1 | j: 2 Result(i: 1.000000 | j: 1.414214 | i*j: > >> > 1.414214): 2.414214 > >> > ITER: 1 | i: 1 | j: 3 Result(i: 1.000000 | j: 1.732051 | i*j: > >> > 1.732051): 4.146264 > >> > ITER: 1 | i: 1 | j: 4 Result(i: 1.000000 | j: 2.000000 | i*j: > >> > 2.000000): 6.146264 > >> > ITER: 1 | i: 1 | j: 5 Result(i: 1.000000 | j: 2.236068 | i*j: > >> > 2.236068): 8.382332 > >> > ITER: 1 | i: 1 | j: 6 Result(i: 1.000000 | j: 2.449490 | i*j: > >> > 2.449490): 10.831822 > >> > ITER: 1 | i: 1 | j: 7 Result(i: 1.000000 | j: 2.645751 | i*j: > >> > 2.645751): 13.477573 > >> > ITER: 1 | i: 1 | j: 8 Result(i: 1.000000 | j: 2.828427 | i*j: > >> > 2.828427): 16.306001 > >> > ITER: 1 | i: 1 | j: 9 Result(i: 1.000000 | j: 3.000000 | i*j: > >> > 3.000000): 19.306001 > >> > ITER: 1 | i: 2 | j: 0 Result(i: 1.414214 | j: 0.000000 | i*j: > >> > 0.000000): 19.306001 > >> > ITER: 1 | i: 2 | j: 1 Result(i: 1.414214 | j: 1.000000 | i*j: > >> > 1.414214): 20.720214 > >> > ITER: 1 | i: 2 | j: 2 Result(i: 1.414214 | j: 1.414214 | i*j: > >> > 2.000000): 22.720214 > >> > ITER: 1 | i: 2 | j: 3 Result(i: 1.414214 | j: 1.732051 | i*j: > >> > 2.449490): 25.169704 > >> > ITER: 1 | i: 2 | j: 4 Result(i: 1.414214 | j: 2.000000 | i*j: > >> > 2.828427): 27.998131 > >> > ITER: 1 | i: 2 | j: 5 Result(i: 1.414214 | j: 2.236068 | i*j: > >> > 3.162278): 31.160409 > >> > ITER: 1 | i: 2 | j: 6 Result(i: 1.414214 | j: 2.449490 | i*j: > >> > 3.464102): 34.624510 > >> > ITER: 1 | i: 2 | j: 7 Result(i: 1.414214 | j: 2.645751 | i*j: > >> > 3.741657): 38.366168 > >> > ITER: 1 | i: 2 | j: 8 Result(i: 1.414214 | j: 2.828427 | i*j: > >> > 4.000000): 42.366168 > >> > ITER: 1 | i: 2 | j: 9 Result(i: 1.414214 | j: 3.000000 | i*j: > >> > 4.242641): 46.608808 > >> > ITER: 1 | i: 3 | j: 0 Result(i: 1.732051 | j: 0.000000 | i*j: > >> > 0.000000): 46.608808 > >> > ITER: 1 | i: 3 | j: 1 Result(i: 1.732051 | j: 1.000000 | i*j: > >> > 1.732051): 48.340859 > >> > ITER: 1 | i: 3 | j: 2 Result(i: 1.732051 | j: 1.414214 | i*j: > >> > 2.449490): 50.790349 > >> > ITER: 1 | i: 3 | j: 3 Result(i: 1.732051 | j: 1.732051 | i*j: > >> > 3.000000): 53.790349 > >> > ITER: 1 | i: 3 | j: 4 Result(i: 1.732051 | j: 2.000000 | i*j: > >> > 3.464102): 57.254450 > >> > ITER: 1 | i: 3 | j: 5 Result(i: 1.732051 | j: 2.236068 | i*j: > >> > 3.872983): 61.127434 > >> > ITER: 1 | i: 3 | j: 6 Result(i: 1.732051 | j: 2.449490 | i*j: > >> > 4.242641): 65.370075 > >> > ITER: 1 | i: 3 | j: 7 Result(i: 1.732051 | j: 2.645751 | i*j: > >> > 4.582576): 69.952650 > >> > ITER: 1 | i: 3 | j: 8 Result(i: 1.732051 | j: 2.828427 | i*j: > >> > 4.898979): 74.851630 > >> > ITER: 1 | i: 3 | j: 9 Result(i: 1.732051 | j: 3.000000 | i*j: > >> > 5.196152): 80.047782 > >> > ITER: 1 | i: 4 | j: 0 Result(i: 2.000000 | j: 0.000000 | i*j: > >> > 0.000000): 80.047782 > >> > ITER: 1 | i: 4 | j: 1 Result(i: 2.000000 | j: 1.000000 | i*j: > >> > 2.000000): 82.047782 > >> > ITER: 1 | i: 4 | j: 2 Result(i: 2.000000 | j: 1.414214 | i*j: > >> > 2.828427): 84.876209 > >> > ITER: 1 | i: 4 | j: 3 Result(i: 2.000000 | j: 1.732051 | i*j: > >> > 3.464102): 88.340311 > >> > ITER: 1 | i: 4 | j: 4 Result(i: 2.000000 | j: 2.000000 | i*j: > >> > 4.000000): 92.340311 > >> > ITER: 1 | i: 4 | j: 5 Result(i: 2.000000 | j: 2.236068 | i*j: > >> > 4.472136): 96.812447 > >> > ITER: 1 | i: 4 | j: 6 Result(i: 2.000000 | j: 2.449490 | i*j: > >> > 4.898979): 101.711426 > >> > ITER: 1 | i: 4 | j: 7 Result(i: 2.000000 | j: 2.645751 | i*j: > >> > 5.291503): 107.002929 > >> > ITER: 1 | i: 4 | j: 8 Result(i: 2.000000 | j: 2.828427 | i*j: > >> > 5.656854): 112.659783 > >> > ITER: 1 | i: 4 | j: 9 Result(i: 2.000000 | j: 3.000000 | i*j: > >> > 6.000000): 118.659783 > >> > ITER: 1 | i: 5 | j: 0 Result(i: 2.236068 | j: 0.000000 | i*j: > >> > 0.000000): 118.659783 > >> > ITER: 1 | i: 5 | j: 1 Result(i: 2.236068 | j: 1.000000 | i*j: > >> > 2.236068): 120.895851 > >> > ITER: 1 | i: 5 | j: 2 Result(i: 2.236068 | j: 1.414214 | i*j: > >> > 3.162278): 124.058129 > >> > ITER: 1 | i: 5 | j: 3 Result(i: 2.236068 | j: 1.732051 | i*j: > >> > 3.872983): 127.931112 > >> > ITER: 1 | i: 5 | j: 4 Result(i: 2.236068 | j: 2.000000 | i*j: > >> > 4.472136): 132.403248 > >> > ITER: 1 | i: 5 | j: 5 Result(i: 2.236068 | j: 2.236068 | i*j: > >> > 5.000000): 137.403248 > >> > ITER: 1 | i: 5 | j: 6 Result(i: 2.236068 | j: 2.449490 | i*j: > >> > 5.477226): 142.880474 > >> > ITER: 1 | i: 5 | j: 7 Result(i: 2.236068 | j: 2.645751 | i*j: > >> > 5.916080): 148.796553 > >> > ITER: 1 | i: 5 | j: 8 Result(i: 2.236068 | j: 2.828427 | i*j: > >> > 6.324555): 155.121109 > >> > ITER: 1 | i: 5 | j: 9 Result(i: 2.236068 | j: 3.000000 | i*j: > >> > 6.708204): 161.829313 > >> > ITER: 1 | i: 6 | j: 0 Result(i: 2.449490 | j: 0.000000 | i*j: > >> > 0.000000): 161.829313 > >> > ITER: 1 | i: 6 | j: 1 Result(i: 2.449490 | j: 1.000000 | i*j: > >> > 2.449490): 164.278802 > >> > ITER: 1 | i: 6 | j: 2 Result(i: 2.449490 | j: 1.414214 | i*j: > >> > 3.464102): 167.742904 > >> > ITER: 1 | i: 6 | j: 3 Result(i: 2.449490 | j: 1.732051 | i*j: > >> > 4.242641): 171.985545 > >> > ITER: 1 | i: 6 | j: 4 Result(i: 2.449490 | j: 2.000000 | i*j: > >> > 4.898979): 176.884524 > >> > ITER: 1 | i: 6 | j: 5 Result(i: 2.449490 | j: 2.236068 | i*j: > >> > 5.477226): 182.361750 > >> > ITER: 1 | i: 6 | j: 6 Result(i: 2.449490 | j: 2.449490 | i*j: > >> > 6.000000): 188.361750 > >> > ITER: 1 | i: 6 | j: 7 Result(i: 2.449490 | j: 2.645751 | i*j: > >> > 6.480741): 194.842491 > >> > ITER: 1 | i: 6 | j: 8 Result(i: 2.449490 | j: 2.828427 | i*j: > >> > 6.928203): 201.770694 > >> > ITER: 1 | i: 6 | j: 9 Result(i: 2.449490 | j: 3.000000 | i*j: > >> > 7.348469): 209.119163 > >> > ITER: 1 | i: 7 | j: 0 Result(i: 2.645751 | j: 0.000000 | i*j: > >> > 0.000000): 209.119163 > >> > ITER: 1 | i: 7 | j: 1 Result(i: 2.645751 | j: 1.000000 | i*j: > >> > 2.645751): 211.764914 > >> > ITER: 1 | i: 7 | j: 2 Result(i: 2.645751 | j: 1.414214 | i*j: > >> > 3.741657): 215.506572 > >> > ITER: 1 | i: 7 | j: 3 Result(i: 2.645751 | j: 1.732051 | i*j: > >> > 4.582576): 220.089147 > >> > ITER: 1 | i: 7 | j: 4 Result(i: 2.645751 | j: 2.000000 | i*j: > >> > 5.291503): 225.380650 > >> > ITER: 1 | i: 7 | j: 5 Result(i: 2.645751 | j: 2.236068 | i*j: > >> > 5.916080): 231.296730 > >> > ITER: 1 | i: 7 | j: 6 Result(i: 2.645751 | j: 2.449490 | i*j: > >> > 6.480741): 237.777470 > >> > ITER: 1 | i: 7 | j: 7 Result(i: 2.645751 | j: 2.645751 | i*j: > >> > 7.000000): 244.777470 > >> > ITER: 1 | i: 7 | j: 8 Result(i: 2.645751 | j: 2.828427 | i*j: > >> > 7.483315): 252.260785 > >> > ITER: 1 | i: 7 | j: 9 Result(i: 2.645751 | j: 3.000000 | i*j: > >> > 7.937254): 260.198039 > >> > ITER: 1 | i: 8 | j: 0 Result(i: 2.828427 | j: 0.000000 | i*j: > >> > 0.000000): 260.198039 > >> > ITER: 1 | i: 8 | j: 1 Result(i: 2.828427 | j: 1.000000 | i*j: > >> > 2.828427): 263.026466 > >> > ITER: 1 | i: 8 | j: 2 Result(i: 2.828427 | j: 1.414214 | i*j: > >> > 4.000000): 267.026466 > >> > ITER: 1 | i: 8 | j: 3 Result(i: 2.828427 | j: 1.732051 | i*j: > >> > 4.898979): 271.925446 > >> > ITER: 1 | i: 8 | j: 4 Result(i: 2.828427 | j: 2.000000 | i*j: > >> > 5.656854): 277.582300 > >> > ITER: 1 | i: 8 | j: 5 Result(i: 2.828427 | j: 2.236068 | i*j: > >> > 6.324555): 283.906855 > >> > ITER: 1 | i: 8 | j: 6 Result(i: 2.828427 | j: 2.449490 | i*j: > >> > 6.928203): 290.835059 > >> > ITER: 1 | i: 8 | j: 7 Result(i: 2.828427 | j: 2.645751 | i*j: > >> > 7.483315): 298.318373 > >> > ITER: 1 | i: 8 | j: 8 Result(i: 2.828427 | j: 2.828427 | i*j: > >> > 8.000000): 306.318373 > >> > ITER: 1 | i: 8 | j: 9 Result(i: 2.828427 | j: 3.000000 | i*j: > >> > 8.485281): 314.803655 > >> > ITER: 1 | i: 9 | j: 0 Result(i: 3.000000 | j: 0.000000 | i*j: > >> > 0.000000): 314.803655 > >> > ITER: 1 | i: 9 | j: 1 Result(i: 3.000000 | j: 1.000000 | i*j: > >> > 3.000000): 317.803655 > >> > ITER: 1 | i: 9 | j: 2 Result(i: 3.000000 | j: 1.414214 | i*j: > >> > 4.242641): 322.046295 > >> > ITER: 1 | i: 9 | j: 3 Result(i: 3.000000 | j: 1.732051 | i*j: > >> > 5.196152): 327.242448 > >> > ITER: 1 | i: 9 | j: 4 Result(i: 3.000000 | j: 2.000000 | i*j: > >> > 6.000000): 333.242448 > >> > ITER: 1 | i: 9 | j: 5 Result(i: 3.000000 | j: 2.236068 | i*j: > >> > 6.708204): 339.950652 > >> > ITER: 1 | i: 9 | j: 6 Result(i: 3.000000 | j: 2.449490 | i*j: > >> > 7.348469): 347.299121 > >> > ITER: 1 | i: 9 | j: 7 Result(i: 3.000000 | j: 2.645751 | i*j: > >> > 7.937254): 355.236375 > >> > ITER: 1 | i: 9 | j: 8 Result(i: 3.000000 | j: 2.828427 | i*j: > >> > 8.485281): 363.721656 > >> > ITER: 1 | i: 9 | j: 9 Result(i: 3.000000 | j: 3.000000 | i*j: > >> > 9.000000): 372.721656 > >> > Final Result: 372.721656 > >> > > >> > > >> > > >> > As we can see in the following iterations the sqrt(1) as well as the > >> > result is set to zero for some reason. > >> > > >> > ITER: 0 | i: 1 | j: 4 Result(i: 0.000000 | j: 2.000000 | i*j: > >> > 0.000000): 0.000000 > >> > ITER: 0 | i: 1 | j: 5 Result(i: 0.000000 | j: 2.236068 | i*j: > >> > 0.000000): 0.000000 > >> > ITER: 0 | i: 1 | j: 6 Result(i: 0.000000 | j: 2.449490 | i*j: > >> > 0.000000): 0.000000 > >> > ITER: 0 | i: 1 | j: 7 Result(i: 0.000000 | j: 2.645751 | i*j: > >> > 0.000000): 0.000000 > >> > ITER: 0 | i: 1 | j: 8 Result(i: 0.000000 | j: 2.828427 | i*j: > >> > 0.000000): 0.000000 > >> > ITER: 0 | i: 1 | j: 9 Result(i: 0.000000 | j: 3.000000 | i*j: > >> > 0.000000): 0.000000 > >> > > >> > Please help me to resolve the accuracy issue! I think that it will > >> > be very useful for gem5 community. > >> > > >> > To be noticed, I find the correct simulated tick in which the > >> > application started in FS (using m5 dumpstats), and I start the > >> > --debug-start, but the trace file which is generated is 10x larger > >> > than SE mode for the same application. How can I compare them? > >> > > >> > Thank you in advance! > >> > Best regards, > >> > Nikos > >> > > >> > Quoting Νικόλαος Ταμπουρατζής <ntampouratzis@ece.auth.gr>: > >> > > >> >> Dear Jason, > >> >> > >> >> I am trying to use --debug-start but in FS mode it is very > >> >> difficult to find the tick on which the application is started! > >> >> > >> >> However, I am writing the following very simple c++ program: > >> >> > >> >> #include <cmath> > >> >> #include <stdio.h> > >> >> > >> >> int main(){ > >> >> > >> >> int dim = 4096; > >> >> > >> >> double result; > >> >> > >> >> for (int iter = 0; iter < 2; iter++){ > >> >> result = 0; > >> >> for (int i = 0; i < dim; i++){ > >> >> for (int j = 0; j < dim; j++){ > >> >> result += sqrt(i) * sqrt(j); > >> >> } > >> >> } > >> >> printf("Result: %lf\n", result); //Result: 30530733453.127449 > >> >> } > >> >> } > >> >> > >> >> I cross-compile it using: riscv64-linux-gnu-g++ -static -O3 -o > >> >> test_riscv test_riscv.cpp > >> >> > >> >> > >> >> While in X86 (without cross-compilation of course), QEMU-RISCV, > >> >> GEM5-SE the result is the same (30530733453.127449), in GEM5-FS the > >> >> result is different! In addition, the result is also different > >> >> between the 2 iterations. > >> >> > >> >> Please reproduce the error if you want in order to verify my result. > >> >> Ηow can the issue be resolved? > >> >> > >> >> Thank you in advance! > >> >> > >> >> Best regards, > >> >> Nikos > >> >> > >> >> > >> >> Quoting Jason Lowe-Power <jason@lowepower.com>: > >> >> > >> >>> Hi Nikos, > >> >>> > >> >>> You can use --debug-start to start the debugging after some number > of > >> >>> ticks. Also, I would expect that the difference should come up > >> quickly, so > >> >>> no need to run the program to the end. > >> >>> > >> >>> For the FS mode one, you will want to just start the trace as the > >> >>> application starts. This could be a bit of a pain. > >> >>> > >> >>> I'm not really sure what fundamentally could be different. FS and SE > >> mode > >> >>> use the exact same code for executing instructions, so I don't think > >> that's > >> >>> the problem. Have you tried running for smaller inputs or just one > >> >>> iteration? > >> >>> > >> >>> Jason > >> >>> > >> >>> > >> >>> > >> >>> On Wed, Sep 21, 2022 at 9:04 AM Νικόλαος Ταμπουρατζής < > >> >>> ntampouratzis@ece.auth.gr> wrote: > >> >>> > >> >>>> Dear Bobby, > >> >>>> > >> >>>> Iam trying to add --debug-flags=Exec (building the gem5 for > gem5.opt > >> >>>> not for gem5.fast which I had) but the debug traces exceed the 20GB > >> >>>> (and it is not finished yet) for less than 1 simulated second. How > can > >> >>>> I reduce the size of the debug-flags (or set something more > specific)? > >> >>>> > >> >>>> In contrast I build the HPCG benchmark with DHPCG_DEBUG flag. If > you > >> >>>> want, you can compare these two output files > >> >>>> (hpcg20010909T014640_SE_Mode & HPCG-Benchmark_3.1_FS_Mode). As you > can > >> >>>> see, something goes wrong with the accuracy of calculations in FS > mode > >> >>>> (benchmark uses double precission). You can find the files here: > >> >>>> http://kition.mhl.tuc.gr:8000/d/68d82f3533/ > >> >>>> > >> >>>> Best regards, > >> >>>> Nikos > >> >>>> > >> >>>> Quoting Jason Lowe-Power <jason@lowepower.com>: > >> >>>> > >> >>>>> That's quite odd that it works in SE mode but not FS mode! > >> >>>>> > >> >>>>> I would suggest running with --debug-flags=Exec for both and then > >> >>>> perform a > >> >>>>> diff to see how they differ. > >> >>>>> > >> >>>>> Cheers, > >> >>>>> Jason > >> >>>>> > >> >>>>> On Tue, Sep 20, 2022 at 2:45 PM Νικόλαος Ταμπουρατζής < > >> >>>>> ntampouratzis@ece.auth.gr> wrote: > >> >>>>> > >> >>>>>> Dear Bobby, > >> >>>>>> > >> >>>>>> In QEMU I get the same (correct) results that I get in SE mode > >> >>>>>> simulation. I get invalid results in FS simulation (in both > >> >>>>>> riscv-fs.py and riscv-ubuntu-run.py). I cannot access real RISCV > >> >>>>>> hardware at this moment, however, if you want you may execute my > >> xhpcg > >> >>>>>> binary (http://kition.mhl.tuc.gr:8000/f/4ca25fdd3c/) with the > >> >>>>>> following configuration: > >> >>>>>> > >> >>>>>> ./xhpcg --nx=16 --ny=16 --nz=16 --npx=1 --npy=1 --npz=1 --rt=0.1 > >> >>>>>> > >> >>>>>> Please let me know if you have any updates! > >> >>>>>> > >> >>>>>> Best regards, > >> >>>>>> Nikos > >> >>>>>> > >> >>>>>> > >> >>>>>> Quoting Jason Lowe-Power <jason@lowepower.com>: > >> >>>>>> > >> >>>>>>> Hi Nikos, > >> >>>>>>> > >> >>>>>>> I notice you said the following in your original email: > >> >>>>>>> > >> >>>>>>> In addition, I used the RISCV Ubuntu image > >> >>>>>>>> ( > >> https://github.com/gem5/gem5-resources/tree/stable/src/riscv-ubuntu > >> >>>> ), > >> >>>>>>>> I installed the gcc compiler, compile it (through qemu) and I > get > >> >>>>>>>> wrong results too. > >> >>>>>>> > >> >>>>>>> > >> >>>>>>> Is this saying you get the wrong results is QEMU? If so, the bug > >> is in > >> >>>>>> GCC > >> >>>>>>> or the HPCG workload, not in gem5. If not, I would test in QEMU > to > >> >>>> make > >> >>>>>>> sure the binary works there. Another way you could test to see > if > >> the > >> >>>>>>> problem is your binary or gem5 would be to run it on real > >> hardware. We > >> >>>>>> have > >> >>>>>>> access to some RISC-V hardware here at UC Davis, if you don't > have > >> >>>> access > >> >>>>>>> to it. > >> >>>>>>> > >> >>>>>>> Cheers, > >> >>>>>>> Jason > >> >>>>>>> > >> >>>>>>> On Tue, Sep 20, 2022 at 12:58 AM Νικόλαος Ταμπουρατζής < > >> >>>>>>> ntampouratzis@ece.auth.gr> wrote: > >> >>>>>>> > >> >>>>>>>> Dear Bobby, > >> >>>>>>>> > >> >>>>>>>> 1) I use the original riscv-fs.py which is provided in the > latest > >> >>>> gem5 > >> >>>>>>>> release. > >> >>>>>>>> I run the gem5 once (./build/RISCV/gem5.fast -d > ./HPCG_FS_results > >> >>>>>>>> ./configs/example/gem5_library/riscv-fs.py) in order to > download > >> the > >> >>>>>>>> riscv-bootloader-vmlinux-5.10 and riscv-disk-img. > >> >>>>>>>> After this I mount the riscv-disk-img (sudo mount -o loop > >> >>>>>>>> riscv-disk-img /mnt), put the xhpcg executable and I do the > >> following > >> >>>>>>>> changes in riscv-fs.py to boot the riscv-disk-img with > executable: > >> >>>>>>>> > >> >>>>>>>> image = CustomDiskImageResource( > >> >>>>>>>> local_path = "/home/cossim/.cache/gem5/riscv-disk-img", > >> >>>>>>>> ) > >> >>>>>>>> > >> >>>>>>>> # Set the Full System workload. > >> >>>>>>>> board.set_kernel_disk_workload( > >> >>>>>>>> > >> kernel=Resource("riscv-bootloader-vmlinux-5.10"), > >> >>>>>>>> disk_image=image, > >> >>>>>>>> ) > >> >>>>>>>> > >> >>>>>>>> Finally, in the > >> gem5/src/python/gem5/components/boards/riscv_board.py > >> >>>>>>>> I change the last line to "return ["console=ttyS0", > >> >>>>>>>> "root={root_value}", "rw"]" in order to allow the write > >> permissions > >> >>>> in > >> >>>>>>>> the image. > >> >>>>>>>> > >> >>>>>>>> > >> >>>>>>>> 2) The HPCG benchmark after some iterations calculates if the > >> results > >> >>>>>>>> are valid or not valid. In the case of FS it gives invalid > >> results. > >> >>>> As > >> >>>>>>>> I see from the results, one (at least) problem is that produces > >> >>>>>>>> different results in each HPCG execution (with the same > >> >>>> configuration). > >> >>>>>>>> > >> >>>>>>>> Here is the HPCG output and riscv-fs.py > >> >>>>>>>> (http://kition.mhl.tuc.gr:8000/d/68d82f3533/). You may > reproduce > >> the > >> >>>>>>>> results in the video if you use the xhpcg executable > >> >>>>>>>> (http://kition.mhl.tuc.gr:8000/f/4ca25fdd3c/) > >> >>>>>>>> > >> >>>>>>>> Please help me in order to solve it! > >> >>>>>>>> > >> >>>>>>>> Finally, I get invalid results in the HPL benchmark in FS mode > >> too. > >> >>>>>>>> > >> >>>>>>>> Best regards, > >> >>>>>>>> Nikos > >> >>>>>>>> > >> >>>>>>>> > >> >>>>>>>> Quoting Bobby Bruce <bbruce@ucdavis.edu>: > >> >>>>>>>> > >> >>>>>>>> > I'm going to need a bit more information to help: > >> >>>>>>>> > > >> >>>>>>>> > 1. In what way have you modified > >> >>>>>>>> > ./configs/example/gem5_library/riscv-fs.py? Can you attach > the > >> >>>> script > >> >>>>>>>> here? > >> >>>>>>>> > 2. What error are you getting or in what way are the results > >> >>>> invalid? > >> >>>>>>>> > > >> >>>>>>>> > - > >> >>>>>>>> > Dr. Bobby R. Bruce > >> >>>>>>>> > Room 3050, > >> >>>>>>>> > Kemper Hall, UC Davis > >> >>>>>>>> > Davis, > >> >>>>>>>> > CA, 95616 > >> >>>>>>>> > > >> >>>>>>>> > web: https://www.bobbybruce.net > >> >>>>>>>> > > >> >>>>>>>> > > >> >>>>>>>> > On Mon, Sep 19, 2022 at 1:43 PM Νικόλαος Ταμπουρατζής < > >> >>>>>>>> > ntampouratzis@ece.auth.gr> wrote: > >> >>>>>>>> > > >> >>>>>>>> >> > >> >>>>>>>> >> Dear gem5 community, > >> >>>>>>>> >> > >> >>>>>>>> >> I have successfully cross-compile the HPCG benchmark for > RISCV > >> >>>>>> (Serial > >> >>>>>>>> >> version, without MPI and OpenMP). While it working properly > in > >> >>>> gem5 > >> >>>>>> SE > >> >>>>>>>> >> mode (./build/RISCV/gem5.fast -d ./HPCG_SE_results > >> >>>>>>>> >> ./configs/example/se.py -c xhpcg --options '--nx=16 --ny=16 > >> >>>> --nz=16 > >> >>>>>>>> >> --npx=1 --npy=1 --npz=1 --rt=0.1'), I get invalid results > in FS > >> >>>>>>>> >> simulation using "./build/RISCV/gem5.fast -d > ./HPCG_FS_results > >> >>>>>>>> >> ./configs/example/gem5_library/riscv-fs.py" (I mount the > riscv > >> >>>> image > >> >>>>>>>> >> and put it). > >> >>>>>>>> >> > >> >>>>>>>> >> Can you help me please? > >> >>>>>>>> >> > >> >>>>>>>> >> In addition, I used the RISCV Ubuntu image > >> >>>>>>>> >> ( > >> >>>> > https://github.com/gem5/gem5-resources/tree/stable/src/riscv-ubuntu > >> >>>>>> ), > >> >>>>>>>> >> I installed the gcc compiler, compile it (through qemu) and > I > >> get > >> >>>>>>>> >> wrong results too. > >> >>>>>>>> >> > >> >>>>>>>> >> Here is the Makefile which I use, the hpcg executable for > RISCV > >> >>>>>>>> >> (xhpcg), and a video that shows the results > >> >>>>>>>> >> (http://kition.mhl.tuc.gr:8000/f/4ca25fdd3c/). > >> >>>>>>>> >> > >> >>>>>>>> >> P.S. I use the latest gem5 version. > >> >>>>>>>> >> > >> >>>>>>>> >> Thank you in advance! :) > >> >>>>>>>> >> > >> >>>>>>>> >> Best regards, > >> >>>>>>>> >> Nikos > >> >>>>>>>> >> _______________________________________________ > >> >>>>>>>> >> gem5-users mailing list -- gem5-users@gem5.org > >> >>>>>>>> >> To unsubscribe send an email to gem5-users-leave@gem5.org > >> >>>>>>>> >> > >> >>>>>>>> > >> >>>>>>>> > >> >>>>>>>> _______________________________________________ > >> >>>>>>>> gem5-users mailing list -- gem5-users@gem5.org > >> >>>>>>>> To unsubscribe send an email to gem5-users-leave@gem5.org > >> >>>>>>>> > >> >>>>>> > >> >>>>>> > >> >>>>>> _______________________________________________ > >> >>>>>> gem5-users mailing list -- gem5-users@gem5.org > >> >>>>>> To unsubscribe send an email to gem5-users-leave@gem5.org > >> >>>>>> > >> >>>> > >> >>>> > >> >>>> _______________________________________________ > >> >>>> gem5-users mailing list -- gem5-users@gem5.org > >> >>>> To unsubscribe send an email to gem5-users-leave@gem5.org > >> >>>> > >> >> > >> >> > >> >> _______________________________________________ > >> >> gem5-users mailing list -- gem5-users@gem5.org > >> >> To unsubscribe send an email to gem5-users-leave@gem5.org > >> > > >> > > >> > _______________________________________________ > >> > gem5-users mailing list -- gem5-users@gem5.org > >> > To unsubscribe send an email to gem5-users-leave@gem5.org > >> > >> > >> _______________________________________________ > >> gem5-users mailing list -- gem5-users@gem5.org > >> To unsubscribe send an email to gem5-users-leave@gem5.org > >> > > > _______________________________________________ > gem5-users mailing list -- gem5-users@gem5.org > To unsubscribe send an email to gem5-users-leave@gem5.org >
Νικόλαος Ταμπουρατζής
Fri, Oct 7, 2022 7:47 PM

Dear Jason & Boddy,

Unfortunately, I have tried my simple example without the sqrt
function and the problem remains. Specifically, I have the following
simple code:

#include <cmath>
#include <stdio.h>

int main(){

 int dim = 1024;

 double result;

 for (int iter = 0; iter < 2; iter++){
     result = 0;
     for (int i = 0; i < dim; i++){
         for (int j = 0; j < dim; j++){
             result += i * j;
         }
     }
     printf("Final Result: %lf\n", result);
 }

}

In the above code, the correct result is 274341298176.000000 (from
RISCV-SE mode and x86), while in FS mode I get sometimes the correct
result and other times a different number.

Best regards,
Nikos

Quoting Jason Lowe-Power jason@lowepower.com:

I have an idea...

Have you put a breakpoint in the implementation of the fsqrt_d function? I
would like to know if when running in SE mode and running in FS mode we are
using the same rounding mode. My hypothesis is that in FS mode the rounding
mode is set differently.

Cheers,
Jason

On Fri, Oct 7, 2022 at 12:15 AM Νικόλαος Ταμπουρατζής <
ntampouratzis@ece.auth.gr> wrote:

Dear Boddy,

Thanks a lot for the effort! I looked in detail and I observe that the
problem is created only using float and double variables (in the case
of int it is working properly in FS mode). Specifically, in the case
of float the variables are set to "nan", while in the case of double
the variables are set to 0.000000 (in random time - probably from some
instruction of simulated OS?). You may use a simple c/c++ example in
order to get some traces before going to HPCG...

Thank you in advance!!
Best regards,
Nikos

Quoting Bobby Bruce bbruce@ucdavis.edu:

Hey Niko,

Thanks for this analysis. I jumped a little into this today but didn't

get

as far as you did. I wanted to find a quick way to recreate the

following:

https://gem5-review.googlesource.com/c/public/gem5/+/64211.  Please feel
free to use this, if it helps any.

It's very strange to me that this bug hasn't manifested itself before but
it's undeniably there. I'll try to spend more time looking at this

tomorrow

with some traces and debug flags and see if I can narrow down the

problem.

--
Dr. Bobby R. Bruce
Room 3050,
Kemper Hall, UC Davis
Davis,
CA, 95616

web: https://www.bobbybruce.net

On Wed, Oct 5, 2022 at 2:26 PM Νικόλαος Ταμπουρατζής <
ntampouratzis@ece.auth.gr> wrote:

In my previous results, I had used double (not float) for the
following variables: result, sq_i and sq_j. In the case of float
instead of double I get "nan" and not 0.000000.

Quoting Νικόλαος Ταμπουρατζής ntampouratzis@ece.auth.gr:

Dear Jason, all,

I am trying to find the accuracy problem with RISCV-FS and I observe
that the problem is created (at least in my dummy example) because
the variables (double) are set to zero in random simulated time (for
this reason I get different results among executions of the same
code). Specifically for the following dummy code:

#include <cmath>
#include <stdio.h>

int main(){

 int dim = 10;

 float result;

 for (int iter = 0; iter < 2; iter++){
     result = 0;
     for (int i = 0; i < dim; i++){
         for (int j = 0; j < dim; j++){
             float sq_i = sqrt(i);
             float sq_j = sqrt(j);
             result += sq_i * sq_j;
             printf("ITER: %d | i: %d | j: %d Result(i: %f | j:

%f | i*j: %f): %f\n", iter, i , j, sq_i, sq_j, sq_i * sq_j, result);
}
}
printf("Final Result: %lf\n", result);
}
}

The correct Final Result in both iterations is 372.721656. However,
I get the following results in FS:

ITER: 0 | i: 0 | j: 0 Result(i: 0.000000 | j: 0.000000 | ij:
0.000000): 0.000000
ITER: 0 | i: 0 | j: 1 Result(i: 0.000000 | j: 1.000000 | i
j:
0.000000): 0.000000
ITER: 0 | i: 0 | j: 2 Result(i: 0.000000 | j: 1.414214 | ij:
0.000000): 0.000000
ITER: 0 | i: 0 | j: 3 Result(i: 0.000000 | j: 1.732051 | i
j:
0.000000): 0.000000
ITER: 0 | i: 0 | j: 4 Result(i: 0.000000 | j: 2.000000 | ij:
0.000000): 0.000000
ITER: 0 | i: 0 | j: 5 Result(i: 0.000000 | j: 2.236068 | i
j:
0.000000): 0.000000
ITER: 0 | i: 0 | j: 6 Result(i: 0.000000 | j: 2.449490 | ij:
0.000000): 0.000000
ITER: 0 | i: 0 | j: 7 Result(i: 0.000000 | j: 2.645751 | i
j:
0.000000): 0.000000
ITER: 0 | i: 0 | j: 8 Result(i: 0.000000 | j: 2.828427 | ij:
0.000000): 0.000000
ITER: 0 | i: 0 | j: 9 Result(i: 0.000000 | j: 3.000000 | i
j:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 0 Result(i: 1.000000 | j: 0.000000 | ij:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 1 Result(i: 1.000000 | j: 1.000000 | i
j:
1.000000): 1.000000
ITER: 0 | i: 1 | j: 2 Result(i: 1.000000 | j: 1.414214 | ij:
1.414214): 2.414214
ITER: 0 | i: 1 | j: 3 Result(i: 1.000000 | j: 1.732051 | i
j:
1.732051): 4.146264
ITER: 0 | i: 1 | j: 4 Result(i: 0.000000 | j: 2.000000 | ij:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 5 Result(i: 0.000000 | j: 2.236068 | i
j:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 6 Result(i: 0.000000 | j: 2.449490 | ij:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 7 Result(i: 0.000000 | j: 2.645751 | i
j:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 8 Result(i: 0.000000 | j: 2.828427 | ij:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 9 Result(i: 0.000000 | j: 3.000000 | i
j:
0.000000): 0.000000
ITER: 0 | i: 2 | j: 0 Result(i: 1.414214 | j: 0.000000 | ij:
0.000000): 0.000000
ITER: 0 | i: 2 | j: 1 Result(i: 1.414214 | j: 1.000000 | i
j:
1.414214): 1.414214
ITER: 0 | i: 2 | j: 2 Result(i: 1.414214 | j: 1.414214 | ij:
2.000000): 3.414214
ITER: 0 | i: 2 | j: 3 Result(i: 1.414214 | j: 1.732051 | i
j:
2.449490): 5.863703
ITER: 0 | i: 2 | j: 4 Result(i: 1.414214 | j: 2.000000 | ij:
2.828427): 8.692130
ITER: 0 | i: 2 | j: 5 Result(i: 1.414214 | j: 2.236068 | i
j:
3.162278): 11.854408
ITER: 0 | i: 2 | j: 6 Result(i: 1.414214 | j: 2.449490 | ij:
3.464102): 15.318510
ITER: 0 | i: 2 | j: 7 Result(i: 1.414214 | j: 2.645751 | i
j:
3.741657): 19.060167
ITER: 0 | i: 2 | j: 8 Result(i: 1.414214 | j: 2.828427 | ij:
4.000000): 23.060167
ITER: 0 | i: 2 | j: 9 Result(i: 1.414214 | j: 3.000000 | i
j:
4.242641): 27.302808
ITER: 0 | i: 3 | j: 0 Result(i: 1.732051 | j: 0.000000 | ij:
0.000000): 27.302808
ITER: 0 | i: 3 | j: 1 Result(i: 1.732051 | j: 1.000000 | i
j:
1.732051): 29.034859
ITER: 0 | i: 3 | j: 2 Result(i: 1.732051 | j: 1.414214 | ij:
2.449490): 31.484348
ITER: 0 | i: 3 | j: 3 Result(i: 1.732051 | j: 1.732051 | i
j:
3.000000): 34.484348
ITER: 0 | i: 3 | j: 4 Result(i: 1.732051 | j: 2.000000 | ij:
3.464102): 37.948450
ITER: 0 | i: 3 | j: 5 Result(i: 1.732051 | j: 2.236068 | i
j:
3.872983): 41.821433
ITER: 0 | i: 3 | j: 6 Result(i: 1.732051 | j: 2.449490 | ij:
4.242641): 46.064074
ITER: 0 | i: 3 | j: 7 Result(i: 1.732051 | j: 2.645751 | i
j:
4.582576): 50.646650
ITER: 0 | i: 3 | j: 8 Result(i: 1.732051 | j: 2.828427 | ij:
4.898979): 55.545629
ITER: 0 | i: 3 | j: 9 Result(i: 1.732051 | j: 3.000000 | i
j:
5.196152): 60.741782
ITER: 0 | i: 4 | j: 0 Result(i: 2.000000 | j: 0.000000 | ij:
0.000000): 60.741782
ITER: 0 | i: 4 | j: 1 Result(i: 2.000000 | j: 1.000000 | i
j:
2.000000): 62.741782
ITER: 0 | i: 4 | j: 2 Result(i: 2.000000 | j: 1.414214 | ij:
2.828427): 65.570209
ITER: 0 | i: 4 | j: 3 Result(i: 2.000000 | j: 1.732051 | i
j:
3.464102): 69.034310
ITER: 0 | i: 4 | j: 4 Result(i: 2.000000 | j: 2.000000 | ij:
4.000000): 73.034310
ITER: 0 | i: 4 | j: 5 Result(i: 2.000000 | j: 2.236068 | i
j:
4.472136): 77.506446
ITER: 0 | i: 4 | j: 6 Result(i: 2.000000 | j: 2.449490 | ij:
4.898979): 82.405426
ITER: 0 | i: 4 | j: 7 Result(i: 2.000000 | j: 2.645751 | i
j:
5.291503): 87.696928
ITER: 0 | i: 4 | j: 8 Result(i: 2.000000 | j: 2.828427 | ij:
5.656854): 93.353783
ITER: 0 | i: 4 | j: 9 Result(i: 2.000000 | j: 3.000000 | i
j:
6.000000): 99.353783
ITER: 0 | i: 5 | j: 0 Result(i: 2.236068 | j: 0.000000 | ij:
0.000000): 99.353783
ITER: 0 | i: 5 | j: 1 Result(i: 2.236068 | j: 1.000000 | i
j:
2.236068): 101.589851
ITER: 0 | i: 5 | j: 2 Result(i: 2.236068 | j: 1.414214 | ij:
3.162278): 104.752128
ITER: 0 | i: 5 | j: 3 Result(i: 2.236068 | j: 1.732051 | i
j:
3.872983): 108.625112
ITER: 0 | i: 5 | j: 4 Result(i: 2.236068 | j: 2.000000 | ij:
4.472136): 113.097248
ITER: 0 | i: 5 | j: 5 Result(i: 2.236068 | j: 2.236068 | i
j:
5.000000): 118.097248
ITER: 0 | i: 5 | j: 6 Result(i: 2.236068 | j: 2.449490 | ij:
5.477226): 123.574473
ITER: 0 | i: 5 | j: 7 Result(i: 2.236068 | j: 2.645751 | i
j:
5.916080): 129.490553
ITER: 0 | i: 5 | j: 8 Result(i: 2.236068 | j: 2.828427 | ij:
6.324555): 135.815108
ITER: 0 | i: 5 | j: 9 Result(i: 2.236068 | j: 3.000000 | i
j:
6.708204): 142.523312
ITER: 0 | i: 6 | j: 0 Result(i: 2.449490 | j: 0.000000 | ij:
0.000000): 142.523312
ITER: 0 | i: 6 | j: 1 Result(i: 2.449490 | j: 1.000000 | i
j:
2.449490): 144.972802
ITER: 0 | i: 6 | j: 2 Result(i: 2.449490 | j: 1.414214 | ij:
3.464102): 148.436904
ITER: 0 | i: 6 | j: 3 Result(i: 2.449490 | j: 1.732051 | i
j:
4.242641): 152.679544
ITER: 0 | i: 6 | j: 4 Result(i: 2.449490 | j: 2.000000 | ij:
4.898979): 157.578524
ITER: 0 | i: 6 | j: 5 Result(i: 2.449490 | j: 2.236068 | i
j:
5.477226): 163.055749
ITER: 0 | i: 6 | j: 6 Result(i: 2.449490 | j: 2.449490 | ij:
6.000000): 169.055749
ITER: 0 | i: 6 | j: 7 Result(i: 2.449490 | j: 2.645751 | i
j:
6.480741): 175.536490
ITER: 0 | i: 6 | j: 8 Result(i: 2.449490 | j: 2.828427 | ij:
6.928203): 182.464693
ITER: 0 | i: 6 | j: 9 Result(i: 2.449490 | j: 3.000000 | i
j:
7.348469): 189.813162
ITER: 0 | i: 7 | j: 0 Result(i: 2.645751 | j: 0.000000 | ij:
0.000000): 189.813162
ITER: 0 | i: 7 | j: 1 Result(i: 2.645751 | j: 1.000000 | i
j:
2.645751): 192.458914
ITER: 0 | i: 7 | j: 2 Result(i: 2.645751 | j: 1.414214 | ij:
3.741657): 196.200571
ITER: 0 | i: 7 | j: 3 Result(i: 2.645751 | j: 1.732051 | i
j:
4.582576): 200.783147
ITER: 0 | i: 7 | j: 4 Result(i: 2.645751 | j: 2.000000 | ij:
5.291503): 206.074649
ITER: 0 | i: 7 | j: 5 Result(i: 2.645751 | j: 2.236068 | i
j:
5.916080): 211.990729
ITER: 0 | i: 7 | j: 6 Result(i: 2.645751 | j: 2.449490 | ij:
6.480741): 218.471470
ITER: 0 | i: 7 | j: 7 Result(i: 2.645751 | j: 2.645751 | i
j:
7.000000): 225.471470
ITER: 0 | i: 7 | j: 8 Result(i: 2.645751 | j: 2.828427 | ij:
7.483315): 232.954785
ITER: 0 | i: 7 | j: 9 Result(i: 2.645751 | j: 3.000000 | i
j:
7.937254): 240.892039
ITER: 0 | i: 8 | j: 0 Result(i: 2.828427 | j: 0.000000 | ij:
0.000000): 240.892039
ITER: 0 | i: 8 | j: 1 Result(i: 2.828427 | j: 1.000000 | i
j:
2.828427): 243.720466
ITER: 0 | i: 8 | j: 2 Result(i: 2.828427 | j: 1.414214 | ij:
4.000000): 247.720466
ITER: 0 | i: 8 | j: 3 Result(i: 2.828427 | j: 1.732051 | i
j:
4.898979): 252.619445
ITER: 0 | i: 8 | j: 4 Result(i: 2.828427 | j: 2.000000 | ij:
5.656854): 258.276300
ITER: 0 | i: 8 | j: 5 Result(i: 2.828427 | j: 2.236068 | i
j:
6.324555): 264.600855
ITER: 0 | i: 8 | j: 6 Result(i: 2.828427 | j: 2.449490 | ij:
6.928203): 271.529058
ITER: 0 | i: 8 | j: 7 Result(i: 2.828427 | j: 2.645751 | i
j:
7.483315): 279.012373
ITER: 0 | i: 8 | j: 8 Result(i: 2.828427 | j: 2.828427 | ij:
8.000000): 287.012373
ITER: 0 | i: 8 | j: 9 Result(i: 2.828427 | j: 3.000000 | i
j:
8.485281): 295.497654
ITER: 0 | i: 9 | j: 0 Result(i: 3.000000 | j: 0.000000 | ij:
0.000000): 295.497654
ITER: 0 | i: 9 | j: 1 Result(i: 3.000000 | j: 1.000000 | i
j:
3.000000): 298.497654
ITER: 0 | i: 9 | j: 2 Result(i: 3.000000 | j: 1.414214 | ij:
4.242641): 302.740295
ITER: 0 | i: 9 | j: 3 Result(i: 3.000000 | j: 1.732051 | i
j:
5.196152): 307.936447
ITER: 0 | i: 9 | j: 4 Result(i: 3.000000 | j: 2.000000 | ij:
6.000000): 313.936447
ITER: 0 | i: 9 | j: 5 Result(i: 3.000000 | j: 2.236068 | i
j:
6.708204): 320.644651
ITER: 0 | i: 9 | j: 6 Result(i: 3.000000 | j: 2.449490 | ij:
7.348469): 327.993120
ITER: 0 | i: 9 | j: 7 Result(i: 3.000000 | j: 2.645751 | i
j:
7.937254): 335.930374
ITER: 0 | i: 9 | j: 8 Result(i: 3.000000 | j: 2.828427 | ij:
8.485281): 344.415656
ITER: 0 | i: 9 | j: 9 Result(i: 3.000000 | j: 3.000000 | i
j:
9.000000): 353.415656
Final Result: 353.415656
ITER: 1 | i: 0 | j: 0 Result(i: 0.000000 | j: 0.000000 | ij:
0.000000): 0.000000
ITER: 1 | i: 0 | j: 1 Result(i: 0.000000 | j: 1.000000 | i
j:
0.000000): 0.000000
ITER: 1 | i: 0 | j: 2 Result(i: 0.000000 | j: 1.414214 | ij:
0.000000): 0.000000
ITER: 1 | i: 0 | j: 3 Result(i: 0.000000 | j: 1.732051 | i
j:
0.000000): 0.000000
ITER: 1 | i: 0 | j: 4 Result(i: 0.000000 | j: 2.000000 | ij:
0.000000): 0.000000
ITER: 1 | i: 0 | j: 5 Result(i: 0.000000 | j: 2.236068 | i
j:
0.000000): 0.000000
ITER: 1 | i: 0 | j: 6 Result(i: 0.000000 | j: 2.449490 | ij:
0.000000): 0.000000
ITER: 1 | i: 0 | j: 7 Result(i: 0.000000 | j: 2.645751 | i
j:
0.000000): 0.000000
ITER: 1 | i: 0 | j: 8 Result(i: 0.000000 | j: 2.828427 | ij:
0.000000): 0.000000
ITER: 1 | i: 0 | j: 9 Result(i: 0.000000 | j: 3.000000 | i
j:
0.000000): 0.000000
ITER: 1 | i: 1 | j: 0 Result(i: 1.000000 | j: 0.000000 | ij:
0.000000): 0.000000
ITER: 1 | i: 1 | j: 1 Result(i: 1.000000 | j: 1.000000 | i
j:
1.000000): 1.000000
ITER: 1 | i: 1 | j: 2 Result(i: 1.000000 | j: 1.414214 | ij:
1.414214): 2.414214
ITER: 1 | i: 1 | j: 3 Result(i: 1.000000 | j: 1.732051 | i
j:
1.732051): 4.146264
ITER: 1 | i: 1 | j: 4 Result(i: 1.000000 | j: 2.000000 | ij:
2.000000): 6.146264
ITER: 1 | i: 1 | j: 5 Result(i: 1.000000 | j: 2.236068 | i
j:
2.236068): 8.382332
ITER: 1 | i: 1 | j: 6 Result(i: 1.000000 | j: 2.449490 | ij:
2.449490): 10.831822
ITER: 1 | i: 1 | j: 7 Result(i: 1.000000 | j: 2.645751 | i
j:
2.645751): 13.477573
ITER: 1 | i: 1 | j: 8 Result(i: 1.000000 | j: 2.828427 | ij:
2.828427): 16.306001
ITER: 1 | i: 1 | j: 9 Result(i: 1.000000 | j: 3.000000 | i
j:
3.000000): 19.306001
ITER: 1 | i: 2 | j: 0 Result(i: 1.414214 | j: 0.000000 | ij:
0.000000): 19.306001
ITER: 1 | i: 2 | j: 1 Result(i: 1.414214 | j: 1.000000 | i
j:
1.414214): 20.720214
ITER: 1 | i: 2 | j: 2 Result(i: 1.414214 | j: 1.414214 | ij:
2.000000): 22.720214
ITER: 1 | i: 2 | j: 3 Result(i: 1.414214 | j: 1.732051 | i
j:
2.449490): 25.169704
ITER: 1 | i: 2 | j: 4 Result(i: 1.414214 | j: 2.000000 | ij:
2.828427): 27.998131
ITER: 1 | i: 2 | j: 5 Result(i: 1.414214 | j: 2.236068 | i
j:
3.162278): 31.160409
ITER: 1 | i: 2 | j: 6 Result(i: 1.414214 | j: 2.449490 | ij:
3.464102): 34.624510
ITER: 1 | i: 2 | j: 7 Result(i: 1.414214 | j: 2.645751 | i
j:
3.741657): 38.366168
ITER: 1 | i: 2 | j: 8 Result(i: 1.414214 | j: 2.828427 | ij:
4.000000): 42.366168
ITER: 1 | i: 2 | j: 9 Result(i: 1.414214 | j: 3.000000 | i
j:
4.242641): 46.608808
ITER: 1 | i: 3 | j: 0 Result(i: 1.732051 | j: 0.000000 | ij:
0.000000): 46.608808
ITER: 1 | i: 3 | j: 1 Result(i: 1.732051 | j: 1.000000 | i
j:
1.732051): 48.340859
ITER: 1 | i: 3 | j: 2 Result(i: 1.732051 | j: 1.414214 | ij:
2.449490): 50.790349
ITER: 1 | i: 3 | j: 3 Result(i: 1.732051 | j: 1.732051 | i
j:
3.000000): 53.790349
ITER: 1 | i: 3 | j: 4 Result(i: 1.732051 | j: 2.000000 | ij:
3.464102): 57.254450
ITER: 1 | i: 3 | j: 5 Result(i: 1.732051 | j: 2.236068 | i
j:
3.872983): 61.127434
ITER: 1 | i: 3 | j: 6 Result(i: 1.732051 | j: 2.449490 | ij:
4.242641): 65.370075
ITER: 1 | i: 3 | j: 7 Result(i: 1.732051 | j: 2.645751 | i
j:
4.582576): 69.952650
ITER: 1 | i: 3 | j: 8 Result(i: 1.732051 | j: 2.828427 | ij:
4.898979): 74.851630
ITER: 1 | i: 3 | j: 9 Result(i: 1.732051 | j: 3.000000 | i
j:
5.196152): 80.047782
ITER: 1 | i: 4 | j: 0 Result(i: 2.000000 | j: 0.000000 | ij:
0.000000): 80.047782
ITER: 1 | i: 4 | j: 1 Result(i: 2.000000 | j: 1.000000 | i
j:
2.000000): 82.047782
ITER: 1 | i: 4 | j: 2 Result(i: 2.000000 | j: 1.414214 | ij:
2.828427): 84.876209
ITER: 1 | i: 4 | j: 3 Result(i: 2.000000 | j: 1.732051 | i
j:
3.464102): 88.340311
ITER: 1 | i: 4 | j: 4 Result(i: 2.000000 | j: 2.000000 | ij:
4.000000): 92.340311
ITER: 1 | i: 4 | j: 5 Result(i: 2.000000 | j: 2.236068 | i
j:
4.472136): 96.812447
ITER: 1 | i: 4 | j: 6 Result(i: 2.000000 | j: 2.449490 | ij:
4.898979): 101.711426
ITER: 1 | i: 4 | j: 7 Result(i: 2.000000 | j: 2.645751 | i
j:
5.291503): 107.002929
ITER: 1 | i: 4 | j: 8 Result(i: 2.000000 | j: 2.828427 | ij:
5.656854): 112.659783
ITER: 1 | i: 4 | j: 9 Result(i: 2.000000 | j: 3.000000 | i
j:
6.000000): 118.659783
ITER: 1 | i: 5 | j: 0 Result(i: 2.236068 | j: 0.000000 | ij:
0.000000): 118.659783
ITER: 1 | i: 5 | j: 1 Result(i: 2.236068 | j: 1.000000 | i
j:
2.236068): 120.895851
ITER: 1 | i: 5 | j: 2 Result(i: 2.236068 | j: 1.414214 | ij:
3.162278): 124.058129
ITER: 1 | i: 5 | j: 3 Result(i: 2.236068 | j: 1.732051 | i
j:
3.872983): 127.931112
ITER: 1 | i: 5 | j: 4 Result(i: 2.236068 | j: 2.000000 | ij:
4.472136): 132.403248
ITER: 1 | i: 5 | j: 5 Result(i: 2.236068 | j: 2.236068 | i
j:
5.000000): 137.403248
ITER: 1 | i: 5 | j: 6 Result(i: 2.236068 | j: 2.449490 | ij:
5.477226): 142.880474
ITER: 1 | i: 5 | j: 7 Result(i: 2.236068 | j: 2.645751 | i
j:
5.916080): 148.796553
ITER: 1 | i: 5 | j: 8 Result(i: 2.236068 | j: 2.828427 | ij:
6.324555): 155.121109
ITER: 1 | i: 5 | j: 9 Result(i: 2.236068 | j: 3.000000 | i
j:
6.708204): 161.829313
ITER: 1 | i: 6 | j: 0 Result(i: 2.449490 | j: 0.000000 | ij:
0.000000): 161.829313
ITER: 1 | i: 6 | j: 1 Result(i: 2.449490 | j: 1.000000 | i
j:
2.449490): 164.278802
ITER: 1 | i: 6 | j: 2 Result(i: 2.449490 | j: 1.414214 | ij:
3.464102): 167.742904
ITER: 1 | i: 6 | j: 3 Result(i: 2.449490 | j: 1.732051 | i
j:
4.242641): 171.985545
ITER: 1 | i: 6 | j: 4 Result(i: 2.449490 | j: 2.000000 | ij:
4.898979): 176.884524
ITER: 1 | i: 6 | j: 5 Result(i: 2.449490 | j: 2.236068 | i
j:
5.477226): 182.361750
ITER: 1 | i: 6 | j: 6 Result(i: 2.449490 | j: 2.449490 | ij:
6.000000): 188.361750
ITER: 1 | i: 6 | j: 7 Result(i: 2.449490 | j: 2.645751 | i
j:
6.480741): 194.842491
ITER: 1 | i: 6 | j: 8 Result(i: 2.449490 | j: 2.828427 | ij:
6.928203): 201.770694
ITER: 1 | i: 6 | j: 9 Result(i: 2.449490 | j: 3.000000 | i
j:
7.348469): 209.119163
ITER: 1 | i: 7 | j: 0 Result(i: 2.645751 | j: 0.000000 | ij:
0.000000): 209.119163
ITER: 1 | i: 7 | j: 1 Result(i: 2.645751 | j: 1.000000 | i
j:
2.645751): 211.764914
ITER: 1 | i: 7 | j: 2 Result(i: 2.645751 | j: 1.414214 | ij:
3.741657): 215.506572
ITER: 1 | i: 7 | j: 3 Result(i: 2.645751 | j: 1.732051 | i
j:
4.582576): 220.089147
ITER: 1 | i: 7 | j: 4 Result(i: 2.645751 | j: 2.000000 | ij:
5.291503): 225.380650
ITER: 1 | i: 7 | j: 5 Result(i: 2.645751 | j: 2.236068 | i
j:
5.916080): 231.296730
ITER: 1 | i: 7 | j: 6 Result(i: 2.645751 | j: 2.449490 | ij:
6.480741): 237.777470
ITER: 1 | i: 7 | j: 7 Result(i: 2.645751 | j: 2.645751 | i
j:
7.000000): 244.777470
ITER: 1 | i: 7 | j: 8 Result(i: 2.645751 | j: 2.828427 | ij:
7.483315): 252.260785
ITER: 1 | i: 7 | j: 9 Result(i: 2.645751 | j: 3.000000 | i
j:
7.937254): 260.198039
ITER: 1 | i: 8 | j: 0 Result(i: 2.828427 | j: 0.000000 | ij:
0.000000): 260.198039
ITER: 1 | i: 8 | j: 1 Result(i: 2.828427 | j: 1.000000 | i
j:
2.828427): 263.026466
ITER: 1 | i: 8 | j: 2 Result(i: 2.828427 | j: 1.414214 | ij:
4.000000): 267.026466
ITER: 1 | i: 8 | j: 3 Result(i: 2.828427 | j: 1.732051 | i
j:
4.898979): 271.925446
ITER: 1 | i: 8 | j: 4 Result(i: 2.828427 | j: 2.000000 | ij:
5.656854): 277.582300
ITER: 1 | i: 8 | j: 5 Result(i: 2.828427 | j: 2.236068 | i
j:
6.324555): 283.906855
ITER: 1 | i: 8 | j: 6 Result(i: 2.828427 | j: 2.449490 | ij:
6.928203): 290.835059
ITER: 1 | i: 8 | j: 7 Result(i: 2.828427 | j: 2.645751 | i
j:
7.483315): 298.318373
ITER: 1 | i: 8 | j: 8 Result(i: 2.828427 | j: 2.828427 | ij:
8.000000): 306.318373
ITER: 1 | i: 8 | j: 9 Result(i: 2.828427 | j: 3.000000 | i
j:
8.485281): 314.803655
ITER: 1 | i: 9 | j: 0 Result(i: 3.000000 | j: 0.000000 | ij:
0.000000): 314.803655
ITER: 1 | i: 9 | j: 1 Result(i: 3.000000 | j: 1.000000 | i
j:
3.000000): 317.803655
ITER: 1 | i: 9 | j: 2 Result(i: 3.000000 | j: 1.414214 | ij:
4.242641): 322.046295
ITER: 1 | i: 9 | j: 3 Result(i: 3.000000 | j: 1.732051 | i
j:
5.196152): 327.242448
ITER: 1 | i: 9 | j: 4 Result(i: 3.000000 | j: 2.000000 | ij:
6.000000): 333.242448
ITER: 1 | i: 9 | j: 5 Result(i: 3.000000 | j: 2.236068 | i
j:
6.708204): 339.950652
ITER: 1 | i: 9 | j: 6 Result(i: 3.000000 | j: 2.449490 | ij:
7.348469): 347.299121
ITER: 1 | i: 9 | j: 7 Result(i: 3.000000 | j: 2.645751 | i
j:
7.937254): 355.236375
ITER: 1 | i: 9 | j: 8 Result(i: 3.000000 | j: 2.828427 | ij:
8.485281): 363.721656
ITER: 1 | i: 9 | j: 9 Result(i: 3.000000 | j: 3.000000 | i
j:
9.000000): 372.721656
Final Result: 372.721656

As we can see in the following iterations the sqrt(1) as well as the
result is set to zero for some reason.

ITER: 0 | i: 1 | j: 4 Result(i: 0.000000 | j: 2.000000 | ij:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 5 Result(i: 0.000000 | j: 2.236068 | i
j:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 6 Result(i: 0.000000 | j: 2.449490 | ij:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 7 Result(i: 0.000000 | j: 2.645751 | i
j:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 8 Result(i: 0.000000 | j: 2.828427 | ij:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 9 Result(i: 0.000000 | j: 3.000000 | i
j:
0.000000): 0.000000

Please help me to resolve the accuracy issue! I think that it will
be very useful for gem5 community.

To be noticed, I find the correct simulated tick in which the
application started in FS (using m5 dumpstats), and I start the
--debug-start, but the trace file which is generated is 10x larger
than SE mode for the same application. How can I compare them?

Thank you in advance!
Best regards,
Nikos

Quoting Νικόλαος Ταμπουρατζής ntampouratzis@ece.auth.gr:

Dear Jason,

I am trying to use --debug-start but in FS mode it is very
difficult to find the tick on which the application is started!

However, I am writing the following very simple c++ program:

#include <cmath>
#include <stdio.h>

int main(){

int dim = 4096;

double result;

for (int iter = 0; iter < 2; iter++){
    result = 0;
    for (int i = 0; i < dim; i++){
        for (int j = 0; j < dim; j++){
            result += sqrt(i) * sqrt(j);
        }
    }
    printf("Result: %lf\n", result); //Result: 30530733453.127449
}

}

I cross-compile it using: riscv64-linux-gnu-g++ -static -O3 -o
test_riscv test_riscv.cpp

While in X86 (without cross-compilation of course), QEMU-RISCV,
GEM5-SE the result is the same (30530733453.127449), in GEM5-FS the
result is different! In addition, the result is also different
between the 2 iterations.

Please reproduce the error if you want in order to verify my result.
Ηow can the issue be resolved?

Thank you in advance!

Best regards,
Nikos

Quoting Jason Lowe-Power jason@lowepower.com:

Hi Nikos,

You can use --debug-start to start the debugging after some number

of

ticks. Also, I would expect that the difference should come up

quickly, so

no need to run the program to the end.

For the FS mode one, you will want to just start the trace as the
application starts. This could be a bit of a pain.

I'm not really sure what fundamentally could be different. FS and SE

mode

use the exact same code for executing instructions, so I don't think

that's

the problem. Have you tried running for smaller inputs or just one
iteration?

Jason

On Wed, Sep 21, 2022 at 9:04 AM Νικόλαος Ταμπουρατζής <
ntampouratzis@ece.auth.gr> wrote:

Dear Bobby,

Iam trying to add --debug-flags=Exec (building the gem5 for

gem5.opt

not for gem5.fast which I had) but the debug traces exceed the 20GB
(and it is not finished yet) for less than 1 simulated second. How

can

I reduce the size of the debug-flags (or set something more

specific)?

In contrast I build the HPCG benchmark with DHPCG_DEBUG flag. If

you

want, you can compare these two output files
(hpcg20010909T014640_SE_Mode & HPCG-Benchmark_3.1_FS_Mode). As you

can

see, something goes wrong with the accuracy of calculations in FS

mode

(benchmark uses double precission). You can find the files here:
http://kition.mhl.tuc.gr:8000/d/68d82f3533/

Best regards,
Nikos

Quoting Jason Lowe-Power jason@lowepower.com:

That's quite odd that it works in SE mode but not FS mode!

I would suggest running with --debug-flags=Exec for both and then

perform a

diff to see how they differ.

Cheers,
Jason

On Tue, Sep 20, 2022 at 2:45 PM Νικόλαος Ταμπουρατζής <
ntampouratzis@ece.auth.gr> wrote:

Dear Bobby,

In QEMU I get the same (correct) results that I get in SE mode
simulation. I get invalid results in FS simulation (in both
riscv-fs.py and riscv-ubuntu-run.py). I cannot access real RISCV
hardware at this moment, however, if you want you may execute my

xhpcg

binary (http://kition.mhl.tuc.gr:8000/f/4ca25fdd3c/) with the
following configuration:

./xhpcg --nx=16 --ny=16 --nz=16 --npx=1 --npy=1 --npz=1 --rt=0.1

Please let me know if you have any updates!

Best regards,
Nikos

Quoting Jason Lowe-Power jason@lowepower.com:

Hi Nikos,

I notice you said the following in your original email:

In addition, I used the RISCV Ubuntu image

(

),

I installed the gcc compiler, compile it (through qemu) and I

get

wrong results too.

Is this saying you get the wrong results is QEMU? If so, the bug

is in

GCC

or the HPCG workload, not in gem5. If not, I would test in QEMU

to

make

sure the binary works there. Another way you could test to see

if

the

problem is your binary or gem5 would be to run it on real

hardware. We

have

access to some RISC-V hardware here at UC Davis, if you don't

have

access

to it.

Cheers,
Jason

On Tue, Sep 20, 2022 at 12:58 AM Νικόλαος Ταμπουρατζής <
ntampouratzis@ece.auth.gr> wrote:

Dear Bobby,

  1. I use the original riscv-fs.py which is provided in the

latest

gem5

release.
I run the gem5 once (./build/RISCV/gem5.fast -d

./HPCG_FS_results

./configs/example/gem5_library/riscv-fs.py) in order to

download

the

riscv-bootloader-vmlinux-5.10 and riscv-disk-img.
After this I mount the riscv-disk-img (sudo mount -o loop
riscv-disk-img /mnt), put the xhpcg executable and I do the

following

changes in riscv-fs.py to boot the riscv-disk-img with

executable:

image = CustomDiskImageResource(
local_path = "/home/cossim/.cache/gem5/riscv-disk-img",
)

Set the Full System workload.

board.set_kernel_disk_workload(

kernel=Resource("riscv-bootloader-vmlinux-5.10"),

                 disk_image=image,

)

Finally, in the

gem5/src/python/gem5/components/boards/riscv_board.py

I change the last line to "return ["console=ttyS0",
"root={root_value}", "rw"]" in order to allow the write

permissions

in

the image.

  1. The HPCG benchmark after some iterations calculates if the

results

are valid or not valid. In the case of FS it gives invalid

results.

As

I see from the results, one (at least) problem is that produces
different results in each HPCG execution (with the same

configuration).

Here is the HPCG output and riscv-fs.py
(http://kition.mhl.tuc.gr:8000/d/68d82f3533/). You may

reproduce

the

results in the video if you use the xhpcg executable
(http://kition.mhl.tuc.gr:8000/f/4ca25fdd3c/)

Please help me in order to solve it!

Finally, I get invalid results in the HPL benchmark in FS mode

too.

Best regards,
Nikos

Quoting Bobby Bruce bbruce@ucdavis.edu:

I'm going to need a bit more information to help:

  1. In what way have you modified
    ./configs/example/gem5_library/riscv-fs.py? Can you attach

the

script

here?

  1. What error are you getting or in what way are the results

invalid?

Dr. Bobby R. Bruce
Room 3050,
Kemper Hall, UC Davis
Davis,
CA, 95616

web: https://www.bobbybruce.net

On Mon, Sep 19, 2022 at 1:43 PM Νικόλαος Ταμπουρατζής <
ntampouratzis@ece.auth.gr> wrote:

Dear gem5 community,

I have successfully cross-compile the HPCG benchmark for

RISCV

(Serial

version, without MPI and OpenMP). While it working properly

in

gem5

SE

mode (./build/RISCV/gem5.fast -d ./HPCG_SE_results
./configs/example/se.py -c xhpcg --options '--nx=16 --ny=16

--nz=16

--npx=1 --npy=1 --npz=1 --rt=0.1'), I get invalid results

in FS

simulation using "./build/RISCV/gem5.fast -d

./HPCG_FS_results

./configs/example/gem5_library/riscv-fs.py" (I mount the

riscv

image

and put it).

Can you help me please?

In addition, I used the RISCV Ubuntu image
(

),

I installed the gcc compiler, compile it (through qemu) and

I

get

wrong results too.

Here is the Makefile which I use, the hpcg executable for

RISCV

(xhpcg), and a video that shows the results
(http://kition.mhl.tuc.gr:8000/f/4ca25fdd3c/).

P.S. I use the latest gem5 version.

Thank you in advance! :)

Best regards,
Nikos


gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-leave@gem5.org


gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-leave@gem5.org


gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-leave@gem5.org


gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-leave@gem5.org


gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-leave@gem5.org


gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-leave@gem5.org


gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-leave@gem5.org


gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-leave@gem5.org

Dear Jason & Boddy, Unfortunately, I have tried my simple example without the sqrt function and the problem remains. Specifically, I have the following simple code: #include <cmath> #include <stdio.h> int main(){ int dim = 1024; double result; for (int iter = 0; iter < 2; iter++){ result = 0; for (int i = 0; i < dim; i++){ for (int j = 0; j < dim; j++){ result += i * j; } } printf("Final Result: %lf\n", result); } } In the above code, the correct result is 274341298176.000000 (from RISCV-SE mode and x86), while in FS mode I get sometimes the correct result and other times a different number. Best regards, Nikos Quoting Jason Lowe-Power <jason@lowepower.com>: > I have an idea... > > Have you put a breakpoint in the implementation of the fsqrt_d function? I > would like to know if when running in SE mode and running in FS mode we are > using the same rounding mode. My hypothesis is that in FS mode the rounding > mode is set differently. > > Cheers, > Jason > > On Fri, Oct 7, 2022 at 12:15 AM Νικόλαος Ταμπουρατζής < > ntampouratzis@ece.auth.gr> wrote: > >> Dear Boddy, >> >> Thanks a lot for the effort! I looked in detail and I observe that the >> problem is created only using float and double variables (in the case >> of int it is working properly in FS mode). Specifically, in the case >> of float the variables are set to "nan", while in the case of double >> the variables are set to 0.000000 (in random time - probably from some >> instruction of simulated OS?). You may use a simple c/c++ example in >> order to get some traces before going to HPCG... >> >> Thank you in advance!! >> Best regards, >> Nikos >> >> >> Quoting Bobby Bruce <bbruce@ucdavis.edu>: >> >> > Hey Niko, >> > >> > Thanks for this analysis. I jumped a little into this today but didn't >> get >> > as far as you did. I wanted to find a quick way to recreate the >> following: >> > https://gem5-review.googlesource.com/c/public/gem5/+/64211. Please feel >> > free to use this, if it helps any. >> > >> > It's very strange to me that this bug hasn't manifested itself before but >> > it's undeniably there. I'll try to spend more time looking at this >> tomorrow >> > with some traces and debug flags and see if I can narrow down the >> problem. >> > >> > -- >> > Dr. Bobby R. Bruce >> > Room 3050, >> > Kemper Hall, UC Davis >> > Davis, >> > CA, 95616 >> > >> > web: https://www.bobbybruce.net >> > >> > >> > On Wed, Oct 5, 2022 at 2:26 PM Νικόλαος Ταμπουρατζής < >> > ntampouratzis@ece.auth.gr> wrote: >> > >> >> In my previous results, I had used double (not float) for the >> >> following variables: result, sq_i and sq_j. In the case of float >> >> instead of double I get "nan" and not 0.000000. >> >> >> >> Quoting Νικόλαος Ταμπουρατζής <ntampouratzis@ece.auth.gr>: >> >> >> >> > Dear Jason, all, >> >> > >> >> > I am trying to find the accuracy problem with RISCV-FS and I observe >> >> > that the problem is created (at least in my dummy example) because >> >> > the variables (double) are set to zero in random simulated time (for >> >> > this reason I get different results among executions of the same >> >> > code). Specifically for the following dummy code: >> >> > >> >> > >> >> > #include <cmath> >> >> > #include <stdio.h> >> >> > >> >> > int main(){ >> >> > >> >> > int dim = 10; >> >> > >> >> > float result; >> >> > >> >> > for (int iter = 0; iter < 2; iter++){ >> >> > result = 0; >> >> > for (int i = 0; i < dim; i++){ >> >> > for (int j = 0; j < dim; j++){ >> >> > float sq_i = sqrt(i); >> >> > float sq_j = sqrt(j); >> >> > result += sq_i * sq_j; >> >> > printf("ITER: %d | i: %d | j: %d Result(i: %f | j: >> >> > %f | i*j: %f): %f\n", iter, i , j, sq_i, sq_j, sq_i * sq_j, result); >> >> > } >> >> > } >> >> > printf("Final Result: %lf\n", result); >> >> > } >> >> > } >> >> > >> >> > >> >> > The correct Final Result in both iterations is 372.721656. However, >> >> > I get the following results in FS: >> >> > >> >> > ITER: 0 | i: 0 | j: 0 Result(i: 0.000000 | j: 0.000000 | i*j: >> >> > 0.000000): 0.000000 >> >> > ITER: 0 | i: 0 | j: 1 Result(i: 0.000000 | j: 1.000000 | i*j: >> >> > 0.000000): 0.000000 >> >> > ITER: 0 | i: 0 | j: 2 Result(i: 0.000000 | j: 1.414214 | i*j: >> >> > 0.000000): 0.000000 >> >> > ITER: 0 | i: 0 | j: 3 Result(i: 0.000000 | j: 1.732051 | i*j: >> >> > 0.000000): 0.000000 >> >> > ITER: 0 | i: 0 | j: 4 Result(i: 0.000000 | j: 2.000000 | i*j: >> >> > 0.000000): 0.000000 >> >> > ITER: 0 | i: 0 | j: 5 Result(i: 0.000000 | j: 2.236068 | i*j: >> >> > 0.000000): 0.000000 >> >> > ITER: 0 | i: 0 | j: 6 Result(i: 0.000000 | j: 2.449490 | i*j: >> >> > 0.000000): 0.000000 >> >> > ITER: 0 | i: 0 | j: 7 Result(i: 0.000000 | j: 2.645751 | i*j: >> >> > 0.000000): 0.000000 >> >> > ITER: 0 | i: 0 | j: 8 Result(i: 0.000000 | j: 2.828427 | i*j: >> >> > 0.000000): 0.000000 >> >> > ITER: 0 | i: 0 | j: 9 Result(i: 0.000000 | j: 3.000000 | i*j: >> >> > 0.000000): 0.000000 >> >> > ITER: 0 | i: 1 | j: 0 Result(i: 1.000000 | j: 0.000000 | i*j: >> >> > 0.000000): 0.000000 >> >> > ITER: 0 | i: 1 | j: 1 Result(i: 1.000000 | j: 1.000000 | i*j: >> >> > 1.000000): 1.000000 >> >> > ITER: 0 | i: 1 | j: 2 Result(i: 1.000000 | j: 1.414214 | i*j: >> >> > 1.414214): 2.414214 >> >> > ITER: 0 | i: 1 | j: 3 Result(i: 1.000000 | j: 1.732051 | i*j: >> >> > 1.732051): 4.146264 >> >> > ITER: 0 | i: 1 | j: 4 Result(i: 0.000000 | j: 2.000000 | i*j: >> >> > 0.000000): 0.000000 >> >> > ITER: 0 | i: 1 | j: 5 Result(i: 0.000000 | j: 2.236068 | i*j: >> >> > 0.000000): 0.000000 >> >> > ITER: 0 | i: 1 | j: 6 Result(i: 0.000000 | j: 2.449490 | i*j: >> >> > 0.000000): 0.000000 >> >> > ITER: 0 | i: 1 | j: 7 Result(i: 0.000000 | j: 2.645751 | i*j: >> >> > 0.000000): 0.000000 >> >> > ITER: 0 | i: 1 | j: 8 Result(i: 0.000000 | j: 2.828427 | i*j: >> >> > 0.000000): 0.000000 >> >> > ITER: 0 | i: 1 | j: 9 Result(i: 0.000000 | j: 3.000000 | i*j: >> >> > 0.000000): 0.000000 >> >> > ITER: 0 | i: 2 | j: 0 Result(i: 1.414214 | j: 0.000000 | i*j: >> >> > 0.000000): 0.000000 >> >> > ITER: 0 | i: 2 | j: 1 Result(i: 1.414214 | j: 1.000000 | i*j: >> >> > 1.414214): 1.414214 >> >> > ITER: 0 | i: 2 | j: 2 Result(i: 1.414214 | j: 1.414214 | i*j: >> >> > 2.000000): 3.414214 >> >> > ITER: 0 | i: 2 | j: 3 Result(i: 1.414214 | j: 1.732051 | i*j: >> >> > 2.449490): 5.863703 >> >> > ITER: 0 | i: 2 | j: 4 Result(i: 1.414214 | j: 2.000000 | i*j: >> >> > 2.828427): 8.692130 >> >> > ITER: 0 | i: 2 | j: 5 Result(i: 1.414214 | j: 2.236068 | i*j: >> >> > 3.162278): 11.854408 >> >> > ITER: 0 | i: 2 | j: 6 Result(i: 1.414214 | j: 2.449490 | i*j: >> >> > 3.464102): 15.318510 >> >> > ITER: 0 | i: 2 | j: 7 Result(i: 1.414214 | j: 2.645751 | i*j: >> >> > 3.741657): 19.060167 >> >> > ITER: 0 | i: 2 | j: 8 Result(i: 1.414214 | j: 2.828427 | i*j: >> >> > 4.000000): 23.060167 >> >> > ITER: 0 | i: 2 | j: 9 Result(i: 1.414214 | j: 3.000000 | i*j: >> >> > 4.242641): 27.302808 >> >> > ITER: 0 | i: 3 | j: 0 Result(i: 1.732051 | j: 0.000000 | i*j: >> >> > 0.000000): 27.302808 >> >> > ITER: 0 | i: 3 | j: 1 Result(i: 1.732051 | j: 1.000000 | i*j: >> >> > 1.732051): 29.034859 >> >> > ITER: 0 | i: 3 | j: 2 Result(i: 1.732051 | j: 1.414214 | i*j: >> >> > 2.449490): 31.484348 >> >> > ITER: 0 | i: 3 | j: 3 Result(i: 1.732051 | j: 1.732051 | i*j: >> >> > 3.000000): 34.484348 >> >> > ITER: 0 | i: 3 | j: 4 Result(i: 1.732051 | j: 2.000000 | i*j: >> >> > 3.464102): 37.948450 >> >> > ITER: 0 | i: 3 | j: 5 Result(i: 1.732051 | j: 2.236068 | i*j: >> >> > 3.872983): 41.821433 >> >> > ITER: 0 | i: 3 | j: 6 Result(i: 1.732051 | j: 2.449490 | i*j: >> >> > 4.242641): 46.064074 >> >> > ITER: 0 | i: 3 | j: 7 Result(i: 1.732051 | j: 2.645751 | i*j: >> >> > 4.582576): 50.646650 >> >> > ITER: 0 | i: 3 | j: 8 Result(i: 1.732051 | j: 2.828427 | i*j: >> >> > 4.898979): 55.545629 >> >> > ITER: 0 | i: 3 | j: 9 Result(i: 1.732051 | j: 3.000000 | i*j: >> >> > 5.196152): 60.741782 >> >> > ITER: 0 | i: 4 | j: 0 Result(i: 2.000000 | j: 0.000000 | i*j: >> >> > 0.000000): 60.741782 >> >> > ITER: 0 | i: 4 | j: 1 Result(i: 2.000000 | j: 1.000000 | i*j: >> >> > 2.000000): 62.741782 >> >> > ITER: 0 | i: 4 | j: 2 Result(i: 2.000000 | j: 1.414214 | i*j: >> >> > 2.828427): 65.570209 >> >> > ITER: 0 | i: 4 | j: 3 Result(i: 2.000000 | j: 1.732051 | i*j: >> >> > 3.464102): 69.034310 >> >> > ITER: 0 | i: 4 | j: 4 Result(i: 2.000000 | j: 2.000000 | i*j: >> >> > 4.000000): 73.034310 >> >> > ITER: 0 | i: 4 | j: 5 Result(i: 2.000000 | j: 2.236068 | i*j: >> >> > 4.472136): 77.506446 >> >> > ITER: 0 | i: 4 | j: 6 Result(i: 2.000000 | j: 2.449490 | i*j: >> >> > 4.898979): 82.405426 >> >> > ITER: 0 | i: 4 | j: 7 Result(i: 2.000000 | j: 2.645751 | i*j: >> >> > 5.291503): 87.696928 >> >> > ITER: 0 | i: 4 | j: 8 Result(i: 2.000000 | j: 2.828427 | i*j: >> >> > 5.656854): 93.353783 >> >> > ITER: 0 | i: 4 | j: 9 Result(i: 2.000000 | j: 3.000000 | i*j: >> >> > 6.000000): 99.353783 >> >> > ITER: 0 | i: 5 | j: 0 Result(i: 2.236068 | j: 0.000000 | i*j: >> >> > 0.000000): 99.353783 >> >> > ITER: 0 | i: 5 | j: 1 Result(i: 2.236068 | j: 1.000000 | i*j: >> >> > 2.236068): 101.589851 >> >> > ITER: 0 | i: 5 | j: 2 Result(i: 2.236068 | j: 1.414214 | i*j: >> >> > 3.162278): 104.752128 >> >> > ITER: 0 | i: 5 | j: 3 Result(i: 2.236068 | j: 1.732051 | i*j: >> >> > 3.872983): 108.625112 >> >> > ITER: 0 | i: 5 | j: 4 Result(i: 2.236068 | j: 2.000000 | i*j: >> >> > 4.472136): 113.097248 >> >> > ITER: 0 | i: 5 | j: 5 Result(i: 2.236068 | j: 2.236068 | i*j: >> >> > 5.000000): 118.097248 >> >> > ITER: 0 | i: 5 | j: 6 Result(i: 2.236068 | j: 2.449490 | i*j: >> >> > 5.477226): 123.574473 >> >> > ITER: 0 | i: 5 | j: 7 Result(i: 2.236068 | j: 2.645751 | i*j: >> >> > 5.916080): 129.490553 >> >> > ITER: 0 | i: 5 | j: 8 Result(i: 2.236068 | j: 2.828427 | i*j: >> >> > 6.324555): 135.815108 >> >> > ITER: 0 | i: 5 | j: 9 Result(i: 2.236068 | j: 3.000000 | i*j: >> >> > 6.708204): 142.523312 >> >> > ITER: 0 | i: 6 | j: 0 Result(i: 2.449490 | j: 0.000000 | i*j: >> >> > 0.000000): 142.523312 >> >> > ITER: 0 | i: 6 | j: 1 Result(i: 2.449490 | j: 1.000000 | i*j: >> >> > 2.449490): 144.972802 >> >> > ITER: 0 | i: 6 | j: 2 Result(i: 2.449490 | j: 1.414214 | i*j: >> >> > 3.464102): 148.436904 >> >> > ITER: 0 | i: 6 | j: 3 Result(i: 2.449490 | j: 1.732051 | i*j: >> >> > 4.242641): 152.679544 >> >> > ITER: 0 | i: 6 | j: 4 Result(i: 2.449490 | j: 2.000000 | i*j: >> >> > 4.898979): 157.578524 >> >> > ITER: 0 | i: 6 | j: 5 Result(i: 2.449490 | j: 2.236068 | i*j: >> >> > 5.477226): 163.055749 >> >> > ITER: 0 | i: 6 | j: 6 Result(i: 2.449490 | j: 2.449490 | i*j: >> >> > 6.000000): 169.055749 >> >> > ITER: 0 | i: 6 | j: 7 Result(i: 2.449490 | j: 2.645751 | i*j: >> >> > 6.480741): 175.536490 >> >> > ITER: 0 | i: 6 | j: 8 Result(i: 2.449490 | j: 2.828427 | i*j: >> >> > 6.928203): 182.464693 >> >> > ITER: 0 | i: 6 | j: 9 Result(i: 2.449490 | j: 3.000000 | i*j: >> >> > 7.348469): 189.813162 >> >> > ITER: 0 | i: 7 | j: 0 Result(i: 2.645751 | j: 0.000000 | i*j: >> >> > 0.000000): 189.813162 >> >> > ITER: 0 | i: 7 | j: 1 Result(i: 2.645751 | j: 1.000000 | i*j: >> >> > 2.645751): 192.458914 >> >> > ITER: 0 | i: 7 | j: 2 Result(i: 2.645751 | j: 1.414214 | i*j: >> >> > 3.741657): 196.200571 >> >> > ITER: 0 | i: 7 | j: 3 Result(i: 2.645751 | j: 1.732051 | i*j: >> >> > 4.582576): 200.783147 >> >> > ITER: 0 | i: 7 | j: 4 Result(i: 2.645751 | j: 2.000000 | i*j: >> >> > 5.291503): 206.074649 >> >> > ITER: 0 | i: 7 | j: 5 Result(i: 2.645751 | j: 2.236068 | i*j: >> >> > 5.916080): 211.990729 >> >> > ITER: 0 | i: 7 | j: 6 Result(i: 2.645751 | j: 2.449490 | i*j: >> >> > 6.480741): 218.471470 >> >> > ITER: 0 | i: 7 | j: 7 Result(i: 2.645751 | j: 2.645751 | i*j: >> >> > 7.000000): 225.471470 >> >> > ITER: 0 | i: 7 | j: 8 Result(i: 2.645751 | j: 2.828427 | i*j: >> >> > 7.483315): 232.954785 >> >> > ITER: 0 | i: 7 | j: 9 Result(i: 2.645751 | j: 3.000000 | i*j: >> >> > 7.937254): 240.892039 >> >> > ITER: 0 | i: 8 | j: 0 Result(i: 2.828427 | j: 0.000000 | i*j: >> >> > 0.000000): 240.892039 >> >> > ITER: 0 | i: 8 | j: 1 Result(i: 2.828427 | j: 1.000000 | i*j: >> >> > 2.828427): 243.720466 >> >> > ITER: 0 | i: 8 | j: 2 Result(i: 2.828427 | j: 1.414214 | i*j: >> >> > 4.000000): 247.720466 >> >> > ITER: 0 | i: 8 | j: 3 Result(i: 2.828427 | j: 1.732051 | i*j: >> >> > 4.898979): 252.619445 >> >> > ITER: 0 | i: 8 | j: 4 Result(i: 2.828427 | j: 2.000000 | i*j: >> >> > 5.656854): 258.276300 >> >> > ITER: 0 | i: 8 | j: 5 Result(i: 2.828427 | j: 2.236068 | i*j: >> >> > 6.324555): 264.600855 >> >> > ITER: 0 | i: 8 | j: 6 Result(i: 2.828427 | j: 2.449490 | i*j: >> >> > 6.928203): 271.529058 >> >> > ITER: 0 | i: 8 | j: 7 Result(i: 2.828427 | j: 2.645751 | i*j: >> >> > 7.483315): 279.012373 >> >> > ITER: 0 | i: 8 | j: 8 Result(i: 2.828427 | j: 2.828427 | i*j: >> >> > 8.000000): 287.012373 >> >> > ITER: 0 | i: 8 | j: 9 Result(i: 2.828427 | j: 3.000000 | i*j: >> >> > 8.485281): 295.497654 >> >> > ITER: 0 | i: 9 | j: 0 Result(i: 3.000000 | j: 0.000000 | i*j: >> >> > 0.000000): 295.497654 >> >> > ITER: 0 | i: 9 | j: 1 Result(i: 3.000000 | j: 1.000000 | i*j: >> >> > 3.000000): 298.497654 >> >> > ITER: 0 | i: 9 | j: 2 Result(i: 3.000000 | j: 1.414214 | i*j: >> >> > 4.242641): 302.740295 >> >> > ITER: 0 | i: 9 | j: 3 Result(i: 3.000000 | j: 1.732051 | i*j: >> >> > 5.196152): 307.936447 >> >> > ITER: 0 | i: 9 | j: 4 Result(i: 3.000000 | j: 2.000000 | i*j: >> >> > 6.000000): 313.936447 >> >> > ITER: 0 | i: 9 | j: 5 Result(i: 3.000000 | j: 2.236068 | i*j: >> >> > 6.708204): 320.644651 >> >> > ITER: 0 | i: 9 | j: 6 Result(i: 3.000000 | j: 2.449490 | i*j: >> >> > 7.348469): 327.993120 >> >> > ITER: 0 | i: 9 | j: 7 Result(i: 3.000000 | j: 2.645751 | i*j: >> >> > 7.937254): 335.930374 >> >> > ITER: 0 | i: 9 | j: 8 Result(i: 3.000000 | j: 2.828427 | i*j: >> >> > 8.485281): 344.415656 >> >> > ITER: 0 | i: 9 | j: 9 Result(i: 3.000000 | j: 3.000000 | i*j: >> >> > 9.000000): 353.415656 >> >> > Final Result: 353.415656 >> >> > ITER: 1 | i: 0 | j: 0 Result(i: 0.000000 | j: 0.000000 | i*j: >> >> > 0.000000): 0.000000 >> >> > ITER: 1 | i: 0 | j: 1 Result(i: 0.000000 | j: 1.000000 | i*j: >> >> > 0.000000): 0.000000 >> >> > ITER: 1 | i: 0 | j: 2 Result(i: 0.000000 | j: 1.414214 | i*j: >> >> > 0.000000): 0.000000 >> >> > ITER: 1 | i: 0 | j: 3 Result(i: 0.000000 | j: 1.732051 | i*j: >> >> > 0.000000): 0.000000 >> >> > ITER: 1 | i: 0 | j: 4 Result(i: 0.000000 | j: 2.000000 | i*j: >> >> > 0.000000): 0.000000 >> >> > ITER: 1 | i: 0 | j: 5 Result(i: 0.000000 | j: 2.236068 | i*j: >> >> > 0.000000): 0.000000 >> >> > ITER: 1 | i: 0 | j: 6 Result(i: 0.000000 | j: 2.449490 | i*j: >> >> > 0.000000): 0.000000 >> >> > ITER: 1 | i: 0 | j: 7 Result(i: 0.000000 | j: 2.645751 | i*j: >> >> > 0.000000): 0.000000 >> >> > ITER: 1 | i: 0 | j: 8 Result(i: 0.000000 | j: 2.828427 | i*j: >> >> > 0.000000): 0.000000 >> >> > ITER: 1 | i: 0 | j: 9 Result(i: 0.000000 | j: 3.000000 | i*j: >> >> > 0.000000): 0.000000 >> >> > ITER: 1 | i: 1 | j: 0 Result(i: 1.000000 | j: 0.000000 | i*j: >> >> > 0.000000): 0.000000 >> >> > ITER: 1 | i: 1 | j: 1 Result(i: 1.000000 | j: 1.000000 | i*j: >> >> > 1.000000): 1.000000 >> >> > ITER: 1 | i: 1 | j: 2 Result(i: 1.000000 | j: 1.414214 | i*j: >> >> > 1.414214): 2.414214 >> >> > ITER: 1 | i: 1 | j: 3 Result(i: 1.000000 | j: 1.732051 | i*j: >> >> > 1.732051): 4.146264 >> >> > ITER: 1 | i: 1 | j: 4 Result(i: 1.000000 | j: 2.000000 | i*j: >> >> > 2.000000): 6.146264 >> >> > ITER: 1 | i: 1 | j: 5 Result(i: 1.000000 | j: 2.236068 | i*j: >> >> > 2.236068): 8.382332 >> >> > ITER: 1 | i: 1 | j: 6 Result(i: 1.000000 | j: 2.449490 | i*j: >> >> > 2.449490): 10.831822 >> >> > ITER: 1 | i: 1 | j: 7 Result(i: 1.000000 | j: 2.645751 | i*j: >> >> > 2.645751): 13.477573 >> >> > ITER: 1 | i: 1 | j: 8 Result(i: 1.000000 | j: 2.828427 | i*j: >> >> > 2.828427): 16.306001 >> >> > ITER: 1 | i: 1 | j: 9 Result(i: 1.000000 | j: 3.000000 | i*j: >> >> > 3.000000): 19.306001 >> >> > ITER: 1 | i: 2 | j: 0 Result(i: 1.414214 | j: 0.000000 | i*j: >> >> > 0.000000): 19.306001 >> >> > ITER: 1 | i: 2 | j: 1 Result(i: 1.414214 | j: 1.000000 | i*j: >> >> > 1.414214): 20.720214 >> >> > ITER: 1 | i: 2 | j: 2 Result(i: 1.414214 | j: 1.414214 | i*j: >> >> > 2.000000): 22.720214 >> >> > ITER: 1 | i: 2 | j: 3 Result(i: 1.414214 | j: 1.732051 | i*j: >> >> > 2.449490): 25.169704 >> >> > ITER: 1 | i: 2 | j: 4 Result(i: 1.414214 | j: 2.000000 | i*j: >> >> > 2.828427): 27.998131 >> >> > ITER: 1 | i: 2 | j: 5 Result(i: 1.414214 | j: 2.236068 | i*j: >> >> > 3.162278): 31.160409 >> >> > ITER: 1 | i: 2 | j: 6 Result(i: 1.414214 | j: 2.449490 | i*j: >> >> > 3.464102): 34.624510 >> >> > ITER: 1 | i: 2 | j: 7 Result(i: 1.414214 | j: 2.645751 | i*j: >> >> > 3.741657): 38.366168 >> >> > ITER: 1 | i: 2 | j: 8 Result(i: 1.414214 | j: 2.828427 | i*j: >> >> > 4.000000): 42.366168 >> >> > ITER: 1 | i: 2 | j: 9 Result(i: 1.414214 | j: 3.000000 | i*j: >> >> > 4.242641): 46.608808 >> >> > ITER: 1 | i: 3 | j: 0 Result(i: 1.732051 | j: 0.000000 | i*j: >> >> > 0.000000): 46.608808 >> >> > ITER: 1 | i: 3 | j: 1 Result(i: 1.732051 | j: 1.000000 | i*j: >> >> > 1.732051): 48.340859 >> >> > ITER: 1 | i: 3 | j: 2 Result(i: 1.732051 | j: 1.414214 | i*j: >> >> > 2.449490): 50.790349 >> >> > ITER: 1 | i: 3 | j: 3 Result(i: 1.732051 | j: 1.732051 | i*j: >> >> > 3.000000): 53.790349 >> >> > ITER: 1 | i: 3 | j: 4 Result(i: 1.732051 | j: 2.000000 | i*j: >> >> > 3.464102): 57.254450 >> >> > ITER: 1 | i: 3 | j: 5 Result(i: 1.732051 | j: 2.236068 | i*j: >> >> > 3.872983): 61.127434 >> >> > ITER: 1 | i: 3 | j: 6 Result(i: 1.732051 | j: 2.449490 | i*j: >> >> > 4.242641): 65.370075 >> >> > ITER: 1 | i: 3 | j: 7 Result(i: 1.732051 | j: 2.645751 | i*j: >> >> > 4.582576): 69.952650 >> >> > ITER: 1 | i: 3 | j: 8 Result(i: 1.732051 | j: 2.828427 | i*j: >> >> > 4.898979): 74.851630 >> >> > ITER: 1 | i: 3 | j: 9 Result(i: 1.732051 | j: 3.000000 | i*j: >> >> > 5.196152): 80.047782 >> >> > ITER: 1 | i: 4 | j: 0 Result(i: 2.000000 | j: 0.000000 | i*j: >> >> > 0.000000): 80.047782 >> >> > ITER: 1 | i: 4 | j: 1 Result(i: 2.000000 | j: 1.000000 | i*j: >> >> > 2.000000): 82.047782 >> >> > ITER: 1 | i: 4 | j: 2 Result(i: 2.000000 | j: 1.414214 | i*j: >> >> > 2.828427): 84.876209 >> >> > ITER: 1 | i: 4 | j: 3 Result(i: 2.000000 | j: 1.732051 | i*j: >> >> > 3.464102): 88.340311 >> >> > ITER: 1 | i: 4 | j: 4 Result(i: 2.000000 | j: 2.000000 | i*j: >> >> > 4.000000): 92.340311 >> >> > ITER: 1 | i: 4 | j: 5 Result(i: 2.000000 | j: 2.236068 | i*j: >> >> > 4.472136): 96.812447 >> >> > ITER: 1 | i: 4 | j: 6 Result(i: 2.000000 | j: 2.449490 | i*j: >> >> > 4.898979): 101.711426 >> >> > ITER: 1 | i: 4 | j: 7 Result(i: 2.000000 | j: 2.645751 | i*j: >> >> > 5.291503): 107.002929 >> >> > ITER: 1 | i: 4 | j: 8 Result(i: 2.000000 | j: 2.828427 | i*j: >> >> > 5.656854): 112.659783 >> >> > ITER: 1 | i: 4 | j: 9 Result(i: 2.000000 | j: 3.000000 | i*j: >> >> > 6.000000): 118.659783 >> >> > ITER: 1 | i: 5 | j: 0 Result(i: 2.236068 | j: 0.000000 | i*j: >> >> > 0.000000): 118.659783 >> >> > ITER: 1 | i: 5 | j: 1 Result(i: 2.236068 | j: 1.000000 | i*j: >> >> > 2.236068): 120.895851 >> >> > ITER: 1 | i: 5 | j: 2 Result(i: 2.236068 | j: 1.414214 | i*j: >> >> > 3.162278): 124.058129 >> >> > ITER: 1 | i: 5 | j: 3 Result(i: 2.236068 | j: 1.732051 | i*j: >> >> > 3.872983): 127.931112 >> >> > ITER: 1 | i: 5 | j: 4 Result(i: 2.236068 | j: 2.000000 | i*j: >> >> > 4.472136): 132.403248 >> >> > ITER: 1 | i: 5 | j: 5 Result(i: 2.236068 | j: 2.236068 | i*j: >> >> > 5.000000): 137.403248 >> >> > ITER: 1 | i: 5 | j: 6 Result(i: 2.236068 | j: 2.449490 | i*j: >> >> > 5.477226): 142.880474 >> >> > ITER: 1 | i: 5 | j: 7 Result(i: 2.236068 | j: 2.645751 | i*j: >> >> > 5.916080): 148.796553 >> >> > ITER: 1 | i: 5 | j: 8 Result(i: 2.236068 | j: 2.828427 | i*j: >> >> > 6.324555): 155.121109 >> >> > ITER: 1 | i: 5 | j: 9 Result(i: 2.236068 | j: 3.000000 | i*j: >> >> > 6.708204): 161.829313 >> >> > ITER: 1 | i: 6 | j: 0 Result(i: 2.449490 | j: 0.000000 | i*j: >> >> > 0.000000): 161.829313 >> >> > ITER: 1 | i: 6 | j: 1 Result(i: 2.449490 | j: 1.000000 | i*j: >> >> > 2.449490): 164.278802 >> >> > ITER: 1 | i: 6 | j: 2 Result(i: 2.449490 | j: 1.414214 | i*j: >> >> > 3.464102): 167.742904 >> >> > ITER: 1 | i: 6 | j: 3 Result(i: 2.449490 | j: 1.732051 | i*j: >> >> > 4.242641): 171.985545 >> >> > ITER: 1 | i: 6 | j: 4 Result(i: 2.449490 | j: 2.000000 | i*j: >> >> > 4.898979): 176.884524 >> >> > ITER: 1 | i: 6 | j: 5 Result(i: 2.449490 | j: 2.236068 | i*j: >> >> > 5.477226): 182.361750 >> >> > ITER: 1 | i: 6 | j: 6 Result(i: 2.449490 | j: 2.449490 | i*j: >> >> > 6.000000): 188.361750 >> >> > ITER: 1 | i: 6 | j: 7 Result(i: 2.449490 | j: 2.645751 | i*j: >> >> > 6.480741): 194.842491 >> >> > ITER: 1 | i: 6 | j: 8 Result(i: 2.449490 | j: 2.828427 | i*j: >> >> > 6.928203): 201.770694 >> >> > ITER: 1 | i: 6 | j: 9 Result(i: 2.449490 | j: 3.000000 | i*j: >> >> > 7.348469): 209.119163 >> >> > ITER: 1 | i: 7 | j: 0 Result(i: 2.645751 | j: 0.000000 | i*j: >> >> > 0.000000): 209.119163 >> >> > ITER: 1 | i: 7 | j: 1 Result(i: 2.645751 | j: 1.000000 | i*j: >> >> > 2.645751): 211.764914 >> >> > ITER: 1 | i: 7 | j: 2 Result(i: 2.645751 | j: 1.414214 | i*j: >> >> > 3.741657): 215.506572 >> >> > ITER: 1 | i: 7 | j: 3 Result(i: 2.645751 | j: 1.732051 | i*j: >> >> > 4.582576): 220.089147 >> >> > ITER: 1 | i: 7 | j: 4 Result(i: 2.645751 | j: 2.000000 | i*j: >> >> > 5.291503): 225.380650 >> >> > ITER: 1 | i: 7 | j: 5 Result(i: 2.645751 | j: 2.236068 | i*j: >> >> > 5.916080): 231.296730 >> >> > ITER: 1 | i: 7 | j: 6 Result(i: 2.645751 | j: 2.449490 | i*j: >> >> > 6.480741): 237.777470 >> >> > ITER: 1 | i: 7 | j: 7 Result(i: 2.645751 | j: 2.645751 | i*j: >> >> > 7.000000): 244.777470 >> >> > ITER: 1 | i: 7 | j: 8 Result(i: 2.645751 | j: 2.828427 | i*j: >> >> > 7.483315): 252.260785 >> >> > ITER: 1 | i: 7 | j: 9 Result(i: 2.645751 | j: 3.000000 | i*j: >> >> > 7.937254): 260.198039 >> >> > ITER: 1 | i: 8 | j: 0 Result(i: 2.828427 | j: 0.000000 | i*j: >> >> > 0.000000): 260.198039 >> >> > ITER: 1 | i: 8 | j: 1 Result(i: 2.828427 | j: 1.000000 | i*j: >> >> > 2.828427): 263.026466 >> >> > ITER: 1 | i: 8 | j: 2 Result(i: 2.828427 | j: 1.414214 | i*j: >> >> > 4.000000): 267.026466 >> >> > ITER: 1 | i: 8 | j: 3 Result(i: 2.828427 | j: 1.732051 | i*j: >> >> > 4.898979): 271.925446 >> >> > ITER: 1 | i: 8 | j: 4 Result(i: 2.828427 | j: 2.000000 | i*j: >> >> > 5.656854): 277.582300 >> >> > ITER: 1 | i: 8 | j: 5 Result(i: 2.828427 | j: 2.236068 | i*j: >> >> > 6.324555): 283.906855 >> >> > ITER: 1 | i: 8 | j: 6 Result(i: 2.828427 | j: 2.449490 | i*j: >> >> > 6.928203): 290.835059 >> >> > ITER: 1 | i: 8 | j: 7 Result(i: 2.828427 | j: 2.645751 | i*j: >> >> > 7.483315): 298.318373 >> >> > ITER: 1 | i: 8 | j: 8 Result(i: 2.828427 | j: 2.828427 | i*j: >> >> > 8.000000): 306.318373 >> >> > ITER: 1 | i: 8 | j: 9 Result(i: 2.828427 | j: 3.000000 | i*j: >> >> > 8.485281): 314.803655 >> >> > ITER: 1 | i: 9 | j: 0 Result(i: 3.000000 | j: 0.000000 | i*j: >> >> > 0.000000): 314.803655 >> >> > ITER: 1 | i: 9 | j: 1 Result(i: 3.000000 | j: 1.000000 | i*j: >> >> > 3.000000): 317.803655 >> >> > ITER: 1 | i: 9 | j: 2 Result(i: 3.000000 | j: 1.414214 | i*j: >> >> > 4.242641): 322.046295 >> >> > ITER: 1 | i: 9 | j: 3 Result(i: 3.000000 | j: 1.732051 | i*j: >> >> > 5.196152): 327.242448 >> >> > ITER: 1 | i: 9 | j: 4 Result(i: 3.000000 | j: 2.000000 | i*j: >> >> > 6.000000): 333.242448 >> >> > ITER: 1 | i: 9 | j: 5 Result(i: 3.000000 | j: 2.236068 | i*j: >> >> > 6.708204): 339.950652 >> >> > ITER: 1 | i: 9 | j: 6 Result(i: 3.000000 | j: 2.449490 | i*j: >> >> > 7.348469): 347.299121 >> >> > ITER: 1 | i: 9 | j: 7 Result(i: 3.000000 | j: 2.645751 | i*j: >> >> > 7.937254): 355.236375 >> >> > ITER: 1 | i: 9 | j: 8 Result(i: 3.000000 | j: 2.828427 | i*j: >> >> > 8.485281): 363.721656 >> >> > ITER: 1 | i: 9 | j: 9 Result(i: 3.000000 | j: 3.000000 | i*j: >> >> > 9.000000): 372.721656 >> >> > Final Result: 372.721656 >> >> > >> >> > >> >> > >> >> > As we can see in the following iterations the sqrt(1) as well as the >> >> > result is set to zero for some reason. >> >> > >> >> > ITER: 0 | i: 1 | j: 4 Result(i: 0.000000 | j: 2.000000 | i*j: >> >> > 0.000000): 0.000000 >> >> > ITER: 0 | i: 1 | j: 5 Result(i: 0.000000 | j: 2.236068 | i*j: >> >> > 0.000000): 0.000000 >> >> > ITER: 0 | i: 1 | j: 6 Result(i: 0.000000 | j: 2.449490 | i*j: >> >> > 0.000000): 0.000000 >> >> > ITER: 0 | i: 1 | j: 7 Result(i: 0.000000 | j: 2.645751 | i*j: >> >> > 0.000000): 0.000000 >> >> > ITER: 0 | i: 1 | j: 8 Result(i: 0.000000 | j: 2.828427 | i*j: >> >> > 0.000000): 0.000000 >> >> > ITER: 0 | i: 1 | j: 9 Result(i: 0.000000 | j: 3.000000 | i*j: >> >> > 0.000000): 0.000000 >> >> > >> >> > Please help me to resolve the accuracy issue! I think that it will >> >> > be very useful for gem5 community. >> >> > >> >> > To be noticed, I find the correct simulated tick in which the >> >> > application started in FS (using m5 dumpstats), and I start the >> >> > --debug-start, but the trace file which is generated is 10x larger >> >> > than SE mode for the same application. How can I compare them? >> >> > >> >> > Thank you in advance! >> >> > Best regards, >> >> > Nikos >> >> > >> >> > Quoting Νικόλαος Ταμπουρατζής <ntampouratzis@ece.auth.gr>: >> >> > >> >> >> Dear Jason, >> >> >> >> >> >> I am trying to use --debug-start but in FS mode it is very >> >> >> difficult to find the tick on which the application is started! >> >> >> >> >> >> However, I am writing the following very simple c++ program: >> >> >> >> >> >> #include <cmath> >> >> >> #include <stdio.h> >> >> >> >> >> >> int main(){ >> >> >> >> >> >> int dim = 4096; >> >> >> >> >> >> double result; >> >> >> >> >> >> for (int iter = 0; iter < 2; iter++){ >> >> >> result = 0; >> >> >> for (int i = 0; i < dim; i++){ >> >> >> for (int j = 0; j < dim; j++){ >> >> >> result += sqrt(i) * sqrt(j); >> >> >> } >> >> >> } >> >> >> printf("Result: %lf\n", result); //Result: 30530733453.127449 >> >> >> } >> >> >> } >> >> >> >> >> >> I cross-compile it using: riscv64-linux-gnu-g++ -static -O3 -o >> >> >> test_riscv test_riscv.cpp >> >> >> >> >> >> >> >> >> While in X86 (without cross-compilation of course), QEMU-RISCV, >> >> >> GEM5-SE the result is the same (30530733453.127449), in GEM5-FS the >> >> >> result is different! In addition, the result is also different >> >> >> between the 2 iterations. >> >> >> >> >> >> Please reproduce the error if you want in order to verify my result. >> >> >> Ηow can the issue be resolved? >> >> >> >> >> >> Thank you in advance! >> >> >> >> >> >> Best regards, >> >> >> Nikos >> >> >> >> >> >> >> >> >> Quoting Jason Lowe-Power <jason@lowepower.com>: >> >> >> >> >> >>> Hi Nikos, >> >> >>> >> >> >>> You can use --debug-start to start the debugging after some number >> of >> >> >>> ticks. Also, I would expect that the difference should come up >> >> quickly, so >> >> >>> no need to run the program to the end. >> >> >>> >> >> >>> For the FS mode one, you will want to just start the trace as the >> >> >>> application starts. This could be a bit of a pain. >> >> >>> >> >> >>> I'm not really sure what fundamentally could be different. FS and SE >> >> mode >> >> >>> use the exact same code for executing instructions, so I don't think >> >> that's >> >> >>> the problem. Have you tried running for smaller inputs or just one >> >> >>> iteration? >> >> >>> >> >> >>> Jason >> >> >>> >> >> >>> >> >> >>> >> >> >>> On Wed, Sep 21, 2022 at 9:04 AM Νικόλαος Ταμπουρατζής < >> >> >>> ntampouratzis@ece.auth.gr> wrote: >> >> >>> >> >> >>>> Dear Bobby, >> >> >>>> >> >> >>>> Iam trying to add --debug-flags=Exec (building the gem5 for >> gem5.opt >> >> >>>> not for gem5.fast which I had) but the debug traces exceed the 20GB >> >> >>>> (and it is not finished yet) for less than 1 simulated second. How >> can >> >> >>>> I reduce the size of the debug-flags (or set something more >> specific)? >> >> >>>> >> >> >>>> In contrast I build the HPCG benchmark with DHPCG_DEBUG flag. If >> you >> >> >>>> want, you can compare these two output files >> >> >>>> (hpcg20010909T014640_SE_Mode & HPCG-Benchmark_3.1_FS_Mode). As you >> can >> >> >>>> see, something goes wrong with the accuracy of calculations in FS >> mode >> >> >>>> (benchmark uses double precission). You can find the files here: >> >> >>>> http://kition.mhl.tuc.gr:8000/d/68d82f3533/ >> >> >>>> >> >> >>>> Best regards, >> >> >>>> Nikos >> >> >>>> >> >> >>>> Quoting Jason Lowe-Power <jason@lowepower.com>: >> >> >>>> >> >> >>>>> That's quite odd that it works in SE mode but not FS mode! >> >> >>>>> >> >> >>>>> I would suggest running with --debug-flags=Exec for both and then >> >> >>>> perform a >> >> >>>>> diff to see how they differ. >> >> >>>>> >> >> >>>>> Cheers, >> >> >>>>> Jason >> >> >>>>> >> >> >>>>> On Tue, Sep 20, 2022 at 2:45 PM Νικόλαος Ταμπουρατζής < >> >> >>>>> ntampouratzis@ece.auth.gr> wrote: >> >> >>>>> >> >> >>>>>> Dear Bobby, >> >> >>>>>> >> >> >>>>>> In QEMU I get the same (correct) results that I get in SE mode >> >> >>>>>> simulation. I get invalid results in FS simulation (in both >> >> >>>>>> riscv-fs.py and riscv-ubuntu-run.py). I cannot access real RISCV >> >> >>>>>> hardware at this moment, however, if you want you may execute my >> >> xhpcg >> >> >>>>>> binary (http://kition.mhl.tuc.gr:8000/f/4ca25fdd3c/) with the >> >> >>>>>> following configuration: >> >> >>>>>> >> >> >>>>>> ./xhpcg --nx=16 --ny=16 --nz=16 --npx=1 --npy=1 --npz=1 --rt=0.1 >> >> >>>>>> >> >> >>>>>> Please let me know if you have any updates! >> >> >>>>>> >> >> >>>>>> Best regards, >> >> >>>>>> Nikos >> >> >>>>>> >> >> >>>>>> >> >> >>>>>> Quoting Jason Lowe-Power <jason@lowepower.com>: >> >> >>>>>> >> >> >>>>>>> Hi Nikos, >> >> >>>>>>> >> >> >>>>>>> I notice you said the following in your original email: >> >> >>>>>>> >> >> >>>>>>> In addition, I used the RISCV Ubuntu image >> >> >>>>>>>> ( >> >> https://github.com/gem5/gem5-resources/tree/stable/src/riscv-ubuntu >> >> >>>> ), >> >> >>>>>>>> I installed the gcc compiler, compile it (through qemu) and I >> get >> >> >>>>>>>> wrong results too. >> >> >>>>>>> >> >> >>>>>>> >> >> >>>>>>> Is this saying you get the wrong results is QEMU? If so, the bug >> >> is in >> >> >>>>>> GCC >> >> >>>>>>> or the HPCG workload, not in gem5. If not, I would test in QEMU >> to >> >> >>>> make >> >> >>>>>>> sure the binary works there. Another way you could test to see >> if >> >> the >> >> >>>>>>> problem is your binary or gem5 would be to run it on real >> >> hardware. We >> >> >>>>>> have >> >> >>>>>>> access to some RISC-V hardware here at UC Davis, if you don't >> have >> >> >>>> access >> >> >>>>>>> to it. >> >> >>>>>>> >> >> >>>>>>> Cheers, >> >> >>>>>>> Jason >> >> >>>>>>> >> >> >>>>>>> On Tue, Sep 20, 2022 at 12:58 AM Νικόλαος Ταμπουρατζής < >> >> >>>>>>> ntampouratzis@ece.auth.gr> wrote: >> >> >>>>>>> >> >> >>>>>>>> Dear Bobby, >> >> >>>>>>>> >> >> >>>>>>>> 1) I use the original riscv-fs.py which is provided in the >> latest >> >> >>>> gem5 >> >> >>>>>>>> release. >> >> >>>>>>>> I run the gem5 once (./build/RISCV/gem5.fast -d >> ./HPCG_FS_results >> >> >>>>>>>> ./configs/example/gem5_library/riscv-fs.py) in order to >> download >> >> the >> >> >>>>>>>> riscv-bootloader-vmlinux-5.10 and riscv-disk-img. >> >> >>>>>>>> After this I mount the riscv-disk-img (sudo mount -o loop >> >> >>>>>>>> riscv-disk-img /mnt), put the xhpcg executable and I do the >> >> following >> >> >>>>>>>> changes in riscv-fs.py to boot the riscv-disk-img with >> executable: >> >> >>>>>>>> >> >> >>>>>>>> image = CustomDiskImageResource( >> >> >>>>>>>> local_path = "/home/cossim/.cache/gem5/riscv-disk-img", >> >> >>>>>>>> ) >> >> >>>>>>>> >> >> >>>>>>>> # Set the Full System workload. >> >> >>>>>>>> board.set_kernel_disk_workload( >> >> >>>>>>>> >> >> kernel=Resource("riscv-bootloader-vmlinux-5.10"), >> >> >>>>>>>> disk_image=image, >> >> >>>>>>>> ) >> >> >>>>>>>> >> >> >>>>>>>> Finally, in the >> >> gem5/src/python/gem5/components/boards/riscv_board.py >> >> >>>>>>>> I change the last line to "return ["console=ttyS0", >> >> >>>>>>>> "root={root_value}", "rw"]" in order to allow the write >> >> permissions >> >> >>>> in >> >> >>>>>>>> the image. >> >> >>>>>>>> >> >> >>>>>>>> >> >> >>>>>>>> 2) The HPCG benchmark after some iterations calculates if the >> >> results >> >> >>>>>>>> are valid or not valid. In the case of FS it gives invalid >> >> results. >> >> >>>> As >> >> >>>>>>>> I see from the results, one (at least) problem is that produces >> >> >>>>>>>> different results in each HPCG execution (with the same >> >> >>>> configuration). >> >> >>>>>>>> >> >> >>>>>>>> Here is the HPCG output and riscv-fs.py >> >> >>>>>>>> (http://kition.mhl.tuc.gr:8000/d/68d82f3533/). You may >> reproduce >> >> the >> >> >>>>>>>> results in the video if you use the xhpcg executable >> >> >>>>>>>> (http://kition.mhl.tuc.gr:8000/f/4ca25fdd3c/) >> >> >>>>>>>> >> >> >>>>>>>> Please help me in order to solve it! >> >> >>>>>>>> >> >> >>>>>>>> Finally, I get invalid results in the HPL benchmark in FS mode >> >> too. >> >> >>>>>>>> >> >> >>>>>>>> Best regards, >> >> >>>>>>>> Nikos >> >> >>>>>>>> >> >> >>>>>>>> >> >> >>>>>>>> Quoting Bobby Bruce <bbruce@ucdavis.edu>: >> >> >>>>>>>> >> >> >>>>>>>> > I'm going to need a bit more information to help: >> >> >>>>>>>> > >> >> >>>>>>>> > 1. In what way have you modified >> >> >>>>>>>> > ./configs/example/gem5_library/riscv-fs.py? Can you attach >> the >> >> >>>> script >> >> >>>>>>>> here? >> >> >>>>>>>> > 2. What error are you getting or in what way are the results >> >> >>>> invalid? >> >> >>>>>>>> > >> >> >>>>>>>> > - >> >> >>>>>>>> > Dr. Bobby R. Bruce >> >> >>>>>>>> > Room 3050, >> >> >>>>>>>> > Kemper Hall, UC Davis >> >> >>>>>>>> > Davis, >> >> >>>>>>>> > CA, 95616 >> >> >>>>>>>> > >> >> >>>>>>>> > web: https://www.bobbybruce.net >> >> >>>>>>>> > >> >> >>>>>>>> > >> >> >>>>>>>> > On Mon, Sep 19, 2022 at 1:43 PM Νικόλαος Ταμπουρατζής < >> >> >>>>>>>> > ntampouratzis@ece.auth.gr> wrote: >> >> >>>>>>>> > >> >> >>>>>>>> >> >> >> >>>>>>>> >> Dear gem5 community, >> >> >>>>>>>> >> >> >> >>>>>>>> >> I have successfully cross-compile the HPCG benchmark for >> RISCV >> >> >>>>>> (Serial >> >> >>>>>>>> >> version, without MPI and OpenMP). While it working properly >> in >> >> >>>> gem5 >> >> >>>>>> SE >> >> >>>>>>>> >> mode (./build/RISCV/gem5.fast -d ./HPCG_SE_results >> >> >>>>>>>> >> ./configs/example/se.py -c xhpcg --options '--nx=16 --ny=16 >> >> >>>> --nz=16 >> >> >>>>>>>> >> --npx=1 --npy=1 --npz=1 --rt=0.1'), I get invalid results >> in FS >> >> >>>>>>>> >> simulation using "./build/RISCV/gem5.fast -d >> ./HPCG_FS_results >> >> >>>>>>>> >> ./configs/example/gem5_library/riscv-fs.py" (I mount the >> riscv >> >> >>>> image >> >> >>>>>>>> >> and put it). >> >> >>>>>>>> >> >> >> >>>>>>>> >> Can you help me please? >> >> >>>>>>>> >> >> >> >>>>>>>> >> In addition, I used the RISCV Ubuntu image >> >> >>>>>>>> >> ( >> >> >>>> >> https://github.com/gem5/gem5-resources/tree/stable/src/riscv-ubuntu >> >> >>>>>> ), >> >> >>>>>>>> >> I installed the gcc compiler, compile it (through qemu) and >> I >> >> get >> >> >>>>>>>> >> wrong results too. >> >> >>>>>>>> >> >> >> >>>>>>>> >> Here is the Makefile which I use, the hpcg executable for >> RISCV >> >> >>>>>>>> >> (xhpcg), and a video that shows the results >> >> >>>>>>>> >> (http://kition.mhl.tuc.gr:8000/f/4ca25fdd3c/). >> >> >>>>>>>> >> >> >> >>>>>>>> >> P.S. I use the latest gem5 version. >> >> >>>>>>>> >> >> >> >>>>>>>> >> Thank you in advance! :) >> >> >>>>>>>> >> >> >> >>>>>>>> >> Best regards, >> >> >>>>>>>> >> Nikos >> >> >>>>>>>> >> _______________________________________________ >> >> >>>>>>>> >> gem5-users mailing list -- gem5-users@gem5.org >> >> >>>>>>>> >> To unsubscribe send an email to gem5-users-leave@gem5.org >> >> >>>>>>>> >> >> >> >>>>>>>> >> >> >>>>>>>> >> >> >>>>>>>> _______________________________________________ >> >> >>>>>>>> gem5-users mailing list -- gem5-users@gem5.org >> >> >>>>>>>> To unsubscribe send an email to gem5-users-leave@gem5.org >> >> >>>>>>>> >> >> >>>>>> >> >> >>>>>> >> >> >>>>>> _______________________________________________ >> >> >>>>>> gem5-users mailing list -- gem5-users@gem5.org >> >> >>>>>> To unsubscribe send an email to gem5-users-leave@gem5.org >> >> >>>>>> >> >> >>>> >> >> >>>> >> >> >>>> _______________________________________________ >> >> >>>> gem5-users mailing list -- gem5-users@gem5.org >> >> >>>> To unsubscribe send an email to gem5-users-leave@gem5.org >> >> >>>> >> >> >> >> >> >> >> >> >> _______________________________________________ >> >> >> gem5-users mailing list -- gem5-users@gem5.org >> >> >> To unsubscribe send an email to gem5-users-leave@gem5.org >> >> > >> >> > >> >> > _______________________________________________ >> >> > gem5-users mailing list -- gem5-users@gem5.org >> >> > To unsubscribe send an email to gem5-users-leave@gem5.org >> >> >> >> >> >> _______________________________________________ >> >> gem5-users mailing list -- gem5-users@gem5.org >> >> To unsubscribe send an email to gem5-users-leave@gem5.org >> >> >> >> >> _______________________________________________ >> gem5-users mailing list -- gem5-users@gem5.org >> To unsubscribe send an email to gem5-users-leave@gem5.org >>
HN
Hoa Nguyen
Sat, Oct 8, 2022 1:40 AM

Hi,

It's quite odd that both sqrt_i and result were zeroed out at the same
time. Does the problem appear in other ISA FS mode, e.g. x86 FS mode? Can
you show the objdump of the loop as well?

Regards,
Hoa Nguyen

On Thu, Oct 6, 2022, 04:06 Νικόλαος Ταμπουρατζής ntampouratzis@ece.auth.gr
wrote:

Dear Jason, all,

I am trying to find the accuracy problem with RISCV-FS and I observe
that the problem is created (at least in my dummy example) because the
variables (double) are set to zero in random simulated time (for this
reason I get different results among executions of the same code).
Specifically for the following dummy code:

#include <cmath>
#include <stdio.h>

int main(){

  int dim = 10;

  float result;

  for (int iter = 0; iter < 2; iter++){
      result = 0;
      for (int i = 0; i < dim; i++){
          for (int j = 0; j < dim; j++){
              float sq_i = sqrt(i);
              float sq_j = sqrt(j);
              result += sq_i * sq_j;
              printf("ITER: %d | i: %d | j: %d Result(i: %f | j: %f

| i*j: %f): %f\n", iter, i , j, sq_i, sq_j, sq_i * sq_j, result);
}
}
printf("Final Result: %lf\n", result);
}
}

The correct Final Result in both iterations is 372.721656. However, I
get the following results in FS:

ITER: 0 | i: 0 | j: 0 Result(i: 0.000000 | j: 0.000000 | ij:
0.000000): 0.000000
ITER: 0 | i: 0 | j: 1 Result(i: 0.000000 | j: 1.000000 | i
j:
0.000000): 0.000000
ITER: 0 | i: 0 | j: 2 Result(i: 0.000000 | j: 1.414214 | ij:
0.000000): 0.000000
ITER: 0 | i: 0 | j: 3 Result(i: 0.000000 | j: 1.732051 | i
j:
0.000000): 0.000000
ITER: 0 | i: 0 | j: 4 Result(i: 0.000000 | j: 2.000000 | ij:
0.000000): 0.000000
ITER: 0 | i: 0 | j: 5 Result(i: 0.000000 | j: 2.236068 | i
j:
0.000000): 0.000000
ITER: 0 | i: 0 | j: 6 Result(i: 0.000000 | j: 2.449490 | ij:
0.000000): 0.000000
ITER: 0 | i: 0 | j: 7 Result(i: 0.000000 | j: 2.645751 | i
j:
0.000000): 0.000000
ITER: 0 | i: 0 | j: 8 Result(i: 0.000000 | j: 2.828427 | ij:
0.000000): 0.000000
ITER: 0 | i: 0 | j: 9 Result(i: 0.000000 | j: 3.000000 | i
j:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 0 Result(i: 1.000000 | j: 0.000000 | ij:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 1 Result(i: 1.000000 | j: 1.000000 | i
j:
1.000000): 1.000000
ITER: 0 | i: 1 | j: 2 Result(i: 1.000000 | j: 1.414214 | ij:
1.414214): 2.414214
ITER: 0 | i: 1 | j: 3 Result(i: 1.000000 | j: 1.732051 | i
j:
1.732051): 4.146264
ITER: 0 | i: 1 | j: 4 Result(i: 0.000000 | j: 2.000000 | ij:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 5 Result(i: 0.000000 | j: 2.236068 | i
j:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 6 Result(i: 0.000000 | j: 2.449490 | ij:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 7 Result(i: 0.000000 | j: 2.645751 | i
j:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 8 Result(i: 0.000000 | j: 2.828427 | ij:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 9 Result(i: 0.000000 | j: 3.000000 | i
j:
0.000000): 0.000000
ITER: 0 | i: 2 | j: 0 Result(i: 1.414214 | j: 0.000000 | ij:
0.000000): 0.000000
ITER: 0 | i: 2 | j: 1 Result(i: 1.414214 | j: 1.000000 | i
j:
1.414214): 1.414214
ITER: 0 | i: 2 | j: 2 Result(i: 1.414214 | j: 1.414214 | ij:
2.000000): 3.414214
ITER: 0 | i: 2 | j: 3 Result(i: 1.414214 | j: 1.732051 | i
j:
2.449490): 5.863703
ITER: 0 | i: 2 | j: 4 Result(i: 1.414214 | j: 2.000000 | ij:
2.828427): 8.692130
ITER: 0 | i: 2 | j: 5 Result(i: 1.414214 | j: 2.236068 | i
j:
3.162278): 11.854408
ITER: 0 | i: 2 | j: 6 Result(i: 1.414214 | j: 2.449490 | ij:
3.464102): 15.318510
ITER: 0 | i: 2 | j: 7 Result(i: 1.414214 | j: 2.645751 | i
j:
3.741657): 19.060167
ITER: 0 | i: 2 | j: 8 Result(i: 1.414214 | j: 2.828427 | ij:
4.000000): 23.060167
ITER: 0 | i: 2 | j: 9 Result(i: 1.414214 | j: 3.000000 | i
j:
4.242641): 27.302808
ITER: 0 | i: 3 | j: 0 Result(i: 1.732051 | j: 0.000000 | ij:
0.000000): 27.302808
ITER: 0 | i: 3 | j: 1 Result(i: 1.732051 | j: 1.000000 | i
j:
1.732051): 29.034859
ITER: 0 | i: 3 | j: 2 Result(i: 1.732051 | j: 1.414214 | ij:
2.449490): 31.484348
ITER: 0 | i: 3 | j: 3 Result(i: 1.732051 | j: 1.732051 | i
j:
3.000000): 34.484348
ITER: 0 | i: 3 | j: 4 Result(i: 1.732051 | j: 2.000000 | ij:
3.464102): 37.948450
ITER: 0 | i: 3 | j: 5 Result(i: 1.732051 | j: 2.236068 | i
j:
3.872983): 41.821433
ITER: 0 | i: 3 | j: 6 Result(i: 1.732051 | j: 2.449490 | ij:
4.242641): 46.064074
ITER: 0 | i: 3 | j: 7 Result(i: 1.732051 | j: 2.645751 | i
j:
4.582576): 50.646650
ITER: 0 | i: 3 | j: 8 Result(i: 1.732051 | j: 2.828427 | ij:
4.898979): 55.545629
ITER: 0 | i: 3 | j: 9 Result(i: 1.732051 | j: 3.000000 | i
j:
5.196152): 60.741782
ITER: 0 | i: 4 | j: 0 Result(i: 2.000000 | j: 0.000000 | ij:
0.000000): 60.741782
ITER: 0 | i: 4 | j: 1 Result(i: 2.000000 | j: 1.000000 | i
j:
2.000000): 62.741782
ITER: 0 | i: 4 | j: 2 Result(i: 2.000000 | j: 1.414214 | ij:
2.828427): 65.570209
ITER: 0 | i: 4 | j: 3 Result(i: 2.000000 | j: 1.732051 | i
j:
3.464102): 69.034310
ITER: 0 | i: 4 | j: 4 Result(i: 2.000000 | j: 2.000000 | ij:
4.000000): 73.034310
ITER: 0 | i: 4 | j: 5 Result(i: 2.000000 | j: 2.236068 | i
j:
4.472136): 77.506446
ITER: 0 | i: 4 | j: 6 Result(i: 2.000000 | j: 2.449490 | ij:
4.898979): 82.405426
ITER: 0 | i: 4 | j: 7 Result(i: 2.000000 | j: 2.645751 | i
j:
5.291503): 87.696928
ITER: 0 | i: 4 | j: 8 Result(i: 2.000000 | j: 2.828427 | ij:
5.656854): 93.353783
ITER: 0 | i: 4 | j: 9 Result(i: 2.000000 | j: 3.000000 | i
j:
6.000000): 99.353783
ITER: 0 | i: 5 | j: 0 Result(i: 2.236068 | j: 0.000000 | ij:
0.000000): 99.353783
ITER: 0 | i: 5 | j: 1 Result(i: 2.236068 | j: 1.000000 | i
j:
2.236068): 101.589851
ITER: 0 | i: 5 | j: 2 Result(i: 2.236068 | j: 1.414214 | ij:
3.162278): 104.752128
ITER: 0 | i: 5 | j: 3 Result(i: 2.236068 | j: 1.732051 | i
j:
3.872983): 108.625112
ITER: 0 | i: 5 | j: 4 Result(i: 2.236068 | j: 2.000000 | ij:
4.472136): 113.097248
ITER: 0 | i: 5 | j: 5 Result(i: 2.236068 | j: 2.236068 | i
j:
5.000000): 118.097248
ITER: 0 | i: 5 | j: 6 Result(i: 2.236068 | j: 2.449490 | ij:
5.477226): 123.574473
ITER: 0 | i: 5 | j: 7 Result(i: 2.236068 | j: 2.645751 | i
j:
5.916080): 129.490553
ITER: 0 | i: 5 | j: 8 Result(i: 2.236068 | j: 2.828427 | ij:
6.324555): 135.815108
ITER: 0 | i: 5 | j: 9 Result(i: 2.236068 | j: 3.000000 | i
j:
6.708204): 142.523312
ITER: 0 | i: 6 | j: 0 Result(i: 2.449490 | j: 0.000000 | ij:
0.000000): 142.523312
ITER: 0 | i: 6 | j: 1 Result(i: 2.449490 | j: 1.000000 | i
j:
2.449490): 144.972802
ITER: 0 | i: 6 | j: 2 Result(i: 2.449490 | j: 1.414214 | ij:
3.464102): 148.436904
ITER: 0 | i: 6 | j: 3 Result(i: 2.449490 | j: 1.732051 | i
j:
4.242641): 152.679544
ITER: 0 | i: 6 | j: 4 Result(i: 2.449490 | j: 2.000000 | ij:
4.898979): 157.578524
ITER: 0 | i: 6 | j: 5 Result(i: 2.449490 | j: 2.236068 | i
j:
5.477226): 163.055749
ITER: 0 | i: 6 | j: 6 Result(i: 2.449490 | j: 2.449490 | ij:
6.000000): 169.055749
ITER: 0 | i: 6 | j: 7 Result(i: 2.449490 | j: 2.645751 | i
j:
6.480741): 175.536490
ITER: 0 | i: 6 | j: 8 Result(i: 2.449490 | j: 2.828427 | ij:
6.928203): 182.464693
ITER: 0 | i: 6 | j: 9 Result(i: 2.449490 | j: 3.000000 | i
j:
7.348469): 189.813162
ITER: 0 | i: 7 | j: 0 Result(i: 2.645751 | j: 0.000000 | ij:
0.000000): 189.813162
ITER: 0 | i: 7 | j: 1 Result(i: 2.645751 | j: 1.000000 | i
j:
2.645751): 192.458914
ITER: 0 | i: 7 | j: 2 Result(i: 2.645751 | j: 1.414214 | ij:
3.741657): 196.200571
ITER: 0 | i: 7 | j: 3 Result(i: 2.645751 | j: 1.732051 | i
j:
4.582576): 200.783147
ITER: 0 | i: 7 | j: 4 Result(i: 2.645751 | j: 2.000000 | ij:
5.291503): 206.074649
ITER: 0 | i: 7 | j: 5 Result(i: 2.645751 | j: 2.236068 | i
j:
5.916080): 211.990729
ITER: 0 | i: 7 | j: 6 Result(i: 2.645751 | j: 2.449490 | ij:
6.480741): 218.471470
ITER: 0 | i: 7 | j: 7 Result(i: 2.645751 | j: 2.645751 | i
j:
7.000000): 225.471470
ITER: 0 | i: 7 | j: 8 Result(i: 2.645751 | j: 2.828427 | ij:
7.483315): 232.954785
ITER: 0 | i: 7 | j: 9 Result(i: 2.645751 | j: 3.000000 | i
j:
7.937254): 240.892039
ITER: 0 | i: 8 | j: 0 Result(i: 2.828427 | j: 0.000000 | ij:
0.000000): 240.892039
ITER: 0 | i: 8 | j: 1 Result(i: 2.828427 | j: 1.000000 | i
j:
2.828427): 243.720466
ITER: 0 | i: 8 | j: 2 Result(i: 2.828427 | j: 1.414214 | ij:
4.000000): 247.720466
ITER: 0 | i: 8 | j: 3 Result(i: 2.828427 | j: 1.732051 | i
j:
4.898979): 252.619445
ITER: 0 | i: 8 | j: 4 Result(i: 2.828427 | j: 2.000000 | ij:
5.656854): 258.276300
ITER: 0 | i: 8 | j: 5 Result(i: 2.828427 | j: 2.236068 | i
j:
6.324555): 264.600855
ITER: 0 | i: 8 | j: 6 Result(i: 2.828427 | j: 2.449490 | ij:
6.928203): 271.529058
ITER: 0 | i: 8 | j: 7 Result(i: 2.828427 | j: 2.645751 | i
j:
7.483315): 279.012373
ITER: 0 | i: 8 | j: 8 Result(i: 2.828427 | j: 2.828427 | ij:
8.000000): 287.012373
ITER: 0 | i: 8 | j: 9 Result(i: 2.828427 | j: 3.000000 | i
j:
8.485281): 295.497654
ITER: 0 | i: 9 | j: 0 Result(i: 3.000000 | j: 0.000000 | ij:
0.000000): 295.497654
ITER: 0 | i: 9 | j: 1 Result(i: 3.000000 | j: 1.000000 | i
j:
3.000000): 298.497654
ITER: 0 | i: 9 | j: 2 Result(i: 3.000000 | j: 1.414214 | ij:
4.242641): 302.740295
ITER: 0 | i: 9 | j: 3 Result(i: 3.000000 | j: 1.732051 | i
j:
5.196152): 307.936447
ITER: 0 | i: 9 | j: 4 Result(i: 3.000000 | j: 2.000000 | ij:
6.000000): 313.936447
ITER: 0 | i: 9 | j: 5 Result(i: 3.000000 | j: 2.236068 | i
j:
6.708204): 320.644651
ITER: 0 | i: 9 | j: 6 Result(i: 3.000000 | j: 2.449490 | ij:
7.348469): 327.993120
ITER: 0 | i: 9 | j: 7 Result(i: 3.000000 | j: 2.645751 | i
j:
7.937254): 335.930374
ITER: 0 | i: 9 | j: 8 Result(i: 3.000000 | j: 2.828427 | ij:
8.485281): 344.415656
ITER: 0 | i: 9 | j: 9 Result(i: 3.000000 | j: 3.000000 | i
j:
9.000000): 353.415656
Final Result: 353.415656
ITER: 1 | i: 0 | j: 0 Result(i: 0.000000 | j: 0.000000 | ij:
0.000000): 0.000000
ITER: 1 | i: 0 | j: 1 Result(i: 0.000000 | j: 1.000000 | i
j:
0.000000): 0.000000
ITER: 1 | i: 0 | j: 2 Result(i: 0.000000 | j: 1.414214 | ij:
0.000000): 0.000000
ITER: 1 | i: 0 | j: 3 Result(i: 0.000000 | j: 1.732051 | i
j:
0.000000): 0.000000
ITER: 1 | i: 0 | j: 4 Result(i: 0.000000 | j: 2.000000 | ij:
0.000000): 0.000000
ITER: 1 | i: 0 | j: 5 Result(i: 0.000000 | j: 2.236068 | i
j:
0.000000): 0.000000
ITER: 1 | i: 0 | j: 6 Result(i: 0.000000 | j: 2.449490 | ij:
0.000000): 0.000000
ITER: 1 | i: 0 | j: 7 Result(i: 0.000000 | j: 2.645751 | i
j:
0.000000): 0.000000
ITER: 1 | i: 0 | j: 8 Result(i: 0.000000 | j: 2.828427 | ij:
0.000000): 0.000000
ITER: 1 | i: 0 | j: 9 Result(i: 0.000000 | j: 3.000000 | i
j:
0.000000): 0.000000
ITER: 1 | i: 1 | j: 0 Result(i: 1.000000 | j: 0.000000 | ij:
0.000000): 0.000000
ITER: 1 | i: 1 | j: 1 Result(i: 1.000000 | j: 1.000000 | i
j:
1.000000): 1.000000
ITER: 1 | i: 1 | j: 2 Result(i: 1.000000 | j: 1.414214 | ij:
1.414214): 2.414214
ITER: 1 | i: 1 | j: 3 Result(i: 1.000000 | j: 1.732051 | i
j:
1.732051): 4.146264
ITER: 1 | i: 1 | j: 4 Result(i: 1.000000 | j: 2.000000 | ij:
2.000000): 6.146264
ITER: 1 | i: 1 | j: 5 Result(i: 1.000000 | j: 2.236068 | i
j:
2.236068): 8.382332
ITER: 1 | i: 1 | j: 6 Result(i: 1.000000 | j: 2.449490 | ij:
2.449490): 10.831822
ITER: 1 | i: 1 | j: 7 Result(i: 1.000000 | j: 2.645751 | i
j:
2.645751): 13.477573
ITER: 1 | i: 1 | j: 8 Result(i: 1.000000 | j: 2.828427 | ij:
2.828427): 16.306001
ITER: 1 | i: 1 | j: 9 Result(i: 1.000000 | j: 3.000000 | i
j:
3.000000): 19.306001
ITER: 1 | i: 2 | j: 0 Result(i: 1.414214 | j: 0.000000 | ij:
0.000000): 19.306001
ITER: 1 | i: 2 | j: 1 Result(i: 1.414214 | j: 1.000000 | i
j:
1.414214): 20.720214
ITER: 1 | i: 2 | j: 2 Result(i: 1.414214 | j: 1.414214 | ij:
2.000000): 22.720214
ITER: 1 | i: 2 | j: 3 Result(i: 1.414214 | j: 1.732051 | i
j:
2.449490): 25.169704
ITER: 1 | i: 2 | j: 4 Result(i: 1.414214 | j: 2.000000 | ij:
2.828427): 27.998131
ITER: 1 | i: 2 | j: 5 Result(i: 1.414214 | j: 2.236068 | i
j:
3.162278): 31.160409
ITER: 1 | i: 2 | j: 6 Result(i: 1.414214 | j: 2.449490 | ij:
3.464102): 34.624510
ITER: 1 | i: 2 | j: 7 Result(i: 1.414214 | j: 2.645751 | i
j:
3.741657): 38.366168
ITER: 1 | i: 2 | j: 8 Result(i: 1.414214 | j: 2.828427 | ij:
4.000000): 42.366168
ITER: 1 | i: 2 | j: 9 Result(i: 1.414214 | j: 3.000000 | i
j:
4.242641): 46.608808
ITER: 1 | i: 3 | j: 0 Result(i: 1.732051 | j: 0.000000 | ij:
0.000000): 46.608808
ITER: 1 | i: 3 | j: 1 Result(i: 1.732051 | j: 1.000000 | i
j:
1.732051): 48.340859
ITER: 1 | i: 3 | j: 2 Result(i: 1.732051 | j: 1.414214 | ij:
2.449490): 50.790349
ITER: 1 | i: 3 | j: 3 Result(i: 1.732051 | j: 1.732051 | i
j:
3.000000): 53.790349
ITER: 1 | i: 3 | j: 4 Result(i: 1.732051 | j: 2.000000 | ij:
3.464102): 57.254450
ITER: 1 | i: 3 | j: 5 Result(i: 1.732051 | j: 2.236068 | i
j:
3.872983): 61.127434
ITER: 1 | i: 3 | j: 6 Result(i: 1.732051 | j: 2.449490 | ij:
4.242641): 65.370075
ITER: 1 | i: 3 | j: 7 Result(i: 1.732051 | j: 2.645751 | i
j:
4.582576): 69.952650
ITER: 1 | i: 3 | j: 8 Result(i: 1.732051 | j: 2.828427 | ij:
4.898979): 74.851630
ITER: 1 | i: 3 | j: 9 Result(i: 1.732051 | j: 3.000000 | i
j:
5.196152): 80.047782
ITER: 1 | i: 4 | j: 0 Result(i: 2.000000 | j: 0.000000 | ij:
0.000000): 80.047782
ITER: 1 | i: 4 | j: 1 Result(i: 2.000000 | j: 1.000000 | i
j:
2.000000): 82.047782
ITER: 1 | i: 4 | j: 2 Result(i: 2.000000 | j: 1.414214 | ij:
2.828427): 84.876209
ITER: 1 | i: 4 | j: 3 Result(i: 2.000000 | j: 1.732051 | i
j:
3.464102): 88.340311
ITER: 1 | i: 4 | j: 4 Result(i: 2.000000 | j: 2.000000 | ij:
4.000000): 92.340311
ITER: 1 | i: 4 | j: 5 Result(i: 2.000000 | j: 2.236068 | i
j:
4.472136): 96.812447
ITER: 1 | i: 4 | j: 6 Result(i: 2.000000 | j: 2.449490 | ij:
4.898979): 101.711426
ITER: 1 | i: 4 | j: 7 Result(i: 2.000000 | j: 2.645751 | i
j:
5.291503): 107.002929
ITER: 1 | i: 4 | j: 8 Result(i: 2.000000 | j: 2.828427 | ij:
5.656854): 112.659783
ITER: 1 | i: 4 | j: 9 Result(i: 2.000000 | j: 3.000000 | i
j:
6.000000): 118.659783
ITER: 1 | i: 5 | j: 0 Result(i: 2.236068 | j: 0.000000 | ij:
0.000000): 118.659783
ITER: 1 | i: 5 | j: 1 Result(i: 2.236068 | j: 1.000000 | i
j:
2.236068): 120.895851
ITER: 1 | i: 5 | j: 2 Result(i: 2.236068 | j: 1.414214 | ij:
3.162278): 124.058129
ITER: 1 | i: 5 | j: 3 Result(i: 2.236068 | j: 1.732051 | i
j:
3.872983): 127.931112
ITER: 1 | i: 5 | j: 4 Result(i: 2.236068 | j: 2.000000 | ij:
4.472136): 132.403248
ITER: 1 | i: 5 | j: 5 Result(i: 2.236068 | j: 2.236068 | i
j:
5.000000): 137.403248
ITER: 1 | i: 5 | j: 6 Result(i: 2.236068 | j: 2.449490 | ij:
5.477226): 142.880474
ITER: 1 | i: 5 | j: 7 Result(i: 2.236068 | j: 2.645751 | i
j:
5.916080): 148.796553
ITER: 1 | i: 5 | j: 8 Result(i: 2.236068 | j: 2.828427 | ij:
6.324555): 155.121109
ITER: 1 | i: 5 | j: 9 Result(i: 2.236068 | j: 3.000000 | i
j:
6.708204): 161.829313
ITER: 1 | i: 6 | j: 0 Result(i: 2.449490 | j: 0.000000 | ij:
0.000000): 161.829313
ITER: 1 | i: 6 | j: 1 Result(i: 2.449490 | j: 1.000000 | i
j:
2.449490): 164.278802
ITER: 1 | i: 6 | j: 2 Result(i: 2.449490 | j: 1.414214 | ij:
3.464102): 167.742904
ITER: 1 | i: 6 | j: 3 Result(i: 2.449490 | j: 1.732051 | i
j:
4.242641): 171.985545
ITER: 1 | i: 6 | j: 4 Result(i: 2.449490 | j: 2.000000 | ij:
4.898979): 176.884524
ITER: 1 | i: 6 | j: 5 Result(i: 2.449490 | j: 2.236068 | i
j:
5.477226): 182.361750
ITER: 1 | i: 6 | j: 6 Result(i: 2.449490 | j: 2.449490 | ij:
6.000000): 188.361750
ITER: 1 | i: 6 | j: 7 Result(i: 2.449490 | j: 2.645751 | i
j:
6.480741): 194.842491
ITER: 1 | i: 6 | j: 8 Result(i: 2.449490 | j: 2.828427 | ij:
6.928203): 201.770694
ITER: 1 | i: 6 | j: 9 Result(i: 2.449490 | j: 3.000000 | i
j:
7.348469): 209.119163
ITER: 1 | i: 7 | j: 0 Result(i: 2.645751 | j: 0.000000 | ij:
0.000000): 209.119163
ITER: 1 | i: 7 | j: 1 Result(i: 2.645751 | j: 1.000000 | i
j:
2.645751): 211.764914
ITER: 1 | i: 7 | j: 2 Result(i: 2.645751 | j: 1.414214 | ij:
3.741657): 215.506572
ITER: 1 | i: 7 | j: 3 Result(i: 2.645751 | j: 1.732051 | i
j:
4.582576): 220.089147
ITER: 1 | i: 7 | j: 4 Result(i: 2.645751 | j: 2.000000 | ij:
5.291503): 225.380650
ITER: 1 | i: 7 | j: 5 Result(i: 2.645751 | j: 2.236068 | i
j:
5.916080): 231.296730
ITER: 1 | i: 7 | j: 6 Result(i: 2.645751 | j: 2.449490 | ij:
6.480741): 237.777470
ITER: 1 | i: 7 | j: 7 Result(i: 2.645751 | j: 2.645751 | i
j:
7.000000): 244.777470
ITER: 1 | i: 7 | j: 8 Result(i: 2.645751 | j: 2.828427 | ij:
7.483315): 252.260785
ITER: 1 | i: 7 | j: 9 Result(i: 2.645751 | j: 3.000000 | i
j:
7.937254): 260.198039
ITER: 1 | i: 8 | j: 0 Result(i: 2.828427 | j: 0.000000 | ij:
0.000000): 260.198039
ITER: 1 | i: 8 | j: 1 Result(i: 2.828427 | j: 1.000000 | i
j:
2.828427): 263.026466
ITER: 1 | i: 8 | j: 2 Result(i: 2.828427 | j: 1.414214 | ij:
4.000000): 267.026466
ITER: 1 | i: 8 | j: 3 Result(i: 2.828427 | j: 1.732051 | i
j:
4.898979): 271.925446
ITER: 1 | i: 8 | j: 4 Result(i: 2.828427 | j: 2.000000 | ij:
5.656854): 277.582300
ITER: 1 | i: 8 | j: 5 Result(i: 2.828427 | j: 2.236068 | i
j:
6.324555): 283.906855
ITER: 1 | i: 8 | j: 6 Result(i: 2.828427 | j: 2.449490 | ij:
6.928203): 290.835059
ITER: 1 | i: 8 | j: 7 Result(i: 2.828427 | j: 2.645751 | i
j:
7.483315): 298.318373
ITER: 1 | i: 8 | j: 8 Result(i: 2.828427 | j: 2.828427 | ij:
8.000000): 306.318373
ITER: 1 | i: 8 | j: 9 Result(i: 2.828427 | j: 3.000000 | i
j:
8.485281): 314.803655
ITER: 1 | i: 9 | j: 0 Result(i: 3.000000 | j: 0.000000 | ij:
0.000000): 314.803655
ITER: 1 | i: 9 | j: 1 Result(i: 3.000000 | j: 1.000000 | i
j:
3.000000): 317.803655
ITER: 1 | i: 9 | j: 2 Result(i: 3.000000 | j: 1.414214 | ij:
4.242641): 322.046295
ITER: 1 | i: 9 | j: 3 Result(i: 3.000000 | j: 1.732051 | i
j:
5.196152): 327.242448
ITER: 1 | i: 9 | j: 4 Result(i: 3.000000 | j: 2.000000 | ij:
6.000000): 333.242448
ITER: 1 | i: 9 | j: 5 Result(i: 3.000000 | j: 2.236068 | i
j:
6.708204): 339.950652
ITER: 1 | i: 9 | j: 6 Result(i: 3.000000 | j: 2.449490 | ij:
7.348469): 347.299121
ITER: 1 | i: 9 | j: 7 Result(i: 3.000000 | j: 2.645751 | i
j:
7.937254): 355.236375
ITER: 1 | i: 9 | j: 8 Result(i: 3.000000 | j: 2.828427 | ij:
8.485281): 363.721656
ITER: 1 | i: 9 | j: 9 Result(i: 3.000000 | j: 3.000000 | i
j:
9.000000): 372.721656
Final Result: 372.721656

As we can see in the following iterations the sqrt(1) as well as the
result is set to zero for some reason.

ITER: 0 | i: 1 | j: 4 Result(i: 0.000000 | j: 2.000000 | ij:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 5 Result(i: 0.000000 | j: 2.236068 | i
j:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 6 Result(i: 0.000000 | j: 2.449490 | ij:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 7 Result(i: 0.000000 | j: 2.645751 | i
j:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 8 Result(i: 0.000000 | j: 2.828427 | ij:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 9 Result(i: 0.000000 | j: 3.000000 | i
j:
0.000000): 0.000000

Please help me to resolve the accuracy issue! I think that it will be
very useful for gem5 community.

To be noticed, I find the correct simulated tick in which the
application started in FS (using m5 dumpstats), and I start the
--debug-start, but the trace file which is generated is 10x larger
than SE mode for the same application. How can I compare them?

Thank you in advance!
Best regards,
Nikos

Quoting Νικόλαος Ταμπουρατζής ntampouratzis@ece.auth.gr:

Dear Jason,

I am trying to use --debug-start but in FS mode it is very difficult
to find the tick on which the application is started!

However, I am writing the following very simple c++ program:

#include <cmath>
#include <stdio.h>

int main(){

 int dim = 4096;

 double result;

 for (int iter = 0; iter < 2; iter++){
     result = 0;
     for (int i = 0; i < dim; i++){
         for (int j = 0; j < dim; j++){
             result += sqrt(i) * sqrt(j);
         }
     }
     printf("Result: %lf\n", result); //Result: 30530733453.127449
 }

}

I cross-compile it using: riscv64-linux-gnu-g++ -static -O3 -o
test_riscv test_riscv.cpp

While in X86 (without cross-compilation of course), QEMU-RISCV,
GEM5-SE the result is the same (30530733453.127449), in GEM5-FS the
result is different! In addition, the result is also different
between the 2 iterations.

Please reproduce the error if you want in order to verify my result.
Ηow can the issue be resolved?

Thank you in advance!

Best regards,
Nikos

Quoting Jason Lowe-Power jason@lowepower.com:

Hi Nikos,

You can use --debug-start to start the debugging after some number of
ticks. Also, I would expect that the difference should come up quickly,

so

no need to run the program to the end.

For the FS mode one, you will want to just start the trace as the
application starts. This could be a bit of a pain.

I'm not really sure what fundamentally could be different. FS and SE

mode

use the exact same code for executing instructions, so I don't think

that's

the problem. Have you tried running for smaller inputs or just one
iteration?

Jason

On Wed, Sep 21, 2022 at 9:04 AM Νικόλαος Ταμπουρατζής <
ntampouratzis@ece.auth.gr> wrote:

Dear Bobby,

Iam trying to add --debug-flags=Exec (building the gem5 for gem5.opt
not for gem5.fast which I had) but the debug traces exceed the 20GB
(and it is not finished yet) for less than 1 simulated second. How can
I reduce the size of the debug-flags (or set something more specific)?

In contrast I build the HPCG benchmark with DHPCG_DEBUG flag. If you
want, you can compare these two output files
(hpcg20010909T014640_SE_Mode & HPCG-Benchmark_3.1_FS_Mode). As you can
see, something goes wrong with the accuracy of calculations in FS mode
(benchmark uses double precission). You can find the files here:
http://kition.mhl.tuc.gr:8000/d/68d82f3533/

Best regards,
Nikos

Quoting Jason Lowe-Power jason@lowepower.com:

That's quite odd that it works in SE mode but not FS mode!

I would suggest running with --debug-flags=Exec for both and then

perform a

diff to see how they differ.

Cheers,
Jason

On Tue, Sep 20, 2022 at 2:45 PM Νικόλαος Ταμπουρατζής <
ntampouratzis@ece.auth.gr> wrote:

Dear Bobby,

In QEMU I get the same (correct) results that I get in SE mode
simulation. I get invalid results in FS simulation (in both
riscv-fs.py and riscv-ubuntu-run.py). I cannot access real RISCV
hardware at this moment, however, if you want you may execute my

xhpcg

binary (http://kition.mhl.tuc.gr:8000/f/4ca25fdd3c/) with the
following configuration:

./xhpcg --nx=16 --ny=16 --nz=16 --npx=1 --npy=1 --npz=1 --rt=0.1

Please let me know if you have any updates!

Best regards,
Nikos

Quoting Jason Lowe-Power jason@lowepower.com:

Hi Nikos,

I notice you said the following in your original email:

In addition, I used the RISCV Ubuntu image

(

),

I installed the gcc compiler, compile it (through qemu) and I get
wrong results too.

Is this saying you get the wrong results is QEMU? If so, the bug

is in

GCC

or the HPCG workload, not in gem5. If not, I would test in QEMU to

make

sure the binary works there. Another way you could test to see if

the

problem is your binary or gem5 would be to run it on real

hardware. We

have

access to some RISC-V hardware here at UC Davis, if you don't have

access

to it.

Cheers,
Jason

On Tue, Sep 20, 2022 at 12:58 AM Νικόλαος Ταμπουρατζής <
ntampouratzis@ece.auth.gr> wrote:

Dear Bobby,

  1. I use the original riscv-fs.py which is provided in the latest

gem5

release.
I run the gem5 once (./build/RISCV/gem5.fast -d ./HPCG_FS_results
./configs/example/gem5_library/riscv-fs.py) in order to download

the

riscv-bootloader-vmlinux-5.10 and riscv-disk-img.
After this I mount the riscv-disk-img (sudo mount -o loop
riscv-disk-img /mnt), put the xhpcg executable and I do the

following

changes in riscv-fs.py to boot the riscv-disk-img with executable:

image = CustomDiskImageResource(
local_path = "/home/cossim/.cache/gem5/riscv-disk-img",
)

Set the Full System workload.

board.set_kernel_disk_workload(

kernel=Resource("riscv-bootloader-vmlinux-5.10"),

                 disk_image=image,

)

Finally, in the

gem5/src/python/gem5/components/boards/riscv_board.py

I change the last line to "return ["console=ttyS0",
"root={root_value}", "rw"]" in order to allow the write

permissions

in

the image.

  1. The HPCG benchmark after some iterations calculates if the

results

are valid or not valid. In the case of FS it gives invalid

results.

As

I see from the results, one (at least) problem is that produces
different results in each HPCG execution (with the same

configuration).

Here is the HPCG output and riscv-fs.py
(http://kition.mhl.tuc.gr:8000/d/68d82f3533/). You may reproduce

the

results in the video if you use the xhpcg executable
(http://kition.mhl.tuc.gr:8000/f/4ca25fdd3c/)

Please help me in order to solve it!

Finally, I get invalid results in the HPL benchmark in FS mode

too.

Best regards,
Nikos

Quoting Bobby Bruce bbruce@ucdavis.edu:

I'm going to need a bit more information to help:

  1. In what way have you modified
    ./configs/example/gem5_library/riscv-fs.py? Can you attach the

script

here?

  1. What error are you getting or in what way are the results

invalid?

Dr. Bobby R. Bruce
Room 3050,
Kemper Hall, UC Davis
Davis,
CA, 95616

web: https://www.bobbybruce.net

On Mon, Sep 19, 2022 at 1:43 PM Νικόλαος Ταμπουρατζής <
ntampouratzis@ece.auth.gr> wrote:

Dear gem5 community,

I have successfully cross-compile the HPCG benchmark for RISCV

(Serial

version, without MPI and OpenMP). While it working properly in

gem5

SE

mode (./build/RISCV/gem5.fast -d ./HPCG_SE_results
./configs/example/se.py -c xhpcg --options '--nx=16 --ny=16

--nz=16

--npx=1 --npy=1 --npz=1 --rt=0.1'), I get invalid results in FS
simulation using "./build/RISCV/gem5.fast -d ./HPCG_FS_results
./configs/example/gem5_library/riscv-fs.py" (I mount the riscv

image

and put it).

Can you help me please?

In addition, I used the RISCV Ubuntu image
(

),

I installed the gcc compiler, compile it (through qemu) and I

get

wrong results too.

Here is the Makefile which I use, the hpcg executable for RISCV
(xhpcg), and a video that shows the results
(http://kition.mhl.tuc.gr:8000/f/4ca25fdd3c/).

P.S. I use the latest gem5 version.

Thank you in advance! :)

Best regards,
Nikos


gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-leave@gem5.org


gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-leave@gem5.org


gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-leave@gem5.org


gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-leave@gem5.org


gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-leave@gem5.org


gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-leave@gem5.org

Hi, It's quite odd that both sqrt_i and result were zeroed out at the same time. Does the problem appear in other ISA FS mode, e.g. x86 FS mode? Can you show the objdump of the loop as well? Regards, Hoa Nguyen On Thu, Oct 6, 2022, 04:06 Νικόλαος Ταμπουρατζής <ntampouratzis@ece.auth.gr> wrote: > Dear Jason, all, > > I am trying to find the accuracy problem with RISCV-FS and I observe > that the problem is created (at least in my dummy example) because the > variables (double) are set to zero in random simulated time (for this > reason I get different results among executions of the same code). > Specifically for the following dummy code: > > > #include <cmath> > #include <stdio.h> > > int main(){ > > int dim = 10; > > float result; > > for (int iter = 0; iter < 2; iter++){ > result = 0; > for (int i = 0; i < dim; i++){ > for (int j = 0; j < dim; j++){ > float sq_i = sqrt(i); > float sq_j = sqrt(j); > result += sq_i * sq_j; > printf("ITER: %d | i: %d | j: %d Result(i: %f | j: %f > | i*j: %f): %f\n", iter, i , j, sq_i, sq_j, sq_i * sq_j, result); > } > } > printf("Final Result: %lf\n", result); > } > } > > > The correct Final Result in both iterations is 372.721656. However, I > get the following results in FS: > > ITER: 0 | i: 0 | j: 0 Result(i: 0.000000 | j: 0.000000 | i*j: > 0.000000): 0.000000 > ITER: 0 | i: 0 | j: 1 Result(i: 0.000000 | j: 1.000000 | i*j: > 0.000000): 0.000000 > ITER: 0 | i: 0 | j: 2 Result(i: 0.000000 | j: 1.414214 | i*j: > 0.000000): 0.000000 > ITER: 0 | i: 0 | j: 3 Result(i: 0.000000 | j: 1.732051 | i*j: > 0.000000): 0.000000 > ITER: 0 | i: 0 | j: 4 Result(i: 0.000000 | j: 2.000000 | i*j: > 0.000000): 0.000000 > ITER: 0 | i: 0 | j: 5 Result(i: 0.000000 | j: 2.236068 | i*j: > 0.000000): 0.000000 > ITER: 0 | i: 0 | j: 6 Result(i: 0.000000 | j: 2.449490 | i*j: > 0.000000): 0.000000 > ITER: 0 | i: 0 | j: 7 Result(i: 0.000000 | j: 2.645751 | i*j: > 0.000000): 0.000000 > ITER: 0 | i: 0 | j: 8 Result(i: 0.000000 | j: 2.828427 | i*j: > 0.000000): 0.000000 > ITER: 0 | i: 0 | j: 9 Result(i: 0.000000 | j: 3.000000 | i*j: > 0.000000): 0.000000 > ITER: 0 | i: 1 | j: 0 Result(i: 1.000000 | j: 0.000000 | i*j: > 0.000000): 0.000000 > ITER: 0 | i: 1 | j: 1 Result(i: 1.000000 | j: 1.000000 | i*j: > 1.000000): 1.000000 > ITER: 0 | i: 1 | j: 2 Result(i: 1.000000 | j: 1.414214 | i*j: > 1.414214): 2.414214 > ITER: 0 | i: 1 | j: 3 Result(i: 1.000000 | j: 1.732051 | i*j: > 1.732051): 4.146264 > ITER: 0 | i: 1 | j: 4 Result(i: 0.000000 | j: 2.000000 | i*j: > 0.000000): 0.000000 > ITER: 0 | i: 1 | j: 5 Result(i: 0.000000 | j: 2.236068 | i*j: > 0.000000): 0.000000 > ITER: 0 | i: 1 | j: 6 Result(i: 0.000000 | j: 2.449490 | i*j: > 0.000000): 0.000000 > ITER: 0 | i: 1 | j: 7 Result(i: 0.000000 | j: 2.645751 | i*j: > 0.000000): 0.000000 > ITER: 0 | i: 1 | j: 8 Result(i: 0.000000 | j: 2.828427 | i*j: > 0.000000): 0.000000 > ITER: 0 | i: 1 | j: 9 Result(i: 0.000000 | j: 3.000000 | i*j: > 0.000000): 0.000000 > ITER: 0 | i: 2 | j: 0 Result(i: 1.414214 | j: 0.000000 | i*j: > 0.000000): 0.000000 > ITER: 0 | i: 2 | j: 1 Result(i: 1.414214 | j: 1.000000 | i*j: > 1.414214): 1.414214 > ITER: 0 | i: 2 | j: 2 Result(i: 1.414214 | j: 1.414214 | i*j: > 2.000000): 3.414214 > ITER: 0 | i: 2 | j: 3 Result(i: 1.414214 | j: 1.732051 | i*j: > 2.449490): 5.863703 > ITER: 0 | i: 2 | j: 4 Result(i: 1.414214 | j: 2.000000 | i*j: > 2.828427): 8.692130 > ITER: 0 | i: 2 | j: 5 Result(i: 1.414214 | j: 2.236068 | i*j: > 3.162278): 11.854408 > ITER: 0 | i: 2 | j: 6 Result(i: 1.414214 | j: 2.449490 | i*j: > 3.464102): 15.318510 > ITER: 0 | i: 2 | j: 7 Result(i: 1.414214 | j: 2.645751 | i*j: > 3.741657): 19.060167 > ITER: 0 | i: 2 | j: 8 Result(i: 1.414214 | j: 2.828427 | i*j: > 4.000000): 23.060167 > ITER: 0 | i: 2 | j: 9 Result(i: 1.414214 | j: 3.000000 | i*j: > 4.242641): 27.302808 > ITER: 0 | i: 3 | j: 0 Result(i: 1.732051 | j: 0.000000 | i*j: > 0.000000): 27.302808 > ITER: 0 | i: 3 | j: 1 Result(i: 1.732051 | j: 1.000000 | i*j: > 1.732051): 29.034859 > ITER: 0 | i: 3 | j: 2 Result(i: 1.732051 | j: 1.414214 | i*j: > 2.449490): 31.484348 > ITER: 0 | i: 3 | j: 3 Result(i: 1.732051 | j: 1.732051 | i*j: > 3.000000): 34.484348 > ITER: 0 | i: 3 | j: 4 Result(i: 1.732051 | j: 2.000000 | i*j: > 3.464102): 37.948450 > ITER: 0 | i: 3 | j: 5 Result(i: 1.732051 | j: 2.236068 | i*j: > 3.872983): 41.821433 > ITER: 0 | i: 3 | j: 6 Result(i: 1.732051 | j: 2.449490 | i*j: > 4.242641): 46.064074 > ITER: 0 | i: 3 | j: 7 Result(i: 1.732051 | j: 2.645751 | i*j: > 4.582576): 50.646650 > ITER: 0 | i: 3 | j: 8 Result(i: 1.732051 | j: 2.828427 | i*j: > 4.898979): 55.545629 > ITER: 0 | i: 3 | j: 9 Result(i: 1.732051 | j: 3.000000 | i*j: > 5.196152): 60.741782 > ITER: 0 | i: 4 | j: 0 Result(i: 2.000000 | j: 0.000000 | i*j: > 0.000000): 60.741782 > ITER: 0 | i: 4 | j: 1 Result(i: 2.000000 | j: 1.000000 | i*j: > 2.000000): 62.741782 > ITER: 0 | i: 4 | j: 2 Result(i: 2.000000 | j: 1.414214 | i*j: > 2.828427): 65.570209 > ITER: 0 | i: 4 | j: 3 Result(i: 2.000000 | j: 1.732051 | i*j: > 3.464102): 69.034310 > ITER: 0 | i: 4 | j: 4 Result(i: 2.000000 | j: 2.000000 | i*j: > 4.000000): 73.034310 > ITER: 0 | i: 4 | j: 5 Result(i: 2.000000 | j: 2.236068 | i*j: > 4.472136): 77.506446 > ITER: 0 | i: 4 | j: 6 Result(i: 2.000000 | j: 2.449490 | i*j: > 4.898979): 82.405426 > ITER: 0 | i: 4 | j: 7 Result(i: 2.000000 | j: 2.645751 | i*j: > 5.291503): 87.696928 > ITER: 0 | i: 4 | j: 8 Result(i: 2.000000 | j: 2.828427 | i*j: > 5.656854): 93.353783 > ITER: 0 | i: 4 | j: 9 Result(i: 2.000000 | j: 3.000000 | i*j: > 6.000000): 99.353783 > ITER: 0 | i: 5 | j: 0 Result(i: 2.236068 | j: 0.000000 | i*j: > 0.000000): 99.353783 > ITER: 0 | i: 5 | j: 1 Result(i: 2.236068 | j: 1.000000 | i*j: > 2.236068): 101.589851 > ITER: 0 | i: 5 | j: 2 Result(i: 2.236068 | j: 1.414214 | i*j: > 3.162278): 104.752128 > ITER: 0 | i: 5 | j: 3 Result(i: 2.236068 | j: 1.732051 | i*j: > 3.872983): 108.625112 > ITER: 0 | i: 5 | j: 4 Result(i: 2.236068 | j: 2.000000 | i*j: > 4.472136): 113.097248 > ITER: 0 | i: 5 | j: 5 Result(i: 2.236068 | j: 2.236068 | i*j: > 5.000000): 118.097248 > ITER: 0 | i: 5 | j: 6 Result(i: 2.236068 | j: 2.449490 | i*j: > 5.477226): 123.574473 > ITER: 0 | i: 5 | j: 7 Result(i: 2.236068 | j: 2.645751 | i*j: > 5.916080): 129.490553 > ITER: 0 | i: 5 | j: 8 Result(i: 2.236068 | j: 2.828427 | i*j: > 6.324555): 135.815108 > ITER: 0 | i: 5 | j: 9 Result(i: 2.236068 | j: 3.000000 | i*j: > 6.708204): 142.523312 > ITER: 0 | i: 6 | j: 0 Result(i: 2.449490 | j: 0.000000 | i*j: > 0.000000): 142.523312 > ITER: 0 | i: 6 | j: 1 Result(i: 2.449490 | j: 1.000000 | i*j: > 2.449490): 144.972802 > ITER: 0 | i: 6 | j: 2 Result(i: 2.449490 | j: 1.414214 | i*j: > 3.464102): 148.436904 > ITER: 0 | i: 6 | j: 3 Result(i: 2.449490 | j: 1.732051 | i*j: > 4.242641): 152.679544 > ITER: 0 | i: 6 | j: 4 Result(i: 2.449490 | j: 2.000000 | i*j: > 4.898979): 157.578524 > ITER: 0 | i: 6 | j: 5 Result(i: 2.449490 | j: 2.236068 | i*j: > 5.477226): 163.055749 > ITER: 0 | i: 6 | j: 6 Result(i: 2.449490 | j: 2.449490 | i*j: > 6.000000): 169.055749 > ITER: 0 | i: 6 | j: 7 Result(i: 2.449490 | j: 2.645751 | i*j: > 6.480741): 175.536490 > ITER: 0 | i: 6 | j: 8 Result(i: 2.449490 | j: 2.828427 | i*j: > 6.928203): 182.464693 > ITER: 0 | i: 6 | j: 9 Result(i: 2.449490 | j: 3.000000 | i*j: > 7.348469): 189.813162 > ITER: 0 | i: 7 | j: 0 Result(i: 2.645751 | j: 0.000000 | i*j: > 0.000000): 189.813162 > ITER: 0 | i: 7 | j: 1 Result(i: 2.645751 | j: 1.000000 | i*j: > 2.645751): 192.458914 > ITER: 0 | i: 7 | j: 2 Result(i: 2.645751 | j: 1.414214 | i*j: > 3.741657): 196.200571 > ITER: 0 | i: 7 | j: 3 Result(i: 2.645751 | j: 1.732051 | i*j: > 4.582576): 200.783147 > ITER: 0 | i: 7 | j: 4 Result(i: 2.645751 | j: 2.000000 | i*j: > 5.291503): 206.074649 > ITER: 0 | i: 7 | j: 5 Result(i: 2.645751 | j: 2.236068 | i*j: > 5.916080): 211.990729 > ITER: 0 | i: 7 | j: 6 Result(i: 2.645751 | j: 2.449490 | i*j: > 6.480741): 218.471470 > ITER: 0 | i: 7 | j: 7 Result(i: 2.645751 | j: 2.645751 | i*j: > 7.000000): 225.471470 > ITER: 0 | i: 7 | j: 8 Result(i: 2.645751 | j: 2.828427 | i*j: > 7.483315): 232.954785 > ITER: 0 | i: 7 | j: 9 Result(i: 2.645751 | j: 3.000000 | i*j: > 7.937254): 240.892039 > ITER: 0 | i: 8 | j: 0 Result(i: 2.828427 | j: 0.000000 | i*j: > 0.000000): 240.892039 > ITER: 0 | i: 8 | j: 1 Result(i: 2.828427 | j: 1.000000 | i*j: > 2.828427): 243.720466 > ITER: 0 | i: 8 | j: 2 Result(i: 2.828427 | j: 1.414214 | i*j: > 4.000000): 247.720466 > ITER: 0 | i: 8 | j: 3 Result(i: 2.828427 | j: 1.732051 | i*j: > 4.898979): 252.619445 > ITER: 0 | i: 8 | j: 4 Result(i: 2.828427 | j: 2.000000 | i*j: > 5.656854): 258.276300 > ITER: 0 | i: 8 | j: 5 Result(i: 2.828427 | j: 2.236068 | i*j: > 6.324555): 264.600855 > ITER: 0 | i: 8 | j: 6 Result(i: 2.828427 | j: 2.449490 | i*j: > 6.928203): 271.529058 > ITER: 0 | i: 8 | j: 7 Result(i: 2.828427 | j: 2.645751 | i*j: > 7.483315): 279.012373 > ITER: 0 | i: 8 | j: 8 Result(i: 2.828427 | j: 2.828427 | i*j: > 8.000000): 287.012373 > ITER: 0 | i: 8 | j: 9 Result(i: 2.828427 | j: 3.000000 | i*j: > 8.485281): 295.497654 > ITER: 0 | i: 9 | j: 0 Result(i: 3.000000 | j: 0.000000 | i*j: > 0.000000): 295.497654 > ITER: 0 | i: 9 | j: 1 Result(i: 3.000000 | j: 1.000000 | i*j: > 3.000000): 298.497654 > ITER: 0 | i: 9 | j: 2 Result(i: 3.000000 | j: 1.414214 | i*j: > 4.242641): 302.740295 > ITER: 0 | i: 9 | j: 3 Result(i: 3.000000 | j: 1.732051 | i*j: > 5.196152): 307.936447 > ITER: 0 | i: 9 | j: 4 Result(i: 3.000000 | j: 2.000000 | i*j: > 6.000000): 313.936447 > ITER: 0 | i: 9 | j: 5 Result(i: 3.000000 | j: 2.236068 | i*j: > 6.708204): 320.644651 > ITER: 0 | i: 9 | j: 6 Result(i: 3.000000 | j: 2.449490 | i*j: > 7.348469): 327.993120 > ITER: 0 | i: 9 | j: 7 Result(i: 3.000000 | j: 2.645751 | i*j: > 7.937254): 335.930374 > ITER: 0 | i: 9 | j: 8 Result(i: 3.000000 | j: 2.828427 | i*j: > 8.485281): 344.415656 > ITER: 0 | i: 9 | j: 9 Result(i: 3.000000 | j: 3.000000 | i*j: > 9.000000): 353.415656 > Final Result: 353.415656 > ITER: 1 | i: 0 | j: 0 Result(i: 0.000000 | j: 0.000000 | i*j: > 0.000000): 0.000000 > ITER: 1 | i: 0 | j: 1 Result(i: 0.000000 | j: 1.000000 | i*j: > 0.000000): 0.000000 > ITER: 1 | i: 0 | j: 2 Result(i: 0.000000 | j: 1.414214 | i*j: > 0.000000): 0.000000 > ITER: 1 | i: 0 | j: 3 Result(i: 0.000000 | j: 1.732051 | i*j: > 0.000000): 0.000000 > ITER: 1 | i: 0 | j: 4 Result(i: 0.000000 | j: 2.000000 | i*j: > 0.000000): 0.000000 > ITER: 1 | i: 0 | j: 5 Result(i: 0.000000 | j: 2.236068 | i*j: > 0.000000): 0.000000 > ITER: 1 | i: 0 | j: 6 Result(i: 0.000000 | j: 2.449490 | i*j: > 0.000000): 0.000000 > ITER: 1 | i: 0 | j: 7 Result(i: 0.000000 | j: 2.645751 | i*j: > 0.000000): 0.000000 > ITER: 1 | i: 0 | j: 8 Result(i: 0.000000 | j: 2.828427 | i*j: > 0.000000): 0.000000 > ITER: 1 | i: 0 | j: 9 Result(i: 0.000000 | j: 3.000000 | i*j: > 0.000000): 0.000000 > ITER: 1 | i: 1 | j: 0 Result(i: 1.000000 | j: 0.000000 | i*j: > 0.000000): 0.000000 > ITER: 1 | i: 1 | j: 1 Result(i: 1.000000 | j: 1.000000 | i*j: > 1.000000): 1.000000 > ITER: 1 | i: 1 | j: 2 Result(i: 1.000000 | j: 1.414214 | i*j: > 1.414214): 2.414214 > ITER: 1 | i: 1 | j: 3 Result(i: 1.000000 | j: 1.732051 | i*j: > 1.732051): 4.146264 > ITER: 1 | i: 1 | j: 4 Result(i: 1.000000 | j: 2.000000 | i*j: > 2.000000): 6.146264 > ITER: 1 | i: 1 | j: 5 Result(i: 1.000000 | j: 2.236068 | i*j: > 2.236068): 8.382332 > ITER: 1 | i: 1 | j: 6 Result(i: 1.000000 | j: 2.449490 | i*j: > 2.449490): 10.831822 > ITER: 1 | i: 1 | j: 7 Result(i: 1.000000 | j: 2.645751 | i*j: > 2.645751): 13.477573 > ITER: 1 | i: 1 | j: 8 Result(i: 1.000000 | j: 2.828427 | i*j: > 2.828427): 16.306001 > ITER: 1 | i: 1 | j: 9 Result(i: 1.000000 | j: 3.000000 | i*j: > 3.000000): 19.306001 > ITER: 1 | i: 2 | j: 0 Result(i: 1.414214 | j: 0.000000 | i*j: > 0.000000): 19.306001 > ITER: 1 | i: 2 | j: 1 Result(i: 1.414214 | j: 1.000000 | i*j: > 1.414214): 20.720214 > ITER: 1 | i: 2 | j: 2 Result(i: 1.414214 | j: 1.414214 | i*j: > 2.000000): 22.720214 > ITER: 1 | i: 2 | j: 3 Result(i: 1.414214 | j: 1.732051 | i*j: > 2.449490): 25.169704 > ITER: 1 | i: 2 | j: 4 Result(i: 1.414214 | j: 2.000000 | i*j: > 2.828427): 27.998131 > ITER: 1 | i: 2 | j: 5 Result(i: 1.414214 | j: 2.236068 | i*j: > 3.162278): 31.160409 > ITER: 1 | i: 2 | j: 6 Result(i: 1.414214 | j: 2.449490 | i*j: > 3.464102): 34.624510 > ITER: 1 | i: 2 | j: 7 Result(i: 1.414214 | j: 2.645751 | i*j: > 3.741657): 38.366168 > ITER: 1 | i: 2 | j: 8 Result(i: 1.414214 | j: 2.828427 | i*j: > 4.000000): 42.366168 > ITER: 1 | i: 2 | j: 9 Result(i: 1.414214 | j: 3.000000 | i*j: > 4.242641): 46.608808 > ITER: 1 | i: 3 | j: 0 Result(i: 1.732051 | j: 0.000000 | i*j: > 0.000000): 46.608808 > ITER: 1 | i: 3 | j: 1 Result(i: 1.732051 | j: 1.000000 | i*j: > 1.732051): 48.340859 > ITER: 1 | i: 3 | j: 2 Result(i: 1.732051 | j: 1.414214 | i*j: > 2.449490): 50.790349 > ITER: 1 | i: 3 | j: 3 Result(i: 1.732051 | j: 1.732051 | i*j: > 3.000000): 53.790349 > ITER: 1 | i: 3 | j: 4 Result(i: 1.732051 | j: 2.000000 | i*j: > 3.464102): 57.254450 > ITER: 1 | i: 3 | j: 5 Result(i: 1.732051 | j: 2.236068 | i*j: > 3.872983): 61.127434 > ITER: 1 | i: 3 | j: 6 Result(i: 1.732051 | j: 2.449490 | i*j: > 4.242641): 65.370075 > ITER: 1 | i: 3 | j: 7 Result(i: 1.732051 | j: 2.645751 | i*j: > 4.582576): 69.952650 > ITER: 1 | i: 3 | j: 8 Result(i: 1.732051 | j: 2.828427 | i*j: > 4.898979): 74.851630 > ITER: 1 | i: 3 | j: 9 Result(i: 1.732051 | j: 3.000000 | i*j: > 5.196152): 80.047782 > ITER: 1 | i: 4 | j: 0 Result(i: 2.000000 | j: 0.000000 | i*j: > 0.000000): 80.047782 > ITER: 1 | i: 4 | j: 1 Result(i: 2.000000 | j: 1.000000 | i*j: > 2.000000): 82.047782 > ITER: 1 | i: 4 | j: 2 Result(i: 2.000000 | j: 1.414214 | i*j: > 2.828427): 84.876209 > ITER: 1 | i: 4 | j: 3 Result(i: 2.000000 | j: 1.732051 | i*j: > 3.464102): 88.340311 > ITER: 1 | i: 4 | j: 4 Result(i: 2.000000 | j: 2.000000 | i*j: > 4.000000): 92.340311 > ITER: 1 | i: 4 | j: 5 Result(i: 2.000000 | j: 2.236068 | i*j: > 4.472136): 96.812447 > ITER: 1 | i: 4 | j: 6 Result(i: 2.000000 | j: 2.449490 | i*j: > 4.898979): 101.711426 > ITER: 1 | i: 4 | j: 7 Result(i: 2.000000 | j: 2.645751 | i*j: > 5.291503): 107.002929 > ITER: 1 | i: 4 | j: 8 Result(i: 2.000000 | j: 2.828427 | i*j: > 5.656854): 112.659783 > ITER: 1 | i: 4 | j: 9 Result(i: 2.000000 | j: 3.000000 | i*j: > 6.000000): 118.659783 > ITER: 1 | i: 5 | j: 0 Result(i: 2.236068 | j: 0.000000 | i*j: > 0.000000): 118.659783 > ITER: 1 | i: 5 | j: 1 Result(i: 2.236068 | j: 1.000000 | i*j: > 2.236068): 120.895851 > ITER: 1 | i: 5 | j: 2 Result(i: 2.236068 | j: 1.414214 | i*j: > 3.162278): 124.058129 > ITER: 1 | i: 5 | j: 3 Result(i: 2.236068 | j: 1.732051 | i*j: > 3.872983): 127.931112 > ITER: 1 | i: 5 | j: 4 Result(i: 2.236068 | j: 2.000000 | i*j: > 4.472136): 132.403248 > ITER: 1 | i: 5 | j: 5 Result(i: 2.236068 | j: 2.236068 | i*j: > 5.000000): 137.403248 > ITER: 1 | i: 5 | j: 6 Result(i: 2.236068 | j: 2.449490 | i*j: > 5.477226): 142.880474 > ITER: 1 | i: 5 | j: 7 Result(i: 2.236068 | j: 2.645751 | i*j: > 5.916080): 148.796553 > ITER: 1 | i: 5 | j: 8 Result(i: 2.236068 | j: 2.828427 | i*j: > 6.324555): 155.121109 > ITER: 1 | i: 5 | j: 9 Result(i: 2.236068 | j: 3.000000 | i*j: > 6.708204): 161.829313 > ITER: 1 | i: 6 | j: 0 Result(i: 2.449490 | j: 0.000000 | i*j: > 0.000000): 161.829313 > ITER: 1 | i: 6 | j: 1 Result(i: 2.449490 | j: 1.000000 | i*j: > 2.449490): 164.278802 > ITER: 1 | i: 6 | j: 2 Result(i: 2.449490 | j: 1.414214 | i*j: > 3.464102): 167.742904 > ITER: 1 | i: 6 | j: 3 Result(i: 2.449490 | j: 1.732051 | i*j: > 4.242641): 171.985545 > ITER: 1 | i: 6 | j: 4 Result(i: 2.449490 | j: 2.000000 | i*j: > 4.898979): 176.884524 > ITER: 1 | i: 6 | j: 5 Result(i: 2.449490 | j: 2.236068 | i*j: > 5.477226): 182.361750 > ITER: 1 | i: 6 | j: 6 Result(i: 2.449490 | j: 2.449490 | i*j: > 6.000000): 188.361750 > ITER: 1 | i: 6 | j: 7 Result(i: 2.449490 | j: 2.645751 | i*j: > 6.480741): 194.842491 > ITER: 1 | i: 6 | j: 8 Result(i: 2.449490 | j: 2.828427 | i*j: > 6.928203): 201.770694 > ITER: 1 | i: 6 | j: 9 Result(i: 2.449490 | j: 3.000000 | i*j: > 7.348469): 209.119163 > ITER: 1 | i: 7 | j: 0 Result(i: 2.645751 | j: 0.000000 | i*j: > 0.000000): 209.119163 > ITER: 1 | i: 7 | j: 1 Result(i: 2.645751 | j: 1.000000 | i*j: > 2.645751): 211.764914 > ITER: 1 | i: 7 | j: 2 Result(i: 2.645751 | j: 1.414214 | i*j: > 3.741657): 215.506572 > ITER: 1 | i: 7 | j: 3 Result(i: 2.645751 | j: 1.732051 | i*j: > 4.582576): 220.089147 > ITER: 1 | i: 7 | j: 4 Result(i: 2.645751 | j: 2.000000 | i*j: > 5.291503): 225.380650 > ITER: 1 | i: 7 | j: 5 Result(i: 2.645751 | j: 2.236068 | i*j: > 5.916080): 231.296730 > ITER: 1 | i: 7 | j: 6 Result(i: 2.645751 | j: 2.449490 | i*j: > 6.480741): 237.777470 > ITER: 1 | i: 7 | j: 7 Result(i: 2.645751 | j: 2.645751 | i*j: > 7.000000): 244.777470 > ITER: 1 | i: 7 | j: 8 Result(i: 2.645751 | j: 2.828427 | i*j: > 7.483315): 252.260785 > ITER: 1 | i: 7 | j: 9 Result(i: 2.645751 | j: 3.000000 | i*j: > 7.937254): 260.198039 > ITER: 1 | i: 8 | j: 0 Result(i: 2.828427 | j: 0.000000 | i*j: > 0.000000): 260.198039 > ITER: 1 | i: 8 | j: 1 Result(i: 2.828427 | j: 1.000000 | i*j: > 2.828427): 263.026466 > ITER: 1 | i: 8 | j: 2 Result(i: 2.828427 | j: 1.414214 | i*j: > 4.000000): 267.026466 > ITER: 1 | i: 8 | j: 3 Result(i: 2.828427 | j: 1.732051 | i*j: > 4.898979): 271.925446 > ITER: 1 | i: 8 | j: 4 Result(i: 2.828427 | j: 2.000000 | i*j: > 5.656854): 277.582300 > ITER: 1 | i: 8 | j: 5 Result(i: 2.828427 | j: 2.236068 | i*j: > 6.324555): 283.906855 > ITER: 1 | i: 8 | j: 6 Result(i: 2.828427 | j: 2.449490 | i*j: > 6.928203): 290.835059 > ITER: 1 | i: 8 | j: 7 Result(i: 2.828427 | j: 2.645751 | i*j: > 7.483315): 298.318373 > ITER: 1 | i: 8 | j: 8 Result(i: 2.828427 | j: 2.828427 | i*j: > 8.000000): 306.318373 > ITER: 1 | i: 8 | j: 9 Result(i: 2.828427 | j: 3.000000 | i*j: > 8.485281): 314.803655 > ITER: 1 | i: 9 | j: 0 Result(i: 3.000000 | j: 0.000000 | i*j: > 0.000000): 314.803655 > ITER: 1 | i: 9 | j: 1 Result(i: 3.000000 | j: 1.000000 | i*j: > 3.000000): 317.803655 > ITER: 1 | i: 9 | j: 2 Result(i: 3.000000 | j: 1.414214 | i*j: > 4.242641): 322.046295 > ITER: 1 | i: 9 | j: 3 Result(i: 3.000000 | j: 1.732051 | i*j: > 5.196152): 327.242448 > ITER: 1 | i: 9 | j: 4 Result(i: 3.000000 | j: 2.000000 | i*j: > 6.000000): 333.242448 > ITER: 1 | i: 9 | j: 5 Result(i: 3.000000 | j: 2.236068 | i*j: > 6.708204): 339.950652 > ITER: 1 | i: 9 | j: 6 Result(i: 3.000000 | j: 2.449490 | i*j: > 7.348469): 347.299121 > ITER: 1 | i: 9 | j: 7 Result(i: 3.000000 | j: 2.645751 | i*j: > 7.937254): 355.236375 > ITER: 1 | i: 9 | j: 8 Result(i: 3.000000 | j: 2.828427 | i*j: > 8.485281): 363.721656 > ITER: 1 | i: 9 | j: 9 Result(i: 3.000000 | j: 3.000000 | i*j: > 9.000000): 372.721656 > Final Result: 372.721656 > > > > As we can see in the following iterations the sqrt(1) as well as the > result is set to zero for some reason. > > ITER: 0 | i: 1 | j: 4 Result(i: 0.000000 | j: 2.000000 | i*j: > 0.000000): 0.000000 > ITER: 0 | i: 1 | j: 5 Result(i: 0.000000 | j: 2.236068 | i*j: > 0.000000): 0.000000 > ITER: 0 | i: 1 | j: 6 Result(i: 0.000000 | j: 2.449490 | i*j: > 0.000000): 0.000000 > ITER: 0 | i: 1 | j: 7 Result(i: 0.000000 | j: 2.645751 | i*j: > 0.000000): 0.000000 > ITER: 0 | i: 1 | j: 8 Result(i: 0.000000 | j: 2.828427 | i*j: > 0.000000): 0.000000 > ITER: 0 | i: 1 | j: 9 Result(i: 0.000000 | j: 3.000000 | i*j: > 0.000000): 0.000000 > > Please help me to resolve the accuracy issue! I think that it will be > very useful for gem5 community. > > To be noticed, I find the correct simulated tick in which the > application started in FS (using m5 dumpstats), and I start the > --debug-start, but the trace file which is generated is 10x larger > than SE mode for the same application. How can I compare them? > > Thank you in advance! > Best regards, > Nikos > > Quoting Νικόλαος Ταμπουρατζής <ntampouratzis@ece.auth.gr>: > > > Dear Jason, > > > > I am trying to use --debug-start but in FS mode it is very difficult > > to find the tick on which the application is started! > > > > However, I am writing the following very simple c++ program: > > > > #include <cmath> > > #include <stdio.h> > > > > int main(){ > > > > int dim = 4096; > > > > double result; > > > > for (int iter = 0; iter < 2; iter++){ > > result = 0; > > for (int i = 0; i < dim; i++){ > > for (int j = 0; j < dim; j++){ > > result += sqrt(i) * sqrt(j); > > } > > } > > printf("Result: %lf\n", result); //Result: 30530733453.127449 > > } > > } > > > > I cross-compile it using: riscv64-linux-gnu-g++ -static -O3 -o > > test_riscv test_riscv.cpp > > > > > > While in X86 (without cross-compilation of course), QEMU-RISCV, > > GEM5-SE the result is the same (30530733453.127449), in GEM5-FS the > > result is different! In addition, the result is also different > > between the 2 iterations. > > > > Please reproduce the error if you want in order to verify my result. > > Ηow can the issue be resolved? > > > > Thank you in advance! > > > > Best regards, > > Nikos > > > > > > Quoting Jason Lowe-Power <jason@lowepower.com>: > > > >> Hi Nikos, > >> > >> You can use --debug-start to start the debugging after some number of > >> ticks. Also, I would expect that the difference should come up quickly, > so > >> no need to run the program to the end. > >> > >> For the FS mode one, you will want to just start the trace as the > >> application starts. This could be a bit of a pain. > >> > >> I'm not really sure what fundamentally could be different. FS and SE > mode > >> use the exact same code for executing instructions, so I don't think > that's > >> the problem. Have you tried running for smaller inputs or just one > >> iteration? > >> > >> Jason > >> > >> > >> > >> On Wed, Sep 21, 2022 at 9:04 AM Νικόλαος Ταμπουρατζής < > >> ntampouratzis@ece.auth.gr> wrote: > >> > >>> Dear Bobby, > >>> > >>> Iam trying to add --debug-flags=Exec (building the gem5 for gem5.opt > >>> not for gem5.fast which I had) but the debug traces exceed the 20GB > >>> (and it is not finished yet) for less than 1 simulated second. How can > >>> I reduce the size of the debug-flags (or set something more specific)? > >>> > >>> In contrast I build the HPCG benchmark with DHPCG_DEBUG flag. If you > >>> want, you can compare these two output files > >>> (hpcg20010909T014640_SE_Mode & HPCG-Benchmark_3.1_FS_Mode). As you can > >>> see, something goes wrong with the accuracy of calculations in FS mode > >>> (benchmark uses double precission). You can find the files here: > >>> http://kition.mhl.tuc.gr:8000/d/68d82f3533/ > >>> > >>> Best regards, > >>> Nikos > >>> > >>> Quoting Jason Lowe-Power <jason@lowepower.com>: > >>> > >>>> That's quite odd that it works in SE mode but not FS mode! > >>>> > >>>> I would suggest running with --debug-flags=Exec for both and then > >>> perform a > >>>> diff to see how they differ. > >>>> > >>>> Cheers, > >>>> Jason > >>>> > >>>> On Tue, Sep 20, 2022 at 2:45 PM Νικόλαος Ταμπουρατζής < > >>>> ntampouratzis@ece.auth.gr> wrote: > >>>> > >>>>> Dear Bobby, > >>>>> > >>>>> In QEMU I get the same (correct) results that I get in SE mode > >>>>> simulation. I get invalid results in FS simulation (in both > >>>>> riscv-fs.py and riscv-ubuntu-run.py). I cannot access real RISCV > >>>>> hardware at this moment, however, if you want you may execute my > xhpcg > >>>>> binary (http://kition.mhl.tuc.gr:8000/f/4ca25fdd3c/) with the > >>>>> following configuration: > >>>>> > >>>>> ./xhpcg --nx=16 --ny=16 --nz=16 --npx=1 --npy=1 --npz=1 --rt=0.1 > >>>>> > >>>>> Please let me know if you have any updates! > >>>>> > >>>>> Best regards, > >>>>> Nikos > >>>>> > >>>>> > >>>>> Quoting Jason Lowe-Power <jason@lowepower.com>: > >>>>> > >>>>> > Hi Nikos, > >>>>> > > >>>>> > I notice you said the following in your original email: > >>>>> > > >>>>> > In addition, I used the RISCV Ubuntu image > >>>>> >> ( > https://github.com/gem5/gem5-resources/tree/stable/src/riscv-ubuntu > >>> ), > >>>>> >> I installed the gcc compiler, compile it (through qemu) and I get > >>>>> >> wrong results too. > >>>>> > > >>>>> > > >>>>> > Is this saying you get the wrong results is QEMU? If so, the bug > is in > >>>>> GCC > >>>>> > or the HPCG workload, not in gem5. If not, I would test in QEMU to > >>> make > >>>>> > sure the binary works there. Another way you could test to see if > the > >>>>> > problem is your binary or gem5 would be to run it on real > hardware. We > >>>>> have > >>>>> > access to some RISC-V hardware here at UC Davis, if you don't have > >>> access > >>>>> > to it. > >>>>> > > >>>>> > Cheers, > >>>>> > Jason > >>>>> > > >>>>> > On Tue, Sep 20, 2022 at 12:58 AM Νικόλαος Ταμπουρατζής < > >>>>> > ntampouratzis@ece.auth.gr> wrote: > >>>>> > > >>>>> >> Dear Bobby, > >>>>> >> > >>>>> >> 1) I use the original riscv-fs.py which is provided in the latest > >>> gem5 > >>>>> >> release. > >>>>> >> I run the gem5 once (./build/RISCV/gem5.fast -d ./HPCG_FS_results > >>>>> >> ./configs/example/gem5_library/riscv-fs.py) in order to download > the > >>>>> >> riscv-bootloader-vmlinux-5.10 and riscv-disk-img. > >>>>> >> After this I mount the riscv-disk-img (sudo mount -o loop > >>>>> >> riscv-disk-img /mnt), put the xhpcg executable and I do the > following > >>>>> >> changes in riscv-fs.py to boot the riscv-disk-img with executable: > >>>>> >> > >>>>> >> image = CustomDiskImageResource( > >>>>> >> local_path = "/home/cossim/.cache/gem5/riscv-disk-img", > >>>>> >> ) > >>>>> >> > >>>>> >> # Set the Full System workload. > >>>>> >> board.set_kernel_disk_workload( > >>>>> >> > kernel=Resource("riscv-bootloader-vmlinux-5.10"), > >>>>> >> disk_image=image, > >>>>> >> ) > >>>>> >> > >>>>> >> Finally, in the > gem5/src/python/gem5/components/boards/riscv_board.py > >>>>> >> I change the last line to "return ["console=ttyS0", > >>>>> >> "root={root_value}", "rw"]" in order to allow the write > permissions > >>> in > >>>>> >> the image. > >>>>> >> > >>>>> >> > >>>>> >> 2) The HPCG benchmark after some iterations calculates if the > results > >>>>> >> are valid or not valid. In the case of FS it gives invalid > results. > >>> As > >>>>> >> I see from the results, one (at least) problem is that produces > >>>>> >> different results in each HPCG execution (with the same > >>> configuration). > >>>>> >> > >>>>> >> Here is the HPCG output and riscv-fs.py > >>>>> >> (http://kition.mhl.tuc.gr:8000/d/68d82f3533/). You may reproduce > the > >>>>> >> results in the video if you use the xhpcg executable > >>>>> >> (http://kition.mhl.tuc.gr:8000/f/4ca25fdd3c/) > >>>>> >> > >>>>> >> Please help me in order to solve it! > >>>>> >> > >>>>> >> Finally, I get invalid results in the HPL benchmark in FS mode > too. > >>>>> >> > >>>>> >> Best regards, > >>>>> >> Nikos > >>>>> >> > >>>>> >> > >>>>> >> Quoting Bobby Bruce <bbruce@ucdavis.edu>: > >>>>> >> > >>>>> >> > I'm going to need a bit more information to help: > >>>>> >> > > >>>>> >> > 1. In what way have you modified > >>>>> >> > ./configs/example/gem5_library/riscv-fs.py? Can you attach the > >>> script > >>>>> >> here? > >>>>> >> > 2. What error are you getting or in what way are the results > >>> invalid? > >>>>> >> > > >>>>> >> > - > >>>>> >> > Dr. Bobby R. Bruce > >>>>> >> > Room 3050, > >>>>> >> > Kemper Hall, UC Davis > >>>>> >> > Davis, > >>>>> >> > CA, 95616 > >>>>> >> > > >>>>> >> > web: https://www.bobbybruce.net > >>>>> >> > > >>>>> >> > > >>>>> >> > On Mon, Sep 19, 2022 at 1:43 PM Νικόλαος Ταμπουρατζής < > >>>>> >> > ntampouratzis@ece.auth.gr> wrote: > >>>>> >> > > >>>>> >> >> > >>>>> >> >> Dear gem5 community, > >>>>> >> >> > >>>>> >> >> I have successfully cross-compile the HPCG benchmark for RISCV > >>>>> (Serial > >>>>> >> >> version, without MPI and OpenMP). While it working properly in > >>> gem5 > >>>>> SE > >>>>> >> >> mode (./build/RISCV/gem5.fast -d ./HPCG_SE_results > >>>>> >> >> ./configs/example/se.py -c xhpcg --options '--nx=16 --ny=16 > >>> --nz=16 > >>>>> >> >> --npx=1 --npy=1 --npz=1 --rt=0.1'), I get invalid results in FS > >>>>> >> >> simulation using "./build/RISCV/gem5.fast -d ./HPCG_FS_results > >>>>> >> >> ./configs/example/gem5_library/riscv-fs.py" (I mount the riscv > >>> image > >>>>> >> >> and put it). > >>>>> >> >> > >>>>> >> >> Can you help me please? > >>>>> >> >> > >>>>> >> >> In addition, I used the RISCV Ubuntu image > >>>>> >> >> ( > >>> https://github.com/gem5/gem5-resources/tree/stable/src/riscv-ubuntu > >>>>> ), > >>>>> >> >> I installed the gcc compiler, compile it (through qemu) and I > get > >>>>> >> >> wrong results too. > >>>>> >> >> > >>>>> >> >> Here is the Makefile which I use, the hpcg executable for RISCV > >>>>> >> >> (xhpcg), and a video that shows the results > >>>>> >> >> (http://kition.mhl.tuc.gr:8000/f/4ca25fdd3c/). > >>>>> >> >> > >>>>> >> >> P.S. I use the latest gem5 version. > >>>>> >> >> > >>>>> >> >> Thank you in advance! :) > >>>>> >> >> > >>>>> >> >> Best regards, > >>>>> >> >> Nikos > >>>>> >> >> _______________________________________________ > >>>>> >> >> gem5-users mailing list -- gem5-users@gem5.org > >>>>> >> >> To unsubscribe send an email to gem5-users-leave@gem5.org > >>>>> >> >> > >>>>> >> > >>>>> >> > >>>>> >> _______________________________________________ > >>>>> >> gem5-users mailing list -- gem5-users@gem5.org > >>>>> >> To unsubscribe send an email to gem5-users-leave@gem5.org > >>>>> >> > >>>>> > >>>>> > >>>>> _______________________________________________ > >>>>> gem5-users mailing list -- gem5-users@gem5.org > >>>>> To unsubscribe send an email to gem5-users-leave@gem5.org > >>>>> > >>> > >>> > >>> _______________________________________________ > >>> gem5-users mailing list -- gem5-users@gem5.org > >>> To unsubscribe send an email to gem5-users-leave@gem5.org > >>> > > > > > > _______________________________________________ > > gem5-users mailing list -- gem5-users@gem5.org > > To unsubscribe send an email to gem5-users-leave@gem5.org > > > _______________________________________________ > gem5-users mailing list -- gem5-users@gem5.org > To unsubscribe send an email to gem5-users-leave@gem5.org >
Νικόλαος Ταμπουρατζής
Sat, Oct 8, 2022 10:20 AM

Dear Hoa, all

I have ported successfully HPCG and many simple examples using gem5
ARM-FS and they are working properly. The problem is only in RISCV-FS
using double and float variables. Which option of objdump to use?

Best regards,
Nikos

Quoting Hoa Nguyen hoanguyen@ucdavis.edu:

Hi,

It's quite odd that both sqrt_i and result were zeroed out at the same
time. Does the problem appear in other ISA FS mode, e.g. x86 FS mode? Can
you show the objdump of the loop as well?

Regards,
Hoa Nguyen

On Thu, Oct 6, 2022, 04:06 Νικόλαος Ταμπουρατζής ntampouratzis@ece.auth.gr
wrote:

Dear Jason, all,

I am trying to find the accuracy problem with RISCV-FS and I observe
that the problem is created (at least in my dummy example) because the
variables (double) are set to zero in random simulated time (for this
reason I get different results among executions of the same code).
Specifically for the following dummy code:

#include <cmath>
#include <stdio.h>

int main(){

  int dim = 10;

  float result;

  for (int iter = 0; iter < 2; iter++){
      result = 0;
      for (int i = 0; i < dim; i++){
          for (int j = 0; j < dim; j++){
              float sq_i = sqrt(i);
              float sq_j = sqrt(j);
              result += sq_i * sq_j;
              printf("ITER: %d | i: %d | j: %d Result(i: %f | j: %f

| i*j: %f): %f\n", iter, i , j, sq_i, sq_j, sq_i * sq_j, result);
}
}
printf("Final Result: %lf\n", result);
}
}

The correct Final Result in both iterations is 372.721656. However, I
get the following results in FS:

ITER: 0 | i: 0 | j: 0 Result(i: 0.000000 | j: 0.000000 | ij:
0.000000): 0.000000
ITER: 0 | i: 0 | j: 1 Result(i: 0.000000 | j: 1.000000 | i
j:
0.000000): 0.000000
ITER: 0 | i: 0 | j: 2 Result(i: 0.000000 | j: 1.414214 | ij:
0.000000): 0.000000
ITER: 0 | i: 0 | j: 3 Result(i: 0.000000 | j: 1.732051 | i
j:
0.000000): 0.000000
ITER: 0 | i: 0 | j: 4 Result(i: 0.000000 | j: 2.000000 | ij:
0.000000): 0.000000
ITER: 0 | i: 0 | j: 5 Result(i: 0.000000 | j: 2.236068 | i
j:
0.000000): 0.000000
ITER: 0 | i: 0 | j: 6 Result(i: 0.000000 | j: 2.449490 | ij:
0.000000): 0.000000
ITER: 0 | i: 0 | j: 7 Result(i: 0.000000 | j: 2.645751 | i
j:
0.000000): 0.000000
ITER: 0 | i: 0 | j: 8 Result(i: 0.000000 | j: 2.828427 | ij:
0.000000): 0.000000
ITER: 0 | i: 0 | j: 9 Result(i: 0.000000 | j: 3.000000 | i
j:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 0 Result(i: 1.000000 | j: 0.000000 | ij:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 1 Result(i: 1.000000 | j: 1.000000 | i
j:
1.000000): 1.000000
ITER: 0 | i: 1 | j: 2 Result(i: 1.000000 | j: 1.414214 | ij:
1.414214): 2.414214
ITER: 0 | i: 1 | j: 3 Result(i: 1.000000 | j: 1.732051 | i
j:
1.732051): 4.146264
ITER: 0 | i: 1 | j: 4 Result(i: 0.000000 | j: 2.000000 | ij:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 5 Result(i: 0.000000 | j: 2.236068 | i
j:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 6 Result(i: 0.000000 | j: 2.449490 | ij:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 7 Result(i: 0.000000 | j: 2.645751 | i
j:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 8 Result(i: 0.000000 | j: 2.828427 | ij:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 9 Result(i: 0.000000 | j: 3.000000 | i
j:
0.000000): 0.000000
ITER: 0 | i: 2 | j: 0 Result(i: 1.414214 | j: 0.000000 | ij:
0.000000): 0.000000
ITER: 0 | i: 2 | j: 1 Result(i: 1.414214 | j: 1.000000 | i
j:
1.414214): 1.414214
ITER: 0 | i: 2 | j: 2 Result(i: 1.414214 | j: 1.414214 | ij:
2.000000): 3.414214
ITER: 0 | i: 2 | j: 3 Result(i: 1.414214 | j: 1.732051 | i
j:
2.449490): 5.863703
ITER: 0 | i: 2 | j: 4 Result(i: 1.414214 | j: 2.000000 | ij:
2.828427): 8.692130
ITER: 0 | i: 2 | j: 5 Result(i: 1.414214 | j: 2.236068 | i
j:
3.162278): 11.854408
ITER: 0 | i: 2 | j: 6 Result(i: 1.414214 | j: 2.449490 | ij:
3.464102): 15.318510
ITER: 0 | i: 2 | j: 7 Result(i: 1.414214 | j: 2.645751 | i
j:
3.741657): 19.060167
ITER: 0 | i: 2 | j: 8 Result(i: 1.414214 | j: 2.828427 | ij:
4.000000): 23.060167
ITER: 0 | i: 2 | j: 9 Result(i: 1.414214 | j: 3.000000 | i
j:
4.242641): 27.302808
ITER: 0 | i: 3 | j: 0 Result(i: 1.732051 | j: 0.000000 | ij:
0.000000): 27.302808
ITER: 0 | i: 3 | j: 1 Result(i: 1.732051 | j: 1.000000 | i
j:
1.732051): 29.034859
ITER: 0 | i: 3 | j: 2 Result(i: 1.732051 | j: 1.414214 | ij:
2.449490): 31.484348
ITER: 0 | i: 3 | j: 3 Result(i: 1.732051 | j: 1.732051 | i
j:
3.000000): 34.484348
ITER: 0 | i: 3 | j: 4 Result(i: 1.732051 | j: 2.000000 | ij:
3.464102): 37.948450
ITER: 0 | i: 3 | j: 5 Result(i: 1.732051 | j: 2.236068 | i
j:
3.872983): 41.821433
ITER: 0 | i: 3 | j: 6 Result(i: 1.732051 | j: 2.449490 | ij:
4.242641): 46.064074
ITER: 0 | i: 3 | j: 7 Result(i: 1.732051 | j: 2.645751 | i
j:
4.582576): 50.646650
ITER: 0 | i: 3 | j: 8 Result(i: 1.732051 | j: 2.828427 | ij:
4.898979): 55.545629
ITER: 0 | i: 3 | j: 9 Result(i: 1.732051 | j: 3.000000 | i
j:
5.196152): 60.741782
ITER: 0 | i: 4 | j: 0 Result(i: 2.000000 | j: 0.000000 | ij:
0.000000): 60.741782
ITER: 0 | i: 4 | j: 1 Result(i: 2.000000 | j: 1.000000 | i
j:
2.000000): 62.741782
ITER: 0 | i: 4 | j: 2 Result(i: 2.000000 | j: 1.414214 | ij:
2.828427): 65.570209
ITER: 0 | i: 4 | j: 3 Result(i: 2.000000 | j: 1.732051 | i
j:
3.464102): 69.034310
ITER: 0 | i: 4 | j: 4 Result(i: 2.000000 | j: 2.000000 | ij:
4.000000): 73.034310
ITER: 0 | i: 4 | j: 5 Result(i: 2.000000 | j: 2.236068 | i
j:
4.472136): 77.506446
ITER: 0 | i: 4 | j: 6 Result(i: 2.000000 | j: 2.449490 | ij:
4.898979): 82.405426
ITER: 0 | i: 4 | j: 7 Result(i: 2.000000 | j: 2.645751 | i
j:
5.291503): 87.696928
ITER: 0 | i: 4 | j: 8 Result(i: 2.000000 | j: 2.828427 | ij:
5.656854): 93.353783
ITER: 0 | i: 4 | j: 9 Result(i: 2.000000 | j: 3.000000 | i
j:
6.000000): 99.353783
ITER: 0 | i: 5 | j: 0 Result(i: 2.236068 | j: 0.000000 | ij:
0.000000): 99.353783
ITER: 0 | i: 5 | j: 1 Result(i: 2.236068 | j: 1.000000 | i
j:
2.236068): 101.589851
ITER: 0 | i: 5 | j: 2 Result(i: 2.236068 | j: 1.414214 | ij:
3.162278): 104.752128
ITER: 0 | i: 5 | j: 3 Result(i: 2.236068 | j: 1.732051 | i
j:
3.872983): 108.625112
ITER: 0 | i: 5 | j: 4 Result(i: 2.236068 | j: 2.000000 | ij:
4.472136): 113.097248
ITER: 0 | i: 5 | j: 5 Result(i: 2.236068 | j: 2.236068 | i
j:
5.000000): 118.097248
ITER: 0 | i: 5 | j: 6 Result(i: 2.236068 | j: 2.449490 | ij:
5.477226): 123.574473
ITER: 0 | i: 5 | j: 7 Result(i: 2.236068 | j: 2.645751 | i
j:
5.916080): 129.490553
ITER: 0 | i: 5 | j: 8 Result(i: 2.236068 | j: 2.828427 | ij:
6.324555): 135.815108
ITER: 0 | i: 5 | j: 9 Result(i: 2.236068 | j: 3.000000 | i
j:
6.708204): 142.523312
ITER: 0 | i: 6 | j: 0 Result(i: 2.449490 | j: 0.000000 | ij:
0.000000): 142.523312
ITER: 0 | i: 6 | j: 1 Result(i: 2.449490 | j: 1.000000 | i
j:
2.449490): 144.972802
ITER: 0 | i: 6 | j: 2 Result(i: 2.449490 | j: 1.414214 | ij:
3.464102): 148.436904
ITER: 0 | i: 6 | j: 3 Result(i: 2.449490 | j: 1.732051 | i
j:
4.242641): 152.679544
ITER: 0 | i: 6 | j: 4 Result(i: 2.449490 | j: 2.000000 | ij:
4.898979): 157.578524
ITER: 0 | i: 6 | j: 5 Result(i: 2.449490 | j: 2.236068 | i
j:
5.477226): 163.055749
ITER: 0 | i: 6 | j: 6 Result(i: 2.449490 | j: 2.449490 | ij:
6.000000): 169.055749
ITER: 0 | i: 6 | j: 7 Result(i: 2.449490 | j: 2.645751 | i
j:
6.480741): 175.536490
ITER: 0 | i: 6 | j: 8 Result(i: 2.449490 | j: 2.828427 | ij:
6.928203): 182.464693
ITER: 0 | i: 6 | j: 9 Result(i: 2.449490 | j: 3.000000 | i
j:
7.348469): 189.813162
ITER: 0 | i: 7 | j: 0 Result(i: 2.645751 | j: 0.000000 | ij:
0.000000): 189.813162
ITER: 0 | i: 7 | j: 1 Result(i: 2.645751 | j: 1.000000 | i
j:
2.645751): 192.458914
ITER: 0 | i: 7 | j: 2 Result(i: 2.645751 | j: 1.414214 | ij:
3.741657): 196.200571
ITER: 0 | i: 7 | j: 3 Result(i: 2.645751 | j: 1.732051 | i
j:
4.582576): 200.783147
ITER: 0 | i: 7 | j: 4 Result(i: 2.645751 | j: 2.000000 | ij:
5.291503): 206.074649
ITER: 0 | i: 7 | j: 5 Result(i: 2.645751 | j: 2.236068 | i
j:
5.916080): 211.990729
ITER: 0 | i: 7 | j: 6 Result(i: 2.645751 | j: 2.449490 | ij:
6.480741): 218.471470
ITER: 0 | i: 7 | j: 7 Result(i: 2.645751 | j: 2.645751 | i
j:
7.000000): 225.471470
ITER: 0 | i: 7 | j: 8 Result(i: 2.645751 | j: 2.828427 | ij:
7.483315): 232.954785
ITER: 0 | i: 7 | j: 9 Result(i: 2.645751 | j: 3.000000 | i
j:
7.937254): 240.892039
ITER: 0 | i: 8 | j: 0 Result(i: 2.828427 | j: 0.000000 | ij:
0.000000): 240.892039
ITER: 0 | i: 8 | j: 1 Result(i: 2.828427 | j: 1.000000 | i
j:
2.828427): 243.720466
ITER: 0 | i: 8 | j: 2 Result(i: 2.828427 | j: 1.414214 | ij:
4.000000): 247.720466
ITER: 0 | i: 8 | j: 3 Result(i: 2.828427 | j: 1.732051 | i
j:
4.898979): 252.619445
ITER: 0 | i: 8 | j: 4 Result(i: 2.828427 | j: 2.000000 | ij:
5.656854): 258.276300
ITER: 0 | i: 8 | j: 5 Result(i: 2.828427 | j: 2.236068 | i
j:
6.324555): 264.600855
ITER: 0 | i: 8 | j: 6 Result(i: 2.828427 | j: 2.449490 | ij:
6.928203): 271.529058
ITER: 0 | i: 8 | j: 7 Result(i: 2.828427 | j: 2.645751 | i
j:
7.483315): 279.012373
ITER: 0 | i: 8 | j: 8 Result(i: 2.828427 | j: 2.828427 | ij:
8.000000): 287.012373
ITER: 0 | i: 8 | j: 9 Result(i: 2.828427 | j: 3.000000 | i
j:
8.485281): 295.497654
ITER: 0 | i: 9 | j: 0 Result(i: 3.000000 | j: 0.000000 | ij:
0.000000): 295.497654
ITER: 0 | i: 9 | j: 1 Result(i: 3.000000 | j: 1.000000 | i
j:
3.000000): 298.497654
ITER: 0 | i: 9 | j: 2 Result(i: 3.000000 | j: 1.414214 | ij:
4.242641): 302.740295
ITER: 0 | i: 9 | j: 3 Result(i: 3.000000 | j: 1.732051 | i
j:
5.196152): 307.936447
ITER: 0 | i: 9 | j: 4 Result(i: 3.000000 | j: 2.000000 | ij:
6.000000): 313.936447
ITER: 0 | i: 9 | j: 5 Result(i: 3.000000 | j: 2.236068 | i
j:
6.708204): 320.644651
ITER: 0 | i: 9 | j: 6 Result(i: 3.000000 | j: 2.449490 | ij:
7.348469): 327.993120
ITER: 0 | i: 9 | j: 7 Result(i: 3.000000 | j: 2.645751 | i
j:
7.937254): 335.930374
ITER: 0 | i: 9 | j: 8 Result(i: 3.000000 | j: 2.828427 | ij:
8.485281): 344.415656
ITER: 0 | i: 9 | j: 9 Result(i: 3.000000 | j: 3.000000 | i
j:
9.000000): 353.415656
Final Result: 353.415656
ITER: 1 | i: 0 | j: 0 Result(i: 0.000000 | j: 0.000000 | ij:
0.000000): 0.000000
ITER: 1 | i: 0 | j: 1 Result(i: 0.000000 | j: 1.000000 | i
j:
0.000000): 0.000000
ITER: 1 | i: 0 | j: 2 Result(i: 0.000000 | j: 1.414214 | ij:
0.000000): 0.000000
ITER: 1 | i: 0 | j: 3 Result(i: 0.000000 | j: 1.732051 | i
j:
0.000000): 0.000000
ITER: 1 | i: 0 | j: 4 Result(i: 0.000000 | j: 2.000000 | ij:
0.000000): 0.000000
ITER: 1 | i: 0 | j: 5 Result(i: 0.000000 | j: 2.236068 | i
j:
0.000000): 0.000000
ITER: 1 | i: 0 | j: 6 Result(i: 0.000000 | j: 2.449490 | ij:
0.000000): 0.000000
ITER: 1 | i: 0 | j: 7 Result(i: 0.000000 | j: 2.645751 | i
j:
0.000000): 0.000000
ITER: 1 | i: 0 | j: 8 Result(i: 0.000000 | j: 2.828427 | ij:
0.000000): 0.000000
ITER: 1 | i: 0 | j: 9 Result(i: 0.000000 | j: 3.000000 | i
j:
0.000000): 0.000000
ITER: 1 | i: 1 | j: 0 Result(i: 1.000000 | j: 0.000000 | ij:
0.000000): 0.000000
ITER: 1 | i: 1 | j: 1 Result(i: 1.000000 | j: 1.000000 | i
j:
1.000000): 1.000000
ITER: 1 | i: 1 | j: 2 Result(i: 1.000000 | j: 1.414214 | ij:
1.414214): 2.414214
ITER: 1 | i: 1 | j: 3 Result(i: 1.000000 | j: 1.732051 | i
j:
1.732051): 4.146264
ITER: 1 | i: 1 | j: 4 Result(i: 1.000000 | j: 2.000000 | ij:
2.000000): 6.146264
ITER: 1 | i: 1 | j: 5 Result(i: 1.000000 | j: 2.236068 | i
j:
2.236068): 8.382332
ITER: 1 | i: 1 | j: 6 Result(i: 1.000000 | j: 2.449490 | ij:
2.449490): 10.831822
ITER: 1 | i: 1 | j: 7 Result(i: 1.000000 | j: 2.645751 | i
j:
2.645751): 13.477573
ITER: 1 | i: 1 | j: 8 Result(i: 1.000000 | j: 2.828427 | ij:
2.828427): 16.306001
ITER: 1 | i: 1 | j: 9 Result(i: 1.000000 | j: 3.000000 | i
j:
3.000000): 19.306001
ITER: 1 | i: 2 | j: 0 Result(i: 1.414214 | j: 0.000000 | ij:
0.000000): 19.306001
ITER: 1 | i: 2 | j: 1 Result(i: 1.414214 | j: 1.000000 | i
j:
1.414214): 20.720214
ITER: 1 | i: 2 | j: 2 Result(i: 1.414214 | j: 1.414214 | ij:
2.000000): 22.720214
ITER: 1 | i: 2 | j: 3 Result(i: 1.414214 | j: 1.732051 | i
j:
2.449490): 25.169704
ITER: 1 | i: 2 | j: 4 Result(i: 1.414214 | j: 2.000000 | ij:
2.828427): 27.998131
ITER: 1 | i: 2 | j: 5 Result(i: 1.414214 | j: 2.236068 | i
j:
3.162278): 31.160409
ITER: 1 | i: 2 | j: 6 Result(i: 1.414214 | j: 2.449490 | ij:
3.464102): 34.624510
ITER: 1 | i: 2 | j: 7 Result(i: 1.414214 | j: 2.645751 | i
j:
3.741657): 38.366168
ITER: 1 | i: 2 | j: 8 Result(i: 1.414214 | j: 2.828427 | ij:
4.000000): 42.366168
ITER: 1 | i: 2 | j: 9 Result(i: 1.414214 | j: 3.000000 | i
j:
4.242641): 46.608808
ITER: 1 | i: 3 | j: 0 Result(i: 1.732051 | j: 0.000000 | ij:
0.000000): 46.608808
ITER: 1 | i: 3 | j: 1 Result(i: 1.732051 | j: 1.000000 | i
j:
1.732051): 48.340859
ITER: 1 | i: 3 | j: 2 Result(i: 1.732051 | j: 1.414214 | ij:
2.449490): 50.790349
ITER: 1 | i: 3 | j: 3 Result(i: 1.732051 | j: 1.732051 | i
j:
3.000000): 53.790349
ITER: 1 | i: 3 | j: 4 Result(i: 1.732051 | j: 2.000000 | ij:
3.464102): 57.254450
ITER: 1 | i: 3 | j: 5 Result(i: 1.732051 | j: 2.236068 | i
j:
3.872983): 61.127434
ITER: 1 | i: 3 | j: 6 Result(i: 1.732051 | j: 2.449490 | ij:
4.242641): 65.370075
ITER: 1 | i: 3 | j: 7 Result(i: 1.732051 | j: 2.645751 | i
j:
4.582576): 69.952650
ITER: 1 | i: 3 | j: 8 Result(i: 1.732051 | j: 2.828427 | ij:
4.898979): 74.851630
ITER: 1 | i: 3 | j: 9 Result(i: 1.732051 | j: 3.000000 | i
j:
5.196152): 80.047782
ITER: 1 | i: 4 | j: 0 Result(i: 2.000000 | j: 0.000000 | ij:
0.000000): 80.047782
ITER: 1 | i: 4 | j: 1 Result(i: 2.000000 | j: 1.000000 | i
j:
2.000000): 82.047782
ITER: 1 | i: 4 | j: 2 Result(i: 2.000000 | j: 1.414214 | ij:
2.828427): 84.876209
ITER: 1 | i: 4 | j: 3 Result(i: 2.000000 | j: 1.732051 | i
j:
3.464102): 88.340311
ITER: 1 | i: 4 | j: 4 Result(i: 2.000000 | j: 2.000000 | ij:
4.000000): 92.340311
ITER: 1 | i: 4 | j: 5 Result(i: 2.000000 | j: 2.236068 | i
j:
4.472136): 96.812447
ITER: 1 | i: 4 | j: 6 Result(i: 2.000000 | j: 2.449490 | ij:
4.898979): 101.711426
ITER: 1 | i: 4 | j: 7 Result(i: 2.000000 | j: 2.645751 | i
j:
5.291503): 107.002929
ITER: 1 | i: 4 | j: 8 Result(i: 2.000000 | j: 2.828427 | ij:
5.656854): 112.659783
ITER: 1 | i: 4 | j: 9 Result(i: 2.000000 | j: 3.000000 | i
j:
6.000000): 118.659783
ITER: 1 | i: 5 | j: 0 Result(i: 2.236068 | j: 0.000000 | ij:
0.000000): 118.659783
ITER: 1 | i: 5 | j: 1 Result(i: 2.236068 | j: 1.000000 | i
j:
2.236068): 120.895851
ITER: 1 | i: 5 | j: 2 Result(i: 2.236068 | j: 1.414214 | ij:
3.162278): 124.058129
ITER: 1 | i: 5 | j: 3 Result(i: 2.236068 | j: 1.732051 | i
j:
3.872983): 127.931112
ITER: 1 | i: 5 | j: 4 Result(i: 2.236068 | j: 2.000000 | ij:
4.472136): 132.403248
ITER: 1 | i: 5 | j: 5 Result(i: 2.236068 | j: 2.236068 | i
j:
5.000000): 137.403248
ITER: 1 | i: 5 | j: 6 Result(i: 2.236068 | j: 2.449490 | ij:
5.477226): 142.880474
ITER: 1 | i: 5 | j: 7 Result(i: 2.236068 | j: 2.645751 | i
j:
5.916080): 148.796553
ITER: 1 | i: 5 | j: 8 Result(i: 2.236068 | j: 2.828427 | ij:
6.324555): 155.121109
ITER: 1 | i: 5 | j: 9 Result(i: 2.236068 | j: 3.000000 | i
j:
6.708204): 161.829313
ITER: 1 | i: 6 | j: 0 Result(i: 2.449490 | j: 0.000000 | ij:
0.000000): 161.829313
ITER: 1 | i: 6 | j: 1 Result(i: 2.449490 | j: 1.000000 | i
j:
2.449490): 164.278802
ITER: 1 | i: 6 | j: 2 Result(i: 2.449490 | j: 1.414214 | ij:
3.464102): 167.742904
ITER: 1 | i: 6 | j: 3 Result(i: 2.449490 | j: 1.732051 | i
j:
4.242641): 171.985545
ITER: 1 | i: 6 | j: 4 Result(i: 2.449490 | j: 2.000000 | ij:
4.898979): 176.884524
ITER: 1 | i: 6 | j: 5 Result(i: 2.449490 | j: 2.236068 | i
j:
5.477226): 182.361750
ITER: 1 | i: 6 | j: 6 Result(i: 2.449490 | j: 2.449490 | ij:
6.000000): 188.361750
ITER: 1 | i: 6 | j: 7 Result(i: 2.449490 | j: 2.645751 | i
j:
6.480741): 194.842491
ITER: 1 | i: 6 | j: 8 Result(i: 2.449490 | j: 2.828427 | ij:
6.928203): 201.770694
ITER: 1 | i: 6 | j: 9 Result(i: 2.449490 | j: 3.000000 | i
j:
7.348469): 209.119163
ITER: 1 | i: 7 | j: 0 Result(i: 2.645751 | j: 0.000000 | ij:
0.000000): 209.119163
ITER: 1 | i: 7 | j: 1 Result(i: 2.645751 | j: 1.000000 | i
j:
2.645751): 211.764914
ITER: 1 | i: 7 | j: 2 Result(i: 2.645751 | j: 1.414214 | ij:
3.741657): 215.506572
ITER: 1 | i: 7 | j: 3 Result(i: 2.645751 | j: 1.732051 | i
j:
4.582576): 220.089147
ITER: 1 | i: 7 | j: 4 Result(i: 2.645751 | j: 2.000000 | ij:
5.291503): 225.380650
ITER: 1 | i: 7 | j: 5 Result(i: 2.645751 | j: 2.236068 | i
j:
5.916080): 231.296730
ITER: 1 | i: 7 | j: 6 Result(i: 2.645751 | j: 2.449490 | ij:
6.480741): 237.777470
ITER: 1 | i: 7 | j: 7 Result(i: 2.645751 | j: 2.645751 | i
j:
7.000000): 244.777470
ITER: 1 | i: 7 | j: 8 Result(i: 2.645751 | j: 2.828427 | ij:
7.483315): 252.260785
ITER: 1 | i: 7 | j: 9 Result(i: 2.645751 | j: 3.000000 | i
j:
7.937254): 260.198039
ITER: 1 | i: 8 | j: 0 Result(i: 2.828427 | j: 0.000000 | ij:
0.000000): 260.198039
ITER: 1 | i: 8 | j: 1 Result(i: 2.828427 | j: 1.000000 | i
j:
2.828427): 263.026466
ITER: 1 | i: 8 | j: 2 Result(i: 2.828427 | j: 1.414214 | ij:
4.000000): 267.026466
ITER: 1 | i: 8 | j: 3 Result(i: 2.828427 | j: 1.732051 | i
j:
4.898979): 271.925446
ITER: 1 | i: 8 | j: 4 Result(i: 2.828427 | j: 2.000000 | ij:
5.656854): 277.582300
ITER: 1 | i: 8 | j: 5 Result(i: 2.828427 | j: 2.236068 | i
j:
6.324555): 283.906855
ITER: 1 | i: 8 | j: 6 Result(i: 2.828427 | j: 2.449490 | ij:
6.928203): 290.835059
ITER: 1 | i: 8 | j: 7 Result(i: 2.828427 | j: 2.645751 | i
j:
7.483315): 298.318373
ITER: 1 | i: 8 | j: 8 Result(i: 2.828427 | j: 2.828427 | ij:
8.000000): 306.318373
ITER: 1 | i: 8 | j: 9 Result(i: 2.828427 | j: 3.000000 | i
j:
8.485281): 314.803655
ITER: 1 | i: 9 | j: 0 Result(i: 3.000000 | j: 0.000000 | ij:
0.000000): 314.803655
ITER: 1 | i: 9 | j: 1 Result(i: 3.000000 | j: 1.000000 | i
j:
3.000000): 317.803655
ITER: 1 | i: 9 | j: 2 Result(i: 3.000000 | j: 1.414214 | ij:
4.242641): 322.046295
ITER: 1 | i: 9 | j: 3 Result(i: 3.000000 | j: 1.732051 | i
j:
5.196152): 327.242448
ITER: 1 | i: 9 | j: 4 Result(i: 3.000000 | j: 2.000000 | ij:
6.000000): 333.242448
ITER: 1 | i: 9 | j: 5 Result(i: 3.000000 | j: 2.236068 | i
j:
6.708204): 339.950652
ITER: 1 | i: 9 | j: 6 Result(i: 3.000000 | j: 2.449490 | ij:
7.348469): 347.299121
ITER: 1 | i: 9 | j: 7 Result(i: 3.000000 | j: 2.645751 | i
j:
7.937254): 355.236375
ITER: 1 | i: 9 | j: 8 Result(i: 3.000000 | j: 2.828427 | ij:
8.485281): 363.721656
ITER: 1 | i: 9 | j: 9 Result(i: 3.000000 | j: 3.000000 | i
j:
9.000000): 372.721656
Final Result: 372.721656

As we can see in the following iterations the sqrt(1) as well as the
result is set to zero for some reason.

ITER: 0 | i: 1 | j: 4 Result(i: 0.000000 | j: 2.000000 | ij:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 5 Result(i: 0.000000 | j: 2.236068 | i
j:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 6 Result(i: 0.000000 | j: 2.449490 | ij:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 7 Result(i: 0.000000 | j: 2.645751 | i
j:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 8 Result(i: 0.000000 | j: 2.828427 | ij:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 9 Result(i: 0.000000 | j: 3.000000 | i
j:
0.000000): 0.000000

Please help me to resolve the accuracy issue! I think that it will be
very useful for gem5 community.

To be noticed, I find the correct simulated tick in which the
application started in FS (using m5 dumpstats), and I start the
--debug-start, but the trace file which is generated is 10x larger
than SE mode for the same application. How can I compare them?

Thank you in advance!
Best regards,
Nikos

Quoting Νικόλαος Ταμπουρατζής ntampouratzis@ece.auth.gr:

Dear Jason,

I am trying to use --debug-start but in FS mode it is very difficult
to find the tick on which the application is started!

However, I am writing the following very simple c++ program:

#include <cmath>
#include <stdio.h>

int main(){

 int dim = 4096;

 double result;

 for (int iter = 0; iter < 2; iter++){
     result = 0;
     for (int i = 0; i < dim; i++){
         for (int j = 0; j < dim; j++){
             result += sqrt(i) * sqrt(j);
         }
     }
     printf("Result: %lf\n", result); //Result: 30530733453.127449
 }

}

I cross-compile it using: riscv64-linux-gnu-g++ -static -O3 -o
test_riscv test_riscv.cpp

While in X86 (without cross-compilation of course), QEMU-RISCV,
GEM5-SE the result is the same (30530733453.127449), in GEM5-FS the
result is different! In addition, the result is also different
between the 2 iterations.

Please reproduce the error if you want in order to verify my result.
Ηow can the issue be resolved?

Thank you in advance!

Best regards,
Nikos

Quoting Jason Lowe-Power jason@lowepower.com:

Hi Nikos,

You can use --debug-start to start the debugging after some number of
ticks. Also, I would expect that the difference should come up quickly,

so

no need to run the program to the end.

For the FS mode one, you will want to just start the trace as the
application starts. This could be a bit of a pain.

I'm not really sure what fundamentally could be different. FS and SE

mode

use the exact same code for executing instructions, so I don't think

that's

the problem. Have you tried running for smaller inputs or just one
iteration?

Jason

On Wed, Sep 21, 2022 at 9:04 AM Νικόλαος Ταμπουρατζής <
ntampouratzis@ece.auth.gr> wrote:

Dear Bobby,

Iam trying to add --debug-flags=Exec (building the gem5 for gem5.opt
not for gem5.fast which I had) but the debug traces exceed the 20GB
(and it is not finished yet) for less than 1 simulated second. How can
I reduce the size of the debug-flags (or set something more specific)?

In contrast I build the HPCG benchmark with DHPCG_DEBUG flag. If you
want, you can compare these two output files
(hpcg20010909T014640_SE_Mode & HPCG-Benchmark_3.1_FS_Mode). As you can
see, something goes wrong with the accuracy of calculations in FS mode
(benchmark uses double precission). You can find the files here:
http://kition.mhl.tuc.gr:8000/d/68d82f3533/

Best regards,
Nikos

Quoting Jason Lowe-Power jason@lowepower.com:

That's quite odd that it works in SE mode but not FS mode!

I would suggest running with --debug-flags=Exec for both and then

perform a

diff to see how they differ.

Cheers,
Jason

On Tue, Sep 20, 2022 at 2:45 PM Νικόλαος Ταμπουρατζής <
ntampouratzis@ece.auth.gr> wrote:

Dear Bobby,

In QEMU I get the same (correct) results that I get in SE mode
simulation. I get invalid results in FS simulation (in both
riscv-fs.py and riscv-ubuntu-run.py). I cannot access real RISCV
hardware at this moment, however, if you want you may execute my

xhpcg

binary (http://kition.mhl.tuc.gr:8000/f/4ca25fdd3c/) with the
following configuration:

./xhpcg --nx=16 --ny=16 --nz=16 --npx=1 --npy=1 --npz=1 --rt=0.1

Please let me know if you have any updates!

Best regards,
Nikos

Quoting Jason Lowe-Power jason@lowepower.com:

Hi Nikos,

I notice you said the following in your original email:

In addition, I used the RISCV Ubuntu image

(

),

I installed the gcc compiler, compile it (through qemu) and I get
wrong results too.

Is this saying you get the wrong results is QEMU? If so, the bug

is in

GCC

or the HPCG workload, not in gem5. If not, I would test in QEMU to

make

sure the binary works there. Another way you could test to see if

the

problem is your binary or gem5 would be to run it on real

hardware. We

have

access to some RISC-V hardware here at UC Davis, if you don't have

access

to it.

Cheers,
Jason

On Tue, Sep 20, 2022 at 12:58 AM Νικόλαος Ταμπουρατζής <
ntampouratzis@ece.auth.gr> wrote:

Dear Bobby,

  1. I use the original riscv-fs.py which is provided in the latest

gem5

release.
I run the gem5 once (./build/RISCV/gem5.fast -d ./HPCG_FS_results
./configs/example/gem5_library/riscv-fs.py) in order to download

the

riscv-bootloader-vmlinux-5.10 and riscv-disk-img.
After this I mount the riscv-disk-img (sudo mount -o loop
riscv-disk-img /mnt), put the xhpcg executable and I do the

following

changes in riscv-fs.py to boot the riscv-disk-img with executable:

image = CustomDiskImageResource(
local_path = "/home/cossim/.cache/gem5/riscv-disk-img",
)

Set the Full System workload.

board.set_kernel_disk_workload(

kernel=Resource("riscv-bootloader-vmlinux-5.10"),

                 disk_image=image,

)

Finally, in the

gem5/src/python/gem5/components/boards/riscv_board.py

I change the last line to "return ["console=ttyS0",
"root={root_value}", "rw"]" in order to allow the write

permissions

in

the image.

  1. The HPCG benchmark after some iterations calculates if the

results

are valid or not valid. In the case of FS it gives invalid

results.

As

I see from the results, one (at least) problem is that produces
different results in each HPCG execution (with the same

configuration).

Here is the HPCG output and riscv-fs.py
(http://kition.mhl.tuc.gr:8000/d/68d82f3533/). You may reproduce

the

results in the video if you use the xhpcg executable
(http://kition.mhl.tuc.gr:8000/f/4ca25fdd3c/)

Please help me in order to solve it!

Finally, I get invalid results in the HPL benchmark in FS mode

too.

Best regards,
Nikos

Quoting Bobby Bruce bbruce@ucdavis.edu:

I'm going to need a bit more information to help:

  1. In what way have you modified
    ./configs/example/gem5_library/riscv-fs.py? Can you attach the

script

here?

  1. What error are you getting or in what way are the results

invalid?

Dr. Bobby R. Bruce
Room 3050,
Kemper Hall, UC Davis
Davis,
CA, 95616

web: https://www.bobbybruce.net

On Mon, Sep 19, 2022 at 1:43 PM Νικόλαος Ταμπουρατζής <
ntampouratzis@ece.auth.gr> wrote:

Dear gem5 community,

I have successfully cross-compile the HPCG benchmark for RISCV

(Serial

version, without MPI and OpenMP). While it working properly in

gem5

SE

mode (./build/RISCV/gem5.fast -d ./HPCG_SE_results
./configs/example/se.py -c xhpcg --options '--nx=16 --ny=16

--nz=16

--npx=1 --npy=1 --npz=1 --rt=0.1'), I get invalid results in FS
simulation using "./build/RISCV/gem5.fast -d ./HPCG_FS_results
./configs/example/gem5_library/riscv-fs.py" (I mount the riscv

image

and put it).

Can you help me please?

In addition, I used the RISCV Ubuntu image
(

),

I installed the gcc compiler, compile it (through qemu) and I

get

wrong results too.

Here is the Makefile which I use, the hpcg executable for RISCV
(xhpcg), and a video that shows the results
(http://kition.mhl.tuc.gr:8000/f/4ca25fdd3c/).

P.S. I use the latest gem5 version.

Thank you in advance! :)

Best regards,
Nikos


gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-leave@gem5.org


gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-leave@gem5.org


gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-leave@gem5.org


gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-leave@gem5.org


gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-leave@gem5.org


gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-leave@gem5.org

Dear Hoa, all I have ported successfully HPCG and many simple examples using gem5 ARM-FS and they are working properly. The problem is only in RISCV-FS using double and float variables. Which option of objdump to use? Best regards, Nikos Quoting Hoa Nguyen <hoanguyen@ucdavis.edu>: > Hi, > > It's quite odd that both sqrt_i and result were zeroed out at the same > time. Does the problem appear in other ISA FS mode, e.g. x86 FS mode? Can > you show the objdump of the loop as well? > > Regards, > Hoa Nguyen > > On Thu, Oct 6, 2022, 04:06 Νικόλαος Ταμπουρατζής <ntampouratzis@ece.auth.gr> > wrote: > >> Dear Jason, all, >> >> I am trying to find the accuracy problem with RISCV-FS and I observe >> that the problem is created (at least in my dummy example) because the >> variables (double) are set to zero in random simulated time (for this >> reason I get different results among executions of the same code). >> Specifically for the following dummy code: >> >> >> #include <cmath> >> #include <stdio.h> >> >> int main(){ >> >> int dim = 10; >> >> float result; >> >> for (int iter = 0; iter < 2; iter++){ >> result = 0; >> for (int i = 0; i < dim; i++){ >> for (int j = 0; j < dim; j++){ >> float sq_i = sqrt(i); >> float sq_j = sqrt(j); >> result += sq_i * sq_j; >> printf("ITER: %d | i: %d | j: %d Result(i: %f | j: %f >> | i*j: %f): %f\n", iter, i , j, sq_i, sq_j, sq_i * sq_j, result); >> } >> } >> printf("Final Result: %lf\n", result); >> } >> } >> >> >> The correct Final Result in both iterations is 372.721656. However, I >> get the following results in FS: >> >> ITER: 0 | i: 0 | j: 0 Result(i: 0.000000 | j: 0.000000 | i*j: >> 0.000000): 0.000000 >> ITER: 0 | i: 0 | j: 1 Result(i: 0.000000 | j: 1.000000 | i*j: >> 0.000000): 0.000000 >> ITER: 0 | i: 0 | j: 2 Result(i: 0.000000 | j: 1.414214 | i*j: >> 0.000000): 0.000000 >> ITER: 0 | i: 0 | j: 3 Result(i: 0.000000 | j: 1.732051 | i*j: >> 0.000000): 0.000000 >> ITER: 0 | i: 0 | j: 4 Result(i: 0.000000 | j: 2.000000 | i*j: >> 0.000000): 0.000000 >> ITER: 0 | i: 0 | j: 5 Result(i: 0.000000 | j: 2.236068 | i*j: >> 0.000000): 0.000000 >> ITER: 0 | i: 0 | j: 6 Result(i: 0.000000 | j: 2.449490 | i*j: >> 0.000000): 0.000000 >> ITER: 0 | i: 0 | j: 7 Result(i: 0.000000 | j: 2.645751 | i*j: >> 0.000000): 0.000000 >> ITER: 0 | i: 0 | j: 8 Result(i: 0.000000 | j: 2.828427 | i*j: >> 0.000000): 0.000000 >> ITER: 0 | i: 0 | j: 9 Result(i: 0.000000 | j: 3.000000 | i*j: >> 0.000000): 0.000000 >> ITER: 0 | i: 1 | j: 0 Result(i: 1.000000 | j: 0.000000 | i*j: >> 0.000000): 0.000000 >> ITER: 0 | i: 1 | j: 1 Result(i: 1.000000 | j: 1.000000 | i*j: >> 1.000000): 1.000000 >> ITER: 0 | i: 1 | j: 2 Result(i: 1.000000 | j: 1.414214 | i*j: >> 1.414214): 2.414214 >> ITER: 0 | i: 1 | j: 3 Result(i: 1.000000 | j: 1.732051 | i*j: >> 1.732051): 4.146264 >> ITER: 0 | i: 1 | j: 4 Result(i: 0.000000 | j: 2.000000 | i*j: >> 0.000000): 0.000000 >> ITER: 0 | i: 1 | j: 5 Result(i: 0.000000 | j: 2.236068 | i*j: >> 0.000000): 0.000000 >> ITER: 0 | i: 1 | j: 6 Result(i: 0.000000 | j: 2.449490 | i*j: >> 0.000000): 0.000000 >> ITER: 0 | i: 1 | j: 7 Result(i: 0.000000 | j: 2.645751 | i*j: >> 0.000000): 0.000000 >> ITER: 0 | i: 1 | j: 8 Result(i: 0.000000 | j: 2.828427 | i*j: >> 0.000000): 0.000000 >> ITER: 0 | i: 1 | j: 9 Result(i: 0.000000 | j: 3.000000 | i*j: >> 0.000000): 0.000000 >> ITER: 0 | i: 2 | j: 0 Result(i: 1.414214 | j: 0.000000 | i*j: >> 0.000000): 0.000000 >> ITER: 0 | i: 2 | j: 1 Result(i: 1.414214 | j: 1.000000 | i*j: >> 1.414214): 1.414214 >> ITER: 0 | i: 2 | j: 2 Result(i: 1.414214 | j: 1.414214 | i*j: >> 2.000000): 3.414214 >> ITER: 0 | i: 2 | j: 3 Result(i: 1.414214 | j: 1.732051 | i*j: >> 2.449490): 5.863703 >> ITER: 0 | i: 2 | j: 4 Result(i: 1.414214 | j: 2.000000 | i*j: >> 2.828427): 8.692130 >> ITER: 0 | i: 2 | j: 5 Result(i: 1.414214 | j: 2.236068 | i*j: >> 3.162278): 11.854408 >> ITER: 0 | i: 2 | j: 6 Result(i: 1.414214 | j: 2.449490 | i*j: >> 3.464102): 15.318510 >> ITER: 0 | i: 2 | j: 7 Result(i: 1.414214 | j: 2.645751 | i*j: >> 3.741657): 19.060167 >> ITER: 0 | i: 2 | j: 8 Result(i: 1.414214 | j: 2.828427 | i*j: >> 4.000000): 23.060167 >> ITER: 0 | i: 2 | j: 9 Result(i: 1.414214 | j: 3.000000 | i*j: >> 4.242641): 27.302808 >> ITER: 0 | i: 3 | j: 0 Result(i: 1.732051 | j: 0.000000 | i*j: >> 0.000000): 27.302808 >> ITER: 0 | i: 3 | j: 1 Result(i: 1.732051 | j: 1.000000 | i*j: >> 1.732051): 29.034859 >> ITER: 0 | i: 3 | j: 2 Result(i: 1.732051 | j: 1.414214 | i*j: >> 2.449490): 31.484348 >> ITER: 0 | i: 3 | j: 3 Result(i: 1.732051 | j: 1.732051 | i*j: >> 3.000000): 34.484348 >> ITER: 0 | i: 3 | j: 4 Result(i: 1.732051 | j: 2.000000 | i*j: >> 3.464102): 37.948450 >> ITER: 0 | i: 3 | j: 5 Result(i: 1.732051 | j: 2.236068 | i*j: >> 3.872983): 41.821433 >> ITER: 0 | i: 3 | j: 6 Result(i: 1.732051 | j: 2.449490 | i*j: >> 4.242641): 46.064074 >> ITER: 0 | i: 3 | j: 7 Result(i: 1.732051 | j: 2.645751 | i*j: >> 4.582576): 50.646650 >> ITER: 0 | i: 3 | j: 8 Result(i: 1.732051 | j: 2.828427 | i*j: >> 4.898979): 55.545629 >> ITER: 0 | i: 3 | j: 9 Result(i: 1.732051 | j: 3.000000 | i*j: >> 5.196152): 60.741782 >> ITER: 0 | i: 4 | j: 0 Result(i: 2.000000 | j: 0.000000 | i*j: >> 0.000000): 60.741782 >> ITER: 0 | i: 4 | j: 1 Result(i: 2.000000 | j: 1.000000 | i*j: >> 2.000000): 62.741782 >> ITER: 0 | i: 4 | j: 2 Result(i: 2.000000 | j: 1.414214 | i*j: >> 2.828427): 65.570209 >> ITER: 0 | i: 4 | j: 3 Result(i: 2.000000 | j: 1.732051 | i*j: >> 3.464102): 69.034310 >> ITER: 0 | i: 4 | j: 4 Result(i: 2.000000 | j: 2.000000 | i*j: >> 4.000000): 73.034310 >> ITER: 0 | i: 4 | j: 5 Result(i: 2.000000 | j: 2.236068 | i*j: >> 4.472136): 77.506446 >> ITER: 0 | i: 4 | j: 6 Result(i: 2.000000 | j: 2.449490 | i*j: >> 4.898979): 82.405426 >> ITER: 0 | i: 4 | j: 7 Result(i: 2.000000 | j: 2.645751 | i*j: >> 5.291503): 87.696928 >> ITER: 0 | i: 4 | j: 8 Result(i: 2.000000 | j: 2.828427 | i*j: >> 5.656854): 93.353783 >> ITER: 0 | i: 4 | j: 9 Result(i: 2.000000 | j: 3.000000 | i*j: >> 6.000000): 99.353783 >> ITER: 0 | i: 5 | j: 0 Result(i: 2.236068 | j: 0.000000 | i*j: >> 0.000000): 99.353783 >> ITER: 0 | i: 5 | j: 1 Result(i: 2.236068 | j: 1.000000 | i*j: >> 2.236068): 101.589851 >> ITER: 0 | i: 5 | j: 2 Result(i: 2.236068 | j: 1.414214 | i*j: >> 3.162278): 104.752128 >> ITER: 0 | i: 5 | j: 3 Result(i: 2.236068 | j: 1.732051 | i*j: >> 3.872983): 108.625112 >> ITER: 0 | i: 5 | j: 4 Result(i: 2.236068 | j: 2.000000 | i*j: >> 4.472136): 113.097248 >> ITER: 0 | i: 5 | j: 5 Result(i: 2.236068 | j: 2.236068 | i*j: >> 5.000000): 118.097248 >> ITER: 0 | i: 5 | j: 6 Result(i: 2.236068 | j: 2.449490 | i*j: >> 5.477226): 123.574473 >> ITER: 0 | i: 5 | j: 7 Result(i: 2.236068 | j: 2.645751 | i*j: >> 5.916080): 129.490553 >> ITER: 0 | i: 5 | j: 8 Result(i: 2.236068 | j: 2.828427 | i*j: >> 6.324555): 135.815108 >> ITER: 0 | i: 5 | j: 9 Result(i: 2.236068 | j: 3.000000 | i*j: >> 6.708204): 142.523312 >> ITER: 0 | i: 6 | j: 0 Result(i: 2.449490 | j: 0.000000 | i*j: >> 0.000000): 142.523312 >> ITER: 0 | i: 6 | j: 1 Result(i: 2.449490 | j: 1.000000 | i*j: >> 2.449490): 144.972802 >> ITER: 0 | i: 6 | j: 2 Result(i: 2.449490 | j: 1.414214 | i*j: >> 3.464102): 148.436904 >> ITER: 0 | i: 6 | j: 3 Result(i: 2.449490 | j: 1.732051 | i*j: >> 4.242641): 152.679544 >> ITER: 0 | i: 6 | j: 4 Result(i: 2.449490 | j: 2.000000 | i*j: >> 4.898979): 157.578524 >> ITER: 0 | i: 6 | j: 5 Result(i: 2.449490 | j: 2.236068 | i*j: >> 5.477226): 163.055749 >> ITER: 0 | i: 6 | j: 6 Result(i: 2.449490 | j: 2.449490 | i*j: >> 6.000000): 169.055749 >> ITER: 0 | i: 6 | j: 7 Result(i: 2.449490 | j: 2.645751 | i*j: >> 6.480741): 175.536490 >> ITER: 0 | i: 6 | j: 8 Result(i: 2.449490 | j: 2.828427 | i*j: >> 6.928203): 182.464693 >> ITER: 0 | i: 6 | j: 9 Result(i: 2.449490 | j: 3.000000 | i*j: >> 7.348469): 189.813162 >> ITER: 0 | i: 7 | j: 0 Result(i: 2.645751 | j: 0.000000 | i*j: >> 0.000000): 189.813162 >> ITER: 0 | i: 7 | j: 1 Result(i: 2.645751 | j: 1.000000 | i*j: >> 2.645751): 192.458914 >> ITER: 0 | i: 7 | j: 2 Result(i: 2.645751 | j: 1.414214 | i*j: >> 3.741657): 196.200571 >> ITER: 0 | i: 7 | j: 3 Result(i: 2.645751 | j: 1.732051 | i*j: >> 4.582576): 200.783147 >> ITER: 0 | i: 7 | j: 4 Result(i: 2.645751 | j: 2.000000 | i*j: >> 5.291503): 206.074649 >> ITER: 0 | i: 7 | j: 5 Result(i: 2.645751 | j: 2.236068 | i*j: >> 5.916080): 211.990729 >> ITER: 0 | i: 7 | j: 6 Result(i: 2.645751 | j: 2.449490 | i*j: >> 6.480741): 218.471470 >> ITER: 0 | i: 7 | j: 7 Result(i: 2.645751 | j: 2.645751 | i*j: >> 7.000000): 225.471470 >> ITER: 0 | i: 7 | j: 8 Result(i: 2.645751 | j: 2.828427 | i*j: >> 7.483315): 232.954785 >> ITER: 0 | i: 7 | j: 9 Result(i: 2.645751 | j: 3.000000 | i*j: >> 7.937254): 240.892039 >> ITER: 0 | i: 8 | j: 0 Result(i: 2.828427 | j: 0.000000 | i*j: >> 0.000000): 240.892039 >> ITER: 0 | i: 8 | j: 1 Result(i: 2.828427 | j: 1.000000 | i*j: >> 2.828427): 243.720466 >> ITER: 0 | i: 8 | j: 2 Result(i: 2.828427 | j: 1.414214 | i*j: >> 4.000000): 247.720466 >> ITER: 0 | i: 8 | j: 3 Result(i: 2.828427 | j: 1.732051 | i*j: >> 4.898979): 252.619445 >> ITER: 0 | i: 8 | j: 4 Result(i: 2.828427 | j: 2.000000 | i*j: >> 5.656854): 258.276300 >> ITER: 0 | i: 8 | j: 5 Result(i: 2.828427 | j: 2.236068 | i*j: >> 6.324555): 264.600855 >> ITER: 0 | i: 8 | j: 6 Result(i: 2.828427 | j: 2.449490 | i*j: >> 6.928203): 271.529058 >> ITER: 0 | i: 8 | j: 7 Result(i: 2.828427 | j: 2.645751 | i*j: >> 7.483315): 279.012373 >> ITER: 0 | i: 8 | j: 8 Result(i: 2.828427 | j: 2.828427 | i*j: >> 8.000000): 287.012373 >> ITER: 0 | i: 8 | j: 9 Result(i: 2.828427 | j: 3.000000 | i*j: >> 8.485281): 295.497654 >> ITER: 0 | i: 9 | j: 0 Result(i: 3.000000 | j: 0.000000 | i*j: >> 0.000000): 295.497654 >> ITER: 0 | i: 9 | j: 1 Result(i: 3.000000 | j: 1.000000 | i*j: >> 3.000000): 298.497654 >> ITER: 0 | i: 9 | j: 2 Result(i: 3.000000 | j: 1.414214 | i*j: >> 4.242641): 302.740295 >> ITER: 0 | i: 9 | j: 3 Result(i: 3.000000 | j: 1.732051 | i*j: >> 5.196152): 307.936447 >> ITER: 0 | i: 9 | j: 4 Result(i: 3.000000 | j: 2.000000 | i*j: >> 6.000000): 313.936447 >> ITER: 0 | i: 9 | j: 5 Result(i: 3.000000 | j: 2.236068 | i*j: >> 6.708204): 320.644651 >> ITER: 0 | i: 9 | j: 6 Result(i: 3.000000 | j: 2.449490 | i*j: >> 7.348469): 327.993120 >> ITER: 0 | i: 9 | j: 7 Result(i: 3.000000 | j: 2.645751 | i*j: >> 7.937254): 335.930374 >> ITER: 0 | i: 9 | j: 8 Result(i: 3.000000 | j: 2.828427 | i*j: >> 8.485281): 344.415656 >> ITER: 0 | i: 9 | j: 9 Result(i: 3.000000 | j: 3.000000 | i*j: >> 9.000000): 353.415656 >> Final Result: 353.415656 >> ITER: 1 | i: 0 | j: 0 Result(i: 0.000000 | j: 0.000000 | i*j: >> 0.000000): 0.000000 >> ITER: 1 | i: 0 | j: 1 Result(i: 0.000000 | j: 1.000000 | i*j: >> 0.000000): 0.000000 >> ITER: 1 | i: 0 | j: 2 Result(i: 0.000000 | j: 1.414214 | i*j: >> 0.000000): 0.000000 >> ITER: 1 | i: 0 | j: 3 Result(i: 0.000000 | j: 1.732051 | i*j: >> 0.000000): 0.000000 >> ITER: 1 | i: 0 | j: 4 Result(i: 0.000000 | j: 2.000000 | i*j: >> 0.000000): 0.000000 >> ITER: 1 | i: 0 | j: 5 Result(i: 0.000000 | j: 2.236068 | i*j: >> 0.000000): 0.000000 >> ITER: 1 | i: 0 | j: 6 Result(i: 0.000000 | j: 2.449490 | i*j: >> 0.000000): 0.000000 >> ITER: 1 | i: 0 | j: 7 Result(i: 0.000000 | j: 2.645751 | i*j: >> 0.000000): 0.000000 >> ITER: 1 | i: 0 | j: 8 Result(i: 0.000000 | j: 2.828427 | i*j: >> 0.000000): 0.000000 >> ITER: 1 | i: 0 | j: 9 Result(i: 0.000000 | j: 3.000000 | i*j: >> 0.000000): 0.000000 >> ITER: 1 | i: 1 | j: 0 Result(i: 1.000000 | j: 0.000000 | i*j: >> 0.000000): 0.000000 >> ITER: 1 | i: 1 | j: 1 Result(i: 1.000000 | j: 1.000000 | i*j: >> 1.000000): 1.000000 >> ITER: 1 | i: 1 | j: 2 Result(i: 1.000000 | j: 1.414214 | i*j: >> 1.414214): 2.414214 >> ITER: 1 | i: 1 | j: 3 Result(i: 1.000000 | j: 1.732051 | i*j: >> 1.732051): 4.146264 >> ITER: 1 | i: 1 | j: 4 Result(i: 1.000000 | j: 2.000000 | i*j: >> 2.000000): 6.146264 >> ITER: 1 | i: 1 | j: 5 Result(i: 1.000000 | j: 2.236068 | i*j: >> 2.236068): 8.382332 >> ITER: 1 | i: 1 | j: 6 Result(i: 1.000000 | j: 2.449490 | i*j: >> 2.449490): 10.831822 >> ITER: 1 | i: 1 | j: 7 Result(i: 1.000000 | j: 2.645751 | i*j: >> 2.645751): 13.477573 >> ITER: 1 | i: 1 | j: 8 Result(i: 1.000000 | j: 2.828427 | i*j: >> 2.828427): 16.306001 >> ITER: 1 | i: 1 | j: 9 Result(i: 1.000000 | j: 3.000000 | i*j: >> 3.000000): 19.306001 >> ITER: 1 | i: 2 | j: 0 Result(i: 1.414214 | j: 0.000000 | i*j: >> 0.000000): 19.306001 >> ITER: 1 | i: 2 | j: 1 Result(i: 1.414214 | j: 1.000000 | i*j: >> 1.414214): 20.720214 >> ITER: 1 | i: 2 | j: 2 Result(i: 1.414214 | j: 1.414214 | i*j: >> 2.000000): 22.720214 >> ITER: 1 | i: 2 | j: 3 Result(i: 1.414214 | j: 1.732051 | i*j: >> 2.449490): 25.169704 >> ITER: 1 | i: 2 | j: 4 Result(i: 1.414214 | j: 2.000000 | i*j: >> 2.828427): 27.998131 >> ITER: 1 | i: 2 | j: 5 Result(i: 1.414214 | j: 2.236068 | i*j: >> 3.162278): 31.160409 >> ITER: 1 | i: 2 | j: 6 Result(i: 1.414214 | j: 2.449490 | i*j: >> 3.464102): 34.624510 >> ITER: 1 | i: 2 | j: 7 Result(i: 1.414214 | j: 2.645751 | i*j: >> 3.741657): 38.366168 >> ITER: 1 | i: 2 | j: 8 Result(i: 1.414214 | j: 2.828427 | i*j: >> 4.000000): 42.366168 >> ITER: 1 | i: 2 | j: 9 Result(i: 1.414214 | j: 3.000000 | i*j: >> 4.242641): 46.608808 >> ITER: 1 | i: 3 | j: 0 Result(i: 1.732051 | j: 0.000000 | i*j: >> 0.000000): 46.608808 >> ITER: 1 | i: 3 | j: 1 Result(i: 1.732051 | j: 1.000000 | i*j: >> 1.732051): 48.340859 >> ITER: 1 | i: 3 | j: 2 Result(i: 1.732051 | j: 1.414214 | i*j: >> 2.449490): 50.790349 >> ITER: 1 | i: 3 | j: 3 Result(i: 1.732051 | j: 1.732051 | i*j: >> 3.000000): 53.790349 >> ITER: 1 | i: 3 | j: 4 Result(i: 1.732051 | j: 2.000000 | i*j: >> 3.464102): 57.254450 >> ITER: 1 | i: 3 | j: 5 Result(i: 1.732051 | j: 2.236068 | i*j: >> 3.872983): 61.127434 >> ITER: 1 | i: 3 | j: 6 Result(i: 1.732051 | j: 2.449490 | i*j: >> 4.242641): 65.370075 >> ITER: 1 | i: 3 | j: 7 Result(i: 1.732051 | j: 2.645751 | i*j: >> 4.582576): 69.952650 >> ITER: 1 | i: 3 | j: 8 Result(i: 1.732051 | j: 2.828427 | i*j: >> 4.898979): 74.851630 >> ITER: 1 | i: 3 | j: 9 Result(i: 1.732051 | j: 3.000000 | i*j: >> 5.196152): 80.047782 >> ITER: 1 | i: 4 | j: 0 Result(i: 2.000000 | j: 0.000000 | i*j: >> 0.000000): 80.047782 >> ITER: 1 | i: 4 | j: 1 Result(i: 2.000000 | j: 1.000000 | i*j: >> 2.000000): 82.047782 >> ITER: 1 | i: 4 | j: 2 Result(i: 2.000000 | j: 1.414214 | i*j: >> 2.828427): 84.876209 >> ITER: 1 | i: 4 | j: 3 Result(i: 2.000000 | j: 1.732051 | i*j: >> 3.464102): 88.340311 >> ITER: 1 | i: 4 | j: 4 Result(i: 2.000000 | j: 2.000000 | i*j: >> 4.000000): 92.340311 >> ITER: 1 | i: 4 | j: 5 Result(i: 2.000000 | j: 2.236068 | i*j: >> 4.472136): 96.812447 >> ITER: 1 | i: 4 | j: 6 Result(i: 2.000000 | j: 2.449490 | i*j: >> 4.898979): 101.711426 >> ITER: 1 | i: 4 | j: 7 Result(i: 2.000000 | j: 2.645751 | i*j: >> 5.291503): 107.002929 >> ITER: 1 | i: 4 | j: 8 Result(i: 2.000000 | j: 2.828427 | i*j: >> 5.656854): 112.659783 >> ITER: 1 | i: 4 | j: 9 Result(i: 2.000000 | j: 3.000000 | i*j: >> 6.000000): 118.659783 >> ITER: 1 | i: 5 | j: 0 Result(i: 2.236068 | j: 0.000000 | i*j: >> 0.000000): 118.659783 >> ITER: 1 | i: 5 | j: 1 Result(i: 2.236068 | j: 1.000000 | i*j: >> 2.236068): 120.895851 >> ITER: 1 | i: 5 | j: 2 Result(i: 2.236068 | j: 1.414214 | i*j: >> 3.162278): 124.058129 >> ITER: 1 | i: 5 | j: 3 Result(i: 2.236068 | j: 1.732051 | i*j: >> 3.872983): 127.931112 >> ITER: 1 | i: 5 | j: 4 Result(i: 2.236068 | j: 2.000000 | i*j: >> 4.472136): 132.403248 >> ITER: 1 | i: 5 | j: 5 Result(i: 2.236068 | j: 2.236068 | i*j: >> 5.000000): 137.403248 >> ITER: 1 | i: 5 | j: 6 Result(i: 2.236068 | j: 2.449490 | i*j: >> 5.477226): 142.880474 >> ITER: 1 | i: 5 | j: 7 Result(i: 2.236068 | j: 2.645751 | i*j: >> 5.916080): 148.796553 >> ITER: 1 | i: 5 | j: 8 Result(i: 2.236068 | j: 2.828427 | i*j: >> 6.324555): 155.121109 >> ITER: 1 | i: 5 | j: 9 Result(i: 2.236068 | j: 3.000000 | i*j: >> 6.708204): 161.829313 >> ITER: 1 | i: 6 | j: 0 Result(i: 2.449490 | j: 0.000000 | i*j: >> 0.000000): 161.829313 >> ITER: 1 | i: 6 | j: 1 Result(i: 2.449490 | j: 1.000000 | i*j: >> 2.449490): 164.278802 >> ITER: 1 | i: 6 | j: 2 Result(i: 2.449490 | j: 1.414214 | i*j: >> 3.464102): 167.742904 >> ITER: 1 | i: 6 | j: 3 Result(i: 2.449490 | j: 1.732051 | i*j: >> 4.242641): 171.985545 >> ITER: 1 | i: 6 | j: 4 Result(i: 2.449490 | j: 2.000000 | i*j: >> 4.898979): 176.884524 >> ITER: 1 | i: 6 | j: 5 Result(i: 2.449490 | j: 2.236068 | i*j: >> 5.477226): 182.361750 >> ITER: 1 | i: 6 | j: 6 Result(i: 2.449490 | j: 2.449490 | i*j: >> 6.000000): 188.361750 >> ITER: 1 | i: 6 | j: 7 Result(i: 2.449490 | j: 2.645751 | i*j: >> 6.480741): 194.842491 >> ITER: 1 | i: 6 | j: 8 Result(i: 2.449490 | j: 2.828427 | i*j: >> 6.928203): 201.770694 >> ITER: 1 | i: 6 | j: 9 Result(i: 2.449490 | j: 3.000000 | i*j: >> 7.348469): 209.119163 >> ITER: 1 | i: 7 | j: 0 Result(i: 2.645751 | j: 0.000000 | i*j: >> 0.000000): 209.119163 >> ITER: 1 | i: 7 | j: 1 Result(i: 2.645751 | j: 1.000000 | i*j: >> 2.645751): 211.764914 >> ITER: 1 | i: 7 | j: 2 Result(i: 2.645751 | j: 1.414214 | i*j: >> 3.741657): 215.506572 >> ITER: 1 | i: 7 | j: 3 Result(i: 2.645751 | j: 1.732051 | i*j: >> 4.582576): 220.089147 >> ITER: 1 | i: 7 | j: 4 Result(i: 2.645751 | j: 2.000000 | i*j: >> 5.291503): 225.380650 >> ITER: 1 | i: 7 | j: 5 Result(i: 2.645751 | j: 2.236068 | i*j: >> 5.916080): 231.296730 >> ITER: 1 | i: 7 | j: 6 Result(i: 2.645751 | j: 2.449490 | i*j: >> 6.480741): 237.777470 >> ITER: 1 | i: 7 | j: 7 Result(i: 2.645751 | j: 2.645751 | i*j: >> 7.000000): 244.777470 >> ITER: 1 | i: 7 | j: 8 Result(i: 2.645751 | j: 2.828427 | i*j: >> 7.483315): 252.260785 >> ITER: 1 | i: 7 | j: 9 Result(i: 2.645751 | j: 3.000000 | i*j: >> 7.937254): 260.198039 >> ITER: 1 | i: 8 | j: 0 Result(i: 2.828427 | j: 0.000000 | i*j: >> 0.000000): 260.198039 >> ITER: 1 | i: 8 | j: 1 Result(i: 2.828427 | j: 1.000000 | i*j: >> 2.828427): 263.026466 >> ITER: 1 | i: 8 | j: 2 Result(i: 2.828427 | j: 1.414214 | i*j: >> 4.000000): 267.026466 >> ITER: 1 | i: 8 | j: 3 Result(i: 2.828427 | j: 1.732051 | i*j: >> 4.898979): 271.925446 >> ITER: 1 | i: 8 | j: 4 Result(i: 2.828427 | j: 2.000000 | i*j: >> 5.656854): 277.582300 >> ITER: 1 | i: 8 | j: 5 Result(i: 2.828427 | j: 2.236068 | i*j: >> 6.324555): 283.906855 >> ITER: 1 | i: 8 | j: 6 Result(i: 2.828427 | j: 2.449490 | i*j: >> 6.928203): 290.835059 >> ITER: 1 | i: 8 | j: 7 Result(i: 2.828427 | j: 2.645751 | i*j: >> 7.483315): 298.318373 >> ITER: 1 | i: 8 | j: 8 Result(i: 2.828427 | j: 2.828427 | i*j: >> 8.000000): 306.318373 >> ITER: 1 | i: 8 | j: 9 Result(i: 2.828427 | j: 3.000000 | i*j: >> 8.485281): 314.803655 >> ITER: 1 | i: 9 | j: 0 Result(i: 3.000000 | j: 0.000000 | i*j: >> 0.000000): 314.803655 >> ITER: 1 | i: 9 | j: 1 Result(i: 3.000000 | j: 1.000000 | i*j: >> 3.000000): 317.803655 >> ITER: 1 | i: 9 | j: 2 Result(i: 3.000000 | j: 1.414214 | i*j: >> 4.242641): 322.046295 >> ITER: 1 | i: 9 | j: 3 Result(i: 3.000000 | j: 1.732051 | i*j: >> 5.196152): 327.242448 >> ITER: 1 | i: 9 | j: 4 Result(i: 3.000000 | j: 2.000000 | i*j: >> 6.000000): 333.242448 >> ITER: 1 | i: 9 | j: 5 Result(i: 3.000000 | j: 2.236068 | i*j: >> 6.708204): 339.950652 >> ITER: 1 | i: 9 | j: 6 Result(i: 3.000000 | j: 2.449490 | i*j: >> 7.348469): 347.299121 >> ITER: 1 | i: 9 | j: 7 Result(i: 3.000000 | j: 2.645751 | i*j: >> 7.937254): 355.236375 >> ITER: 1 | i: 9 | j: 8 Result(i: 3.000000 | j: 2.828427 | i*j: >> 8.485281): 363.721656 >> ITER: 1 | i: 9 | j: 9 Result(i: 3.000000 | j: 3.000000 | i*j: >> 9.000000): 372.721656 >> Final Result: 372.721656 >> >> >> >> As we can see in the following iterations the sqrt(1) as well as the >> result is set to zero for some reason. >> >> ITER: 0 | i: 1 | j: 4 Result(i: 0.000000 | j: 2.000000 | i*j: >> 0.000000): 0.000000 >> ITER: 0 | i: 1 | j: 5 Result(i: 0.000000 | j: 2.236068 | i*j: >> 0.000000): 0.000000 >> ITER: 0 | i: 1 | j: 6 Result(i: 0.000000 | j: 2.449490 | i*j: >> 0.000000): 0.000000 >> ITER: 0 | i: 1 | j: 7 Result(i: 0.000000 | j: 2.645751 | i*j: >> 0.000000): 0.000000 >> ITER: 0 | i: 1 | j: 8 Result(i: 0.000000 | j: 2.828427 | i*j: >> 0.000000): 0.000000 >> ITER: 0 | i: 1 | j: 9 Result(i: 0.000000 | j: 3.000000 | i*j: >> 0.000000): 0.000000 >> >> Please help me to resolve the accuracy issue! I think that it will be >> very useful for gem5 community. >> >> To be noticed, I find the correct simulated tick in which the >> application started in FS (using m5 dumpstats), and I start the >> --debug-start, but the trace file which is generated is 10x larger >> than SE mode for the same application. How can I compare them? >> >> Thank you in advance! >> Best regards, >> Nikos >> >> Quoting Νικόλαος Ταμπουρατζής <ntampouratzis@ece.auth.gr>: >> >> > Dear Jason, >> > >> > I am trying to use --debug-start but in FS mode it is very difficult >> > to find the tick on which the application is started! >> > >> > However, I am writing the following very simple c++ program: >> > >> > #include <cmath> >> > #include <stdio.h> >> > >> > int main(){ >> > >> > int dim = 4096; >> > >> > double result; >> > >> > for (int iter = 0; iter < 2; iter++){ >> > result = 0; >> > for (int i = 0; i < dim; i++){ >> > for (int j = 0; j < dim; j++){ >> > result += sqrt(i) * sqrt(j); >> > } >> > } >> > printf("Result: %lf\n", result); //Result: 30530733453.127449 >> > } >> > } >> > >> > I cross-compile it using: riscv64-linux-gnu-g++ -static -O3 -o >> > test_riscv test_riscv.cpp >> > >> > >> > While in X86 (without cross-compilation of course), QEMU-RISCV, >> > GEM5-SE the result is the same (30530733453.127449), in GEM5-FS the >> > result is different! In addition, the result is also different >> > between the 2 iterations. >> > >> > Please reproduce the error if you want in order to verify my result. >> > Ηow can the issue be resolved? >> > >> > Thank you in advance! >> > >> > Best regards, >> > Nikos >> > >> > >> > Quoting Jason Lowe-Power <jason@lowepower.com>: >> > >> >> Hi Nikos, >> >> >> >> You can use --debug-start to start the debugging after some number of >> >> ticks. Also, I would expect that the difference should come up quickly, >> so >> >> no need to run the program to the end. >> >> >> >> For the FS mode one, you will want to just start the trace as the >> >> application starts. This could be a bit of a pain. >> >> >> >> I'm not really sure what fundamentally could be different. FS and SE >> mode >> >> use the exact same code for executing instructions, so I don't think >> that's >> >> the problem. Have you tried running for smaller inputs or just one >> >> iteration? >> >> >> >> Jason >> >> >> >> >> >> >> >> On Wed, Sep 21, 2022 at 9:04 AM Νικόλαος Ταμπουρατζής < >> >> ntampouratzis@ece.auth.gr> wrote: >> >> >> >>> Dear Bobby, >> >>> >> >>> Iam trying to add --debug-flags=Exec (building the gem5 for gem5.opt >> >>> not for gem5.fast which I had) but the debug traces exceed the 20GB >> >>> (and it is not finished yet) for less than 1 simulated second. How can >> >>> I reduce the size of the debug-flags (or set something more specific)? >> >>> >> >>> In contrast I build the HPCG benchmark with DHPCG_DEBUG flag. If you >> >>> want, you can compare these two output files >> >>> (hpcg20010909T014640_SE_Mode & HPCG-Benchmark_3.1_FS_Mode). As you can >> >>> see, something goes wrong with the accuracy of calculations in FS mode >> >>> (benchmark uses double precission). You can find the files here: >> >>> http://kition.mhl.tuc.gr:8000/d/68d82f3533/ >> >>> >> >>> Best regards, >> >>> Nikos >> >>> >> >>> Quoting Jason Lowe-Power <jason@lowepower.com>: >> >>> >> >>>> That's quite odd that it works in SE mode but not FS mode! >> >>>> >> >>>> I would suggest running with --debug-flags=Exec for both and then >> >>> perform a >> >>>> diff to see how they differ. >> >>>> >> >>>> Cheers, >> >>>> Jason >> >>>> >> >>>> On Tue, Sep 20, 2022 at 2:45 PM Νικόλαος Ταμπουρατζής < >> >>>> ntampouratzis@ece.auth.gr> wrote: >> >>>> >> >>>>> Dear Bobby, >> >>>>> >> >>>>> In QEMU I get the same (correct) results that I get in SE mode >> >>>>> simulation. I get invalid results in FS simulation (in both >> >>>>> riscv-fs.py and riscv-ubuntu-run.py). I cannot access real RISCV >> >>>>> hardware at this moment, however, if you want you may execute my >> xhpcg >> >>>>> binary (http://kition.mhl.tuc.gr:8000/f/4ca25fdd3c/) with the >> >>>>> following configuration: >> >>>>> >> >>>>> ./xhpcg --nx=16 --ny=16 --nz=16 --npx=1 --npy=1 --npz=1 --rt=0.1 >> >>>>> >> >>>>> Please let me know if you have any updates! >> >>>>> >> >>>>> Best regards, >> >>>>> Nikos >> >>>>> >> >>>>> >> >>>>> Quoting Jason Lowe-Power <jason@lowepower.com>: >> >>>>> >> >>>>> > Hi Nikos, >> >>>>> > >> >>>>> > I notice you said the following in your original email: >> >>>>> > >> >>>>> > In addition, I used the RISCV Ubuntu image >> >>>>> >> ( >> https://github.com/gem5/gem5-resources/tree/stable/src/riscv-ubuntu >> >>> ), >> >>>>> >> I installed the gcc compiler, compile it (through qemu) and I get >> >>>>> >> wrong results too. >> >>>>> > >> >>>>> > >> >>>>> > Is this saying you get the wrong results is QEMU? If so, the bug >> is in >> >>>>> GCC >> >>>>> > or the HPCG workload, not in gem5. If not, I would test in QEMU to >> >>> make >> >>>>> > sure the binary works there. Another way you could test to see if >> the >> >>>>> > problem is your binary or gem5 would be to run it on real >> hardware. We >> >>>>> have >> >>>>> > access to some RISC-V hardware here at UC Davis, if you don't have >> >>> access >> >>>>> > to it. >> >>>>> > >> >>>>> > Cheers, >> >>>>> > Jason >> >>>>> > >> >>>>> > On Tue, Sep 20, 2022 at 12:58 AM Νικόλαος Ταμπουρατζής < >> >>>>> > ntampouratzis@ece.auth.gr> wrote: >> >>>>> > >> >>>>> >> Dear Bobby, >> >>>>> >> >> >>>>> >> 1) I use the original riscv-fs.py which is provided in the latest >> >>> gem5 >> >>>>> >> release. >> >>>>> >> I run the gem5 once (./build/RISCV/gem5.fast -d ./HPCG_FS_results >> >>>>> >> ./configs/example/gem5_library/riscv-fs.py) in order to download >> the >> >>>>> >> riscv-bootloader-vmlinux-5.10 and riscv-disk-img. >> >>>>> >> After this I mount the riscv-disk-img (sudo mount -o loop >> >>>>> >> riscv-disk-img /mnt), put the xhpcg executable and I do the >> following >> >>>>> >> changes in riscv-fs.py to boot the riscv-disk-img with executable: >> >>>>> >> >> >>>>> >> image = CustomDiskImageResource( >> >>>>> >> local_path = "/home/cossim/.cache/gem5/riscv-disk-img", >> >>>>> >> ) >> >>>>> >> >> >>>>> >> # Set the Full System workload. >> >>>>> >> board.set_kernel_disk_workload( >> >>>>> >> >> kernel=Resource("riscv-bootloader-vmlinux-5.10"), >> >>>>> >> disk_image=image, >> >>>>> >> ) >> >>>>> >> >> >>>>> >> Finally, in the >> gem5/src/python/gem5/components/boards/riscv_board.py >> >>>>> >> I change the last line to "return ["console=ttyS0", >> >>>>> >> "root={root_value}", "rw"]" in order to allow the write >> permissions >> >>> in >> >>>>> >> the image. >> >>>>> >> >> >>>>> >> >> >>>>> >> 2) The HPCG benchmark after some iterations calculates if the >> results >> >>>>> >> are valid or not valid. In the case of FS it gives invalid >> results. >> >>> As >> >>>>> >> I see from the results, one (at least) problem is that produces >> >>>>> >> different results in each HPCG execution (with the same >> >>> configuration). >> >>>>> >> >> >>>>> >> Here is the HPCG output and riscv-fs.py >> >>>>> >> (http://kition.mhl.tuc.gr:8000/d/68d82f3533/). You may reproduce >> the >> >>>>> >> results in the video if you use the xhpcg executable >> >>>>> >> (http://kition.mhl.tuc.gr:8000/f/4ca25fdd3c/) >> >>>>> >> >> >>>>> >> Please help me in order to solve it! >> >>>>> >> >> >>>>> >> Finally, I get invalid results in the HPL benchmark in FS mode >> too. >> >>>>> >> >> >>>>> >> Best regards, >> >>>>> >> Nikos >> >>>>> >> >> >>>>> >> >> >>>>> >> Quoting Bobby Bruce <bbruce@ucdavis.edu>: >> >>>>> >> >> >>>>> >> > I'm going to need a bit more information to help: >> >>>>> >> > >> >>>>> >> > 1. In what way have you modified >> >>>>> >> > ./configs/example/gem5_library/riscv-fs.py? Can you attach the >> >>> script >> >>>>> >> here? >> >>>>> >> > 2. What error are you getting or in what way are the results >> >>> invalid? >> >>>>> >> > >> >>>>> >> > - >> >>>>> >> > Dr. Bobby R. Bruce >> >>>>> >> > Room 3050, >> >>>>> >> > Kemper Hall, UC Davis >> >>>>> >> > Davis, >> >>>>> >> > CA, 95616 >> >>>>> >> > >> >>>>> >> > web: https://www.bobbybruce.net >> >>>>> >> > >> >>>>> >> > >> >>>>> >> > On Mon, Sep 19, 2022 at 1:43 PM Νικόλαος Ταμπουρατζής < >> >>>>> >> > ntampouratzis@ece.auth.gr> wrote: >> >>>>> >> > >> >>>>> >> >> >> >>>>> >> >> Dear gem5 community, >> >>>>> >> >> >> >>>>> >> >> I have successfully cross-compile the HPCG benchmark for RISCV >> >>>>> (Serial >> >>>>> >> >> version, without MPI and OpenMP). While it working properly in >> >>> gem5 >> >>>>> SE >> >>>>> >> >> mode (./build/RISCV/gem5.fast -d ./HPCG_SE_results >> >>>>> >> >> ./configs/example/se.py -c xhpcg --options '--nx=16 --ny=16 >> >>> --nz=16 >> >>>>> >> >> --npx=1 --npy=1 --npz=1 --rt=0.1'), I get invalid results in FS >> >>>>> >> >> simulation using "./build/RISCV/gem5.fast -d ./HPCG_FS_results >> >>>>> >> >> ./configs/example/gem5_library/riscv-fs.py" (I mount the riscv >> >>> image >> >>>>> >> >> and put it). >> >>>>> >> >> >> >>>>> >> >> Can you help me please? >> >>>>> >> >> >> >>>>> >> >> In addition, I used the RISCV Ubuntu image >> >>>>> >> >> ( >> >>> https://github.com/gem5/gem5-resources/tree/stable/src/riscv-ubuntu >> >>>>> ), >> >>>>> >> >> I installed the gcc compiler, compile it (through qemu) and I >> get >> >>>>> >> >> wrong results too. >> >>>>> >> >> >> >>>>> >> >> Here is the Makefile which I use, the hpcg executable for RISCV >> >>>>> >> >> (xhpcg), and a video that shows the results >> >>>>> >> >> (http://kition.mhl.tuc.gr:8000/f/4ca25fdd3c/). >> >>>>> >> >> >> >>>>> >> >> P.S. I use the latest gem5 version. >> >>>>> >> >> >> >>>>> >> >> Thank you in advance! :) >> >>>>> >> >> >> >>>>> >> >> Best regards, >> >>>>> >> >> Nikos >> >>>>> >> >> _______________________________________________ >> >>>>> >> >> gem5-users mailing list -- gem5-users@gem5.org >> >>>>> >> >> To unsubscribe send an email to gem5-users-leave@gem5.org >> >>>>> >> >> >> >>>>> >> >> >>>>> >> >> >>>>> >> _______________________________________________ >> >>>>> >> gem5-users mailing list -- gem5-users@gem5.org >> >>>>> >> To unsubscribe send an email to gem5-users-leave@gem5.org >> >>>>> >> >> >>>>> >> >>>>> >> >>>>> _______________________________________________ >> >>>>> gem5-users mailing list -- gem5-users@gem5.org >> >>>>> To unsubscribe send an email to gem5-users-leave@gem5.org >> >>>>> >> >>> >> >>> >> >>> _______________________________________________ >> >>> gem5-users mailing list -- gem5-users@gem5.org >> >>> To unsubscribe send an email to gem5-users-leave@gem5.org >> >>> >> > >> > >> > _______________________________________________ >> > gem5-users mailing list -- gem5-users@gem5.org >> > To unsubscribe send an email to gem5-users-leave@gem5.org >> >> >> _______________________________________________ >> gem5-users mailing list -- gem5-users@gem5.org >> To unsubscribe send an email to gem5-users-leave@gem5.org >>
BB
Bobby Bruce
Wed, Oct 12, 2022 6:33 PM

Jason and I had a theory that this may be due to the "Rounding Mode" for
floating pointing being set incorrectly in FS mode. That's set via a macro
here:
https://gem5.googlesource.com/public/gem5/+/refs/tags/v22.0.0.2/src/arch/riscv/fp_inst.hh#36

I manually expanded the macro here:
https://gem5.googlesource.com/public/gem5/+/refs/tags/v22.0.0.2/src/arch/riscv/isa/decoder.isa#1495,
inside the "fsqrt_d" definition then compiled "build/ALL/gem5.debug". Then
used gdb to add a breakpoint in the "Fsqrt_d::execute" function (in the
generated "build/ALL/arch/riscv/generated/exec-ns.cc.inc" file).

gdb build/ALL/gem5.opt
break Fsqrt_d::execute
run bug-recreation/se-mode-run.py # or `run bug-recreation/fs-mode-run.py`

Stepping through with gdb I the rounding mode is 0 for SE mode and 0
for FS mode as well. So, no luck with that theory.

My new theory is that this bug has something to do with thread context
switching being implemented incorrectly in RISC-V somehow. I find it
strange that the sqrt(1) works fine for a while (i.e. returns 1) then
suddenly starts returning zero after a certain point in the execution. In
addition, it's odd that the loop is not returning the same value each time
despite executing the same code. It'd make sense to me that the thread is
being stored and then resumed with some corruption of the floating point
data. This would also explain why this bug only occurs in FS mode.

I'll try to find time to figure out a good test for this. If anyone has any
other theories or ideas then let me know.

--
Dr. Bobby R. Bruce
Room 3050,
Kemper Hall, UC Davis
Davis,
CA, 95616

web: https://www.bobbybruce.net

On Fri, Oct 7, 2022 at 12:50 PM Νικόλαος Ταμπουρατζής <
ntampouratzis@ece.auth.gr> wrote:

Dear Jason & Boddy,

Unfortunately, I have tried my simple example without the sqrt
function and the problem remains. Specifically, I have the following
simple code:

#include <cmath>
#include <stdio.h>

int main(){

  int dim = 1024;

  double result;

  for (int iter = 0; iter < 2; iter++){
      result = 0;
      for (int i = 0; i < dim; i++){
          for (int j = 0; j < dim; j++){
              result += i * j;
          }
      }
      printf("Final Result: %lf\n", result);
  }

}

In the above code, the correct result is 274341298176.000000 (from
RISCV-SE mode and x86), while in FS mode I get sometimes the correct
result and other times a different number.

Best regards,
Nikos

Quoting Jason Lowe-Power jason@lowepower.com:

I have an idea...

Have you put a breakpoint in the implementation of the fsqrt_d

function? I

would like to know if when running in SE mode and running in FS mode we

are

using the same rounding mode. My hypothesis is that in FS mode the

rounding

mode is set differently.

Cheers,
Jason

On Fri, Oct 7, 2022 at 12:15 AM Νικόλαος Ταμπουρατζής <
ntampouratzis@ece.auth.gr> wrote:

Dear Boddy,

Thanks a lot for the effort! I looked in detail and I observe that the
problem is created only using float and double variables (in the case
of int it is working properly in FS mode). Specifically, in the case
of float the variables are set to "nan", while in the case of double
the variables are set to 0.000000 (in random time - probably from some
instruction of simulated OS?). You may use a simple c/c++ example in
order to get some traces before going to HPCG...

Thank you in advance!!
Best regards,
Nikos

Quoting Bobby Bruce bbruce@ucdavis.edu:

Hey Niko,

Thanks for this analysis. I jumped a little into this today but

didn't

get

as far as you did. I wanted to find a quick way to recreate the

following:

feel

free to use this, if it helps any.

It's very strange to me that this bug hasn't manifested itself

before but

it's undeniably there. I'll try to spend more time looking at this

tomorrow

with some traces and debug flags and see if I can narrow down the

problem.

--
Dr. Bobby R. Bruce
Room 3050,
Kemper Hall, UC Davis
Davis,
CA, 95616

web: https://www.bobbybruce.net

On Wed, Oct 5, 2022 at 2:26 PM Νικόλαος Ταμπουρατζής <
ntampouratzis@ece.auth.gr> wrote:

In my previous results, I had used double (not float) for the
following variables: result, sq_i and sq_j. In the case of float
instead of double I get "nan" and not 0.000000.

Quoting Νικόλαος Ταμπουρατζής ntampouratzis@ece.auth.gr:

Dear Jason, all,

I am trying to find the accuracy problem with RISCV-FS and I

observe

that the problem is created (at least in my dummy example) because
the variables (double) are set to zero in random simulated time

(for

this reason I get different results among executions of the same
code). Specifically for the following dummy code:

#include <cmath>
#include <stdio.h>

int main(){

 int dim = 10;

 float result;

 for (int iter = 0; iter < 2; iter++){
     result = 0;
     for (int i = 0; i < dim; i++){
         for (int j = 0; j < dim; j++){
             float sq_i = sqrt(i);
             float sq_j = sqrt(j);
             result += sq_i * sq_j;
             printf("ITER: %d | i: %d | j: %d Result(i: %f | j:

%f | i*j: %f): %f\n", iter, i , j, sq_i, sq_j, sq_i * sq_j,

result);

         }
     }
     printf("Final Result: %lf\n", result);
 }

}

The correct Final Result in both iterations is 372.721656.

However,

I get the following results in FS:

ITER: 0 | i: 0 | j: 0 Result(i: 0.000000 | j: 0.000000 | ij:
0.000000): 0.000000
ITER: 0 | i: 0 | j: 1 Result(i: 0.000000 | j: 1.000000 | i
j:
0.000000): 0.000000
ITER: 0 | i: 0 | j: 2 Result(i: 0.000000 | j: 1.414214 | ij:
0.000000): 0.000000
ITER: 0 | i: 0 | j: 3 Result(i: 0.000000 | j: 1.732051 | i
j:
0.000000): 0.000000
ITER: 0 | i: 0 | j: 4 Result(i: 0.000000 | j: 2.000000 | ij:
0.000000): 0.000000
ITER: 0 | i: 0 | j: 5 Result(i: 0.000000 | j: 2.236068 | i
j:
0.000000): 0.000000
ITER: 0 | i: 0 | j: 6 Result(i: 0.000000 | j: 2.449490 | ij:
0.000000): 0.000000
ITER: 0 | i: 0 | j: 7 Result(i: 0.000000 | j: 2.645751 | i
j:
0.000000): 0.000000
ITER: 0 | i: 0 | j: 8 Result(i: 0.000000 | j: 2.828427 | ij:
0.000000): 0.000000
ITER: 0 | i: 0 | j: 9 Result(i: 0.000000 | j: 3.000000 | i
j:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 0 Result(i: 1.000000 | j: 0.000000 | ij:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 1 Result(i: 1.000000 | j: 1.000000 | i
j:
1.000000): 1.000000
ITER: 0 | i: 1 | j: 2 Result(i: 1.000000 | j: 1.414214 | ij:
1.414214): 2.414214
ITER: 0 | i: 1 | j: 3 Result(i: 1.000000 | j: 1.732051 | i
j:
1.732051): 4.146264
ITER: 0 | i: 1 | j: 4 Result(i: 0.000000 | j: 2.000000 | ij:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 5 Result(i: 0.000000 | j: 2.236068 | i
j:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 6 Result(i: 0.000000 | j: 2.449490 | ij:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 7 Result(i: 0.000000 | j: 2.645751 | i
j:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 8 Result(i: 0.000000 | j: 2.828427 | ij:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 9 Result(i: 0.000000 | j: 3.000000 | i
j:
0.000000): 0.000000
ITER: 0 | i: 2 | j: 0 Result(i: 1.414214 | j: 0.000000 | ij:
0.000000): 0.000000
ITER: 0 | i: 2 | j: 1 Result(i: 1.414214 | j: 1.000000 | i
j:
1.414214): 1.414214
ITER: 0 | i: 2 | j: 2 Result(i: 1.414214 | j: 1.414214 | ij:
2.000000): 3.414214
ITER: 0 | i: 2 | j: 3 Result(i: 1.414214 | j: 1.732051 | i
j:
2.449490): 5.863703
ITER: 0 | i: 2 | j: 4 Result(i: 1.414214 | j: 2.000000 | ij:
2.828427): 8.692130
ITER: 0 | i: 2 | j: 5 Result(i: 1.414214 | j: 2.236068 | i
j:
3.162278): 11.854408
ITER: 0 | i: 2 | j: 6 Result(i: 1.414214 | j: 2.449490 | ij:
3.464102): 15.318510
ITER: 0 | i: 2 | j: 7 Result(i: 1.414214 | j: 2.645751 | i
j:
3.741657): 19.060167
ITER: 0 | i: 2 | j: 8 Result(i: 1.414214 | j: 2.828427 | ij:
4.000000): 23.060167
ITER: 0 | i: 2 | j: 9 Result(i: 1.414214 | j: 3.000000 | i
j:
4.242641): 27.302808
ITER: 0 | i: 3 | j: 0 Result(i: 1.732051 | j: 0.000000 | ij:
0.000000): 27.302808
ITER: 0 | i: 3 | j: 1 Result(i: 1.732051 | j: 1.000000 | i
j:
1.732051): 29.034859
ITER: 0 | i: 3 | j: 2 Result(i: 1.732051 | j: 1.414214 | ij:
2.449490): 31.484348
ITER: 0 | i: 3 | j: 3 Result(i: 1.732051 | j: 1.732051 | i
j:
3.000000): 34.484348
ITER: 0 | i: 3 | j: 4 Result(i: 1.732051 | j: 2.000000 | ij:
3.464102): 37.948450
ITER: 0 | i: 3 | j: 5 Result(i: 1.732051 | j: 2.236068 | i
j:
3.872983): 41.821433
ITER: 0 | i: 3 | j: 6 Result(i: 1.732051 | j: 2.449490 | ij:
4.242641): 46.064074
ITER: 0 | i: 3 | j: 7 Result(i: 1.732051 | j: 2.645751 | i
j:
4.582576): 50.646650
ITER: 0 | i: 3 | j: 8 Result(i: 1.732051 | j: 2.828427 | ij:
4.898979): 55.545629
ITER: 0 | i: 3 | j: 9 Result(i: 1.732051 | j: 3.000000 | i
j:
5.196152): 60.741782
ITER: 0 | i: 4 | j: 0 Result(i: 2.000000 | j: 0.000000 | ij:
0.000000): 60.741782
ITER: 0 | i: 4 | j: 1 Result(i: 2.000000 | j: 1.000000 | i
j:
2.000000): 62.741782
ITER: 0 | i: 4 | j: 2 Result(i: 2.000000 | j: 1.414214 | ij:
2.828427): 65.570209
ITER: 0 | i: 4 | j: 3 Result(i: 2.000000 | j: 1.732051 | i
j:
3.464102): 69.034310
ITER: 0 | i: 4 | j: 4 Result(i: 2.000000 | j: 2.000000 | ij:
4.000000): 73.034310
ITER: 0 | i: 4 | j: 5 Result(i: 2.000000 | j: 2.236068 | i
j:
4.472136): 77.506446
ITER: 0 | i: 4 | j: 6 Result(i: 2.000000 | j: 2.449490 | ij:
4.898979): 82.405426
ITER: 0 | i: 4 | j: 7 Result(i: 2.000000 | j: 2.645751 | i
j:
5.291503): 87.696928
ITER: 0 | i: 4 | j: 8 Result(i: 2.000000 | j: 2.828427 | ij:
5.656854): 93.353783
ITER: 0 | i: 4 | j: 9 Result(i: 2.000000 | j: 3.000000 | i
j:
6.000000): 99.353783
ITER: 0 | i: 5 | j: 0 Result(i: 2.236068 | j: 0.000000 | ij:
0.000000): 99.353783
ITER: 0 | i: 5 | j: 1 Result(i: 2.236068 | j: 1.000000 | i
j:
2.236068): 101.589851
ITER: 0 | i: 5 | j: 2 Result(i: 2.236068 | j: 1.414214 | ij:
3.162278): 104.752128
ITER: 0 | i: 5 | j: 3 Result(i: 2.236068 | j: 1.732051 | i
j:
3.872983): 108.625112
ITER: 0 | i: 5 | j: 4 Result(i: 2.236068 | j: 2.000000 | ij:
4.472136): 113.097248
ITER: 0 | i: 5 | j: 5 Result(i: 2.236068 | j: 2.236068 | i
j:
5.000000): 118.097248
ITER: 0 | i: 5 | j: 6 Result(i: 2.236068 | j: 2.449490 | ij:
5.477226): 123.574473
ITER: 0 | i: 5 | j: 7 Result(i: 2.236068 | j: 2.645751 | i
j:
5.916080): 129.490553
ITER: 0 | i: 5 | j: 8 Result(i: 2.236068 | j: 2.828427 | ij:
6.324555): 135.815108
ITER: 0 | i: 5 | j: 9 Result(i: 2.236068 | j: 3.000000 | i
j:
6.708204): 142.523312
ITER: 0 | i: 6 | j: 0 Result(i: 2.449490 | j: 0.000000 | ij:
0.000000): 142.523312
ITER: 0 | i: 6 | j: 1 Result(i: 2.449490 | j: 1.000000 | i
j:
2.449490): 144.972802
ITER: 0 | i: 6 | j: 2 Result(i: 2.449490 | j: 1.414214 | ij:
3.464102): 148.436904
ITER: 0 | i: 6 | j: 3 Result(i: 2.449490 | j: 1.732051 | i
j:
4.242641): 152.679544
ITER: 0 | i: 6 | j: 4 Result(i: 2.449490 | j: 2.000000 | ij:
4.898979): 157.578524
ITER: 0 | i: 6 | j: 5 Result(i: 2.449490 | j: 2.236068 | i
j:
5.477226): 163.055749
ITER: 0 | i: 6 | j: 6 Result(i: 2.449490 | j: 2.449490 | ij:
6.000000): 169.055749
ITER: 0 | i: 6 | j: 7 Result(i: 2.449490 | j: 2.645751 | i
j:
6.480741): 175.536490
ITER: 0 | i: 6 | j: 8 Result(i: 2.449490 | j: 2.828427 | ij:
6.928203): 182.464693
ITER: 0 | i: 6 | j: 9 Result(i: 2.449490 | j: 3.000000 | i
j:
7.348469): 189.813162
ITER: 0 | i: 7 | j: 0 Result(i: 2.645751 | j: 0.000000 | ij:
0.000000): 189.813162
ITER: 0 | i: 7 | j: 1 Result(i: 2.645751 | j: 1.000000 | i
j:
2.645751): 192.458914
ITER: 0 | i: 7 | j: 2 Result(i: 2.645751 | j: 1.414214 | ij:
3.741657): 196.200571
ITER: 0 | i: 7 | j: 3 Result(i: 2.645751 | j: 1.732051 | i
j:
4.582576): 200.783147
ITER: 0 | i: 7 | j: 4 Result(i: 2.645751 | j: 2.000000 | ij:
5.291503): 206.074649
ITER: 0 | i: 7 | j: 5 Result(i: 2.645751 | j: 2.236068 | i
j:
5.916080): 211.990729
ITER: 0 | i: 7 | j: 6 Result(i: 2.645751 | j: 2.449490 | ij:
6.480741): 218.471470
ITER: 0 | i: 7 | j: 7 Result(i: 2.645751 | j: 2.645751 | i
j:
7.000000): 225.471470
ITER: 0 | i: 7 | j: 8 Result(i: 2.645751 | j: 2.828427 | ij:
7.483315): 232.954785
ITER: 0 | i: 7 | j: 9 Result(i: 2.645751 | j: 3.000000 | i
j:
7.937254): 240.892039
ITER: 0 | i: 8 | j: 0 Result(i: 2.828427 | j: 0.000000 | ij:
0.000000): 240.892039
ITER: 0 | i: 8 | j: 1 Result(i: 2.828427 | j: 1.000000 | i
j:
2.828427): 243.720466
ITER: 0 | i: 8 | j: 2 Result(i: 2.828427 | j: 1.414214 | ij:
4.000000): 247.720466
ITER: 0 | i: 8 | j: 3 Result(i: 2.828427 | j: 1.732051 | i
j:
4.898979): 252.619445
ITER: 0 | i: 8 | j: 4 Result(i: 2.828427 | j: 2.000000 | ij:
5.656854): 258.276300
ITER: 0 | i: 8 | j: 5 Result(i: 2.828427 | j: 2.236068 | i
j:
6.324555): 264.600855
ITER: 0 | i: 8 | j: 6 Result(i: 2.828427 | j: 2.449490 | ij:
6.928203): 271.529058
ITER: 0 | i: 8 | j: 7 Result(i: 2.828427 | j: 2.645751 | i
j:
7.483315): 279.012373
ITER: 0 | i: 8 | j: 8 Result(i: 2.828427 | j: 2.828427 | ij:
8.000000): 287.012373
ITER: 0 | i: 8 | j: 9 Result(i: 2.828427 | j: 3.000000 | i
j:
8.485281): 295.497654
ITER: 0 | i: 9 | j: 0 Result(i: 3.000000 | j: 0.000000 | ij:
0.000000): 295.497654
ITER: 0 | i: 9 | j: 1 Result(i: 3.000000 | j: 1.000000 | i
j:
3.000000): 298.497654
ITER: 0 | i: 9 | j: 2 Result(i: 3.000000 | j: 1.414214 | ij:
4.242641): 302.740295
ITER: 0 | i: 9 | j: 3 Result(i: 3.000000 | j: 1.732051 | i
j:
5.196152): 307.936447
ITER: 0 | i: 9 | j: 4 Result(i: 3.000000 | j: 2.000000 | ij:
6.000000): 313.936447
ITER: 0 | i: 9 | j: 5 Result(i: 3.000000 | j: 2.236068 | i
j:
6.708204): 320.644651
ITER: 0 | i: 9 | j: 6 Result(i: 3.000000 | j: 2.449490 | ij:
7.348469): 327.993120
ITER: 0 | i: 9 | j: 7 Result(i: 3.000000 | j: 2.645751 | i
j:
7.937254): 335.930374
ITER: 0 | i: 9 | j: 8 Result(i: 3.000000 | j: 2.828427 | ij:
8.485281): 344.415656
ITER: 0 | i: 9 | j: 9 Result(i: 3.000000 | j: 3.000000 | i
j:
9.000000): 353.415656
Final Result: 353.415656
ITER: 1 | i: 0 | j: 0 Result(i: 0.000000 | j: 0.000000 | ij:
0.000000): 0.000000
ITER: 1 | i: 0 | j: 1 Result(i: 0.000000 | j: 1.000000 | i
j:
0.000000): 0.000000
ITER: 1 | i: 0 | j: 2 Result(i: 0.000000 | j: 1.414214 | ij:
0.000000): 0.000000
ITER: 1 | i: 0 | j: 3 Result(i: 0.000000 | j: 1.732051 | i
j:
0.000000): 0.000000
ITER: 1 | i: 0 | j: 4 Result(i: 0.000000 | j: 2.000000 | ij:
0.000000): 0.000000
ITER: 1 | i: 0 | j: 5 Result(i: 0.000000 | j: 2.236068 | i
j:
0.000000): 0.000000
ITER: 1 | i: 0 | j: 6 Result(i: 0.000000 | j: 2.449490 | ij:
0.000000): 0.000000
ITER: 1 | i: 0 | j: 7 Result(i: 0.000000 | j: 2.645751 | i
j:
0.000000): 0.000000
ITER: 1 | i: 0 | j: 8 Result(i: 0.000000 | j: 2.828427 | ij:
0.000000): 0.000000
ITER: 1 | i: 0 | j: 9 Result(i: 0.000000 | j: 3.000000 | i
j:
0.000000): 0.000000
ITER: 1 | i: 1 | j: 0 Result(i: 1.000000 | j: 0.000000 | ij:
0.000000): 0.000000
ITER: 1 | i: 1 | j: 1 Result(i: 1.000000 | j: 1.000000 | i
j:
1.000000): 1.000000
ITER: 1 | i: 1 | j: 2 Result(i: 1.000000 | j: 1.414214 | ij:
1.414214): 2.414214
ITER: 1 | i: 1 | j: 3 Result(i: 1.000000 | j: 1.732051 | i
j:
1.732051): 4.146264
ITER: 1 | i: 1 | j: 4 Result(i: 1.000000 | j: 2.000000 | ij:
2.000000): 6.146264
ITER: 1 | i: 1 | j: 5 Result(i: 1.000000 | j: 2.236068 | i
j:
2.236068): 8.382332
ITER: 1 | i: 1 | j: 6 Result(i: 1.000000 | j: 2.449490 | ij:
2.449490): 10.831822
ITER: 1 | i: 1 | j: 7 Result(i: 1.000000 | j: 2.645751 | i
j:
2.645751): 13.477573
ITER: 1 | i: 1 | j: 8 Result(i: 1.000000 | j: 2.828427 | ij:
2.828427): 16.306001
ITER: 1 | i: 1 | j: 9 Result(i: 1.000000 | j: 3.000000 | i
j:
3.000000): 19.306001
ITER: 1 | i: 2 | j: 0 Result(i: 1.414214 | j: 0.000000 | ij:
0.000000): 19.306001
ITER: 1 | i: 2 | j: 1 Result(i: 1.414214 | j: 1.000000 | i
j:
1.414214): 20.720214
ITER: 1 | i: 2 | j: 2 Result(i: 1.414214 | j: 1.414214 | ij:
2.000000): 22.720214
ITER: 1 | i: 2 | j: 3 Result(i: 1.414214 | j: 1.732051 | i
j:
2.449490): 25.169704
ITER: 1 | i: 2 | j: 4 Result(i: 1.414214 | j: 2.000000 | ij:
2.828427): 27.998131
ITER: 1 | i: 2 | j: 5 Result(i: 1.414214 | j: 2.236068 | i
j:
3.162278): 31.160409
ITER: 1 | i: 2 | j: 6 Result(i: 1.414214 | j: 2.449490 | ij:
3.464102): 34.624510
ITER: 1 | i: 2 | j: 7 Result(i: 1.414214 | j: 2.645751 | i
j:
3.741657): 38.366168
ITER: 1 | i: 2 | j: 8 Result(i: 1.414214 | j: 2.828427 | ij:
4.000000): 42.366168
ITER: 1 | i: 2 | j: 9 Result(i: 1.414214 | j: 3.000000 | i
j:
4.242641): 46.608808
ITER: 1 | i: 3 | j: 0 Result(i: 1.732051 | j: 0.000000 | ij:
0.000000): 46.608808
ITER: 1 | i: 3 | j: 1 Result(i: 1.732051 | j: 1.000000 | i
j:
1.732051): 48.340859
ITER: 1 | i: 3 | j: 2 Result(i: 1.732051 | j: 1.414214 | ij:
2.449490): 50.790349
ITER: 1 | i: 3 | j: 3 Result(i: 1.732051 | j: 1.732051 | i
j:
3.000000): 53.790349
ITER: 1 | i: 3 | j: 4 Result(i: 1.732051 | j: 2.000000 | ij:
3.464102): 57.254450
ITER: 1 | i: 3 | j: 5 Result(i: 1.732051 | j: 2.236068 | i
j:
3.872983): 61.127434
ITER: 1 | i: 3 | j: 6 Result(i: 1.732051 | j: 2.449490 | ij:
4.242641): 65.370075
ITER: 1 | i: 3 | j: 7 Result(i: 1.732051 | j: 2.645751 | i
j:
4.582576): 69.952650
ITER: 1 | i: 3 | j: 8 Result(i: 1.732051 | j: 2.828427 | ij:
4.898979): 74.851630
ITER: 1 | i: 3 | j: 9 Result(i: 1.732051 | j: 3.000000 | i
j:
5.196152): 80.047782
ITER: 1 | i: 4 | j: 0 Result(i: 2.000000 | j: 0.000000 | ij:
0.000000): 80.047782
ITER: 1 | i: 4 | j: 1 Result(i: 2.000000 | j: 1.000000 | i
j:
2.000000): 82.047782
ITER: 1 | i: 4 | j: 2 Result(i: 2.000000 | j: 1.414214 | ij:
2.828427): 84.876209
ITER: 1 | i: 4 | j: 3 Result(i: 2.000000 | j: 1.732051 | i
j:
3.464102): 88.340311
ITER: 1 | i: 4 | j: 4 Result(i: 2.000000 | j: 2.000000 | ij:
4.000000): 92.340311
ITER: 1 | i: 4 | j: 5 Result(i: 2.000000 | j: 2.236068 | i
j:
4.472136): 96.812447
ITER: 1 | i: 4 | j: 6 Result(i: 2.000000 | j: 2.449490 | ij:
4.898979): 101.711426
ITER: 1 | i: 4 | j: 7 Result(i: 2.000000 | j: 2.645751 | i
j:
5.291503): 107.002929
ITER: 1 | i: 4 | j: 8 Result(i: 2.000000 | j: 2.828427 | ij:
5.656854): 112.659783
ITER: 1 | i: 4 | j: 9 Result(i: 2.000000 | j: 3.000000 | i
j:
6.000000): 118.659783
ITER: 1 | i: 5 | j: 0 Result(i: 2.236068 | j: 0.000000 | ij:
0.000000): 118.659783
ITER: 1 | i: 5 | j: 1 Result(i: 2.236068 | j: 1.000000 | i
j:
2.236068): 120.895851
ITER: 1 | i: 5 | j: 2 Result(i: 2.236068 | j: 1.414214 | ij:
3.162278): 124.058129
ITER: 1 | i: 5 | j: 3 Result(i: 2.236068 | j: 1.732051 | i
j:
3.872983): 127.931112
ITER: 1 | i: 5 | j: 4 Result(i: 2.236068 | j: 2.000000 | ij:
4.472136): 132.403248
ITER: 1 | i: 5 | j: 5 Result(i: 2.236068 | j: 2.236068 | i
j:
5.000000): 137.403248
ITER: 1 | i: 5 | j: 6 Result(i: 2.236068 | j: 2.449490 | ij:
5.477226): 142.880474
ITER: 1 | i: 5 | j: 7 Result(i: 2.236068 | j: 2.645751 | i
j:
5.916080): 148.796553
ITER: 1 | i: 5 | j: 8 Result(i: 2.236068 | j: 2.828427 | ij:
6.324555): 155.121109
ITER: 1 | i: 5 | j: 9 Result(i: 2.236068 | j: 3.000000 | i
j:
6.708204): 161.829313
ITER: 1 | i: 6 | j: 0 Result(i: 2.449490 | j: 0.000000 | ij:
0.000000): 161.829313
ITER: 1 | i: 6 | j: 1 Result(i: 2.449490 | j: 1.000000 | i
j:
2.449490): 164.278802
ITER: 1 | i: 6 | j: 2 Result(i: 2.449490 | j: 1.414214 | ij:
3.464102): 167.742904
ITER: 1 | i: 6 | j: 3 Result(i: 2.449490 | j: 1.732051 | i
j:
4.242641): 171.985545
ITER: 1 | i: 6 | j: 4 Result(i: 2.449490 | j: 2.000000 | ij:
4.898979): 176.884524
ITER: 1 | i: 6 | j: 5 Result(i: 2.449490 | j: 2.236068 | i
j:
5.477226): 182.361750
ITER: 1 | i: 6 | j: 6 Result(i: 2.449490 | j: 2.449490 | ij:
6.000000): 188.361750
ITER: 1 | i: 6 | j: 7 Result(i: 2.449490 | j: 2.645751 | i
j:
6.480741): 194.842491
ITER: 1 | i: 6 | j: 8 Result(i: 2.449490 | j: 2.828427 | ij:
6.928203): 201.770694
ITER: 1 | i: 6 | j: 9 Result(i: 2.449490 | j: 3.000000 | i
j:
7.348469): 209.119163
ITER: 1 | i: 7 | j: 0 Result(i: 2.645751 | j: 0.000000 | ij:
0.000000): 209.119163
ITER: 1 | i: 7 | j: 1 Result(i: 2.645751 | j: 1.000000 | i
j:
2.645751): 211.764914
ITER: 1 | i: 7 | j: 2 Result(i: 2.645751 | j: 1.414214 | ij:
3.741657): 215.506572
ITER: 1 | i: 7 | j: 3 Result(i: 2.645751 | j: 1.732051 | i
j:
4.582576): 220.089147
ITER: 1 | i: 7 | j: 4 Result(i: 2.645751 | j: 2.000000 | ij:
5.291503): 225.380650
ITER: 1 | i: 7 | j: 5 Result(i: 2.645751 | j: 2.236068 | i
j:
5.916080): 231.296730
ITER: 1 | i: 7 | j: 6 Result(i: 2.645751 | j: 2.449490 | ij:
6.480741): 237.777470
ITER: 1 | i: 7 | j: 7 Result(i: 2.645751 | j: 2.645751 | i
j:
7.000000): 244.777470
ITER: 1 | i: 7 | j: 8 Result(i: 2.645751 | j: 2.828427 | ij:
7.483315): 252.260785
ITER: 1 | i: 7 | j: 9 Result(i: 2.645751 | j: 3.000000 | i
j:
7.937254): 260.198039
ITER: 1 | i: 8 | j: 0 Result(i: 2.828427 | j: 0.000000 | ij:
0.000000): 260.198039
ITER: 1 | i: 8 | j: 1 Result(i: 2.828427 | j: 1.000000 | i
j:
2.828427): 263.026466
ITER: 1 | i: 8 | j: 2 Result(i: 2.828427 | j: 1.414214 | ij:
4.000000): 267.026466
ITER: 1 | i: 8 | j: 3 Result(i: 2.828427 | j: 1.732051 | i
j:
4.898979): 271.925446
ITER: 1 | i: 8 | j: 4 Result(i: 2.828427 | j: 2.000000 | ij:
5.656854): 277.582300
ITER: 1 | i: 8 | j: 5 Result(i: 2.828427 | j: 2.236068 | i
j:
6.324555): 283.906855
ITER: 1 | i: 8 | j: 6 Result(i: 2.828427 | j: 2.449490 | ij:
6.928203): 290.835059
ITER: 1 | i: 8 | j: 7 Result(i: 2.828427 | j: 2.645751 | i
j:
7.483315): 298.318373
ITER: 1 | i: 8 | j: 8 Result(i: 2.828427 | j: 2.828427 | ij:
8.000000): 306.318373
ITER: 1 | i: 8 | j: 9 Result(i: 2.828427 | j: 3.000000 | i
j:
8.485281): 314.803655
ITER: 1 | i: 9 | j: 0 Result(i: 3.000000 | j: 0.000000 | ij:
0.000000): 314.803655
ITER: 1 | i: 9 | j: 1 Result(i: 3.000000 | j: 1.000000 | i
j:
3.000000): 317.803655
ITER: 1 | i: 9 | j: 2 Result(i: 3.000000 | j: 1.414214 | ij:
4.242641): 322.046295
ITER: 1 | i: 9 | j: 3 Result(i: 3.000000 | j: 1.732051 | i
j:
5.196152): 327.242448
ITER: 1 | i: 9 | j: 4 Result(i: 3.000000 | j: 2.000000 | ij:
6.000000): 333.242448
ITER: 1 | i: 9 | j: 5 Result(i: 3.000000 | j: 2.236068 | i
j:
6.708204): 339.950652
ITER: 1 | i: 9 | j: 6 Result(i: 3.000000 | j: 2.449490 | ij:
7.348469): 347.299121
ITER: 1 | i: 9 | j: 7 Result(i: 3.000000 | j: 2.645751 | i
j:
7.937254): 355.236375
ITER: 1 | i: 9 | j: 8 Result(i: 3.000000 | j: 2.828427 | ij:
8.485281): 363.721656
ITER: 1 | i: 9 | j: 9 Result(i: 3.000000 | j: 3.000000 | i
j:
9.000000): 372.721656
Final Result: 372.721656

As we can see in the following iterations the sqrt(1) as well as

the

result is set to zero for some reason.

ITER: 0 | i: 1 | j: 4 Result(i: 0.000000 | j: 2.000000 | ij:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 5 Result(i: 0.000000 | j: 2.236068 | i
j:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 6 Result(i: 0.000000 | j: 2.449490 | ij:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 7 Result(i: 0.000000 | j: 2.645751 | i
j:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 8 Result(i: 0.000000 | j: 2.828427 | ij:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 9 Result(i: 0.000000 | j: 3.000000 | i
j:
0.000000): 0.000000

Please help me to resolve the accuracy issue! I think that it will
be very useful for gem5 community.

To be noticed, I find the correct simulated tick in which the
application started in FS (using m5 dumpstats), and I start the
--debug-start, but the trace file which is generated is 10x larger
than SE mode for the same application. How can I compare them?

Thank you in advance!
Best regards,
Nikos

Quoting Νικόλαος Ταμπουρατζής ntampouratzis@ece.auth.gr:

Dear Jason,

I am trying to use --debug-start but in FS mode it is very
difficult to find the tick on which the application is started!

However, I am writing the following very simple c++ program:

#include <cmath>
#include <stdio.h>

int main(){

int dim = 4096;

double result;

for (int iter = 0; iter < 2; iter++){
    result = 0;
    for (int i = 0; i < dim; i++){
        for (int j = 0; j < dim; j++){
            result += sqrt(i) * sqrt(j);
        }
    }
    printf("Result: %lf\n", result); //Result:

30530733453.127449

}

}

I cross-compile it using: riscv64-linux-gnu-g++ -static -O3 -o
test_riscv test_riscv.cpp

While in X86 (without cross-compilation of course), QEMU-RISCV,
GEM5-SE the result is the same (30530733453.127449), in GEM5-FS

the

result is different! In addition, the result is also different
between the 2 iterations.

Please reproduce the error if you want in order to verify my

result.

Ηow can the issue be resolved?

Thank you in advance!

Best regards,
Nikos

Quoting Jason Lowe-Power jason@lowepower.com:

Hi Nikos,

You can use --debug-start to start the debugging after some

number

of

ticks. Also, I would expect that the difference should come up

quickly, so

no need to run the program to the end.

For the FS mode one, you will want to just start the trace as

the

application starts. This could be a bit of a pain.

I'm not really sure what fundamentally could be different. FS

and SE

mode

use the exact same code for executing instructions, so I don't

think

that's

the problem. Have you tried running for smaller inputs or just

one

iteration?

Jason

On Wed, Sep 21, 2022 at 9:04 AM Νικόλαος Ταμπουρατζής <
ntampouratzis@ece.auth.gr> wrote:

Dear Bobby,

Iam trying to add --debug-flags=Exec (building the gem5 for

gem5.opt

not for gem5.fast which I had) but the debug traces exceed the

20GB

(and it is not finished yet) for less than 1 simulated second.

How

can

I reduce the size of the debug-flags (or set something more

specific)?

In contrast I build the HPCG benchmark with DHPCG_DEBUG flag.

If

you

want, you can compare these two output files
(hpcg20010909T014640_SE_Mode & HPCG-Benchmark_3.1_FS_Mode). As

you

can

see, something goes wrong with the accuracy of calculations in

FS

mode

(benchmark uses double precission). You can find the files

here:

http://kition.mhl.tuc.gr:8000/d/68d82f3533/

Best regards,
Nikos

Quoting Jason Lowe-Power jason@lowepower.com:

That's quite odd that it works in SE mode but not FS mode!

I would suggest running with --debug-flags=Exec for both and

then

perform a

diff to see how they differ.

Cheers,
Jason

On Tue, Sep 20, 2022 at 2:45 PM Νικόλαος Ταμπουρατζής <
ntampouratzis@ece.auth.gr> wrote:

Dear Bobby,

In QEMU I get the same (correct) results that I get in SE

mode

simulation. I get invalid results in FS simulation (in both
riscv-fs.py and riscv-ubuntu-run.py). I cannot access real

RISCV

hardware at this moment, however, if you want you may

execute my

xhpcg

binary (http://kition.mhl.tuc.gr:8000/f/4ca25fdd3c/) with the
following configuration:

./xhpcg --nx=16 --ny=16 --nz=16 --npx=1 --npy=1 --npz=1

--rt=0.1

Please let me know if you have any updates!

Best regards,
Nikos

Quoting Jason Lowe-Power jason@lowepower.com:

Hi Nikos,

I notice you said the following in your original email:

In addition, I used the RISCV Ubuntu image

(

),

I installed the gcc compiler, compile it (through qemu)

and I

get

wrong results too.

Is this saying you get the wrong results is QEMU? If so,

the bug

is in

GCC

or the HPCG workload, not in gem5. If not, I would test in

QEMU

to

make

sure the binary works there. Another way you could test to

see

if

the

problem is your binary or gem5 would be to run it on real

hardware. We

have

access to some RISC-V hardware here at UC Davis, if you

don't

have

access

to it.

Cheers,
Jason

On Tue, Sep 20, 2022 at 12:58 AM Νικόλαος Ταμπουρατζής <
ntampouratzis@ece.auth.gr> wrote:

Dear Bobby,

  1. I use the original riscv-fs.py which is provided in the

latest

gem5

release.
I run the gem5 once (./build/RISCV/gem5.fast -d

./HPCG_FS_results

./configs/example/gem5_library/riscv-fs.py) in order to

download

the

riscv-bootloader-vmlinux-5.10 and riscv-disk-img.
After this I mount the riscv-disk-img (sudo mount -o loop
riscv-disk-img /mnt), put the xhpcg executable and I do the

following

changes in riscv-fs.py to boot the riscv-disk-img with

executable:

image = CustomDiskImageResource(
local_path =

"/home/cossim/.cache/gem5/riscv-disk-img",

)

Set the Full System workload.

board.set_kernel_disk_workload(

kernel=Resource("riscv-bootloader-vmlinux-5.10"),

                 disk_image=image,

)

Finally, in the

gem5/src/python/gem5/components/boards/riscv_board.py

I change the last line to "return ["console=ttyS0",
"root={root_value}", "rw"]" in order to allow the write

permissions

in

the image.

  1. The HPCG benchmark after some iterations calculates if

the

results

are valid or not valid. In the case of FS it gives invalid

results.

As

I see from the results, one (at least) problem is that

produces

different results in each HPCG execution (with the same

configuration).

Here is the HPCG output and riscv-fs.py
(http://kition.mhl.tuc.gr:8000/d/68d82f3533/). You may

reproduce

the

results in the video if you use the xhpcg executable
(http://kition.mhl.tuc.gr:8000/f/4ca25fdd3c/)

Please help me in order to solve it!

Finally, I get invalid results in the HPL benchmark in FS

mode

too.

Best regards,
Nikos

Quoting Bobby Bruce bbruce@ucdavis.edu:

I'm going to need a bit more information to help:

  1. In what way have you modified
    ./configs/example/gem5_library/riscv-fs.py? Can you

attach

the

script

here?

  1. What error are you getting or in what way are the

results

invalid?

Dr. Bobby R. Bruce
Room 3050,
Kemper Hall, UC Davis
Davis,
CA, 95616

web: https://www.bobbybruce.net

On Mon, Sep 19, 2022 at 1:43 PM Νικόλαος Ταμπουρατζής <
ntampouratzis@ece.auth.gr> wrote:

Dear gem5 community,

I have successfully cross-compile the HPCG benchmark for

RISCV

(Serial

version, without MPI and OpenMP). While it working

properly

in

gem5

SE

mode (./build/RISCV/gem5.fast -d ./HPCG_SE_results
./configs/example/se.py -c xhpcg --options '--nx=16

--ny=16

--nz=16

--npx=1 --npy=1 --npz=1 --rt=0.1'), I get invalid

results

in FS

simulation using "./build/RISCV/gem5.fast -d

./HPCG_FS_results

./configs/example/gem5_library/riscv-fs.py" (I mount the

riscv

image

and put it).

Can you help me please?

In addition, I used the RISCV Ubuntu image
(

),

I installed the gcc compiler, compile it (through qemu)

and

I

get

wrong results too.

Here is the Makefile which I use, the hpcg executable

for

RISCV

(xhpcg), and a video that shows the results
(http://kition.mhl.tuc.gr:8000/f/4ca25fdd3c/).

P.S. I use the latest gem5 version.

Thank you in advance! :)

Best regards,
Nikos


gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to


gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-leave@gem5.org


gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-leave@gem5.org


gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-leave@gem5.org


gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-leave@gem5.org


gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-leave@gem5.org


gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-leave@gem5.org


gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-leave@gem5.org


gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-leave@gem5.org

Jason and I had a theory that this may be due to the "Rounding Mode" for floating pointing being set incorrectly in FS mode. That's set via a macro here: https://gem5.googlesource.com/public/gem5/+/refs/tags/v22.0.0.2/src/arch/riscv/fp_inst.hh#36 I manually expanded the macro here: https://gem5.googlesource.com/public/gem5/+/refs/tags/v22.0.0.2/src/arch/riscv/isa/decoder.isa#1495, inside the "fsqrt_d" definition then compiled "build/ALL/gem5.debug". Then used gdb to add a breakpoint in the "Fsqrt_d::execute" function (in the generated "build/ALL/arch/riscv/generated/exec-ns.cc.inc" file). ``` gdb build/ALL/gem5.opt break Fsqrt_d::execute run bug-recreation/se-mode-run.py # or `run bug-recreation/fs-mode-run.py` ``` Stepping through with gdb I the rounding mode is `0` for SE mode and `0` for FS mode as well. So, no luck with that theory. My new theory is that this bug has something to do with thread context switching being implemented incorrectly in RISC-V somehow. I find it strange that the sqrt(1) works fine for a while (i.e. returns `1`) then suddenly starts returning zero after a certain point in the execution. In addition, it's odd that the loop is not returning the same value each time despite executing the same code. It'd make sense to me that the thread is being stored and then resumed with some corruption of the floating point data. This would also explain why this bug only occurs in FS mode. I'll try to find time to figure out a good test for this. If anyone has any other theories or ideas then let me know. -- Dr. Bobby R. Bruce Room 3050, Kemper Hall, UC Davis Davis, CA, 95616 web: https://www.bobbybruce.net On Fri, Oct 7, 2022 at 12:50 PM Νικόλαος Ταμπουρατζής < ntampouratzis@ece.auth.gr> wrote: > > Dear Jason & Boddy, > > Unfortunately, I have tried my simple example without the sqrt > function and the problem remains. Specifically, I have the following > simple code: > > > #include <cmath> > #include <stdio.h> > > int main(){ > > int dim = 1024; > > double result; > > for (int iter = 0; iter < 2; iter++){ > result = 0; > for (int i = 0; i < dim; i++){ > for (int j = 0; j < dim; j++){ > result += i * j; > } > } > printf("Final Result: %lf\n", result); > } > } > > > In the above code, the correct result is 274341298176.000000 (from > RISCV-SE mode and x86), while in FS mode I get sometimes the correct > result and other times a different number. > > Best regards, > Nikos > > > Quoting Jason Lowe-Power <jason@lowepower.com>: > > > I have an idea... > > > > Have you put a breakpoint in the implementation of the fsqrt_d function? I > > would like to know if when running in SE mode and running in FS mode we are > > using the same rounding mode. My hypothesis is that in FS mode the rounding > > mode is set differently. > > > > Cheers, > > Jason > > > > On Fri, Oct 7, 2022 at 12:15 AM Νικόλαος Ταμπουρατζής < > > ntampouratzis@ece.auth.gr> wrote: > > > >> Dear Boddy, > >> > >> Thanks a lot for the effort! I looked in detail and I observe that the > >> problem is created only using float and double variables (in the case > >> of int it is working properly in FS mode). Specifically, in the case > >> of float the variables are set to "nan", while in the case of double > >> the variables are set to 0.000000 (in random time - probably from some > >> instruction of simulated OS?). You may use a simple c/c++ example in > >> order to get some traces before going to HPCG... > >> > >> Thank you in advance!! > >> Best regards, > >> Nikos > >> > >> > >> Quoting Bobby Bruce <bbruce@ucdavis.edu>: > >> > >> > Hey Niko, > >> > > >> > Thanks for this analysis. I jumped a little into this today but didn't > >> get > >> > as far as you did. I wanted to find a quick way to recreate the > >> following: > >> > https://gem5-review.googlesource.com/c/public/gem5/+/64211. Please feel > >> > free to use this, if it helps any. > >> > > >> > It's very strange to me that this bug hasn't manifested itself before but > >> > it's undeniably there. I'll try to spend more time looking at this > >> tomorrow > >> > with some traces and debug flags and see if I can narrow down the > >> problem. > >> > > >> > -- > >> > Dr. Bobby R. Bruce > >> > Room 3050, > >> > Kemper Hall, UC Davis > >> > Davis, > >> > CA, 95616 > >> > > >> > web: https://www.bobbybruce.net > >> > > >> > > >> > On Wed, Oct 5, 2022 at 2:26 PM Νικόλαος Ταμπουρατζής < > >> > ntampouratzis@ece.auth.gr> wrote: > >> > > >> >> In my previous results, I had used double (not float) for the > >> >> following variables: result, sq_i and sq_j. In the case of float > >> >> instead of double I get "nan" and not 0.000000. > >> >> > >> >> Quoting Νικόλαος Ταμπουρατζής <ntampouratzis@ece.auth.gr>: > >> >> > >> >> > Dear Jason, all, > >> >> > > >> >> > I am trying to find the accuracy problem with RISCV-FS and I observe > >> >> > that the problem is created (at least in my dummy example) because > >> >> > the variables (double) are set to zero in random simulated time (for > >> >> > this reason I get different results among executions of the same > >> >> > code). Specifically for the following dummy code: > >> >> > > >> >> > > >> >> > #include <cmath> > >> >> > #include <stdio.h> > >> >> > > >> >> > int main(){ > >> >> > > >> >> > int dim = 10; > >> >> > > >> >> > float result; > >> >> > > >> >> > for (int iter = 0; iter < 2; iter++){ > >> >> > result = 0; > >> >> > for (int i = 0; i < dim; i++){ > >> >> > for (int j = 0; j < dim; j++){ > >> >> > float sq_i = sqrt(i); > >> >> > float sq_j = sqrt(j); > >> >> > result += sq_i * sq_j; > >> >> > printf("ITER: %d | i: %d | j: %d Result(i: %f | j: > >> >> > %f | i*j: %f): %f\n", iter, i , j, sq_i, sq_j, sq_i * sq_j, result); > >> >> > } > >> >> > } > >> >> > printf("Final Result: %lf\n", result); > >> >> > } > >> >> > } > >> >> > > >> >> > > >> >> > The correct Final Result in both iterations is 372.721656. However, > >> >> > I get the following results in FS: > >> >> > > >> >> > ITER: 0 | i: 0 | j: 0 Result(i: 0.000000 | j: 0.000000 | i*j: > >> >> > 0.000000): 0.000000 > >> >> > ITER: 0 | i: 0 | j: 1 Result(i: 0.000000 | j: 1.000000 | i*j: > >> >> > 0.000000): 0.000000 > >> >> > ITER: 0 | i: 0 | j: 2 Result(i: 0.000000 | j: 1.414214 | i*j: > >> >> > 0.000000): 0.000000 > >> >> > ITER: 0 | i: 0 | j: 3 Result(i: 0.000000 | j: 1.732051 | i*j: > >> >> > 0.000000): 0.000000 > >> >> > ITER: 0 | i: 0 | j: 4 Result(i: 0.000000 | j: 2.000000 | i*j: > >> >> > 0.000000): 0.000000 > >> >> > ITER: 0 | i: 0 | j: 5 Result(i: 0.000000 | j: 2.236068 | i*j: > >> >> > 0.000000): 0.000000 > >> >> > ITER: 0 | i: 0 | j: 6 Result(i: 0.000000 | j: 2.449490 | i*j: > >> >> > 0.000000): 0.000000 > >> >> > ITER: 0 | i: 0 | j: 7 Result(i: 0.000000 | j: 2.645751 | i*j: > >> >> > 0.000000): 0.000000 > >> >> > ITER: 0 | i: 0 | j: 8 Result(i: 0.000000 | j: 2.828427 | i*j: > >> >> > 0.000000): 0.000000 > >> >> > ITER: 0 | i: 0 | j: 9 Result(i: 0.000000 | j: 3.000000 | i*j: > >> >> > 0.000000): 0.000000 > >> >> > ITER: 0 | i: 1 | j: 0 Result(i: 1.000000 | j: 0.000000 | i*j: > >> >> > 0.000000): 0.000000 > >> >> > ITER: 0 | i: 1 | j: 1 Result(i: 1.000000 | j: 1.000000 | i*j: > >> >> > 1.000000): 1.000000 > >> >> > ITER: 0 | i: 1 | j: 2 Result(i: 1.000000 | j: 1.414214 | i*j: > >> >> > 1.414214): 2.414214 > >> >> > ITER: 0 | i: 1 | j: 3 Result(i: 1.000000 | j: 1.732051 | i*j: > >> >> > 1.732051): 4.146264 > >> >> > ITER: 0 | i: 1 | j: 4 Result(i: 0.000000 | j: 2.000000 | i*j: > >> >> > 0.000000): 0.000000 > >> >> > ITER: 0 | i: 1 | j: 5 Result(i: 0.000000 | j: 2.236068 | i*j: > >> >> > 0.000000): 0.000000 > >> >> > ITER: 0 | i: 1 | j: 6 Result(i: 0.000000 | j: 2.449490 | i*j: > >> >> > 0.000000): 0.000000 > >> >> > ITER: 0 | i: 1 | j: 7 Result(i: 0.000000 | j: 2.645751 | i*j: > >> >> > 0.000000): 0.000000 > >> >> > ITER: 0 | i: 1 | j: 8 Result(i: 0.000000 | j: 2.828427 | i*j: > >> >> > 0.000000): 0.000000 > >> >> > ITER: 0 | i: 1 | j: 9 Result(i: 0.000000 | j: 3.000000 | i*j: > >> >> > 0.000000): 0.000000 > >> >> > ITER: 0 | i: 2 | j: 0 Result(i: 1.414214 | j: 0.000000 | i*j: > >> >> > 0.000000): 0.000000 > >> >> > ITER: 0 | i: 2 | j: 1 Result(i: 1.414214 | j: 1.000000 | i*j: > >> >> > 1.414214): 1.414214 > >> >> > ITER: 0 | i: 2 | j: 2 Result(i: 1.414214 | j: 1.414214 | i*j: > >> >> > 2.000000): 3.414214 > >> >> > ITER: 0 | i: 2 | j: 3 Result(i: 1.414214 | j: 1.732051 | i*j: > >> >> > 2.449490): 5.863703 > >> >> > ITER: 0 | i: 2 | j: 4 Result(i: 1.414214 | j: 2.000000 | i*j: > >> >> > 2.828427): 8.692130 > >> >> > ITER: 0 | i: 2 | j: 5 Result(i: 1.414214 | j: 2.236068 | i*j: > >> >> > 3.162278): 11.854408 > >> >> > ITER: 0 | i: 2 | j: 6 Result(i: 1.414214 | j: 2.449490 | i*j: > >> >> > 3.464102): 15.318510 > >> >> > ITER: 0 | i: 2 | j: 7 Result(i: 1.414214 | j: 2.645751 | i*j: > >> >> > 3.741657): 19.060167 > >> >> > ITER: 0 | i: 2 | j: 8 Result(i: 1.414214 | j: 2.828427 | i*j: > >> >> > 4.000000): 23.060167 > >> >> > ITER: 0 | i: 2 | j: 9 Result(i: 1.414214 | j: 3.000000 | i*j: > >> >> > 4.242641): 27.302808 > >> >> > ITER: 0 | i: 3 | j: 0 Result(i: 1.732051 | j: 0.000000 | i*j: > >> >> > 0.000000): 27.302808 > >> >> > ITER: 0 | i: 3 | j: 1 Result(i: 1.732051 | j: 1.000000 | i*j: > >> >> > 1.732051): 29.034859 > >> >> > ITER: 0 | i: 3 | j: 2 Result(i: 1.732051 | j: 1.414214 | i*j: > >> >> > 2.449490): 31.484348 > >> >> > ITER: 0 | i: 3 | j: 3 Result(i: 1.732051 | j: 1.732051 | i*j: > >> >> > 3.000000): 34.484348 > >> >> > ITER: 0 | i: 3 | j: 4 Result(i: 1.732051 | j: 2.000000 | i*j: > >> >> > 3.464102): 37.948450 > >> >> > ITER: 0 | i: 3 | j: 5 Result(i: 1.732051 | j: 2.236068 | i*j: > >> >> > 3.872983): 41.821433 > >> >> > ITER: 0 | i: 3 | j: 6 Result(i: 1.732051 | j: 2.449490 | i*j: > >> >> > 4.242641): 46.064074 > >> >> > ITER: 0 | i: 3 | j: 7 Result(i: 1.732051 | j: 2.645751 | i*j: > >> >> > 4.582576): 50.646650 > >> >> > ITER: 0 | i: 3 | j: 8 Result(i: 1.732051 | j: 2.828427 | i*j: > >> >> > 4.898979): 55.545629 > >> >> > ITER: 0 | i: 3 | j: 9 Result(i: 1.732051 | j: 3.000000 | i*j: > >> >> > 5.196152): 60.741782 > >> >> > ITER: 0 | i: 4 | j: 0 Result(i: 2.000000 | j: 0.000000 | i*j: > >> >> > 0.000000): 60.741782 > >> >> > ITER: 0 | i: 4 | j: 1 Result(i: 2.000000 | j: 1.000000 | i*j: > >> >> > 2.000000): 62.741782 > >> >> > ITER: 0 | i: 4 | j: 2 Result(i: 2.000000 | j: 1.414214 | i*j: > >> >> > 2.828427): 65.570209 > >> >> > ITER: 0 | i: 4 | j: 3 Result(i: 2.000000 | j: 1.732051 | i*j: > >> >> > 3.464102): 69.034310 > >> >> > ITER: 0 | i: 4 | j: 4 Result(i: 2.000000 | j: 2.000000 | i*j: > >> >> > 4.000000): 73.034310 > >> >> > ITER: 0 | i: 4 | j: 5 Result(i: 2.000000 | j: 2.236068 | i*j: > >> >> > 4.472136): 77.506446 > >> >> > ITER: 0 | i: 4 | j: 6 Result(i: 2.000000 | j: 2.449490 | i*j: > >> >> > 4.898979): 82.405426 > >> >> > ITER: 0 | i: 4 | j: 7 Result(i: 2.000000 | j: 2.645751 | i*j: > >> >> > 5.291503): 87.696928 > >> >> > ITER: 0 | i: 4 | j: 8 Result(i: 2.000000 | j: 2.828427 | i*j: > >> >> > 5.656854): 93.353783 > >> >> > ITER: 0 | i: 4 | j: 9 Result(i: 2.000000 | j: 3.000000 | i*j: > >> >> > 6.000000): 99.353783 > >> >> > ITER: 0 | i: 5 | j: 0 Result(i: 2.236068 | j: 0.000000 | i*j: > >> >> > 0.000000): 99.353783 > >> >> > ITER: 0 | i: 5 | j: 1 Result(i: 2.236068 | j: 1.000000 | i*j: > >> >> > 2.236068): 101.589851 > >> >> > ITER: 0 | i: 5 | j: 2 Result(i: 2.236068 | j: 1.414214 | i*j: > >> >> > 3.162278): 104.752128 > >> >> > ITER: 0 | i: 5 | j: 3 Result(i: 2.236068 | j: 1.732051 | i*j: > >> >> > 3.872983): 108.625112 > >> >> > ITER: 0 | i: 5 | j: 4 Result(i: 2.236068 | j: 2.000000 | i*j: > >> >> > 4.472136): 113.097248 > >> >> > ITER: 0 | i: 5 | j: 5 Result(i: 2.236068 | j: 2.236068 | i*j: > >> >> > 5.000000): 118.097248 > >> >> > ITER: 0 | i: 5 | j: 6 Result(i: 2.236068 | j: 2.449490 | i*j: > >> >> > 5.477226): 123.574473 > >> >> > ITER: 0 | i: 5 | j: 7 Result(i: 2.236068 | j: 2.645751 | i*j: > >> >> > 5.916080): 129.490553 > >> >> > ITER: 0 | i: 5 | j: 8 Result(i: 2.236068 | j: 2.828427 | i*j: > >> >> > 6.324555): 135.815108 > >> >> > ITER: 0 | i: 5 | j: 9 Result(i: 2.236068 | j: 3.000000 | i*j: > >> >> > 6.708204): 142.523312 > >> >> > ITER: 0 | i: 6 | j: 0 Result(i: 2.449490 | j: 0.000000 | i*j: > >> >> > 0.000000): 142.523312 > >> >> > ITER: 0 | i: 6 | j: 1 Result(i: 2.449490 | j: 1.000000 | i*j: > >> >> > 2.449490): 144.972802 > >> >> > ITER: 0 | i: 6 | j: 2 Result(i: 2.449490 | j: 1.414214 | i*j: > >> >> > 3.464102): 148.436904 > >> >> > ITER: 0 | i: 6 | j: 3 Result(i: 2.449490 | j: 1.732051 | i*j: > >> >> > 4.242641): 152.679544 > >> >> > ITER: 0 | i: 6 | j: 4 Result(i: 2.449490 | j: 2.000000 | i*j: > >> >> > 4.898979): 157.578524 > >> >> > ITER: 0 | i: 6 | j: 5 Result(i: 2.449490 | j: 2.236068 | i*j: > >> >> > 5.477226): 163.055749 > >> >> > ITER: 0 | i: 6 | j: 6 Result(i: 2.449490 | j: 2.449490 | i*j: > >> >> > 6.000000): 169.055749 > >> >> > ITER: 0 | i: 6 | j: 7 Result(i: 2.449490 | j: 2.645751 | i*j: > >> >> > 6.480741): 175.536490 > >> >> > ITER: 0 | i: 6 | j: 8 Result(i: 2.449490 | j: 2.828427 | i*j: > >> >> > 6.928203): 182.464693 > >> >> > ITER: 0 | i: 6 | j: 9 Result(i: 2.449490 | j: 3.000000 | i*j: > >> >> > 7.348469): 189.813162 > >> >> > ITER: 0 | i: 7 | j: 0 Result(i: 2.645751 | j: 0.000000 | i*j: > >> >> > 0.000000): 189.813162 > >> >> > ITER: 0 | i: 7 | j: 1 Result(i: 2.645751 | j: 1.000000 | i*j: > >> >> > 2.645751): 192.458914 > >> >> > ITER: 0 | i: 7 | j: 2 Result(i: 2.645751 | j: 1.414214 | i*j: > >> >> > 3.741657): 196.200571 > >> >> > ITER: 0 | i: 7 | j: 3 Result(i: 2.645751 | j: 1.732051 | i*j: > >> >> > 4.582576): 200.783147 > >> >> > ITER: 0 | i: 7 | j: 4 Result(i: 2.645751 | j: 2.000000 | i*j: > >> >> > 5.291503): 206.074649 > >> >> > ITER: 0 | i: 7 | j: 5 Result(i: 2.645751 | j: 2.236068 | i*j: > >> >> > 5.916080): 211.990729 > >> >> > ITER: 0 | i: 7 | j: 6 Result(i: 2.645751 | j: 2.449490 | i*j: > >> >> > 6.480741): 218.471470 > >> >> > ITER: 0 | i: 7 | j: 7 Result(i: 2.645751 | j: 2.645751 | i*j: > >> >> > 7.000000): 225.471470 > >> >> > ITER: 0 | i: 7 | j: 8 Result(i: 2.645751 | j: 2.828427 | i*j: > >> >> > 7.483315): 232.954785 > >> >> > ITER: 0 | i: 7 | j: 9 Result(i: 2.645751 | j: 3.000000 | i*j: > >> >> > 7.937254): 240.892039 > >> >> > ITER: 0 | i: 8 | j: 0 Result(i: 2.828427 | j: 0.000000 | i*j: > >> >> > 0.000000): 240.892039 > >> >> > ITER: 0 | i: 8 | j: 1 Result(i: 2.828427 | j: 1.000000 | i*j: > >> >> > 2.828427): 243.720466 > >> >> > ITER: 0 | i: 8 | j: 2 Result(i: 2.828427 | j: 1.414214 | i*j: > >> >> > 4.000000): 247.720466 > >> >> > ITER: 0 | i: 8 | j: 3 Result(i: 2.828427 | j: 1.732051 | i*j: > >> >> > 4.898979): 252.619445 > >> >> > ITER: 0 | i: 8 | j: 4 Result(i: 2.828427 | j: 2.000000 | i*j: > >> >> > 5.656854): 258.276300 > >> >> > ITER: 0 | i: 8 | j: 5 Result(i: 2.828427 | j: 2.236068 | i*j: > >> >> > 6.324555): 264.600855 > >> >> > ITER: 0 | i: 8 | j: 6 Result(i: 2.828427 | j: 2.449490 | i*j: > >> >> > 6.928203): 271.529058 > >> >> > ITER: 0 | i: 8 | j: 7 Result(i: 2.828427 | j: 2.645751 | i*j: > >> >> > 7.483315): 279.012373 > >> >> > ITER: 0 | i: 8 | j: 8 Result(i: 2.828427 | j: 2.828427 | i*j: > >> >> > 8.000000): 287.012373 > >> >> > ITER: 0 | i: 8 | j: 9 Result(i: 2.828427 | j: 3.000000 | i*j: > >> >> > 8.485281): 295.497654 > >> >> > ITER: 0 | i: 9 | j: 0 Result(i: 3.000000 | j: 0.000000 | i*j: > >> >> > 0.000000): 295.497654 > >> >> > ITER: 0 | i: 9 | j: 1 Result(i: 3.000000 | j: 1.000000 | i*j: > >> >> > 3.000000): 298.497654 > >> >> > ITER: 0 | i: 9 | j: 2 Result(i: 3.000000 | j: 1.414214 | i*j: > >> >> > 4.242641): 302.740295 > >> >> > ITER: 0 | i: 9 | j: 3 Result(i: 3.000000 | j: 1.732051 | i*j: > >> >> > 5.196152): 307.936447 > >> >> > ITER: 0 | i: 9 | j: 4 Result(i: 3.000000 | j: 2.000000 | i*j: > >> >> > 6.000000): 313.936447 > >> >> > ITER: 0 | i: 9 | j: 5 Result(i: 3.000000 | j: 2.236068 | i*j: > >> >> > 6.708204): 320.644651 > >> >> > ITER: 0 | i: 9 | j: 6 Result(i: 3.000000 | j: 2.449490 | i*j: > >> >> > 7.348469): 327.993120 > >> >> > ITER: 0 | i: 9 | j: 7 Result(i: 3.000000 | j: 2.645751 | i*j: > >> >> > 7.937254): 335.930374 > >> >> > ITER: 0 | i: 9 | j: 8 Result(i: 3.000000 | j: 2.828427 | i*j: > >> >> > 8.485281): 344.415656 > >> >> > ITER: 0 | i: 9 | j: 9 Result(i: 3.000000 | j: 3.000000 | i*j: > >> >> > 9.000000): 353.415656 > >> >> > Final Result: 353.415656 > >> >> > ITER: 1 | i: 0 | j: 0 Result(i: 0.000000 | j: 0.000000 | i*j: > >> >> > 0.000000): 0.000000 > >> >> > ITER: 1 | i: 0 | j: 1 Result(i: 0.000000 | j: 1.000000 | i*j: > >> >> > 0.000000): 0.000000 > >> >> > ITER: 1 | i: 0 | j: 2 Result(i: 0.000000 | j: 1.414214 | i*j: > >> >> > 0.000000): 0.000000 > >> >> > ITER: 1 | i: 0 | j: 3 Result(i: 0.000000 | j: 1.732051 | i*j: > >> >> > 0.000000): 0.000000 > >> >> > ITER: 1 | i: 0 | j: 4 Result(i: 0.000000 | j: 2.000000 | i*j: > >> >> > 0.000000): 0.000000 > >> >> > ITER: 1 | i: 0 | j: 5 Result(i: 0.000000 | j: 2.236068 | i*j: > >> >> > 0.000000): 0.000000 > >> >> > ITER: 1 | i: 0 | j: 6 Result(i: 0.000000 | j: 2.449490 | i*j: > >> >> > 0.000000): 0.000000 > >> >> > ITER: 1 | i: 0 | j: 7 Result(i: 0.000000 | j: 2.645751 | i*j: > >> >> > 0.000000): 0.000000 > >> >> > ITER: 1 | i: 0 | j: 8 Result(i: 0.000000 | j: 2.828427 | i*j: > >> >> > 0.000000): 0.000000 > >> >> > ITER: 1 | i: 0 | j: 9 Result(i: 0.000000 | j: 3.000000 | i*j: > >> >> > 0.000000): 0.000000 > >> >> > ITER: 1 | i: 1 | j: 0 Result(i: 1.000000 | j: 0.000000 | i*j: > >> >> > 0.000000): 0.000000 > >> >> > ITER: 1 | i: 1 | j: 1 Result(i: 1.000000 | j: 1.000000 | i*j: > >> >> > 1.000000): 1.000000 > >> >> > ITER: 1 | i: 1 | j: 2 Result(i: 1.000000 | j: 1.414214 | i*j: > >> >> > 1.414214): 2.414214 > >> >> > ITER: 1 | i: 1 | j: 3 Result(i: 1.000000 | j: 1.732051 | i*j: > >> >> > 1.732051): 4.146264 > >> >> > ITER: 1 | i: 1 | j: 4 Result(i: 1.000000 | j: 2.000000 | i*j: > >> >> > 2.000000): 6.146264 > >> >> > ITER: 1 | i: 1 | j: 5 Result(i: 1.000000 | j: 2.236068 | i*j: > >> >> > 2.236068): 8.382332 > >> >> > ITER: 1 | i: 1 | j: 6 Result(i: 1.000000 | j: 2.449490 | i*j: > >> >> > 2.449490): 10.831822 > >> >> > ITER: 1 | i: 1 | j: 7 Result(i: 1.000000 | j: 2.645751 | i*j: > >> >> > 2.645751): 13.477573 > >> >> > ITER: 1 | i: 1 | j: 8 Result(i: 1.000000 | j: 2.828427 | i*j: > >> >> > 2.828427): 16.306001 > >> >> > ITER: 1 | i: 1 | j: 9 Result(i: 1.000000 | j: 3.000000 | i*j: > >> >> > 3.000000): 19.306001 > >> >> > ITER: 1 | i: 2 | j: 0 Result(i: 1.414214 | j: 0.000000 | i*j: > >> >> > 0.000000): 19.306001 > >> >> > ITER: 1 | i: 2 | j: 1 Result(i: 1.414214 | j: 1.000000 | i*j: > >> >> > 1.414214): 20.720214 > >> >> > ITER: 1 | i: 2 | j: 2 Result(i: 1.414214 | j: 1.414214 | i*j: > >> >> > 2.000000): 22.720214 > >> >> > ITER: 1 | i: 2 | j: 3 Result(i: 1.414214 | j: 1.732051 | i*j: > >> >> > 2.449490): 25.169704 > >> >> > ITER: 1 | i: 2 | j: 4 Result(i: 1.414214 | j: 2.000000 | i*j: > >> >> > 2.828427): 27.998131 > >> >> > ITER: 1 | i: 2 | j: 5 Result(i: 1.414214 | j: 2.236068 | i*j: > >> >> > 3.162278): 31.160409 > >> >> > ITER: 1 | i: 2 | j: 6 Result(i: 1.414214 | j: 2.449490 | i*j: > >> >> > 3.464102): 34.624510 > >> >> > ITER: 1 | i: 2 | j: 7 Result(i: 1.414214 | j: 2.645751 | i*j: > >> >> > 3.741657): 38.366168 > >> >> > ITER: 1 | i: 2 | j: 8 Result(i: 1.414214 | j: 2.828427 | i*j: > >> >> > 4.000000): 42.366168 > >> >> > ITER: 1 | i: 2 | j: 9 Result(i: 1.414214 | j: 3.000000 | i*j: > >> >> > 4.242641): 46.608808 > >> >> > ITER: 1 | i: 3 | j: 0 Result(i: 1.732051 | j: 0.000000 | i*j: > >> >> > 0.000000): 46.608808 > >> >> > ITER: 1 | i: 3 | j: 1 Result(i: 1.732051 | j: 1.000000 | i*j: > >> >> > 1.732051): 48.340859 > >> >> > ITER: 1 | i: 3 | j: 2 Result(i: 1.732051 | j: 1.414214 | i*j: > >> >> > 2.449490): 50.790349 > >> >> > ITER: 1 | i: 3 | j: 3 Result(i: 1.732051 | j: 1.732051 | i*j: > >> >> > 3.000000): 53.790349 > >> >> > ITER: 1 | i: 3 | j: 4 Result(i: 1.732051 | j: 2.000000 | i*j: > >> >> > 3.464102): 57.254450 > >> >> > ITER: 1 | i: 3 | j: 5 Result(i: 1.732051 | j: 2.236068 | i*j: > >> >> > 3.872983): 61.127434 > >> >> > ITER: 1 | i: 3 | j: 6 Result(i: 1.732051 | j: 2.449490 | i*j: > >> >> > 4.242641): 65.370075 > >> >> > ITER: 1 | i: 3 | j: 7 Result(i: 1.732051 | j: 2.645751 | i*j: > >> >> > 4.582576): 69.952650 > >> >> > ITER: 1 | i: 3 | j: 8 Result(i: 1.732051 | j: 2.828427 | i*j: > >> >> > 4.898979): 74.851630 > >> >> > ITER: 1 | i: 3 | j: 9 Result(i: 1.732051 | j: 3.000000 | i*j: > >> >> > 5.196152): 80.047782 > >> >> > ITER: 1 | i: 4 | j: 0 Result(i: 2.000000 | j: 0.000000 | i*j: > >> >> > 0.000000): 80.047782 > >> >> > ITER: 1 | i: 4 | j: 1 Result(i: 2.000000 | j: 1.000000 | i*j: > >> >> > 2.000000): 82.047782 > >> >> > ITER: 1 | i: 4 | j: 2 Result(i: 2.000000 | j: 1.414214 | i*j: > >> >> > 2.828427): 84.876209 > >> >> > ITER: 1 | i: 4 | j: 3 Result(i: 2.000000 | j: 1.732051 | i*j: > >> >> > 3.464102): 88.340311 > >> >> > ITER: 1 | i: 4 | j: 4 Result(i: 2.000000 | j: 2.000000 | i*j: > >> >> > 4.000000): 92.340311 > >> >> > ITER: 1 | i: 4 | j: 5 Result(i: 2.000000 | j: 2.236068 | i*j: > >> >> > 4.472136): 96.812447 > >> >> > ITER: 1 | i: 4 | j: 6 Result(i: 2.000000 | j: 2.449490 | i*j: > >> >> > 4.898979): 101.711426 > >> >> > ITER: 1 | i: 4 | j: 7 Result(i: 2.000000 | j: 2.645751 | i*j: > >> >> > 5.291503): 107.002929 > >> >> > ITER: 1 | i: 4 | j: 8 Result(i: 2.000000 | j: 2.828427 | i*j: > >> >> > 5.656854): 112.659783 > >> >> > ITER: 1 | i: 4 | j: 9 Result(i: 2.000000 | j: 3.000000 | i*j: > >> >> > 6.000000): 118.659783 > >> >> > ITER: 1 | i: 5 | j: 0 Result(i: 2.236068 | j: 0.000000 | i*j: > >> >> > 0.000000): 118.659783 > >> >> > ITER: 1 | i: 5 | j: 1 Result(i: 2.236068 | j: 1.000000 | i*j: > >> >> > 2.236068): 120.895851 > >> >> > ITER: 1 | i: 5 | j: 2 Result(i: 2.236068 | j: 1.414214 | i*j: > >> >> > 3.162278): 124.058129 > >> >> > ITER: 1 | i: 5 | j: 3 Result(i: 2.236068 | j: 1.732051 | i*j: > >> >> > 3.872983): 127.931112 > >> >> > ITER: 1 | i: 5 | j: 4 Result(i: 2.236068 | j: 2.000000 | i*j: > >> >> > 4.472136): 132.403248 > >> >> > ITER: 1 | i: 5 | j: 5 Result(i: 2.236068 | j: 2.236068 | i*j: > >> >> > 5.000000): 137.403248 > >> >> > ITER: 1 | i: 5 | j: 6 Result(i: 2.236068 | j: 2.449490 | i*j: > >> >> > 5.477226): 142.880474 > >> >> > ITER: 1 | i: 5 | j: 7 Result(i: 2.236068 | j: 2.645751 | i*j: > >> >> > 5.916080): 148.796553 > >> >> > ITER: 1 | i: 5 | j: 8 Result(i: 2.236068 | j: 2.828427 | i*j: > >> >> > 6.324555): 155.121109 > >> >> > ITER: 1 | i: 5 | j: 9 Result(i: 2.236068 | j: 3.000000 | i*j: > >> >> > 6.708204): 161.829313 > >> >> > ITER: 1 | i: 6 | j: 0 Result(i: 2.449490 | j: 0.000000 | i*j: > >> >> > 0.000000): 161.829313 > >> >> > ITER: 1 | i: 6 | j: 1 Result(i: 2.449490 | j: 1.000000 | i*j: > >> >> > 2.449490): 164.278802 > >> >> > ITER: 1 | i: 6 | j: 2 Result(i: 2.449490 | j: 1.414214 | i*j: > >> >> > 3.464102): 167.742904 > >> >> > ITER: 1 | i: 6 | j: 3 Result(i: 2.449490 | j: 1.732051 | i*j: > >> >> > 4.242641): 171.985545 > >> >> > ITER: 1 | i: 6 | j: 4 Result(i: 2.449490 | j: 2.000000 | i*j: > >> >> > 4.898979): 176.884524 > >> >> > ITER: 1 | i: 6 | j: 5 Result(i: 2.449490 | j: 2.236068 | i*j: > >> >> > 5.477226): 182.361750 > >> >> > ITER: 1 | i: 6 | j: 6 Result(i: 2.449490 | j: 2.449490 | i*j: > >> >> > 6.000000): 188.361750 > >> >> > ITER: 1 | i: 6 | j: 7 Result(i: 2.449490 | j: 2.645751 | i*j: > >> >> > 6.480741): 194.842491 > >> >> > ITER: 1 | i: 6 | j: 8 Result(i: 2.449490 | j: 2.828427 | i*j: > >> >> > 6.928203): 201.770694 > >> >> > ITER: 1 | i: 6 | j: 9 Result(i: 2.449490 | j: 3.000000 | i*j: > >> >> > 7.348469): 209.119163 > >> >> > ITER: 1 | i: 7 | j: 0 Result(i: 2.645751 | j: 0.000000 | i*j: > >> >> > 0.000000): 209.119163 > >> >> > ITER: 1 | i: 7 | j: 1 Result(i: 2.645751 | j: 1.000000 | i*j: > >> >> > 2.645751): 211.764914 > >> >> > ITER: 1 | i: 7 | j: 2 Result(i: 2.645751 | j: 1.414214 | i*j: > >> >> > 3.741657): 215.506572 > >> >> > ITER: 1 | i: 7 | j: 3 Result(i: 2.645751 | j: 1.732051 | i*j: > >> >> > 4.582576): 220.089147 > >> >> > ITER: 1 | i: 7 | j: 4 Result(i: 2.645751 | j: 2.000000 | i*j: > >> >> > 5.291503): 225.380650 > >> >> > ITER: 1 | i: 7 | j: 5 Result(i: 2.645751 | j: 2.236068 | i*j: > >> >> > 5.916080): 231.296730 > >> >> > ITER: 1 | i: 7 | j: 6 Result(i: 2.645751 | j: 2.449490 | i*j: > >> >> > 6.480741): 237.777470 > >> >> > ITER: 1 | i: 7 | j: 7 Result(i: 2.645751 | j: 2.645751 | i*j: > >> >> > 7.000000): 244.777470 > >> >> > ITER: 1 | i: 7 | j: 8 Result(i: 2.645751 | j: 2.828427 | i*j: > >> >> > 7.483315): 252.260785 > >> >> > ITER: 1 | i: 7 | j: 9 Result(i: 2.645751 | j: 3.000000 | i*j: > >> >> > 7.937254): 260.198039 > >> >> > ITER: 1 | i: 8 | j: 0 Result(i: 2.828427 | j: 0.000000 | i*j: > >> >> > 0.000000): 260.198039 > >> >> > ITER: 1 | i: 8 | j: 1 Result(i: 2.828427 | j: 1.000000 | i*j: > >> >> > 2.828427): 263.026466 > >> >> > ITER: 1 | i: 8 | j: 2 Result(i: 2.828427 | j: 1.414214 | i*j: > >> >> > 4.000000): 267.026466 > >> >> > ITER: 1 | i: 8 | j: 3 Result(i: 2.828427 | j: 1.732051 | i*j: > >> >> > 4.898979): 271.925446 > >> >> > ITER: 1 | i: 8 | j: 4 Result(i: 2.828427 | j: 2.000000 | i*j: > >> >> > 5.656854): 277.582300 > >> >> > ITER: 1 | i: 8 | j: 5 Result(i: 2.828427 | j: 2.236068 | i*j: > >> >> > 6.324555): 283.906855 > >> >> > ITER: 1 | i: 8 | j: 6 Result(i: 2.828427 | j: 2.449490 | i*j: > >> >> > 6.928203): 290.835059 > >> >> > ITER: 1 | i: 8 | j: 7 Result(i: 2.828427 | j: 2.645751 | i*j: > >> >> > 7.483315): 298.318373 > >> >> > ITER: 1 | i: 8 | j: 8 Result(i: 2.828427 | j: 2.828427 | i*j: > >> >> > 8.000000): 306.318373 > >> >> > ITER: 1 | i: 8 | j: 9 Result(i: 2.828427 | j: 3.000000 | i*j: > >> >> > 8.485281): 314.803655 > >> >> > ITER: 1 | i: 9 | j: 0 Result(i: 3.000000 | j: 0.000000 | i*j: > >> >> > 0.000000): 314.803655 > >> >> > ITER: 1 | i: 9 | j: 1 Result(i: 3.000000 | j: 1.000000 | i*j: > >> >> > 3.000000): 317.803655 > >> >> > ITER: 1 | i: 9 | j: 2 Result(i: 3.000000 | j: 1.414214 | i*j: > >> >> > 4.242641): 322.046295 > >> >> > ITER: 1 | i: 9 | j: 3 Result(i: 3.000000 | j: 1.732051 | i*j: > >> >> > 5.196152): 327.242448 > >> >> > ITER: 1 | i: 9 | j: 4 Result(i: 3.000000 | j: 2.000000 | i*j: > >> >> > 6.000000): 333.242448 > >> >> > ITER: 1 | i: 9 | j: 5 Result(i: 3.000000 | j: 2.236068 | i*j: > >> >> > 6.708204): 339.950652 > >> >> > ITER: 1 | i: 9 | j: 6 Result(i: 3.000000 | j: 2.449490 | i*j: > >> >> > 7.348469): 347.299121 > >> >> > ITER: 1 | i: 9 | j: 7 Result(i: 3.000000 | j: 2.645751 | i*j: > >> >> > 7.937254): 355.236375 > >> >> > ITER: 1 | i: 9 | j: 8 Result(i: 3.000000 | j: 2.828427 | i*j: > >> >> > 8.485281): 363.721656 > >> >> > ITER: 1 | i: 9 | j: 9 Result(i: 3.000000 | j: 3.000000 | i*j: > >> >> > 9.000000): 372.721656 > >> >> > Final Result: 372.721656 > >> >> > > >> >> > > >> >> > > >> >> > As we can see in the following iterations the sqrt(1) as well as the > >> >> > result is set to zero for some reason. > >> >> > > >> >> > ITER: 0 | i: 1 | j: 4 Result(i: 0.000000 | j: 2.000000 | i*j: > >> >> > 0.000000): 0.000000 > >> >> > ITER: 0 | i: 1 | j: 5 Result(i: 0.000000 | j: 2.236068 | i*j: > >> >> > 0.000000): 0.000000 > >> >> > ITER: 0 | i: 1 | j: 6 Result(i: 0.000000 | j: 2.449490 | i*j: > >> >> > 0.000000): 0.000000 > >> >> > ITER: 0 | i: 1 | j: 7 Result(i: 0.000000 | j: 2.645751 | i*j: > >> >> > 0.000000): 0.000000 > >> >> > ITER: 0 | i: 1 | j: 8 Result(i: 0.000000 | j: 2.828427 | i*j: > >> >> > 0.000000): 0.000000 > >> >> > ITER: 0 | i: 1 | j: 9 Result(i: 0.000000 | j: 3.000000 | i*j: > >> >> > 0.000000): 0.000000 > >> >> > > >> >> > Please help me to resolve the accuracy issue! I think that it will > >> >> > be very useful for gem5 community. > >> >> > > >> >> > To be noticed, I find the correct simulated tick in which the > >> >> > application started in FS (using m5 dumpstats), and I start the > >> >> > --debug-start, but the trace file which is generated is 10x larger > >> >> > than SE mode for the same application. How can I compare them? > >> >> > > >> >> > Thank you in advance! > >> >> > Best regards, > >> >> > Nikos > >> >> > > >> >> > Quoting Νικόλαος Ταμπουρατζής <ntampouratzis@ece.auth.gr>: > >> >> > > >> >> >> Dear Jason, > >> >> >> > >> >> >> I am trying to use --debug-start but in FS mode it is very > >> >> >> difficult to find the tick on which the application is started! > >> >> >> > >> >> >> However, I am writing the following very simple c++ program: > >> >> >> > >> >> >> #include <cmath> > >> >> >> #include <stdio.h> > >> >> >> > >> >> >> int main(){ > >> >> >> > >> >> >> int dim = 4096; > >> >> >> > >> >> >> double result; > >> >> >> > >> >> >> for (int iter = 0; iter < 2; iter++){ > >> >> >> result = 0; > >> >> >> for (int i = 0; i < dim; i++){ > >> >> >> for (int j = 0; j < dim; j++){ > >> >> >> result += sqrt(i) * sqrt(j); > >> >> >> } > >> >> >> } > >> >> >> printf("Result: %lf\n", result); //Result: 30530733453.127449 > >> >> >> } > >> >> >> } > >> >> >> > >> >> >> I cross-compile it using: riscv64-linux-gnu-g++ -static -O3 -o > >> >> >> test_riscv test_riscv.cpp > >> >> >> > >> >> >> > >> >> >> While in X86 (without cross-compilation of course), QEMU-RISCV, > >> >> >> GEM5-SE the result is the same (30530733453.127449), in GEM5-FS the > >> >> >> result is different! In addition, the result is also different > >> >> >> between the 2 iterations. > >> >> >> > >> >> >> Please reproduce the error if you want in order to verify my result. > >> >> >> Ηow can the issue be resolved? > >> >> >> > >> >> >> Thank you in advance! > >> >> >> > >> >> >> Best regards, > >> >> >> Nikos > >> >> >> > >> >> >> > >> >> >> Quoting Jason Lowe-Power <jason@lowepower.com>: > >> >> >> > >> >> >>> Hi Nikos, > >> >> >>> > >> >> >>> You can use --debug-start to start the debugging after some number > >> of > >> >> >>> ticks. Also, I would expect that the difference should come up > >> >> quickly, so > >> >> >>> no need to run the program to the end. > >> >> >>> > >> >> >>> For the FS mode one, you will want to just start the trace as the > >> >> >>> application starts. This could be a bit of a pain. > >> >> >>> > >> >> >>> I'm not really sure what fundamentally could be different. FS and SE > >> >> mode > >> >> >>> use the exact same code for executing instructions, so I don't think > >> >> that's > >> >> >>> the problem. Have you tried running for smaller inputs or just one > >> >> >>> iteration? > >> >> >>> > >> >> >>> Jason > >> >> >>> > >> >> >>> > >> >> >>> > >> >> >>> On Wed, Sep 21, 2022 at 9:04 AM Νικόλαος Ταμπουρατζής < > >> >> >>> ntampouratzis@ece.auth.gr> wrote: > >> >> >>> > >> >> >>>> Dear Bobby, > >> >> >>>> > >> >> >>>> Iam trying to add --debug-flags=Exec (building the gem5 for > >> gem5.opt > >> >> >>>> not for gem5.fast which I had) but the debug traces exceed the 20GB > >> >> >>>> (and it is not finished yet) for less than 1 simulated second. How > >> can > >> >> >>>> I reduce the size of the debug-flags (or set something more > >> specific)? > >> >> >>>> > >> >> >>>> In contrast I build the HPCG benchmark with DHPCG_DEBUG flag. If > >> you > >> >> >>>> want, you can compare these two output files > >> >> >>>> (hpcg20010909T014640_SE_Mode & HPCG-Benchmark_3.1_FS_Mode). As you > >> can > >> >> >>>> see, something goes wrong with the accuracy of calculations in FS > >> mode > >> >> >>>> (benchmark uses double precission). You can find the files here: > >> >> >>>> http://kition.mhl.tuc.gr:8000/d/68d82f3533/ > >> >> >>>> > >> >> >>>> Best regards, > >> >> >>>> Nikos > >> >> >>>> > >> >> >>>> Quoting Jason Lowe-Power <jason@lowepower.com>: > >> >> >>>> > >> >> >>>>> That's quite odd that it works in SE mode but not FS mode! > >> >> >>>>> > >> >> >>>>> I would suggest running with --debug-flags=Exec for both and then > >> >> >>>> perform a > >> >> >>>>> diff to see how they differ. > >> >> >>>>> > >> >> >>>>> Cheers, > >> >> >>>>> Jason > >> >> >>>>> > >> >> >>>>> On Tue, Sep 20, 2022 at 2:45 PM Νικόλαος Ταμπουρατζής < > >> >> >>>>> ntampouratzis@ece.auth.gr> wrote: > >> >> >>>>> > >> >> >>>>>> Dear Bobby, > >> >> >>>>>> > >> >> >>>>>> In QEMU I get the same (correct) results that I get in SE mode > >> >> >>>>>> simulation. I get invalid results in FS simulation (in both > >> >> >>>>>> riscv-fs.py and riscv-ubuntu-run.py). I cannot access real RISCV > >> >> >>>>>> hardware at this moment, however, if you want you may execute my > >> >> xhpcg > >> >> >>>>>> binary (http://kition.mhl.tuc.gr:8000/f/4ca25fdd3c/) with the > >> >> >>>>>> following configuration: > >> >> >>>>>> > >> >> >>>>>> ./xhpcg --nx=16 --ny=16 --nz=16 --npx=1 --npy=1 --npz=1 --rt=0.1 > >> >> >>>>>> > >> >> >>>>>> Please let me know if you have any updates! > >> >> >>>>>> > >> >> >>>>>> Best regards, > >> >> >>>>>> Nikos > >> >> >>>>>> > >> >> >>>>>> > >> >> >>>>>> Quoting Jason Lowe-Power <jason@lowepower.com>: > >> >> >>>>>> > >> >> >>>>>>> Hi Nikos, > >> >> >>>>>>> > >> >> >>>>>>> I notice you said the following in your original email: > >> >> >>>>>>> > >> >> >>>>>>> In addition, I used the RISCV Ubuntu image > >> >> >>>>>>>> ( > >> >> https://github.com/gem5/gem5-resources/tree/stable/src/riscv-ubuntu > >> >> >>>> ), > >> >> >>>>>>>> I installed the gcc compiler, compile it (through qemu) and I > >> get > >> >> >>>>>>>> wrong results too. > >> >> >>>>>>> > >> >> >>>>>>> > >> >> >>>>>>> Is this saying you get the wrong results is QEMU? If so, the bug > >> >> is in > >> >> >>>>>> GCC > >> >> >>>>>>> or the HPCG workload, not in gem5. If not, I would test in QEMU > >> to > >> >> >>>> make > >> >> >>>>>>> sure the binary works there. Another way you could test to see > >> if > >> >> the > >> >> >>>>>>> problem is your binary or gem5 would be to run it on real > >> >> hardware. We > >> >> >>>>>> have > >> >> >>>>>>> access to some RISC-V hardware here at UC Davis, if you don't > >> have > >> >> >>>> access > >> >> >>>>>>> to it. > >> >> >>>>>>> > >> >> >>>>>>> Cheers, > >> >> >>>>>>> Jason > >> >> >>>>>>> > >> >> >>>>>>> On Tue, Sep 20, 2022 at 12:58 AM Νικόλαος Ταμπουρατζής < > >> >> >>>>>>> ntampouratzis@ece.auth.gr> wrote: > >> >> >>>>>>> > >> >> >>>>>>>> Dear Bobby, > >> >> >>>>>>>> > >> >> >>>>>>>> 1) I use the original riscv-fs.py which is provided in the > >> latest > >> >> >>>> gem5 > >> >> >>>>>>>> release. > >> >> >>>>>>>> I run the gem5 once (./build/RISCV/gem5.fast -d > >> ./HPCG_FS_results > >> >> >>>>>>>> ./configs/example/gem5_library/riscv-fs.py) in order to > >> download > >> >> the > >> >> >>>>>>>> riscv-bootloader-vmlinux-5.10 and riscv-disk-img. > >> >> >>>>>>>> After this I mount the riscv-disk-img (sudo mount -o loop > >> >> >>>>>>>> riscv-disk-img /mnt), put the xhpcg executable and I do the > >> >> following > >> >> >>>>>>>> changes in riscv-fs.py to boot the riscv-disk-img with > >> executable: > >> >> >>>>>>>> > >> >> >>>>>>>> image = CustomDiskImageResource( > >> >> >>>>>>>> local_path = "/home/cossim/.cache/gem5/riscv-disk-img", > >> >> >>>>>>>> ) > >> >> >>>>>>>> > >> >> >>>>>>>> # Set the Full System workload. > >> >> >>>>>>>> board.set_kernel_disk_workload( > >> >> >>>>>>>> > >> >> kernel=Resource("riscv-bootloader-vmlinux-5.10"), > >> >> >>>>>>>> disk_image=image, > >> >> >>>>>>>> ) > >> >> >>>>>>>> > >> >> >>>>>>>> Finally, in the > >> >> gem5/src/python/gem5/components/boards/riscv_board.py > >> >> >>>>>>>> I change the last line to "return ["console=ttyS0", > >> >> >>>>>>>> "root={root_value}", "rw"]" in order to allow the write > >> >> permissions > >> >> >>>> in > >> >> >>>>>>>> the image. > >> >> >>>>>>>> > >> >> >>>>>>>> > >> >> >>>>>>>> 2) The HPCG benchmark after some iterations calculates if the > >> >> results > >> >> >>>>>>>> are valid or not valid. In the case of FS it gives invalid > >> >> results. > >> >> >>>> As > >> >> >>>>>>>> I see from the results, one (at least) problem is that produces > >> >> >>>>>>>> different results in each HPCG execution (with the same > >> >> >>>> configuration). > >> >> >>>>>>>> > >> >> >>>>>>>> Here is the HPCG output and riscv-fs.py > >> >> >>>>>>>> (http://kition.mhl.tuc.gr:8000/d/68d82f3533/). You may > >> reproduce > >> >> the > >> >> >>>>>>>> results in the video if you use the xhpcg executable > >> >> >>>>>>>> (http://kition.mhl.tuc.gr:8000/f/4ca25fdd3c/) > >> >> >>>>>>>> > >> >> >>>>>>>> Please help me in order to solve it! > >> >> >>>>>>>> > >> >> >>>>>>>> Finally, I get invalid results in the HPL benchmark in FS mode > >> >> too. > >> >> >>>>>>>> > >> >> >>>>>>>> Best regards, > >> >> >>>>>>>> Nikos > >> >> >>>>>>>> > >> >> >>>>>>>> > >> >> >>>>>>>> Quoting Bobby Bruce <bbruce@ucdavis.edu>: > >> >> >>>>>>>> > >> >> >>>>>>>> > I'm going to need a bit more information to help: > >> >> >>>>>>>> > > >> >> >>>>>>>> > 1. In what way have you modified > >> >> >>>>>>>> > ./configs/example/gem5_library/riscv-fs.py? Can you attach > >> the > >> >> >>>> script > >> >> >>>>>>>> here? > >> >> >>>>>>>> > 2. What error are you getting or in what way are the results > >> >> >>>> invalid? > >> >> >>>>>>>> > > >> >> >>>>>>>> > - > >> >> >>>>>>>> > Dr. Bobby R. Bruce > >> >> >>>>>>>> > Room 3050, > >> >> >>>>>>>> > Kemper Hall, UC Davis > >> >> >>>>>>>> > Davis, > >> >> >>>>>>>> > CA, 95616 > >> >> >>>>>>>> > > >> >> >>>>>>>> > web: https://www.bobbybruce.net > >> >> >>>>>>>> > > >> >> >>>>>>>> > > >> >> >>>>>>>> > On Mon, Sep 19, 2022 at 1:43 PM Νικόλαος Ταμπουρατζής < > >> >> >>>>>>>> > ntampouratzis@ece.auth.gr> wrote: > >> >> >>>>>>>> > > >> >> >>>>>>>> >> > >> >> >>>>>>>> >> Dear gem5 community, > >> >> >>>>>>>> >> > >> >> >>>>>>>> >> I have successfully cross-compile the HPCG benchmark for > >> RISCV > >> >> >>>>>> (Serial > >> >> >>>>>>>> >> version, without MPI and OpenMP). While it working properly > >> in > >> >> >>>> gem5 > >> >> >>>>>> SE > >> >> >>>>>>>> >> mode (./build/RISCV/gem5.fast -d ./HPCG_SE_results > >> >> >>>>>>>> >> ./configs/example/se.py -c xhpcg --options '--nx=16 --ny=16 > >> >> >>>> --nz=16 > >> >> >>>>>>>> >> --npx=1 --npy=1 --npz=1 --rt=0.1'), I get invalid results > >> in FS > >> >> >>>>>>>> >> simulation using "./build/RISCV/gem5.fast -d > >> ./HPCG_FS_results > >> >> >>>>>>>> >> ./configs/example/gem5_library/riscv-fs.py" (I mount the > >> riscv > >> >> >>>> image > >> >> >>>>>>>> >> and put it). > >> >> >>>>>>>> >> > >> >> >>>>>>>> >> Can you help me please? > >> >> >>>>>>>> >> > >> >> >>>>>>>> >> In addition, I used the RISCV Ubuntu image > >> >> >>>>>>>> >> ( > >> >> >>>> > >> https://github.com/gem5/gem5-resources/tree/stable/src/riscv-ubuntu > >> >> >>>>>> ), > >> >> >>>>>>>> >> I installed the gcc compiler, compile it (through qemu) and > >> I > >> >> get > >> >> >>>>>>>> >> wrong results too. > >> >> >>>>>>>> >> > >> >> >>>>>>>> >> Here is the Makefile which I use, the hpcg executable for > >> RISCV > >> >> >>>>>>>> >> (xhpcg), and a video that shows the results > >> >> >>>>>>>> >> (http://kition.mhl.tuc.gr:8000/f/4ca25fdd3c/). > >> >> >>>>>>>> >> > >> >> >>>>>>>> >> P.S. I use the latest gem5 version. > >> >> >>>>>>>> >> > >> >> >>>>>>>> >> Thank you in advance! :) > >> >> >>>>>>>> >> > >> >> >>>>>>>> >> Best regards, > >> >> >>>>>>>> >> Nikos > >> >> >>>>>>>> >> _______________________________________________ > >> >> >>>>>>>> >> gem5-users mailing list -- gem5-users@gem5.org > >> >> >>>>>>>> >> To unsubscribe send an email to gem5-users-leave@gem5.org > >> >> >>>>>>>> >> > >> >> >>>>>>>> > >> >> >>>>>>>> > >> >> >>>>>>>> _______________________________________________ > >> >> >>>>>>>> gem5-users mailing list -- gem5-users@gem5.org > >> >> >>>>>>>> To unsubscribe send an email to gem5-users-leave@gem5.org > >> >> >>>>>>>> > >> >> >>>>>> > >> >> >>>>>> > >> >> >>>>>> _______________________________________________ > >> >> >>>>>> gem5-users mailing list -- gem5-users@gem5.org > >> >> >>>>>> To unsubscribe send an email to gem5-users-leave@gem5.org > >> >> >>>>>> > >> >> >>>> > >> >> >>>> > >> >> >>>> _______________________________________________ > >> >> >>>> gem5-users mailing list -- gem5-users@gem5.org > >> >> >>>> To unsubscribe send an email to gem5-users-leave@gem5.org > >> >> >>>> > >> >> >> > >> >> >> > >> >> >> _______________________________________________ > >> >> >> gem5-users mailing list -- gem5-users@gem5.org > >> >> >> To unsubscribe send an email to gem5-users-leave@gem5.org > >> >> > > >> >> > > >> >> > _______________________________________________ > >> >> > gem5-users mailing list -- gem5-users@gem5.org > >> >> > To unsubscribe send an email to gem5-users-leave@gem5.org > >> >> > >> >> > >> >> _______________________________________________ > >> >> gem5-users mailing list -- gem5-users@gem5.org > >> >> To unsubscribe send an email to gem5-users-leave@gem5.org > >> >> > >> > >> > >> _______________________________________________ > >> gem5-users mailing list -- gem5-users@gem5.org > >> To unsubscribe send an email to gem5-users-leave@gem5.org > >> > > > _______________________________________________ > gem5-users mailing list -- gem5-users@gem5.org > To unsubscribe send an email to gem5-users-leave@gem5.org
Νικόλαος Ταμπουρατζής
Mon, Oct 31, 2022 8:40 AM

Dear Bobby, Jason, all,

Is there any update about the accuracy of RISC-V FS?

Best regards,
Nikos

Quoting Bobby Bruce bbruce@ucdavis.edu:

Jason and I had a theory that this may be due to the "Rounding Mode" for
floating pointing being set incorrectly in FS mode. That's set via a macro
here:
https://gem5.googlesource.com/public/gem5/+/refs/tags/v22.0.0.2/src/arch/riscv/fp_inst.hh#36

I manually expanded the macro here:
https://gem5.googlesource.com/public/gem5/+/refs/tags/v22.0.0.2/src/arch/riscv/isa/decoder.isa#1495,
inside the "fsqrt_d" definition then compiled "build/ALL/gem5.debug". Then
used gdb to add a breakpoint in the "Fsqrt_d::execute" function (in the
generated "build/ALL/arch/riscv/generated/exec-ns.cc.inc" file).

gdb build/ALL/gem5.opt
break Fsqrt_d::execute
run bug-recreation/se-mode-run.py # or `run bug-recreation/fs-mode-run.py`

Stepping through with gdb I the rounding mode is 0 for SE mode and 0
for FS mode as well. So, no luck with that theory.

My new theory is that this bug has something to do with thread context
switching being implemented incorrectly in RISC-V somehow. I find it
strange that the sqrt(1) works fine for a while (i.e. returns 1) then
suddenly starts returning zero after a certain point in the execution. In
addition, it's odd that the loop is not returning the same value each time
despite executing the same code. It'd make sense to me that the thread is
being stored and then resumed with some corruption of the floating point
data. This would also explain why this bug only occurs in FS mode.

I'll try to find time to figure out a good test for this. If anyone has any
other theories or ideas then let me know.

--
Dr. Bobby R. Bruce
Room 3050,
Kemper Hall, UC Davis
Davis,
CA, 95616

web: https://www.bobbybruce.net

On Fri, Oct 7, 2022 at 12:50 PM Νικόλαος Ταμπουρατζής <
ntampouratzis@ece.auth.gr> wrote:

Dear Jason & Boddy,

Unfortunately, I have tried my simple example without the sqrt
function and the problem remains. Specifically, I have the following
simple code:

#include <cmath>
#include <stdio.h>

int main(){

  int dim = 1024;

  double result;

  for (int iter = 0; iter < 2; iter++){
      result = 0;
      for (int i = 0; i < dim; i++){
          for (int j = 0; j < dim; j++){
              result += i * j;
          }
      }
      printf("Final Result: %lf\n", result);
  }

}

In the above code, the correct result is 274341298176.000000 (from
RISCV-SE mode and x86), while in FS mode I get sometimes the correct
result and other times a different number.

Best regards,
Nikos

Quoting Jason Lowe-Power jason@lowepower.com:

I have an idea...

Have you put a breakpoint in the implementation of the fsqrt_d

function? I

would like to know if when running in SE mode and running in FS mode we

are

using the same rounding mode. My hypothesis is that in FS mode the

rounding

mode is set differently.

Cheers,
Jason

On Fri, Oct 7, 2022 at 12:15 AM Νικόλαος Ταμπουρατζής <
ntampouratzis@ece.auth.gr> wrote:

Dear Boddy,

Thanks a lot for the effort! I looked in detail and I observe that the
problem is created only using float and double variables (in the case
of int it is working properly in FS mode). Specifically, in the case
of float the variables are set to "nan", while in the case of double
the variables are set to 0.000000 (in random time - probably from some
instruction of simulated OS?). You may use a simple c/c++ example in
order to get some traces before going to HPCG...

Thank you in advance!!
Best regards,
Nikos

Quoting Bobby Bruce bbruce@ucdavis.edu:

Hey Niko,

Thanks for this analysis. I jumped a little into this today but

didn't

get

as far as you did. I wanted to find a quick way to recreate the

following:

feel

free to use this, if it helps any.

It's very strange to me that this bug hasn't manifested itself

before but

it's undeniably there. I'll try to spend more time looking at this

tomorrow

with some traces and debug flags and see if I can narrow down the

problem.

--
Dr. Bobby R. Bruce
Room 3050,
Kemper Hall, UC Davis
Davis,
CA, 95616

web: https://www.bobbybruce.net

On Wed, Oct 5, 2022 at 2:26 PM Νικόλαος Ταμπουρατζής <
ntampouratzis@ece.auth.gr> wrote:

In my previous results, I had used double (not float) for the
following variables: result, sq_i and sq_j. In the case of float
instead of double I get "nan" and not 0.000000.

Quoting Νικόλαος Ταμπουρατζής ntampouratzis@ece.auth.gr:

Dear Jason, all,

I am trying to find the accuracy problem with RISCV-FS and I

observe

that the problem is created (at least in my dummy example) because
the variables (double) are set to zero in random simulated time

(for

this reason I get different results among executions of the same
code). Specifically for the following dummy code:

#include <cmath>
#include <stdio.h>

int main(){

 int dim = 10;

 float result;

 for (int iter = 0; iter < 2; iter++){
     result = 0;
     for (int i = 0; i < dim; i++){
         for (int j = 0; j < dim; j++){
             float sq_i = sqrt(i);
             float sq_j = sqrt(j);
             result += sq_i * sq_j;
             printf("ITER: %d | i: %d | j: %d Result(i: %f | j:

%f | i*j: %f): %f\n", iter, i , j, sq_i, sq_j, sq_i * sq_j,

result);

         }
     }
     printf("Final Result: %lf\n", result);
 }

}

The correct Final Result in both iterations is 372.721656.

However,

I get the following results in FS:

ITER: 0 | i: 0 | j: 0 Result(i: 0.000000 | j: 0.000000 | ij:
0.000000): 0.000000
ITER: 0 | i: 0 | j: 1 Result(i: 0.000000 | j: 1.000000 | i
j:
0.000000): 0.000000
ITER: 0 | i: 0 | j: 2 Result(i: 0.000000 | j: 1.414214 | ij:
0.000000): 0.000000
ITER: 0 | i: 0 | j: 3 Result(i: 0.000000 | j: 1.732051 | i
j:
0.000000): 0.000000
ITER: 0 | i: 0 | j: 4 Result(i: 0.000000 | j: 2.000000 | ij:
0.000000): 0.000000
ITER: 0 | i: 0 | j: 5 Result(i: 0.000000 | j: 2.236068 | i
j:
0.000000): 0.000000
ITER: 0 | i: 0 | j: 6 Result(i: 0.000000 | j: 2.449490 | ij:
0.000000): 0.000000
ITER: 0 | i: 0 | j: 7 Result(i: 0.000000 | j: 2.645751 | i
j:
0.000000): 0.000000
ITER: 0 | i: 0 | j: 8 Result(i: 0.000000 | j: 2.828427 | ij:
0.000000): 0.000000
ITER: 0 | i: 0 | j: 9 Result(i: 0.000000 | j: 3.000000 | i
j:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 0 Result(i: 1.000000 | j: 0.000000 | ij:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 1 Result(i: 1.000000 | j: 1.000000 | i
j:
1.000000): 1.000000
ITER: 0 | i: 1 | j: 2 Result(i: 1.000000 | j: 1.414214 | ij:
1.414214): 2.414214
ITER: 0 | i: 1 | j: 3 Result(i: 1.000000 | j: 1.732051 | i
j:
1.732051): 4.146264
ITER: 0 | i: 1 | j: 4 Result(i: 0.000000 | j: 2.000000 | ij:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 5 Result(i: 0.000000 | j: 2.236068 | i
j:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 6 Result(i: 0.000000 | j: 2.449490 | ij:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 7 Result(i: 0.000000 | j: 2.645751 | i
j:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 8 Result(i: 0.000000 | j: 2.828427 | ij:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 9 Result(i: 0.000000 | j: 3.000000 | i
j:
0.000000): 0.000000
ITER: 0 | i: 2 | j: 0 Result(i: 1.414214 | j: 0.000000 | ij:
0.000000): 0.000000
ITER: 0 | i: 2 | j: 1 Result(i: 1.414214 | j: 1.000000 | i
j:
1.414214): 1.414214
ITER: 0 | i: 2 | j: 2 Result(i: 1.414214 | j: 1.414214 | ij:
2.000000): 3.414214
ITER: 0 | i: 2 | j: 3 Result(i: 1.414214 | j: 1.732051 | i
j:
2.449490): 5.863703
ITER: 0 | i: 2 | j: 4 Result(i: 1.414214 | j: 2.000000 | ij:
2.828427): 8.692130
ITER: 0 | i: 2 | j: 5 Result(i: 1.414214 | j: 2.236068 | i
j:
3.162278): 11.854408
ITER: 0 | i: 2 | j: 6 Result(i: 1.414214 | j: 2.449490 | ij:
3.464102): 15.318510
ITER: 0 | i: 2 | j: 7 Result(i: 1.414214 | j: 2.645751 | i
j:
3.741657): 19.060167
ITER: 0 | i: 2 | j: 8 Result(i: 1.414214 | j: 2.828427 | ij:
4.000000): 23.060167
ITER: 0 | i: 2 | j: 9 Result(i: 1.414214 | j: 3.000000 | i
j:
4.242641): 27.302808
ITER: 0 | i: 3 | j: 0 Result(i: 1.732051 | j: 0.000000 | ij:
0.000000): 27.302808
ITER: 0 | i: 3 | j: 1 Result(i: 1.732051 | j: 1.000000 | i
j:
1.732051): 29.034859
ITER: 0 | i: 3 | j: 2 Result(i: 1.732051 | j: 1.414214 | ij:
2.449490): 31.484348
ITER: 0 | i: 3 | j: 3 Result(i: 1.732051 | j: 1.732051 | i
j:
3.000000): 34.484348
ITER: 0 | i: 3 | j: 4 Result(i: 1.732051 | j: 2.000000 | ij:
3.464102): 37.948450
ITER: 0 | i: 3 | j: 5 Result(i: 1.732051 | j: 2.236068 | i
j:
3.872983): 41.821433
ITER: 0 | i: 3 | j: 6 Result(i: 1.732051 | j: 2.449490 | ij:
4.242641): 46.064074
ITER: 0 | i: 3 | j: 7 Result(i: 1.732051 | j: 2.645751 | i
j:
4.582576): 50.646650
ITER: 0 | i: 3 | j: 8 Result(i: 1.732051 | j: 2.828427 | ij:
4.898979): 55.545629
ITER: 0 | i: 3 | j: 9 Result(i: 1.732051 | j: 3.000000 | i
j:
5.196152): 60.741782
ITER: 0 | i: 4 | j: 0 Result(i: 2.000000 | j: 0.000000 | ij:
0.000000): 60.741782
ITER: 0 | i: 4 | j: 1 Result(i: 2.000000 | j: 1.000000 | i
j:
2.000000): 62.741782
ITER: 0 | i: 4 | j: 2 Result(i: 2.000000 | j: 1.414214 | ij:
2.828427): 65.570209
ITER: 0 | i: 4 | j: 3 Result(i: 2.000000 | j: 1.732051 | i
j:
3.464102): 69.034310
ITER: 0 | i: 4 | j: 4 Result(i: 2.000000 | j: 2.000000 | ij:
4.000000): 73.034310
ITER: 0 | i: 4 | j: 5 Result(i: 2.000000 | j: 2.236068 | i
j:
4.472136): 77.506446
ITER: 0 | i: 4 | j: 6 Result(i: 2.000000 | j: 2.449490 | ij:
4.898979): 82.405426
ITER: 0 | i: 4 | j: 7 Result(i: 2.000000 | j: 2.645751 | i
j:
5.291503): 87.696928
ITER: 0 | i: 4 | j: 8 Result(i: 2.000000 | j: 2.828427 | ij:
5.656854): 93.353783
ITER: 0 | i: 4 | j: 9 Result(i: 2.000000 | j: 3.000000 | i
j:
6.000000): 99.353783
ITER: 0 | i: 5 | j: 0 Result(i: 2.236068 | j: 0.000000 | ij:
0.000000): 99.353783
ITER: 0 | i: 5 | j: 1 Result(i: 2.236068 | j: 1.000000 | i
j:
2.236068): 101.589851
ITER: 0 | i: 5 | j: 2 Result(i: 2.236068 | j: 1.414214 | ij:
3.162278): 104.752128
ITER: 0 | i: 5 | j: 3 Result(i: 2.236068 | j: 1.732051 | i
j:
3.872983): 108.625112
ITER: 0 | i: 5 | j: 4 Result(i: 2.236068 | j: 2.000000 | ij:
4.472136): 113.097248
ITER: 0 | i: 5 | j: 5 Result(i: 2.236068 | j: 2.236068 | i
j:
5.000000): 118.097248
ITER: 0 | i: 5 | j: 6 Result(i: 2.236068 | j: 2.449490 | ij:
5.477226): 123.574473
ITER: 0 | i: 5 | j: 7 Result(i: 2.236068 | j: 2.645751 | i
j:
5.916080): 129.490553
ITER: 0 | i: 5 | j: 8 Result(i: 2.236068 | j: 2.828427 | ij:
6.324555): 135.815108
ITER: 0 | i: 5 | j: 9 Result(i: 2.236068 | j: 3.000000 | i
j:
6.708204): 142.523312
ITER: 0 | i: 6 | j: 0 Result(i: 2.449490 | j: 0.000000 | ij:
0.000000): 142.523312
ITER: 0 | i: 6 | j: 1 Result(i: 2.449490 | j: 1.000000 | i
j:
2.449490): 144.972802
ITER: 0 | i: 6 | j: 2 Result(i: 2.449490 | j: 1.414214 | ij:
3.464102): 148.436904
ITER: 0 | i: 6 | j: 3 Result(i: 2.449490 | j: 1.732051 | i
j:
4.242641): 152.679544
ITER: 0 | i: 6 | j: 4 Result(i: 2.449490 | j: 2.000000 | ij:
4.898979): 157.578524
ITER: 0 | i: 6 | j: 5 Result(i: 2.449490 | j: 2.236068 | i
j:
5.477226): 163.055749
ITER: 0 | i: 6 | j: 6 Result(i: 2.449490 | j: 2.449490 | ij:
6.000000): 169.055749
ITER: 0 | i: 6 | j: 7 Result(i: 2.449490 | j: 2.645751 | i
j:
6.480741): 175.536490
ITER: 0 | i: 6 | j: 8 Result(i: 2.449490 | j: 2.828427 | ij:
6.928203): 182.464693
ITER: 0 | i: 6 | j: 9 Result(i: 2.449490 | j: 3.000000 | i
j:
7.348469): 189.813162
ITER: 0 | i: 7 | j: 0 Result(i: 2.645751 | j: 0.000000 | ij:
0.000000): 189.813162
ITER: 0 | i: 7 | j: 1 Result(i: 2.645751 | j: 1.000000 | i
j:
2.645751): 192.458914
ITER: 0 | i: 7 | j: 2 Result(i: 2.645751 | j: 1.414214 | ij:
3.741657): 196.200571
ITER: 0 | i: 7 | j: 3 Result(i: 2.645751 | j: 1.732051 | i
j:
4.582576): 200.783147
ITER: 0 | i: 7 | j: 4 Result(i: 2.645751 | j: 2.000000 | ij:
5.291503): 206.074649
ITER: 0 | i: 7 | j: 5 Result(i: 2.645751 | j: 2.236068 | i
j:
5.916080): 211.990729
ITER: 0 | i: 7 | j: 6 Result(i: 2.645751 | j: 2.449490 | ij:
6.480741): 218.471470
ITER: 0 | i: 7 | j: 7 Result(i: 2.645751 | j: 2.645751 | i
j:
7.000000): 225.471470
ITER: 0 | i: 7 | j: 8 Result(i: 2.645751 | j: 2.828427 | ij:
7.483315): 232.954785
ITER: 0 | i: 7 | j: 9 Result(i: 2.645751 | j: 3.000000 | i
j:
7.937254): 240.892039
ITER: 0 | i: 8 | j: 0 Result(i: 2.828427 | j: 0.000000 | ij:
0.000000): 240.892039
ITER: 0 | i: 8 | j: 1 Result(i: 2.828427 | j: 1.000000 | i
j:
2.828427): 243.720466
ITER: 0 | i: 8 | j: 2 Result(i: 2.828427 | j: 1.414214 | ij:
4.000000): 247.720466
ITER: 0 | i: 8 | j: 3 Result(i: 2.828427 | j: 1.732051 | i
j:
4.898979): 252.619445
ITER: 0 | i: 8 | j: 4 Result(i: 2.828427 | j: 2.000000 | ij:
5.656854): 258.276300
ITER: 0 | i: 8 | j: 5 Result(i: 2.828427 | j: 2.236068 | i
j:
6.324555): 264.600855
ITER: 0 | i: 8 | j: 6 Result(i: 2.828427 | j: 2.449490 | ij:
6.928203): 271.529058
ITER: 0 | i: 8 | j: 7 Result(i: 2.828427 | j: 2.645751 | i
j:
7.483315): 279.012373
ITER: 0 | i: 8 | j: 8 Result(i: 2.828427 | j: 2.828427 | ij:
8.000000): 287.012373
ITER: 0 | i: 8 | j: 9 Result(i: 2.828427 | j: 3.000000 | i
j:
8.485281): 295.497654
ITER: 0 | i: 9 | j: 0 Result(i: 3.000000 | j: 0.000000 | ij:
0.000000): 295.497654
ITER: 0 | i: 9 | j: 1 Result(i: 3.000000 | j: 1.000000 | i
j:
3.000000): 298.497654
ITER: 0 | i: 9 | j: 2 Result(i: 3.000000 | j: 1.414214 | ij:
4.242641): 302.740295
ITER: 0 | i: 9 | j: 3 Result(i: 3.000000 | j: 1.732051 | i
j:
5.196152): 307.936447
ITER: 0 | i: 9 | j: 4 Result(i: 3.000000 | j: 2.000000 | ij:
6.000000): 313.936447
ITER: 0 | i: 9 | j: 5 Result(i: 3.000000 | j: 2.236068 | i
j:
6.708204): 320.644651
ITER: 0 | i: 9 | j: 6 Result(i: 3.000000 | j: 2.449490 | ij:
7.348469): 327.993120
ITER: 0 | i: 9 | j: 7 Result(i: 3.000000 | j: 2.645751 | i
j:
7.937254): 335.930374
ITER: 0 | i: 9 | j: 8 Result(i: 3.000000 | j: 2.828427 | ij:
8.485281): 344.415656
ITER: 0 | i: 9 | j: 9 Result(i: 3.000000 | j: 3.000000 | i
j:
9.000000): 353.415656
Final Result: 353.415656
ITER: 1 | i: 0 | j: 0 Result(i: 0.000000 | j: 0.000000 | ij:
0.000000): 0.000000
ITER: 1 | i: 0 | j: 1 Result(i: 0.000000 | j: 1.000000 | i
j:
0.000000): 0.000000
ITER: 1 | i: 0 | j: 2 Result(i: 0.000000 | j: 1.414214 | ij:
0.000000): 0.000000
ITER: 1 | i: 0 | j: 3 Result(i: 0.000000 | j: 1.732051 | i
j:
0.000000): 0.000000
ITER: 1 | i: 0 | j: 4 Result(i: 0.000000 | j: 2.000000 | ij:
0.000000): 0.000000
ITER: 1 | i: 0 | j: 5 Result(i: 0.000000 | j: 2.236068 | i
j:
0.000000): 0.000000
ITER: 1 | i: 0 | j: 6 Result(i: 0.000000 | j: 2.449490 | ij:
0.000000): 0.000000
ITER: 1 | i: 0 | j: 7 Result(i: 0.000000 | j: 2.645751 | i
j:
0.000000): 0.000000
ITER: 1 | i: 0 | j: 8 Result(i: 0.000000 | j: 2.828427 | ij:
0.000000): 0.000000
ITER: 1 | i: 0 | j: 9 Result(i: 0.000000 | j: 3.000000 | i
j:
0.000000): 0.000000
ITER: 1 | i: 1 | j: 0 Result(i: 1.000000 | j: 0.000000 | ij:
0.000000): 0.000000
ITER: 1 | i: 1 | j: 1 Result(i: 1.000000 | j: 1.000000 | i
j:
1.000000): 1.000000
ITER: 1 | i: 1 | j: 2 Result(i: 1.000000 | j: 1.414214 | ij:
1.414214): 2.414214
ITER: 1 | i: 1 | j: 3 Result(i: 1.000000 | j: 1.732051 | i
j:
1.732051): 4.146264
ITER: 1 | i: 1 | j: 4 Result(i: 1.000000 | j: 2.000000 | ij:
2.000000): 6.146264
ITER: 1 | i: 1 | j: 5 Result(i: 1.000000 | j: 2.236068 | i
j:
2.236068): 8.382332
ITER: 1 | i: 1 | j: 6 Result(i: 1.000000 | j: 2.449490 | ij:
2.449490): 10.831822
ITER: 1 | i: 1 | j: 7 Result(i: 1.000000 | j: 2.645751 | i
j:
2.645751): 13.477573
ITER: 1 | i: 1 | j: 8 Result(i: 1.000000 | j: 2.828427 | ij:
2.828427): 16.306001
ITER: 1 | i: 1 | j: 9 Result(i: 1.000000 | j: 3.000000 | i
j:
3.000000): 19.306001
ITER: 1 | i: 2 | j: 0 Result(i: 1.414214 | j: 0.000000 | ij:
0.000000): 19.306001
ITER: 1 | i: 2 | j: 1 Result(i: 1.414214 | j: 1.000000 | i
j:
1.414214): 20.720214
ITER: 1 | i: 2 | j: 2 Result(i: 1.414214 | j: 1.414214 | ij:
2.000000): 22.720214
ITER: 1 | i: 2 | j: 3 Result(i: 1.414214 | j: 1.732051 | i
j:
2.449490): 25.169704
ITER: 1 | i: 2 | j: 4 Result(i: 1.414214 | j: 2.000000 | ij:
2.828427): 27.998131
ITER: 1 | i: 2 | j: 5 Result(i: 1.414214 | j: 2.236068 | i
j:
3.162278): 31.160409
ITER: 1 | i: 2 | j: 6 Result(i: 1.414214 | j: 2.449490 | ij:
3.464102): 34.624510
ITER: 1 | i: 2 | j: 7 Result(i: 1.414214 | j: 2.645751 | i
j:
3.741657): 38.366168
ITER: 1 | i: 2 | j: 8 Result(i: 1.414214 | j: 2.828427 | ij:
4.000000): 42.366168
ITER: 1 | i: 2 | j: 9 Result(i: 1.414214 | j: 3.000000 | i
j:
4.242641): 46.608808
ITER: 1 | i: 3 | j: 0 Result(i: 1.732051 | j: 0.000000 | ij:
0.000000): 46.608808
ITER: 1 | i: 3 | j: 1 Result(i: 1.732051 | j: 1.000000 | i
j:
1.732051): 48.340859
ITER: 1 | i: 3 | j: 2 Result(i: 1.732051 | j: 1.414214 | ij:
2.449490): 50.790349
ITER: 1 | i: 3 | j: 3 Result(i: 1.732051 | j: 1.732051 | i
j:
3.000000): 53.790349
ITER: 1 | i: 3 | j: 4 Result(i: 1.732051 | j: 2.000000 | ij:
3.464102): 57.254450
ITER: 1 | i: 3 | j: 5 Result(i: 1.732051 | j: 2.236068 | i
j:
3.872983): 61.127434
ITER: 1 | i: 3 | j: 6 Result(i: 1.732051 | j: 2.449490 | ij:
4.242641): 65.370075
ITER: 1 | i: 3 | j: 7 Result(i: 1.732051 | j: 2.645751 | i
j:
4.582576): 69.952650
ITER: 1 | i: 3 | j: 8 Result(i: 1.732051 | j: 2.828427 | ij:
4.898979): 74.851630
ITER: 1 | i: 3 | j: 9 Result(i: 1.732051 | j: 3.000000 | i
j:
5.196152): 80.047782
ITER: 1 | i: 4 | j: 0 Result(i: 2.000000 | j: 0.000000 | ij:
0.000000): 80.047782
ITER: 1 | i: 4 | j: 1 Result(i: 2.000000 | j: 1.000000 | i
j:
2.000000): 82.047782
ITER: 1 | i: 4 | j: 2 Result(i: 2.000000 | j: 1.414214 | ij:
2.828427): 84.876209
ITER: 1 | i: 4 | j: 3 Result(i: 2.000000 | j: 1.732051 | i
j:
3.464102): 88.340311
ITER: 1 | i: 4 | j: 4 Result(i: 2.000000 | j: 2.000000 | ij:
4.000000): 92.340311
ITER: 1 | i: 4 | j: 5 Result(i: 2.000000 | j: 2.236068 | i
j:
4.472136): 96.812447
ITER: 1 | i: 4 | j: 6 Result(i: 2.000000 | j: 2.449490 | ij:
4.898979): 101.711426
ITER: 1 | i: 4 | j: 7 Result(i: 2.000000 | j: 2.645751 | i
j:
5.291503): 107.002929
ITER: 1 | i: 4 | j: 8 Result(i: 2.000000 | j: 2.828427 | ij:
5.656854): 112.659783
ITER: 1 | i: 4 | j: 9 Result(i: 2.000000 | j: 3.000000 | i
j:
6.000000): 118.659783
ITER: 1 | i: 5 | j: 0 Result(i: 2.236068 | j: 0.000000 | ij:
0.000000): 118.659783
ITER: 1 | i: 5 | j: 1 Result(i: 2.236068 | j: 1.000000 | i
j:
2.236068): 120.895851
ITER: 1 | i: 5 | j: 2 Result(i: 2.236068 | j: 1.414214 | ij:
3.162278): 124.058129
ITER: 1 | i: 5 | j: 3 Result(i: 2.236068 | j: 1.732051 | i
j:
3.872983): 127.931112
ITER: 1 | i: 5 | j: 4 Result(i: 2.236068 | j: 2.000000 | ij:
4.472136): 132.403248
ITER: 1 | i: 5 | j: 5 Result(i: 2.236068 | j: 2.236068 | i
j:
5.000000): 137.403248
ITER: 1 | i: 5 | j: 6 Result(i: 2.236068 | j: 2.449490 | ij:
5.477226): 142.880474
ITER: 1 | i: 5 | j: 7 Result(i: 2.236068 | j: 2.645751 | i
j:
5.916080): 148.796553
ITER: 1 | i: 5 | j: 8 Result(i: 2.236068 | j: 2.828427 | ij:
6.324555): 155.121109
ITER: 1 | i: 5 | j: 9 Result(i: 2.236068 | j: 3.000000 | i
j:
6.708204): 161.829313
ITER: 1 | i: 6 | j: 0 Result(i: 2.449490 | j: 0.000000 | ij:
0.000000): 161.829313
ITER: 1 | i: 6 | j: 1 Result(i: 2.449490 | j: 1.000000 | i
j:
2.449490): 164.278802
ITER: 1 | i: 6 | j: 2 Result(i: 2.449490 | j: 1.414214 | ij:
3.464102): 167.742904
ITER: 1 | i: 6 | j: 3 Result(i: 2.449490 | j: 1.732051 | i
j:
4.242641): 171.985545
ITER: 1 | i: 6 | j: 4 Result(i: 2.449490 | j: 2.000000 | ij:
4.898979): 176.884524
ITER: 1 | i: 6 | j: 5 Result(i: 2.449490 | j: 2.236068 | i
j:
5.477226): 182.361750
ITER: 1 | i: 6 | j: 6 Result(i: 2.449490 | j: 2.449490 | ij:
6.000000): 188.361750
ITER: 1 | i: 6 | j: 7 Result(i: 2.449490 | j: 2.645751 | i
j:
6.480741): 194.842491
ITER: 1 | i: 6 | j: 8 Result(i: 2.449490 | j: 2.828427 | ij:
6.928203): 201.770694
ITER: 1 | i: 6 | j: 9 Result(i: 2.449490 | j: 3.000000 | i
j:
7.348469): 209.119163
ITER: 1 | i: 7 | j: 0 Result(i: 2.645751 | j: 0.000000 | ij:
0.000000): 209.119163
ITER: 1 | i: 7 | j: 1 Result(i: 2.645751 | j: 1.000000 | i
j:
2.645751): 211.764914
ITER: 1 | i: 7 | j: 2 Result(i: 2.645751 | j: 1.414214 | ij:
3.741657): 215.506572
ITER: 1 | i: 7 | j: 3 Result(i: 2.645751 | j: 1.732051 | i
j:
4.582576): 220.089147
ITER: 1 | i: 7 | j: 4 Result(i: 2.645751 | j: 2.000000 | ij:
5.291503): 225.380650
ITER: 1 | i: 7 | j: 5 Result(i: 2.645751 | j: 2.236068 | i
j:
5.916080): 231.296730
ITER: 1 | i: 7 | j: 6 Result(i: 2.645751 | j: 2.449490 | ij:
6.480741): 237.777470
ITER: 1 | i: 7 | j: 7 Result(i: 2.645751 | j: 2.645751 | i
j:
7.000000): 244.777470
ITER: 1 | i: 7 | j: 8 Result(i: 2.645751 | j: 2.828427 | ij:
7.483315): 252.260785
ITER: 1 | i: 7 | j: 9 Result(i: 2.645751 | j: 3.000000 | i
j:
7.937254): 260.198039
ITER: 1 | i: 8 | j: 0 Result(i: 2.828427 | j: 0.000000 | ij:
0.000000): 260.198039
ITER: 1 | i: 8 | j: 1 Result(i: 2.828427 | j: 1.000000 | i
j:
2.828427): 263.026466
ITER: 1 | i: 8 | j: 2 Result(i: 2.828427 | j: 1.414214 | ij:
4.000000): 267.026466
ITER: 1 | i: 8 | j: 3 Result(i: 2.828427 | j: 1.732051 | i
j:
4.898979): 271.925446
ITER: 1 | i: 8 | j: 4 Result(i: 2.828427 | j: 2.000000 | ij:
5.656854): 277.582300
ITER: 1 | i: 8 | j: 5 Result(i: 2.828427 | j: 2.236068 | i
j:
6.324555): 283.906855
ITER: 1 | i: 8 | j: 6 Result(i: 2.828427 | j: 2.449490 | ij:
6.928203): 290.835059
ITER: 1 | i: 8 | j: 7 Result(i: 2.828427 | j: 2.645751 | i
j:
7.483315): 298.318373
ITER: 1 | i: 8 | j: 8 Result(i: 2.828427 | j: 2.828427 | ij:
8.000000): 306.318373
ITER: 1 | i: 8 | j: 9 Result(i: 2.828427 | j: 3.000000 | i
j:
8.485281): 314.803655
ITER: 1 | i: 9 | j: 0 Result(i: 3.000000 | j: 0.000000 | ij:
0.000000): 314.803655
ITER: 1 | i: 9 | j: 1 Result(i: 3.000000 | j: 1.000000 | i
j:
3.000000): 317.803655
ITER: 1 | i: 9 | j: 2 Result(i: 3.000000 | j: 1.414214 | ij:
4.242641): 322.046295
ITER: 1 | i: 9 | j: 3 Result(i: 3.000000 | j: 1.732051 | i
j:
5.196152): 327.242448
ITER: 1 | i: 9 | j: 4 Result(i: 3.000000 | j: 2.000000 | ij:
6.000000): 333.242448
ITER: 1 | i: 9 | j: 5 Result(i: 3.000000 | j: 2.236068 | i
j:
6.708204): 339.950652
ITER: 1 | i: 9 | j: 6 Result(i: 3.000000 | j: 2.449490 | ij:
7.348469): 347.299121
ITER: 1 | i: 9 | j: 7 Result(i: 3.000000 | j: 2.645751 | i
j:
7.937254): 355.236375
ITER: 1 | i: 9 | j: 8 Result(i: 3.000000 | j: 2.828427 | ij:
8.485281): 363.721656
ITER: 1 | i: 9 | j: 9 Result(i: 3.000000 | j: 3.000000 | i
j:
9.000000): 372.721656
Final Result: 372.721656

As we can see in the following iterations the sqrt(1) as well as

the

result is set to zero for some reason.

ITER: 0 | i: 1 | j: 4 Result(i: 0.000000 | j: 2.000000 | ij:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 5 Result(i: 0.000000 | j: 2.236068 | i
j:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 6 Result(i: 0.000000 | j: 2.449490 | ij:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 7 Result(i: 0.000000 | j: 2.645751 | i
j:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 8 Result(i: 0.000000 | j: 2.828427 | ij:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 9 Result(i: 0.000000 | j: 3.000000 | i
j:
0.000000): 0.000000

Please help me to resolve the accuracy issue! I think that it will
be very useful for gem5 community.

To be noticed, I find the correct simulated tick in which the
application started in FS (using m5 dumpstats), and I start the
--debug-start, but the trace file which is generated is 10x larger
than SE mode for the same application. How can I compare them?

Thank you in advance!
Best regards,
Nikos

Quoting Νικόλαος Ταμπουρατζής ntampouratzis@ece.auth.gr:

Dear Jason,

I am trying to use --debug-start but in FS mode it is very
difficult to find the tick on which the application is started!

However, I am writing the following very simple c++ program:

#include <cmath>
#include <stdio.h>

int main(){

int dim = 4096;

double result;

for (int iter = 0; iter < 2; iter++){
    result = 0;
    for (int i = 0; i < dim; i++){
        for (int j = 0; j < dim; j++){
            result += sqrt(i) * sqrt(j);
        }
    }
    printf("Result: %lf\n", result); //Result:

30530733453.127449

}

}

I cross-compile it using: riscv64-linux-gnu-g++ -static -O3 -o
test_riscv test_riscv.cpp

While in X86 (without cross-compilation of course), QEMU-RISCV,
GEM5-SE the result is the same (30530733453.127449), in GEM5-FS

the

result is different! In addition, the result is also different
between the 2 iterations.

Please reproduce the error if you want in order to verify my

result.

Ηow can the issue be resolved?

Thank you in advance!

Best regards,
Nikos

Quoting Jason Lowe-Power jason@lowepower.com:

Hi Nikos,

You can use --debug-start to start the debugging after some

number

of

ticks. Also, I would expect that the difference should come up

quickly, so

no need to run the program to the end.

For the FS mode one, you will want to just start the trace as

the

application starts. This could be a bit of a pain.

I'm not really sure what fundamentally could be different. FS

and SE

mode

use the exact same code for executing instructions, so I don't

think

that's

the problem. Have you tried running for smaller inputs or just

one

iteration?

Jason

On Wed, Sep 21, 2022 at 9:04 AM Νικόλαος Ταμπουρατζής <
ntampouratzis@ece.auth.gr> wrote:

Dear Bobby,

Iam trying to add --debug-flags=Exec (building the gem5 for

gem5.opt

not for gem5.fast which I had) but the debug traces exceed the

20GB

(and it is not finished yet) for less than 1 simulated second.

How

can

I reduce the size of the debug-flags (or set something more

specific)?

In contrast I build the HPCG benchmark with DHPCG_DEBUG flag.

If

you

want, you can compare these two output files
(hpcg20010909T014640_SE_Mode & HPCG-Benchmark_3.1_FS_Mode). As

you

can

see, something goes wrong with the accuracy of calculations in

FS

mode

(benchmark uses double precission). You can find the files

here:

http://kition.mhl.tuc.gr:8000/d/68d82f3533/

Best regards,
Nikos

Quoting Jason Lowe-Power jason@lowepower.com:

That's quite odd that it works in SE mode but not FS mode!

I would suggest running with --debug-flags=Exec for both and

then

perform a

diff to see how they differ.

Cheers,
Jason

On Tue, Sep 20, 2022 at 2:45 PM Νικόλαος Ταμπουρατζής <
ntampouratzis@ece.auth.gr> wrote:

Dear Bobby,

In QEMU I get the same (correct) results that I get in SE

mode

simulation. I get invalid results in FS simulation (in both
riscv-fs.py and riscv-ubuntu-run.py). I cannot access real

RISCV

hardware at this moment, however, if you want you may

execute my

xhpcg

binary (http://kition.mhl.tuc.gr:8000/f/4ca25fdd3c/) with the
following configuration:

./xhpcg --nx=16 --ny=16 --nz=16 --npx=1 --npy=1 --npz=1

--rt=0.1

Please let me know if you have any updates!

Best regards,
Nikos

Quoting Jason Lowe-Power jason@lowepower.com:

Hi Nikos,

I notice you said the following in your original email:

In addition, I used the RISCV Ubuntu image

(

),

I installed the gcc compiler, compile it (through qemu)

and I

get

wrong results too.

Is this saying you get the wrong results is QEMU? If so,

the bug

is in

GCC

or the HPCG workload, not in gem5. If not, I would test in

QEMU

to

make

sure the binary works there. Another way you could test to

see

if

the

problem is your binary or gem5 would be to run it on real

hardware. We

have

access to some RISC-V hardware here at UC Davis, if you

don't

have

access

to it.

Cheers,
Jason

On Tue, Sep 20, 2022 at 12:58 AM Νικόλαος Ταμπουρατζής <
ntampouratzis@ece.auth.gr> wrote:

Dear Bobby,

  1. I use the original riscv-fs.py which is provided in the

latest

gem5

release.
I run the gem5 once (./build/RISCV/gem5.fast -d

./HPCG_FS_results

./configs/example/gem5_library/riscv-fs.py) in order to

download

the

riscv-bootloader-vmlinux-5.10 and riscv-disk-img.
After this I mount the riscv-disk-img (sudo mount -o loop
riscv-disk-img /mnt), put the xhpcg executable and I do the

following

changes in riscv-fs.py to boot the riscv-disk-img with

executable:

image = CustomDiskImageResource(
local_path =

"/home/cossim/.cache/gem5/riscv-disk-img",

)

Set the Full System workload.

board.set_kernel_disk_workload(

kernel=Resource("riscv-bootloader-vmlinux-5.10"),

                 disk_image=image,

)

Finally, in the

gem5/src/python/gem5/components/boards/riscv_board.py

I change the last line to "return ["console=ttyS0",
"root={root_value}", "rw"]" in order to allow the write

permissions

in

the image.

  1. The HPCG benchmark after some iterations calculates if

the

results

are valid or not valid. In the case of FS it gives invalid

results.

As

I see from the results, one (at least) problem is that

produces

different results in each HPCG execution (with the same

configuration).

Here is the HPCG output and riscv-fs.py
(http://kition.mhl.tuc.gr:8000/d/68d82f3533/). You may

reproduce

the

results in the video if you use the xhpcg executable
(http://kition.mhl.tuc.gr:8000/f/4ca25fdd3c/)

Please help me in order to solve it!

Finally, I get invalid results in the HPL benchmark in FS

mode

too.

Best regards,
Nikos

Quoting Bobby Bruce bbruce@ucdavis.edu:

I'm going to need a bit more information to help:

  1. In what way have you modified
    ./configs/example/gem5_library/riscv-fs.py? Can you

attach

the

script

here?

  1. What error are you getting or in what way are the

results

invalid?

Dr. Bobby R. Bruce
Room 3050,
Kemper Hall, UC Davis
Davis,
CA, 95616

web: https://www.bobbybruce.net

On Mon, Sep 19, 2022 at 1:43 PM Νικόλαος Ταμπουρατζής <
ntampouratzis@ece.auth.gr> wrote:

Dear gem5 community,

I have successfully cross-compile the HPCG benchmark for

RISCV

(Serial

version, without MPI and OpenMP). While it working

properly

in

gem5

SE

mode (./build/RISCV/gem5.fast -d ./HPCG_SE_results
./configs/example/se.py -c xhpcg --options '--nx=16

--ny=16

--nz=16

--npx=1 --npy=1 --npz=1 --rt=0.1'), I get invalid

results

in FS

simulation using "./build/RISCV/gem5.fast -d

./HPCG_FS_results

./configs/example/gem5_library/riscv-fs.py" (I mount the

riscv

image

and put it).

Can you help me please?

In addition, I used the RISCV Ubuntu image
(

),

I installed the gcc compiler, compile it (through qemu)

and

I

get

wrong results too.

Here is the Makefile which I use, the hpcg executable

for

RISCV

(xhpcg), and a video that shows the results
(http://kition.mhl.tuc.gr:8000/f/4ca25fdd3c/).

P.S. I use the latest gem5 version.

Thank you in advance! :)

Best regards,
Nikos


gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to


gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-leave@gem5.org


gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-leave@gem5.org


gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-leave@gem5.org


gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-leave@gem5.org


gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-leave@gem5.org


gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-leave@gem5.org


gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-leave@gem5.org


gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-leave@gem5.org

Dear Bobby, Jason, all, Is there any update about the accuracy of RISC-V FS? Best regards, Nikos Quoting Bobby Bruce <bbruce@ucdavis.edu>: > Jason and I had a theory that this may be due to the "Rounding Mode" for > floating pointing being set incorrectly in FS mode. That's set via a macro > here: > https://gem5.googlesource.com/public/gem5/+/refs/tags/v22.0.0.2/src/arch/riscv/fp_inst.hh#36 > > I manually expanded the macro here: > https://gem5.googlesource.com/public/gem5/+/refs/tags/v22.0.0.2/src/arch/riscv/isa/decoder.isa#1495, > inside the "fsqrt_d" definition then compiled "build/ALL/gem5.debug". Then > used gdb to add a breakpoint in the "Fsqrt_d::execute" function (in the > generated "build/ALL/arch/riscv/generated/exec-ns.cc.inc" file). > > ``` > gdb build/ALL/gem5.opt > break Fsqrt_d::execute > run bug-recreation/se-mode-run.py # or `run bug-recreation/fs-mode-run.py` > ``` > > Stepping through with gdb I the rounding mode is `0` for SE mode and `0` > for FS mode as well. So, no luck with that theory. > > My new theory is that this bug has something to do with thread context > switching being implemented incorrectly in RISC-V somehow. I find it > strange that the sqrt(1) works fine for a while (i.e. returns `1`) then > suddenly starts returning zero after a certain point in the execution. In > addition, it's odd that the loop is not returning the same value each time > despite executing the same code. It'd make sense to me that the thread is > being stored and then resumed with some corruption of the floating point > data. This would also explain why this bug only occurs in FS mode. > > I'll try to find time to figure out a good test for this. If anyone has any > other theories or ideas then let me know. > > -- > Dr. Bobby R. Bruce > Room 3050, > Kemper Hall, UC Davis > Davis, > CA, 95616 > > web: https://www.bobbybruce.net > > > On Fri, Oct 7, 2022 at 12:50 PM Νικόλαος Ταμπουρατζής < > ntampouratzis@ece.auth.gr> wrote: >> >> Dear Jason & Boddy, >> >> Unfortunately, I have tried my simple example without the sqrt >> function and the problem remains. Specifically, I have the following >> simple code: >> >> >> #include <cmath> >> #include <stdio.h> >> >> int main(){ >> >> int dim = 1024; >> >> double result; >> >> for (int iter = 0; iter < 2; iter++){ >> result = 0; >> for (int i = 0; i < dim; i++){ >> for (int j = 0; j < dim; j++){ >> result += i * j; >> } >> } >> printf("Final Result: %lf\n", result); >> } >> } >> >> >> In the above code, the correct result is 274341298176.000000 (from >> RISCV-SE mode and x86), while in FS mode I get sometimes the correct >> result and other times a different number. >> >> Best regards, >> Nikos >> >> >> Quoting Jason Lowe-Power <jason@lowepower.com>: >> >> > I have an idea... >> > >> > Have you put a breakpoint in the implementation of the fsqrt_d > function? I >> > would like to know if when running in SE mode and running in FS mode we > are >> > using the same rounding mode. My hypothesis is that in FS mode the > rounding >> > mode is set differently. >> > >> > Cheers, >> > Jason >> > >> > On Fri, Oct 7, 2022 at 12:15 AM Νικόλαος Ταμπουρατζής < >> > ntampouratzis@ece.auth.gr> wrote: >> > >> >> Dear Boddy, >> >> >> >> Thanks a lot for the effort! I looked in detail and I observe that the >> >> problem is created only using float and double variables (in the case >> >> of int it is working properly in FS mode). Specifically, in the case >> >> of float the variables are set to "nan", while in the case of double >> >> the variables are set to 0.000000 (in random time - probably from some >> >> instruction of simulated OS?). You may use a simple c/c++ example in >> >> order to get some traces before going to HPCG... >> >> >> >> Thank you in advance!! >> >> Best regards, >> >> Nikos >> >> >> >> >> >> Quoting Bobby Bruce <bbruce@ucdavis.edu>: >> >> >> >> > Hey Niko, >> >> > >> >> > Thanks for this analysis. I jumped a little into this today but > didn't >> >> get >> >> > as far as you did. I wanted to find a quick way to recreate the >> >> following: >> >> > https://gem5-review.googlesource.com/c/public/gem5/+/64211. Please > feel >> >> > free to use this, if it helps any. >> >> > >> >> > It's very strange to me that this bug hasn't manifested itself > before but >> >> > it's undeniably there. I'll try to spend more time looking at this >> >> tomorrow >> >> > with some traces and debug flags and see if I can narrow down the >> >> problem. >> >> > >> >> > -- >> >> > Dr. Bobby R. Bruce >> >> > Room 3050, >> >> > Kemper Hall, UC Davis >> >> > Davis, >> >> > CA, 95616 >> >> > >> >> > web: https://www.bobbybruce.net >> >> > >> >> > >> >> > On Wed, Oct 5, 2022 at 2:26 PM Νικόλαος Ταμπουρατζής < >> >> > ntampouratzis@ece.auth.gr> wrote: >> >> > >> >> >> In my previous results, I had used double (not float) for the >> >> >> following variables: result, sq_i and sq_j. In the case of float >> >> >> instead of double I get "nan" and not 0.000000. >> >> >> >> >> >> Quoting Νικόλαος Ταμπουρατζής <ntampouratzis@ece.auth.gr>: >> >> >> >> >> >> > Dear Jason, all, >> >> >> > >> >> >> > I am trying to find the accuracy problem with RISCV-FS and I > observe >> >> >> > that the problem is created (at least in my dummy example) because >> >> >> > the variables (double) are set to zero in random simulated time > (for >> >> >> > this reason I get different results among executions of the same >> >> >> > code). Specifically for the following dummy code: >> >> >> > >> >> >> > >> >> >> > #include <cmath> >> >> >> > #include <stdio.h> >> >> >> > >> >> >> > int main(){ >> >> >> > >> >> >> > int dim = 10; >> >> >> > >> >> >> > float result; >> >> >> > >> >> >> > for (int iter = 0; iter < 2; iter++){ >> >> >> > result = 0; >> >> >> > for (int i = 0; i < dim; i++){ >> >> >> > for (int j = 0; j < dim; j++){ >> >> >> > float sq_i = sqrt(i); >> >> >> > float sq_j = sqrt(j); >> >> >> > result += sq_i * sq_j; >> >> >> > printf("ITER: %d | i: %d | j: %d Result(i: %f | j: >> >> >> > %f | i*j: %f): %f\n", iter, i , j, sq_i, sq_j, sq_i * sq_j, > result); >> >> >> > } >> >> >> > } >> >> >> > printf("Final Result: %lf\n", result); >> >> >> > } >> >> >> > } >> >> >> > >> >> >> > >> >> >> > The correct Final Result in both iterations is 372.721656. > However, >> >> >> > I get the following results in FS: >> >> >> > >> >> >> > ITER: 0 | i: 0 | j: 0 Result(i: 0.000000 | j: 0.000000 | i*j: >> >> >> > 0.000000): 0.000000 >> >> >> > ITER: 0 | i: 0 | j: 1 Result(i: 0.000000 | j: 1.000000 | i*j: >> >> >> > 0.000000): 0.000000 >> >> >> > ITER: 0 | i: 0 | j: 2 Result(i: 0.000000 | j: 1.414214 | i*j: >> >> >> > 0.000000): 0.000000 >> >> >> > ITER: 0 | i: 0 | j: 3 Result(i: 0.000000 | j: 1.732051 | i*j: >> >> >> > 0.000000): 0.000000 >> >> >> > ITER: 0 | i: 0 | j: 4 Result(i: 0.000000 | j: 2.000000 | i*j: >> >> >> > 0.000000): 0.000000 >> >> >> > ITER: 0 | i: 0 | j: 5 Result(i: 0.000000 | j: 2.236068 | i*j: >> >> >> > 0.000000): 0.000000 >> >> >> > ITER: 0 | i: 0 | j: 6 Result(i: 0.000000 | j: 2.449490 | i*j: >> >> >> > 0.000000): 0.000000 >> >> >> > ITER: 0 | i: 0 | j: 7 Result(i: 0.000000 | j: 2.645751 | i*j: >> >> >> > 0.000000): 0.000000 >> >> >> > ITER: 0 | i: 0 | j: 8 Result(i: 0.000000 | j: 2.828427 | i*j: >> >> >> > 0.000000): 0.000000 >> >> >> > ITER: 0 | i: 0 | j: 9 Result(i: 0.000000 | j: 3.000000 | i*j: >> >> >> > 0.000000): 0.000000 >> >> >> > ITER: 0 | i: 1 | j: 0 Result(i: 1.000000 | j: 0.000000 | i*j: >> >> >> > 0.000000): 0.000000 >> >> >> > ITER: 0 | i: 1 | j: 1 Result(i: 1.000000 | j: 1.000000 | i*j: >> >> >> > 1.000000): 1.000000 >> >> >> > ITER: 0 | i: 1 | j: 2 Result(i: 1.000000 | j: 1.414214 | i*j: >> >> >> > 1.414214): 2.414214 >> >> >> > ITER: 0 | i: 1 | j: 3 Result(i: 1.000000 | j: 1.732051 | i*j: >> >> >> > 1.732051): 4.146264 >> >> >> > ITER: 0 | i: 1 | j: 4 Result(i: 0.000000 | j: 2.000000 | i*j: >> >> >> > 0.000000): 0.000000 >> >> >> > ITER: 0 | i: 1 | j: 5 Result(i: 0.000000 | j: 2.236068 | i*j: >> >> >> > 0.000000): 0.000000 >> >> >> > ITER: 0 | i: 1 | j: 6 Result(i: 0.000000 | j: 2.449490 | i*j: >> >> >> > 0.000000): 0.000000 >> >> >> > ITER: 0 | i: 1 | j: 7 Result(i: 0.000000 | j: 2.645751 | i*j: >> >> >> > 0.000000): 0.000000 >> >> >> > ITER: 0 | i: 1 | j: 8 Result(i: 0.000000 | j: 2.828427 | i*j: >> >> >> > 0.000000): 0.000000 >> >> >> > ITER: 0 | i: 1 | j: 9 Result(i: 0.000000 | j: 3.000000 | i*j: >> >> >> > 0.000000): 0.000000 >> >> >> > ITER: 0 | i: 2 | j: 0 Result(i: 1.414214 | j: 0.000000 | i*j: >> >> >> > 0.000000): 0.000000 >> >> >> > ITER: 0 | i: 2 | j: 1 Result(i: 1.414214 | j: 1.000000 | i*j: >> >> >> > 1.414214): 1.414214 >> >> >> > ITER: 0 | i: 2 | j: 2 Result(i: 1.414214 | j: 1.414214 | i*j: >> >> >> > 2.000000): 3.414214 >> >> >> > ITER: 0 | i: 2 | j: 3 Result(i: 1.414214 | j: 1.732051 | i*j: >> >> >> > 2.449490): 5.863703 >> >> >> > ITER: 0 | i: 2 | j: 4 Result(i: 1.414214 | j: 2.000000 | i*j: >> >> >> > 2.828427): 8.692130 >> >> >> > ITER: 0 | i: 2 | j: 5 Result(i: 1.414214 | j: 2.236068 | i*j: >> >> >> > 3.162278): 11.854408 >> >> >> > ITER: 0 | i: 2 | j: 6 Result(i: 1.414214 | j: 2.449490 | i*j: >> >> >> > 3.464102): 15.318510 >> >> >> > ITER: 0 | i: 2 | j: 7 Result(i: 1.414214 | j: 2.645751 | i*j: >> >> >> > 3.741657): 19.060167 >> >> >> > ITER: 0 | i: 2 | j: 8 Result(i: 1.414214 | j: 2.828427 | i*j: >> >> >> > 4.000000): 23.060167 >> >> >> > ITER: 0 | i: 2 | j: 9 Result(i: 1.414214 | j: 3.000000 | i*j: >> >> >> > 4.242641): 27.302808 >> >> >> > ITER: 0 | i: 3 | j: 0 Result(i: 1.732051 | j: 0.000000 | i*j: >> >> >> > 0.000000): 27.302808 >> >> >> > ITER: 0 | i: 3 | j: 1 Result(i: 1.732051 | j: 1.000000 | i*j: >> >> >> > 1.732051): 29.034859 >> >> >> > ITER: 0 | i: 3 | j: 2 Result(i: 1.732051 | j: 1.414214 | i*j: >> >> >> > 2.449490): 31.484348 >> >> >> > ITER: 0 | i: 3 | j: 3 Result(i: 1.732051 | j: 1.732051 | i*j: >> >> >> > 3.000000): 34.484348 >> >> >> > ITER: 0 | i: 3 | j: 4 Result(i: 1.732051 | j: 2.000000 | i*j: >> >> >> > 3.464102): 37.948450 >> >> >> > ITER: 0 | i: 3 | j: 5 Result(i: 1.732051 | j: 2.236068 | i*j: >> >> >> > 3.872983): 41.821433 >> >> >> > ITER: 0 | i: 3 | j: 6 Result(i: 1.732051 | j: 2.449490 | i*j: >> >> >> > 4.242641): 46.064074 >> >> >> > ITER: 0 | i: 3 | j: 7 Result(i: 1.732051 | j: 2.645751 | i*j: >> >> >> > 4.582576): 50.646650 >> >> >> > ITER: 0 | i: 3 | j: 8 Result(i: 1.732051 | j: 2.828427 | i*j: >> >> >> > 4.898979): 55.545629 >> >> >> > ITER: 0 | i: 3 | j: 9 Result(i: 1.732051 | j: 3.000000 | i*j: >> >> >> > 5.196152): 60.741782 >> >> >> > ITER: 0 | i: 4 | j: 0 Result(i: 2.000000 | j: 0.000000 | i*j: >> >> >> > 0.000000): 60.741782 >> >> >> > ITER: 0 | i: 4 | j: 1 Result(i: 2.000000 | j: 1.000000 | i*j: >> >> >> > 2.000000): 62.741782 >> >> >> > ITER: 0 | i: 4 | j: 2 Result(i: 2.000000 | j: 1.414214 | i*j: >> >> >> > 2.828427): 65.570209 >> >> >> > ITER: 0 | i: 4 | j: 3 Result(i: 2.000000 | j: 1.732051 | i*j: >> >> >> > 3.464102): 69.034310 >> >> >> > ITER: 0 | i: 4 | j: 4 Result(i: 2.000000 | j: 2.000000 | i*j: >> >> >> > 4.000000): 73.034310 >> >> >> > ITER: 0 | i: 4 | j: 5 Result(i: 2.000000 | j: 2.236068 | i*j: >> >> >> > 4.472136): 77.506446 >> >> >> > ITER: 0 | i: 4 | j: 6 Result(i: 2.000000 | j: 2.449490 | i*j: >> >> >> > 4.898979): 82.405426 >> >> >> > ITER: 0 | i: 4 | j: 7 Result(i: 2.000000 | j: 2.645751 | i*j: >> >> >> > 5.291503): 87.696928 >> >> >> > ITER: 0 | i: 4 | j: 8 Result(i: 2.000000 | j: 2.828427 | i*j: >> >> >> > 5.656854): 93.353783 >> >> >> > ITER: 0 | i: 4 | j: 9 Result(i: 2.000000 | j: 3.000000 | i*j: >> >> >> > 6.000000): 99.353783 >> >> >> > ITER: 0 | i: 5 | j: 0 Result(i: 2.236068 | j: 0.000000 | i*j: >> >> >> > 0.000000): 99.353783 >> >> >> > ITER: 0 | i: 5 | j: 1 Result(i: 2.236068 | j: 1.000000 | i*j: >> >> >> > 2.236068): 101.589851 >> >> >> > ITER: 0 | i: 5 | j: 2 Result(i: 2.236068 | j: 1.414214 | i*j: >> >> >> > 3.162278): 104.752128 >> >> >> > ITER: 0 | i: 5 | j: 3 Result(i: 2.236068 | j: 1.732051 | i*j: >> >> >> > 3.872983): 108.625112 >> >> >> > ITER: 0 | i: 5 | j: 4 Result(i: 2.236068 | j: 2.000000 | i*j: >> >> >> > 4.472136): 113.097248 >> >> >> > ITER: 0 | i: 5 | j: 5 Result(i: 2.236068 | j: 2.236068 | i*j: >> >> >> > 5.000000): 118.097248 >> >> >> > ITER: 0 | i: 5 | j: 6 Result(i: 2.236068 | j: 2.449490 | i*j: >> >> >> > 5.477226): 123.574473 >> >> >> > ITER: 0 | i: 5 | j: 7 Result(i: 2.236068 | j: 2.645751 | i*j: >> >> >> > 5.916080): 129.490553 >> >> >> > ITER: 0 | i: 5 | j: 8 Result(i: 2.236068 | j: 2.828427 | i*j: >> >> >> > 6.324555): 135.815108 >> >> >> > ITER: 0 | i: 5 | j: 9 Result(i: 2.236068 | j: 3.000000 | i*j: >> >> >> > 6.708204): 142.523312 >> >> >> > ITER: 0 | i: 6 | j: 0 Result(i: 2.449490 | j: 0.000000 | i*j: >> >> >> > 0.000000): 142.523312 >> >> >> > ITER: 0 | i: 6 | j: 1 Result(i: 2.449490 | j: 1.000000 | i*j: >> >> >> > 2.449490): 144.972802 >> >> >> > ITER: 0 | i: 6 | j: 2 Result(i: 2.449490 | j: 1.414214 | i*j: >> >> >> > 3.464102): 148.436904 >> >> >> > ITER: 0 | i: 6 | j: 3 Result(i: 2.449490 | j: 1.732051 | i*j: >> >> >> > 4.242641): 152.679544 >> >> >> > ITER: 0 | i: 6 | j: 4 Result(i: 2.449490 | j: 2.000000 | i*j: >> >> >> > 4.898979): 157.578524 >> >> >> > ITER: 0 | i: 6 | j: 5 Result(i: 2.449490 | j: 2.236068 | i*j: >> >> >> > 5.477226): 163.055749 >> >> >> > ITER: 0 | i: 6 | j: 6 Result(i: 2.449490 | j: 2.449490 | i*j: >> >> >> > 6.000000): 169.055749 >> >> >> > ITER: 0 | i: 6 | j: 7 Result(i: 2.449490 | j: 2.645751 | i*j: >> >> >> > 6.480741): 175.536490 >> >> >> > ITER: 0 | i: 6 | j: 8 Result(i: 2.449490 | j: 2.828427 | i*j: >> >> >> > 6.928203): 182.464693 >> >> >> > ITER: 0 | i: 6 | j: 9 Result(i: 2.449490 | j: 3.000000 | i*j: >> >> >> > 7.348469): 189.813162 >> >> >> > ITER: 0 | i: 7 | j: 0 Result(i: 2.645751 | j: 0.000000 | i*j: >> >> >> > 0.000000): 189.813162 >> >> >> > ITER: 0 | i: 7 | j: 1 Result(i: 2.645751 | j: 1.000000 | i*j: >> >> >> > 2.645751): 192.458914 >> >> >> > ITER: 0 | i: 7 | j: 2 Result(i: 2.645751 | j: 1.414214 | i*j: >> >> >> > 3.741657): 196.200571 >> >> >> > ITER: 0 | i: 7 | j: 3 Result(i: 2.645751 | j: 1.732051 | i*j: >> >> >> > 4.582576): 200.783147 >> >> >> > ITER: 0 | i: 7 | j: 4 Result(i: 2.645751 | j: 2.000000 | i*j: >> >> >> > 5.291503): 206.074649 >> >> >> > ITER: 0 | i: 7 | j: 5 Result(i: 2.645751 | j: 2.236068 | i*j: >> >> >> > 5.916080): 211.990729 >> >> >> > ITER: 0 | i: 7 | j: 6 Result(i: 2.645751 | j: 2.449490 | i*j: >> >> >> > 6.480741): 218.471470 >> >> >> > ITER: 0 | i: 7 | j: 7 Result(i: 2.645751 | j: 2.645751 | i*j: >> >> >> > 7.000000): 225.471470 >> >> >> > ITER: 0 | i: 7 | j: 8 Result(i: 2.645751 | j: 2.828427 | i*j: >> >> >> > 7.483315): 232.954785 >> >> >> > ITER: 0 | i: 7 | j: 9 Result(i: 2.645751 | j: 3.000000 | i*j: >> >> >> > 7.937254): 240.892039 >> >> >> > ITER: 0 | i: 8 | j: 0 Result(i: 2.828427 | j: 0.000000 | i*j: >> >> >> > 0.000000): 240.892039 >> >> >> > ITER: 0 | i: 8 | j: 1 Result(i: 2.828427 | j: 1.000000 | i*j: >> >> >> > 2.828427): 243.720466 >> >> >> > ITER: 0 | i: 8 | j: 2 Result(i: 2.828427 | j: 1.414214 | i*j: >> >> >> > 4.000000): 247.720466 >> >> >> > ITER: 0 | i: 8 | j: 3 Result(i: 2.828427 | j: 1.732051 | i*j: >> >> >> > 4.898979): 252.619445 >> >> >> > ITER: 0 | i: 8 | j: 4 Result(i: 2.828427 | j: 2.000000 | i*j: >> >> >> > 5.656854): 258.276300 >> >> >> > ITER: 0 | i: 8 | j: 5 Result(i: 2.828427 | j: 2.236068 | i*j: >> >> >> > 6.324555): 264.600855 >> >> >> > ITER: 0 | i: 8 | j: 6 Result(i: 2.828427 | j: 2.449490 | i*j: >> >> >> > 6.928203): 271.529058 >> >> >> > ITER: 0 | i: 8 | j: 7 Result(i: 2.828427 | j: 2.645751 | i*j: >> >> >> > 7.483315): 279.012373 >> >> >> > ITER: 0 | i: 8 | j: 8 Result(i: 2.828427 | j: 2.828427 | i*j: >> >> >> > 8.000000): 287.012373 >> >> >> > ITER: 0 | i: 8 | j: 9 Result(i: 2.828427 | j: 3.000000 | i*j: >> >> >> > 8.485281): 295.497654 >> >> >> > ITER: 0 | i: 9 | j: 0 Result(i: 3.000000 | j: 0.000000 | i*j: >> >> >> > 0.000000): 295.497654 >> >> >> > ITER: 0 | i: 9 | j: 1 Result(i: 3.000000 | j: 1.000000 | i*j: >> >> >> > 3.000000): 298.497654 >> >> >> > ITER: 0 | i: 9 | j: 2 Result(i: 3.000000 | j: 1.414214 | i*j: >> >> >> > 4.242641): 302.740295 >> >> >> > ITER: 0 | i: 9 | j: 3 Result(i: 3.000000 | j: 1.732051 | i*j: >> >> >> > 5.196152): 307.936447 >> >> >> > ITER: 0 | i: 9 | j: 4 Result(i: 3.000000 | j: 2.000000 | i*j: >> >> >> > 6.000000): 313.936447 >> >> >> > ITER: 0 | i: 9 | j: 5 Result(i: 3.000000 | j: 2.236068 | i*j: >> >> >> > 6.708204): 320.644651 >> >> >> > ITER: 0 | i: 9 | j: 6 Result(i: 3.000000 | j: 2.449490 | i*j: >> >> >> > 7.348469): 327.993120 >> >> >> > ITER: 0 | i: 9 | j: 7 Result(i: 3.000000 | j: 2.645751 | i*j: >> >> >> > 7.937254): 335.930374 >> >> >> > ITER: 0 | i: 9 | j: 8 Result(i: 3.000000 | j: 2.828427 | i*j: >> >> >> > 8.485281): 344.415656 >> >> >> > ITER: 0 | i: 9 | j: 9 Result(i: 3.000000 | j: 3.000000 | i*j: >> >> >> > 9.000000): 353.415656 >> >> >> > Final Result: 353.415656 >> >> >> > ITER: 1 | i: 0 | j: 0 Result(i: 0.000000 | j: 0.000000 | i*j: >> >> >> > 0.000000): 0.000000 >> >> >> > ITER: 1 | i: 0 | j: 1 Result(i: 0.000000 | j: 1.000000 | i*j: >> >> >> > 0.000000): 0.000000 >> >> >> > ITER: 1 | i: 0 | j: 2 Result(i: 0.000000 | j: 1.414214 | i*j: >> >> >> > 0.000000): 0.000000 >> >> >> > ITER: 1 | i: 0 | j: 3 Result(i: 0.000000 | j: 1.732051 | i*j: >> >> >> > 0.000000): 0.000000 >> >> >> > ITER: 1 | i: 0 | j: 4 Result(i: 0.000000 | j: 2.000000 | i*j: >> >> >> > 0.000000): 0.000000 >> >> >> > ITER: 1 | i: 0 | j: 5 Result(i: 0.000000 | j: 2.236068 | i*j: >> >> >> > 0.000000): 0.000000 >> >> >> > ITER: 1 | i: 0 | j: 6 Result(i: 0.000000 | j: 2.449490 | i*j: >> >> >> > 0.000000): 0.000000 >> >> >> > ITER: 1 | i: 0 | j: 7 Result(i: 0.000000 | j: 2.645751 | i*j: >> >> >> > 0.000000): 0.000000 >> >> >> > ITER: 1 | i: 0 | j: 8 Result(i: 0.000000 | j: 2.828427 | i*j: >> >> >> > 0.000000): 0.000000 >> >> >> > ITER: 1 | i: 0 | j: 9 Result(i: 0.000000 | j: 3.000000 | i*j: >> >> >> > 0.000000): 0.000000 >> >> >> > ITER: 1 | i: 1 | j: 0 Result(i: 1.000000 | j: 0.000000 | i*j: >> >> >> > 0.000000): 0.000000 >> >> >> > ITER: 1 | i: 1 | j: 1 Result(i: 1.000000 | j: 1.000000 | i*j: >> >> >> > 1.000000): 1.000000 >> >> >> > ITER: 1 | i: 1 | j: 2 Result(i: 1.000000 | j: 1.414214 | i*j: >> >> >> > 1.414214): 2.414214 >> >> >> > ITER: 1 | i: 1 | j: 3 Result(i: 1.000000 | j: 1.732051 | i*j: >> >> >> > 1.732051): 4.146264 >> >> >> > ITER: 1 | i: 1 | j: 4 Result(i: 1.000000 | j: 2.000000 | i*j: >> >> >> > 2.000000): 6.146264 >> >> >> > ITER: 1 | i: 1 | j: 5 Result(i: 1.000000 | j: 2.236068 | i*j: >> >> >> > 2.236068): 8.382332 >> >> >> > ITER: 1 | i: 1 | j: 6 Result(i: 1.000000 | j: 2.449490 | i*j: >> >> >> > 2.449490): 10.831822 >> >> >> > ITER: 1 | i: 1 | j: 7 Result(i: 1.000000 | j: 2.645751 | i*j: >> >> >> > 2.645751): 13.477573 >> >> >> > ITER: 1 | i: 1 | j: 8 Result(i: 1.000000 | j: 2.828427 | i*j: >> >> >> > 2.828427): 16.306001 >> >> >> > ITER: 1 | i: 1 | j: 9 Result(i: 1.000000 | j: 3.000000 | i*j: >> >> >> > 3.000000): 19.306001 >> >> >> > ITER: 1 | i: 2 | j: 0 Result(i: 1.414214 | j: 0.000000 | i*j: >> >> >> > 0.000000): 19.306001 >> >> >> > ITER: 1 | i: 2 | j: 1 Result(i: 1.414214 | j: 1.000000 | i*j: >> >> >> > 1.414214): 20.720214 >> >> >> > ITER: 1 | i: 2 | j: 2 Result(i: 1.414214 | j: 1.414214 | i*j: >> >> >> > 2.000000): 22.720214 >> >> >> > ITER: 1 | i: 2 | j: 3 Result(i: 1.414214 | j: 1.732051 | i*j: >> >> >> > 2.449490): 25.169704 >> >> >> > ITER: 1 | i: 2 | j: 4 Result(i: 1.414214 | j: 2.000000 | i*j: >> >> >> > 2.828427): 27.998131 >> >> >> > ITER: 1 | i: 2 | j: 5 Result(i: 1.414214 | j: 2.236068 | i*j: >> >> >> > 3.162278): 31.160409 >> >> >> > ITER: 1 | i: 2 | j: 6 Result(i: 1.414214 | j: 2.449490 | i*j: >> >> >> > 3.464102): 34.624510 >> >> >> > ITER: 1 | i: 2 | j: 7 Result(i: 1.414214 | j: 2.645751 | i*j: >> >> >> > 3.741657): 38.366168 >> >> >> > ITER: 1 | i: 2 | j: 8 Result(i: 1.414214 | j: 2.828427 | i*j: >> >> >> > 4.000000): 42.366168 >> >> >> > ITER: 1 | i: 2 | j: 9 Result(i: 1.414214 | j: 3.000000 | i*j: >> >> >> > 4.242641): 46.608808 >> >> >> > ITER: 1 | i: 3 | j: 0 Result(i: 1.732051 | j: 0.000000 | i*j: >> >> >> > 0.000000): 46.608808 >> >> >> > ITER: 1 | i: 3 | j: 1 Result(i: 1.732051 | j: 1.000000 | i*j: >> >> >> > 1.732051): 48.340859 >> >> >> > ITER: 1 | i: 3 | j: 2 Result(i: 1.732051 | j: 1.414214 | i*j: >> >> >> > 2.449490): 50.790349 >> >> >> > ITER: 1 | i: 3 | j: 3 Result(i: 1.732051 | j: 1.732051 | i*j: >> >> >> > 3.000000): 53.790349 >> >> >> > ITER: 1 | i: 3 | j: 4 Result(i: 1.732051 | j: 2.000000 | i*j: >> >> >> > 3.464102): 57.254450 >> >> >> > ITER: 1 | i: 3 | j: 5 Result(i: 1.732051 | j: 2.236068 | i*j: >> >> >> > 3.872983): 61.127434 >> >> >> > ITER: 1 | i: 3 | j: 6 Result(i: 1.732051 | j: 2.449490 | i*j: >> >> >> > 4.242641): 65.370075 >> >> >> > ITER: 1 | i: 3 | j: 7 Result(i: 1.732051 | j: 2.645751 | i*j: >> >> >> > 4.582576): 69.952650 >> >> >> > ITER: 1 | i: 3 | j: 8 Result(i: 1.732051 | j: 2.828427 | i*j: >> >> >> > 4.898979): 74.851630 >> >> >> > ITER: 1 | i: 3 | j: 9 Result(i: 1.732051 | j: 3.000000 | i*j: >> >> >> > 5.196152): 80.047782 >> >> >> > ITER: 1 | i: 4 | j: 0 Result(i: 2.000000 | j: 0.000000 | i*j: >> >> >> > 0.000000): 80.047782 >> >> >> > ITER: 1 | i: 4 | j: 1 Result(i: 2.000000 | j: 1.000000 | i*j: >> >> >> > 2.000000): 82.047782 >> >> >> > ITER: 1 | i: 4 | j: 2 Result(i: 2.000000 | j: 1.414214 | i*j: >> >> >> > 2.828427): 84.876209 >> >> >> > ITER: 1 | i: 4 | j: 3 Result(i: 2.000000 | j: 1.732051 | i*j: >> >> >> > 3.464102): 88.340311 >> >> >> > ITER: 1 | i: 4 | j: 4 Result(i: 2.000000 | j: 2.000000 | i*j: >> >> >> > 4.000000): 92.340311 >> >> >> > ITER: 1 | i: 4 | j: 5 Result(i: 2.000000 | j: 2.236068 | i*j: >> >> >> > 4.472136): 96.812447 >> >> >> > ITER: 1 | i: 4 | j: 6 Result(i: 2.000000 | j: 2.449490 | i*j: >> >> >> > 4.898979): 101.711426 >> >> >> > ITER: 1 | i: 4 | j: 7 Result(i: 2.000000 | j: 2.645751 | i*j: >> >> >> > 5.291503): 107.002929 >> >> >> > ITER: 1 | i: 4 | j: 8 Result(i: 2.000000 | j: 2.828427 | i*j: >> >> >> > 5.656854): 112.659783 >> >> >> > ITER: 1 | i: 4 | j: 9 Result(i: 2.000000 | j: 3.000000 | i*j: >> >> >> > 6.000000): 118.659783 >> >> >> > ITER: 1 | i: 5 | j: 0 Result(i: 2.236068 | j: 0.000000 | i*j: >> >> >> > 0.000000): 118.659783 >> >> >> > ITER: 1 | i: 5 | j: 1 Result(i: 2.236068 | j: 1.000000 | i*j: >> >> >> > 2.236068): 120.895851 >> >> >> > ITER: 1 | i: 5 | j: 2 Result(i: 2.236068 | j: 1.414214 | i*j: >> >> >> > 3.162278): 124.058129 >> >> >> > ITER: 1 | i: 5 | j: 3 Result(i: 2.236068 | j: 1.732051 | i*j: >> >> >> > 3.872983): 127.931112 >> >> >> > ITER: 1 | i: 5 | j: 4 Result(i: 2.236068 | j: 2.000000 | i*j: >> >> >> > 4.472136): 132.403248 >> >> >> > ITER: 1 | i: 5 | j: 5 Result(i: 2.236068 | j: 2.236068 | i*j: >> >> >> > 5.000000): 137.403248 >> >> >> > ITER: 1 | i: 5 | j: 6 Result(i: 2.236068 | j: 2.449490 | i*j: >> >> >> > 5.477226): 142.880474 >> >> >> > ITER: 1 | i: 5 | j: 7 Result(i: 2.236068 | j: 2.645751 | i*j: >> >> >> > 5.916080): 148.796553 >> >> >> > ITER: 1 | i: 5 | j: 8 Result(i: 2.236068 | j: 2.828427 | i*j: >> >> >> > 6.324555): 155.121109 >> >> >> > ITER: 1 | i: 5 | j: 9 Result(i: 2.236068 | j: 3.000000 | i*j: >> >> >> > 6.708204): 161.829313 >> >> >> > ITER: 1 | i: 6 | j: 0 Result(i: 2.449490 | j: 0.000000 | i*j: >> >> >> > 0.000000): 161.829313 >> >> >> > ITER: 1 | i: 6 | j: 1 Result(i: 2.449490 | j: 1.000000 | i*j: >> >> >> > 2.449490): 164.278802 >> >> >> > ITER: 1 | i: 6 | j: 2 Result(i: 2.449490 | j: 1.414214 | i*j: >> >> >> > 3.464102): 167.742904 >> >> >> > ITER: 1 | i: 6 | j: 3 Result(i: 2.449490 | j: 1.732051 | i*j: >> >> >> > 4.242641): 171.985545 >> >> >> > ITER: 1 | i: 6 | j: 4 Result(i: 2.449490 | j: 2.000000 | i*j: >> >> >> > 4.898979): 176.884524 >> >> >> > ITER: 1 | i: 6 | j: 5 Result(i: 2.449490 | j: 2.236068 | i*j: >> >> >> > 5.477226): 182.361750 >> >> >> > ITER: 1 | i: 6 | j: 6 Result(i: 2.449490 | j: 2.449490 | i*j: >> >> >> > 6.000000): 188.361750 >> >> >> > ITER: 1 | i: 6 | j: 7 Result(i: 2.449490 | j: 2.645751 | i*j: >> >> >> > 6.480741): 194.842491 >> >> >> > ITER: 1 | i: 6 | j: 8 Result(i: 2.449490 | j: 2.828427 | i*j: >> >> >> > 6.928203): 201.770694 >> >> >> > ITER: 1 | i: 6 | j: 9 Result(i: 2.449490 | j: 3.000000 | i*j: >> >> >> > 7.348469): 209.119163 >> >> >> > ITER: 1 | i: 7 | j: 0 Result(i: 2.645751 | j: 0.000000 | i*j: >> >> >> > 0.000000): 209.119163 >> >> >> > ITER: 1 | i: 7 | j: 1 Result(i: 2.645751 | j: 1.000000 | i*j: >> >> >> > 2.645751): 211.764914 >> >> >> > ITER: 1 | i: 7 | j: 2 Result(i: 2.645751 | j: 1.414214 | i*j: >> >> >> > 3.741657): 215.506572 >> >> >> > ITER: 1 | i: 7 | j: 3 Result(i: 2.645751 | j: 1.732051 | i*j: >> >> >> > 4.582576): 220.089147 >> >> >> > ITER: 1 | i: 7 | j: 4 Result(i: 2.645751 | j: 2.000000 | i*j: >> >> >> > 5.291503): 225.380650 >> >> >> > ITER: 1 | i: 7 | j: 5 Result(i: 2.645751 | j: 2.236068 | i*j: >> >> >> > 5.916080): 231.296730 >> >> >> > ITER: 1 | i: 7 | j: 6 Result(i: 2.645751 | j: 2.449490 | i*j: >> >> >> > 6.480741): 237.777470 >> >> >> > ITER: 1 | i: 7 | j: 7 Result(i: 2.645751 | j: 2.645751 | i*j: >> >> >> > 7.000000): 244.777470 >> >> >> > ITER: 1 | i: 7 | j: 8 Result(i: 2.645751 | j: 2.828427 | i*j: >> >> >> > 7.483315): 252.260785 >> >> >> > ITER: 1 | i: 7 | j: 9 Result(i: 2.645751 | j: 3.000000 | i*j: >> >> >> > 7.937254): 260.198039 >> >> >> > ITER: 1 | i: 8 | j: 0 Result(i: 2.828427 | j: 0.000000 | i*j: >> >> >> > 0.000000): 260.198039 >> >> >> > ITER: 1 | i: 8 | j: 1 Result(i: 2.828427 | j: 1.000000 | i*j: >> >> >> > 2.828427): 263.026466 >> >> >> > ITER: 1 | i: 8 | j: 2 Result(i: 2.828427 | j: 1.414214 | i*j: >> >> >> > 4.000000): 267.026466 >> >> >> > ITER: 1 | i: 8 | j: 3 Result(i: 2.828427 | j: 1.732051 | i*j: >> >> >> > 4.898979): 271.925446 >> >> >> > ITER: 1 | i: 8 | j: 4 Result(i: 2.828427 | j: 2.000000 | i*j: >> >> >> > 5.656854): 277.582300 >> >> >> > ITER: 1 | i: 8 | j: 5 Result(i: 2.828427 | j: 2.236068 | i*j: >> >> >> > 6.324555): 283.906855 >> >> >> > ITER: 1 | i: 8 | j: 6 Result(i: 2.828427 | j: 2.449490 | i*j: >> >> >> > 6.928203): 290.835059 >> >> >> > ITER: 1 | i: 8 | j: 7 Result(i: 2.828427 | j: 2.645751 | i*j: >> >> >> > 7.483315): 298.318373 >> >> >> > ITER: 1 | i: 8 | j: 8 Result(i: 2.828427 | j: 2.828427 | i*j: >> >> >> > 8.000000): 306.318373 >> >> >> > ITER: 1 | i: 8 | j: 9 Result(i: 2.828427 | j: 3.000000 | i*j: >> >> >> > 8.485281): 314.803655 >> >> >> > ITER: 1 | i: 9 | j: 0 Result(i: 3.000000 | j: 0.000000 | i*j: >> >> >> > 0.000000): 314.803655 >> >> >> > ITER: 1 | i: 9 | j: 1 Result(i: 3.000000 | j: 1.000000 | i*j: >> >> >> > 3.000000): 317.803655 >> >> >> > ITER: 1 | i: 9 | j: 2 Result(i: 3.000000 | j: 1.414214 | i*j: >> >> >> > 4.242641): 322.046295 >> >> >> > ITER: 1 | i: 9 | j: 3 Result(i: 3.000000 | j: 1.732051 | i*j: >> >> >> > 5.196152): 327.242448 >> >> >> > ITER: 1 | i: 9 | j: 4 Result(i: 3.000000 | j: 2.000000 | i*j: >> >> >> > 6.000000): 333.242448 >> >> >> > ITER: 1 | i: 9 | j: 5 Result(i: 3.000000 | j: 2.236068 | i*j: >> >> >> > 6.708204): 339.950652 >> >> >> > ITER: 1 | i: 9 | j: 6 Result(i: 3.000000 | j: 2.449490 | i*j: >> >> >> > 7.348469): 347.299121 >> >> >> > ITER: 1 | i: 9 | j: 7 Result(i: 3.000000 | j: 2.645751 | i*j: >> >> >> > 7.937254): 355.236375 >> >> >> > ITER: 1 | i: 9 | j: 8 Result(i: 3.000000 | j: 2.828427 | i*j: >> >> >> > 8.485281): 363.721656 >> >> >> > ITER: 1 | i: 9 | j: 9 Result(i: 3.000000 | j: 3.000000 | i*j: >> >> >> > 9.000000): 372.721656 >> >> >> > Final Result: 372.721656 >> >> >> > >> >> >> > >> >> >> > >> >> >> > As we can see in the following iterations the sqrt(1) as well as > the >> >> >> > result is set to zero for some reason. >> >> >> > >> >> >> > ITER: 0 | i: 1 | j: 4 Result(i: 0.000000 | j: 2.000000 | i*j: >> >> >> > 0.000000): 0.000000 >> >> >> > ITER: 0 | i: 1 | j: 5 Result(i: 0.000000 | j: 2.236068 | i*j: >> >> >> > 0.000000): 0.000000 >> >> >> > ITER: 0 | i: 1 | j: 6 Result(i: 0.000000 | j: 2.449490 | i*j: >> >> >> > 0.000000): 0.000000 >> >> >> > ITER: 0 | i: 1 | j: 7 Result(i: 0.000000 | j: 2.645751 | i*j: >> >> >> > 0.000000): 0.000000 >> >> >> > ITER: 0 | i: 1 | j: 8 Result(i: 0.000000 | j: 2.828427 | i*j: >> >> >> > 0.000000): 0.000000 >> >> >> > ITER: 0 | i: 1 | j: 9 Result(i: 0.000000 | j: 3.000000 | i*j: >> >> >> > 0.000000): 0.000000 >> >> >> > >> >> >> > Please help me to resolve the accuracy issue! I think that it will >> >> >> > be very useful for gem5 community. >> >> >> > >> >> >> > To be noticed, I find the correct simulated tick in which the >> >> >> > application started in FS (using m5 dumpstats), and I start the >> >> >> > --debug-start, but the trace file which is generated is 10x larger >> >> >> > than SE mode for the same application. How can I compare them? >> >> >> > >> >> >> > Thank you in advance! >> >> >> > Best regards, >> >> >> > Nikos >> >> >> > >> >> >> > Quoting Νικόλαος Ταμπουρατζής <ntampouratzis@ece.auth.gr>: >> >> >> > >> >> >> >> Dear Jason, >> >> >> >> >> >> >> >> I am trying to use --debug-start but in FS mode it is very >> >> >> >> difficult to find the tick on which the application is started! >> >> >> >> >> >> >> >> However, I am writing the following very simple c++ program: >> >> >> >> >> >> >> >> #include <cmath> >> >> >> >> #include <stdio.h> >> >> >> >> >> >> >> >> int main(){ >> >> >> >> >> >> >> >> int dim = 4096; >> >> >> >> >> >> >> >> double result; >> >> >> >> >> >> >> >> for (int iter = 0; iter < 2; iter++){ >> >> >> >> result = 0; >> >> >> >> for (int i = 0; i < dim; i++){ >> >> >> >> for (int j = 0; j < dim; j++){ >> >> >> >> result += sqrt(i) * sqrt(j); >> >> >> >> } >> >> >> >> } >> >> >> >> printf("Result: %lf\n", result); //Result: > 30530733453.127449 >> >> >> >> } >> >> >> >> } >> >> >> >> >> >> >> >> I cross-compile it using: riscv64-linux-gnu-g++ -static -O3 -o >> >> >> >> test_riscv test_riscv.cpp >> >> >> >> >> >> >> >> >> >> >> >> While in X86 (without cross-compilation of course), QEMU-RISCV, >> >> >> >> GEM5-SE the result is the same (30530733453.127449), in GEM5-FS > the >> >> >> >> result is different! In addition, the result is also different >> >> >> >> between the 2 iterations. >> >> >> >> >> >> >> >> Please reproduce the error if you want in order to verify my > result. >> >> >> >> Ηow can the issue be resolved? >> >> >> >> >> >> >> >> Thank you in advance! >> >> >> >> >> >> >> >> Best regards, >> >> >> >> Nikos >> >> >> >> >> >> >> >> >> >> >> >> Quoting Jason Lowe-Power <jason@lowepower.com>: >> >> >> >> >> >> >> >>> Hi Nikos, >> >> >> >>> >> >> >> >>> You can use --debug-start to start the debugging after some > number >> >> of >> >> >> >>> ticks. Also, I would expect that the difference should come up >> >> >> quickly, so >> >> >> >>> no need to run the program to the end. >> >> >> >>> >> >> >> >>> For the FS mode one, you will want to just start the trace as > the >> >> >> >>> application starts. This could be a bit of a pain. >> >> >> >>> >> >> >> >>> I'm not really sure what fundamentally could be different. FS > and SE >> >> >> mode >> >> >> >>> use the exact same code for executing instructions, so I don't > think >> >> >> that's >> >> >> >>> the problem. Have you tried running for smaller inputs or just > one >> >> >> >>> iteration? >> >> >> >>> >> >> >> >>> Jason >> >> >> >>> >> >> >> >>> >> >> >> >>> >> >> >> >>> On Wed, Sep 21, 2022 at 9:04 AM Νικόλαος Ταμπουρατζής < >> >> >> >>> ntampouratzis@ece.auth.gr> wrote: >> >> >> >>> >> >> >> >>>> Dear Bobby, >> >> >> >>>> >> >> >> >>>> Iam trying to add --debug-flags=Exec (building the gem5 for >> >> gem5.opt >> >> >> >>>> not for gem5.fast which I had) but the debug traces exceed the > 20GB >> >> >> >>>> (and it is not finished yet) for less than 1 simulated second. > How >> >> can >> >> >> >>>> I reduce the size of the debug-flags (or set something more >> >> specific)? >> >> >> >>>> >> >> >> >>>> In contrast I build the HPCG benchmark with DHPCG_DEBUG flag. > If >> >> you >> >> >> >>>> want, you can compare these two output files >> >> >> >>>> (hpcg20010909T014640_SE_Mode & HPCG-Benchmark_3.1_FS_Mode). As > you >> >> can >> >> >> >>>> see, something goes wrong with the accuracy of calculations in > FS >> >> mode >> >> >> >>>> (benchmark uses double precission). You can find the files > here: >> >> >> >>>> http://kition.mhl.tuc.gr:8000/d/68d82f3533/ >> >> >> >>>> >> >> >> >>>> Best regards, >> >> >> >>>> Nikos >> >> >> >>>> >> >> >> >>>> Quoting Jason Lowe-Power <jason@lowepower.com>: >> >> >> >>>> >> >> >> >>>>> That's quite odd that it works in SE mode but not FS mode! >> >> >> >>>>> >> >> >> >>>>> I would suggest running with --debug-flags=Exec for both and > then >> >> >> >>>> perform a >> >> >> >>>>> diff to see how they differ. >> >> >> >>>>> >> >> >> >>>>> Cheers, >> >> >> >>>>> Jason >> >> >> >>>>> >> >> >> >>>>> On Tue, Sep 20, 2022 at 2:45 PM Νικόλαος Ταμπουρατζής < >> >> >> >>>>> ntampouratzis@ece.auth.gr> wrote: >> >> >> >>>>> >> >> >> >>>>>> Dear Bobby, >> >> >> >>>>>> >> >> >> >>>>>> In QEMU I get the same (correct) results that I get in SE > mode >> >> >> >>>>>> simulation. I get invalid results in FS simulation (in both >> >> >> >>>>>> riscv-fs.py and riscv-ubuntu-run.py). I cannot access real > RISCV >> >> >> >>>>>> hardware at this moment, however, if you want you may > execute my >> >> >> xhpcg >> >> >> >>>>>> binary (http://kition.mhl.tuc.gr:8000/f/4ca25fdd3c/) with the >> >> >> >>>>>> following configuration: >> >> >> >>>>>> >> >> >> >>>>>> ./xhpcg --nx=16 --ny=16 --nz=16 --npx=1 --npy=1 --npz=1 > --rt=0.1 >> >> >> >>>>>> >> >> >> >>>>>> Please let me know if you have any updates! >> >> >> >>>>>> >> >> >> >>>>>> Best regards, >> >> >> >>>>>> Nikos >> >> >> >>>>>> >> >> >> >>>>>> >> >> >> >>>>>> Quoting Jason Lowe-Power <jason@lowepower.com>: >> >> >> >>>>>> >> >> >> >>>>>>> Hi Nikos, >> >> >> >>>>>>> >> >> >> >>>>>>> I notice you said the following in your original email: >> >> >> >>>>>>> >> >> >> >>>>>>> In addition, I used the RISCV Ubuntu image >> >> >> >>>>>>>> ( >> >> >> https://github.com/gem5/gem5-resources/tree/stable/src/riscv-ubuntu >> >> >> >>>> ), >> >> >> >>>>>>>> I installed the gcc compiler, compile it (through qemu) > and I >> >> get >> >> >> >>>>>>>> wrong results too. >> >> >> >>>>>>> >> >> >> >>>>>>> >> >> >> >>>>>>> Is this saying you get the wrong results is QEMU? If so, > the bug >> >> >> is in >> >> >> >>>>>> GCC >> >> >> >>>>>>> or the HPCG workload, not in gem5. If not, I would test in > QEMU >> >> to >> >> >> >>>> make >> >> >> >>>>>>> sure the binary works there. Another way you could test to > see >> >> if >> >> >> the >> >> >> >>>>>>> problem is your binary or gem5 would be to run it on real >> >> >> hardware. We >> >> >> >>>>>> have >> >> >> >>>>>>> access to some RISC-V hardware here at UC Davis, if you > don't >> >> have >> >> >> >>>> access >> >> >> >>>>>>> to it. >> >> >> >>>>>>> >> >> >> >>>>>>> Cheers, >> >> >> >>>>>>> Jason >> >> >> >>>>>>> >> >> >> >>>>>>> On Tue, Sep 20, 2022 at 12:58 AM Νικόλαος Ταμπουρατζής < >> >> >> >>>>>>> ntampouratzis@ece.auth.gr> wrote: >> >> >> >>>>>>> >> >> >> >>>>>>>> Dear Bobby, >> >> >> >>>>>>>> >> >> >> >>>>>>>> 1) I use the original riscv-fs.py which is provided in the >> >> latest >> >> >> >>>> gem5 >> >> >> >>>>>>>> release. >> >> >> >>>>>>>> I run the gem5 once (./build/RISCV/gem5.fast -d >> >> ./HPCG_FS_results >> >> >> >>>>>>>> ./configs/example/gem5_library/riscv-fs.py) in order to >> >> download >> >> >> the >> >> >> >>>>>>>> riscv-bootloader-vmlinux-5.10 and riscv-disk-img. >> >> >> >>>>>>>> After this I mount the riscv-disk-img (sudo mount -o loop >> >> >> >>>>>>>> riscv-disk-img /mnt), put the xhpcg executable and I do the >> >> >> following >> >> >> >>>>>>>> changes in riscv-fs.py to boot the riscv-disk-img with >> >> executable: >> >> >> >>>>>>>> >> >> >> >>>>>>>> image = CustomDiskImageResource( >> >> >> >>>>>>>> local_path = > "/home/cossim/.cache/gem5/riscv-disk-img", >> >> >> >>>>>>>> ) >> >> >> >>>>>>>> >> >> >> >>>>>>>> # Set the Full System workload. >> >> >> >>>>>>>> board.set_kernel_disk_workload( >> >> >> >>>>>>>> >> >> >> kernel=Resource("riscv-bootloader-vmlinux-5.10"), >> >> >> >>>>>>>> disk_image=image, >> >> >> >>>>>>>> ) >> >> >> >>>>>>>> >> >> >> >>>>>>>> Finally, in the >> >> >> gem5/src/python/gem5/components/boards/riscv_board.py >> >> >> >>>>>>>> I change the last line to "return ["console=ttyS0", >> >> >> >>>>>>>> "root={root_value}", "rw"]" in order to allow the write >> >> >> permissions >> >> >> >>>> in >> >> >> >>>>>>>> the image. >> >> >> >>>>>>>> >> >> >> >>>>>>>> >> >> >> >>>>>>>> 2) The HPCG benchmark after some iterations calculates if > the >> >> >> results >> >> >> >>>>>>>> are valid or not valid. In the case of FS it gives invalid >> >> >> results. >> >> >> >>>> As >> >> >> >>>>>>>> I see from the results, one (at least) problem is that > produces >> >> >> >>>>>>>> different results in each HPCG execution (with the same >> >> >> >>>> configuration). >> >> >> >>>>>>>> >> >> >> >>>>>>>> Here is the HPCG output and riscv-fs.py >> >> >> >>>>>>>> (http://kition.mhl.tuc.gr:8000/d/68d82f3533/). You may >> >> reproduce >> >> >> the >> >> >> >>>>>>>> results in the video if you use the xhpcg executable >> >> >> >>>>>>>> (http://kition.mhl.tuc.gr:8000/f/4ca25fdd3c/) >> >> >> >>>>>>>> >> >> >> >>>>>>>> Please help me in order to solve it! >> >> >> >>>>>>>> >> >> >> >>>>>>>> Finally, I get invalid results in the HPL benchmark in FS > mode >> >> >> too. >> >> >> >>>>>>>> >> >> >> >>>>>>>> Best regards, >> >> >> >>>>>>>> Nikos >> >> >> >>>>>>>> >> >> >> >>>>>>>> >> >> >> >>>>>>>> Quoting Bobby Bruce <bbruce@ucdavis.edu>: >> >> >> >>>>>>>> >> >> >> >>>>>>>> > I'm going to need a bit more information to help: >> >> >> >>>>>>>> > >> >> >> >>>>>>>> > 1. In what way have you modified >> >> >> >>>>>>>> > ./configs/example/gem5_library/riscv-fs.py? Can you > attach >> >> the >> >> >> >>>> script >> >> >> >>>>>>>> here? >> >> >> >>>>>>>> > 2. What error are you getting or in what way are the > results >> >> >> >>>> invalid? >> >> >> >>>>>>>> > >> >> >> >>>>>>>> > - >> >> >> >>>>>>>> > Dr. Bobby R. Bruce >> >> >> >>>>>>>> > Room 3050, >> >> >> >>>>>>>> > Kemper Hall, UC Davis >> >> >> >>>>>>>> > Davis, >> >> >> >>>>>>>> > CA, 95616 >> >> >> >>>>>>>> > >> >> >> >>>>>>>> > web: https://www.bobbybruce.net >> >> >> >>>>>>>> > >> >> >> >>>>>>>> > >> >> >> >>>>>>>> > On Mon, Sep 19, 2022 at 1:43 PM Νικόλαος Ταμπουρατζής < >> >> >> >>>>>>>> > ntampouratzis@ece.auth.gr> wrote: >> >> >> >>>>>>>> > >> >> >> >>>>>>>> >> >> >> >> >>>>>>>> >> Dear gem5 community, >> >> >> >>>>>>>> >> >> >> >> >>>>>>>> >> I have successfully cross-compile the HPCG benchmark for >> >> RISCV >> >> >> >>>>>> (Serial >> >> >> >>>>>>>> >> version, without MPI and OpenMP). While it working > properly >> >> in >> >> >> >>>> gem5 >> >> >> >>>>>> SE >> >> >> >>>>>>>> >> mode (./build/RISCV/gem5.fast -d ./HPCG_SE_results >> >> >> >>>>>>>> >> ./configs/example/se.py -c xhpcg --options '--nx=16 > --ny=16 >> >> >> >>>> --nz=16 >> >> >> >>>>>>>> >> --npx=1 --npy=1 --npz=1 --rt=0.1'), I get invalid > results >> >> in FS >> >> >> >>>>>>>> >> simulation using "./build/RISCV/gem5.fast -d >> >> ./HPCG_FS_results >> >> >> >>>>>>>> >> ./configs/example/gem5_library/riscv-fs.py" (I mount the >> >> riscv >> >> >> >>>> image >> >> >> >>>>>>>> >> and put it). >> >> >> >>>>>>>> >> >> >> >> >>>>>>>> >> Can you help me please? >> >> >> >>>>>>>> >> >> >> >> >>>>>>>> >> In addition, I used the RISCV Ubuntu image >> >> >> >>>>>>>> >> ( >> >> >> >>>> >> >> https://github.com/gem5/gem5-resources/tree/stable/src/riscv-ubuntu >> >> >> >>>>>> ), >> >> >> >>>>>>>> >> I installed the gcc compiler, compile it (through qemu) > and >> >> I >> >> >> get >> >> >> >>>>>>>> >> wrong results too. >> >> >> >>>>>>>> >> >> >> >> >>>>>>>> >> Here is the Makefile which I use, the hpcg executable > for >> >> RISCV >> >> >> >>>>>>>> >> (xhpcg), and a video that shows the results >> >> >> >>>>>>>> >> (http://kition.mhl.tuc.gr:8000/f/4ca25fdd3c/). >> >> >> >>>>>>>> >> >> >> >> >>>>>>>> >> P.S. I use the latest gem5 version. >> >> >> >>>>>>>> >> >> >> >> >>>>>>>> >> Thank you in advance! :) >> >> >> >>>>>>>> >> >> >> >> >>>>>>>> >> Best regards, >> >> >> >>>>>>>> >> Nikos >> >> >> >>>>>>>> >> _______________________________________________ >> >> >> >>>>>>>> >> gem5-users mailing list -- gem5-users@gem5.org >> >> >> >>>>>>>> >> To unsubscribe send an email to > gem5-users-leave@gem5.org >> >> >> >>>>>>>> >> >> >> >> >>>>>>>> >> >> >> >>>>>>>> >> >> >> >>>>>>>> _______________________________________________ >> >> >> >>>>>>>> gem5-users mailing list -- gem5-users@gem5.org >> >> >> >>>>>>>> To unsubscribe send an email to gem5-users-leave@gem5.org >> >> >> >>>>>>>> >> >> >> >>>>>> >> >> >> >>>>>> >> >> >> >>>>>> _______________________________________________ >> >> >> >>>>>> gem5-users mailing list -- gem5-users@gem5.org >> >> >> >>>>>> To unsubscribe send an email to gem5-users-leave@gem5.org >> >> >> >>>>>> >> >> >> >>>> >> >> >> >>>> >> >> >> >>>> _______________________________________________ >> >> >> >>>> gem5-users mailing list -- gem5-users@gem5.org >> >> >> >>>> To unsubscribe send an email to gem5-users-leave@gem5.org >> >> >> >>>> >> >> >> >> >> >> >> >> >> >> >> >> _______________________________________________ >> >> >> >> gem5-users mailing list -- gem5-users@gem5.org >> >> >> >> To unsubscribe send an email to gem5-users-leave@gem5.org >> >> >> > >> >> >> > >> >> >> > _______________________________________________ >> >> >> > gem5-users mailing list -- gem5-users@gem5.org >> >> >> > To unsubscribe send an email to gem5-users-leave@gem5.org >> >> >> >> >> >> >> >> >> _______________________________________________ >> >> >> gem5-users mailing list -- gem5-users@gem5.org >> >> >> To unsubscribe send an email to gem5-users-leave@gem5.org >> >> >> >> >> >> >> >> >> _______________________________________________ >> >> gem5-users mailing list -- gem5-users@gem5.org >> >> To unsubscribe send an email to gem5-users-leave@gem5.org >> >> >> >> >> _______________________________________________ >> gem5-users mailing list -- gem5-users@gem5.org >> To unsubscribe send an email to gem5-users-leave@gem5.org
BB
Bobby Bruce
Mon, Oct 31, 2022 11:37 PM

You mean this bug? Unfortunately not, I've been very busy with the upcoming
gem5 release and haven't had time to investigate this further.

--
Dr. Bobby R. Bruce
Room 3050,
Kemper Hall, UC Davis
Davis,
CA, 95616

web: https://www.bobbybruce.net

On Mon, Oct 31, 2022 at 1:45 AM Νικόλαος Ταμπουρατζής via gem5-users <
gem5-users@gem5.org> wrote:

Dear Bobby, Jason, all,

Is there any update about the accuracy of RISC-V FS?

Best regards,
Nikos

Quoting Bobby Bruce bbruce@ucdavis.edu:

Jason and I had a theory that this may be due to the "Rounding Mode" for
floating pointing being set incorrectly in FS mode. That's set via a

macro

here:

I manually expanded the macro here:

inside the "fsqrt_d" definition then compiled "build/ALL/gem5.debug".

Then

used gdb to add a breakpoint in the "Fsqrt_d::execute" function (in the
generated "build/ALL/arch/riscv/generated/exec-ns.cc.inc" file).

gdb build/ALL/gem5.opt
break Fsqrt_d::execute
run bug-recreation/se-mode-run.py # or `run

bug-recreation/fs-mode-run.py`


Stepping through with gdb I the rounding mode is `0` for SE mode and `0`
for FS mode as well. So, no luck with that theory.

My new theory is that this bug has something to do with thread context
switching being implemented incorrectly in RISC-V somehow. I find it
strange that the sqrt(1) works fine for a while (i.e. returns `1`) then
suddenly starts returning zero after a certain point in the execution. In
addition, it's odd that the loop is not returning the same value each

time

despite executing the same code. It'd make sense to me that the thread is
being stored and then resumed with some corruption of the floating point
data. This would also explain why this bug only occurs in FS mode.

I'll try to find time to figure out a good test for this. If anyone has

any

other theories or ideas then let me know.

--
Dr. Bobby R. Bruce
Room 3050,
Kemper Hall, UC Davis
Davis,
CA, 95616

web: https://www.bobbybruce.net

On Fri, Oct 7, 2022 at 12:50 PM Νικόλαος Ταμπουρατζής <
ntampouratzis@ece.auth.gr> wrote:

Dear Jason & Boddy,

Unfortunately, I have tried my simple example without the sqrt
function and the problem remains. Specifically, I have the following
simple code:

#include <cmath>
#include <stdio.h>

int main(){

  int dim = 1024;

  double result;

  for (int iter = 0; iter < 2; iter++){
      result = 0;
      for (int i = 0; i < dim; i++){
          for (int j = 0; j < dim; j++){
              result += i * j;
          }
      }
      printf("Final Result: %lf\n", result);
  }

}

In the above code, the correct result is 274341298176.000000 (from
RISCV-SE mode and x86), while in FS mode I get sometimes the correct
result and other times a different number.

Best regards,
Nikos

Quoting Jason Lowe-Power jason@lowepower.com:

I have an idea...

Have you put a breakpoint in the implementation of the fsqrt_d

function? I

would like to know if when running in SE mode and running in FS mode

we

are

using the same rounding mode. My hypothesis is that in FS mode the

rounding

mode is set differently.

Cheers,
Jason

On Fri, Oct 7, 2022 at 12:15 AM Νικόλαος Ταμπουρατζής <
ntampouratzis@ece.auth.gr> wrote:

Dear Boddy,

Thanks a lot for the effort! I looked in detail and I observe that

the

problem is created only using float and double variables (in the case
of int it is working properly in FS mode). Specifically, in the case
of float the variables are set to "nan", while in the case of double
the variables are set to 0.000000 (in random time - probably from

some

instruction of simulated OS?). You may use a simple c/c++ example in
order to get some traces before going to HPCG...

Thank you in advance!!
Best regards,
Nikos

Quoting Bobby Bruce bbruce@ucdavis.edu:

Hey Niko,

Thanks for this analysis. I jumped a little into this today but

didn't

get

as far as you did. I wanted to find a quick way to recreate the

following:

Please

feel

free to use this, if it helps any.

It's very strange to me that this bug hasn't manifested itself

before but

it's undeniably there. I'll try to spend more time looking at this

tomorrow

with some traces and debug flags and see if I can narrow down the

problem.

--
Dr. Bobby R. Bruce
Room 3050,
Kemper Hall, UC Davis
Davis,
CA, 95616

web: https://www.bobbybruce.net

On Wed, Oct 5, 2022 at 2:26 PM Νικόλαος Ταμπουρατζής <
ntampouratzis@ece.auth.gr> wrote:

In my previous results, I had used double (not float) for the
following variables: result, sq_i and sq_j. In the case of float
instead of double I get "nan" and not 0.000000.

Quoting Νικόλαος Ταμπουρατζής ntampouratzis@ece.auth.gr:

Dear Jason, all,

I am trying to find the accuracy problem with RISCV-FS and I

observe

that the problem is created (at least in my dummy example)

because

the variables (double) are set to zero in random simulated time

(for

this reason I get different results among executions of the same
code). Specifically for the following dummy code:

#include <cmath>
#include <stdio.h>

int main(){

 int dim = 10;

 float result;

 for (int iter = 0; iter < 2; iter++){
     result = 0;
     for (int i = 0; i < dim; i++){
         for (int j = 0; j < dim; j++){
             float sq_i = sqrt(i);
             float sq_j = sqrt(j);
             result += sq_i * sq_j;
             printf("ITER: %d | i: %d | j: %d Result(i: %f |

j:

%f | i*j: %f): %f\n", iter, i , j, sq_i, sq_j, sq_i * sq_j,

result);

         }
     }
     printf("Final Result: %lf\n", result);
 }

}

The correct Final Result in both iterations is 372.721656.

However,

I get the following results in FS:

ITER: 0 | i: 0 | j: 0 Result(i: 0.000000 | j: 0.000000 | ij:
0.000000): 0.000000
ITER: 0 | i: 0 | j: 1 Result(i: 0.000000 | j: 1.000000 | i
j:
0.000000): 0.000000
ITER: 0 | i: 0 | j: 2 Result(i: 0.000000 | j: 1.414214 | ij:
0.000000): 0.000000
ITER: 0 | i: 0 | j: 3 Result(i: 0.000000 | j: 1.732051 | i
j:
0.000000): 0.000000
ITER: 0 | i: 0 | j: 4 Result(i: 0.000000 | j: 2.000000 | ij:
0.000000): 0.000000
ITER: 0 | i: 0 | j: 5 Result(i: 0.000000 | j: 2.236068 | i
j:
0.000000): 0.000000
ITER: 0 | i: 0 | j: 6 Result(i: 0.000000 | j: 2.449490 | ij:
0.000000): 0.000000
ITER: 0 | i: 0 | j: 7 Result(i: 0.000000 | j: 2.645751 | i
j:
0.000000): 0.000000
ITER: 0 | i: 0 | j: 8 Result(i: 0.000000 | j: 2.828427 | ij:
0.000000): 0.000000
ITER: 0 | i: 0 | j: 9 Result(i: 0.000000 | j: 3.000000 | i
j:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 0 Result(i: 1.000000 | j: 0.000000 | ij:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 1 Result(i: 1.000000 | j: 1.000000 | i
j:
1.000000): 1.000000
ITER: 0 | i: 1 | j: 2 Result(i: 1.000000 | j: 1.414214 | ij:
1.414214): 2.414214
ITER: 0 | i: 1 | j: 3 Result(i: 1.000000 | j: 1.732051 | i
j:
1.732051): 4.146264
ITER: 0 | i: 1 | j: 4 Result(i: 0.000000 | j: 2.000000 | ij:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 5 Result(i: 0.000000 | j: 2.236068 | i
j:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 6 Result(i: 0.000000 | j: 2.449490 | ij:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 7 Result(i: 0.000000 | j: 2.645751 | i
j:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 8 Result(i: 0.000000 | j: 2.828427 | ij:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 9 Result(i: 0.000000 | j: 3.000000 | i
j:
0.000000): 0.000000
ITER: 0 | i: 2 | j: 0 Result(i: 1.414214 | j: 0.000000 | ij:
0.000000): 0.000000
ITER: 0 | i: 2 | j: 1 Result(i: 1.414214 | j: 1.000000 | i
j:
1.414214): 1.414214
ITER: 0 | i: 2 | j: 2 Result(i: 1.414214 | j: 1.414214 | ij:
2.000000): 3.414214
ITER: 0 | i: 2 | j: 3 Result(i: 1.414214 | j: 1.732051 | i
j:
2.449490): 5.863703
ITER: 0 | i: 2 | j: 4 Result(i: 1.414214 | j: 2.000000 | ij:
2.828427): 8.692130
ITER: 0 | i: 2 | j: 5 Result(i: 1.414214 | j: 2.236068 | i
j:
3.162278): 11.854408
ITER: 0 | i: 2 | j: 6 Result(i: 1.414214 | j: 2.449490 | ij:
3.464102): 15.318510
ITER: 0 | i: 2 | j: 7 Result(i: 1.414214 | j: 2.645751 | i
j:
3.741657): 19.060167
ITER: 0 | i: 2 | j: 8 Result(i: 1.414214 | j: 2.828427 | ij:
4.000000): 23.060167
ITER: 0 | i: 2 | j: 9 Result(i: 1.414214 | j: 3.000000 | i
j:
4.242641): 27.302808
ITER: 0 | i: 3 | j: 0 Result(i: 1.732051 | j: 0.000000 | ij:
0.000000): 27.302808
ITER: 0 | i: 3 | j: 1 Result(i: 1.732051 | j: 1.000000 | i
j:
1.732051): 29.034859
ITER: 0 | i: 3 | j: 2 Result(i: 1.732051 | j: 1.414214 | ij:
2.449490): 31.484348
ITER: 0 | i: 3 | j: 3 Result(i: 1.732051 | j: 1.732051 | i
j:
3.000000): 34.484348
ITER: 0 | i: 3 | j: 4 Result(i: 1.732051 | j: 2.000000 | ij:
3.464102): 37.948450
ITER: 0 | i: 3 | j: 5 Result(i: 1.732051 | j: 2.236068 | i
j:
3.872983): 41.821433
ITER: 0 | i: 3 | j: 6 Result(i: 1.732051 | j: 2.449490 | ij:
4.242641): 46.064074
ITER: 0 | i: 3 | j: 7 Result(i: 1.732051 | j: 2.645751 | i
j:
4.582576): 50.646650
ITER: 0 | i: 3 | j: 8 Result(i: 1.732051 | j: 2.828427 | ij:
4.898979): 55.545629
ITER: 0 | i: 3 | j: 9 Result(i: 1.732051 | j: 3.000000 | i
j:
5.196152): 60.741782
ITER: 0 | i: 4 | j: 0 Result(i: 2.000000 | j: 0.000000 | ij:
0.000000): 60.741782
ITER: 0 | i: 4 | j: 1 Result(i: 2.000000 | j: 1.000000 | i
j:
2.000000): 62.741782
ITER: 0 | i: 4 | j: 2 Result(i: 2.000000 | j: 1.414214 | ij:
2.828427): 65.570209
ITER: 0 | i: 4 | j: 3 Result(i: 2.000000 | j: 1.732051 | i
j:
3.464102): 69.034310
ITER: 0 | i: 4 | j: 4 Result(i: 2.000000 | j: 2.000000 | ij:
4.000000): 73.034310
ITER: 0 | i: 4 | j: 5 Result(i: 2.000000 | j: 2.236068 | i
j:
4.472136): 77.506446
ITER: 0 | i: 4 | j: 6 Result(i: 2.000000 | j: 2.449490 | ij:
4.898979): 82.405426
ITER: 0 | i: 4 | j: 7 Result(i: 2.000000 | j: 2.645751 | i
j:
5.291503): 87.696928
ITER: 0 | i: 4 | j: 8 Result(i: 2.000000 | j: 2.828427 | ij:
5.656854): 93.353783
ITER: 0 | i: 4 | j: 9 Result(i: 2.000000 | j: 3.000000 | i
j:
6.000000): 99.353783
ITER: 0 | i: 5 | j: 0 Result(i: 2.236068 | j: 0.000000 | ij:
0.000000): 99.353783
ITER: 0 | i: 5 | j: 1 Result(i: 2.236068 | j: 1.000000 | i
j:
2.236068): 101.589851
ITER: 0 | i: 5 | j: 2 Result(i: 2.236068 | j: 1.414214 | ij:
3.162278): 104.752128
ITER: 0 | i: 5 | j: 3 Result(i: 2.236068 | j: 1.732051 | i
j:
3.872983): 108.625112
ITER: 0 | i: 5 | j: 4 Result(i: 2.236068 | j: 2.000000 | ij:
4.472136): 113.097248
ITER: 0 | i: 5 | j: 5 Result(i: 2.236068 | j: 2.236068 | i
j:
5.000000): 118.097248
ITER: 0 | i: 5 | j: 6 Result(i: 2.236068 | j: 2.449490 | ij:
5.477226): 123.574473
ITER: 0 | i: 5 | j: 7 Result(i: 2.236068 | j: 2.645751 | i
j:
5.916080): 129.490553
ITER: 0 | i: 5 | j: 8 Result(i: 2.236068 | j: 2.828427 | ij:
6.324555): 135.815108
ITER: 0 | i: 5 | j: 9 Result(i: 2.236068 | j: 3.000000 | i
j:
6.708204): 142.523312
ITER: 0 | i: 6 | j: 0 Result(i: 2.449490 | j: 0.000000 | ij:
0.000000): 142.523312
ITER: 0 | i: 6 | j: 1 Result(i: 2.449490 | j: 1.000000 | i
j:
2.449490): 144.972802
ITER: 0 | i: 6 | j: 2 Result(i: 2.449490 | j: 1.414214 | ij:
3.464102): 148.436904
ITER: 0 | i: 6 | j: 3 Result(i: 2.449490 | j: 1.732051 | i
j:
4.242641): 152.679544
ITER: 0 | i: 6 | j: 4 Result(i: 2.449490 | j: 2.000000 | ij:
4.898979): 157.578524
ITER: 0 | i: 6 | j: 5 Result(i: 2.449490 | j: 2.236068 | i
j:
5.477226): 163.055749
ITER: 0 | i: 6 | j: 6 Result(i: 2.449490 | j: 2.449490 | ij:
6.000000): 169.055749
ITER: 0 | i: 6 | j: 7 Result(i: 2.449490 | j: 2.645751 | i
j:
6.480741): 175.536490
ITER: 0 | i: 6 | j: 8 Result(i: 2.449490 | j: 2.828427 | ij:
6.928203): 182.464693
ITER: 0 | i: 6 | j: 9 Result(i: 2.449490 | j: 3.000000 | i
j:
7.348469): 189.813162
ITER: 0 | i: 7 | j: 0 Result(i: 2.645751 | j: 0.000000 | ij:
0.000000): 189.813162
ITER: 0 | i: 7 | j: 1 Result(i: 2.645751 | j: 1.000000 | i
j:
2.645751): 192.458914
ITER: 0 | i: 7 | j: 2 Result(i: 2.645751 | j: 1.414214 | ij:
3.741657): 196.200571
ITER: 0 | i: 7 | j: 3 Result(i: 2.645751 | j: 1.732051 | i
j:
4.582576): 200.783147
ITER: 0 | i: 7 | j: 4 Result(i: 2.645751 | j: 2.000000 | ij:
5.291503): 206.074649
ITER: 0 | i: 7 | j: 5 Result(i: 2.645751 | j: 2.236068 | i
j:
5.916080): 211.990729
ITER: 0 | i: 7 | j: 6 Result(i: 2.645751 | j: 2.449490 | ij:
6.480741): 218.471470
ITER: 0 | i: 7 | j: 7 Result(i: 2.645751 | j: 2.645751 | i
j:
7.000000): 225.471470
ITER: 0 | i: 7 | j: 8 Result(i: 2.645751 | j: 2.828427 | ij:
7.483315): 232.954785
ITER: 0 | i: 7 | j: 9 Result(i: 2.645751 | j: 3.000000 | i
j:
7.937254): 240.892039
ITER: 0 | i: 8 | j: 0 Result(i: 2.828427 | j: 0.000000 | ij:
0.000000): 240.892039
ITER: 0 | i: 8 | j: 1 Result(i: 2.828427 | j: 1.000000 | i
j:
2.828427): 243.720466
ITER: 0 | i: 8 | j: 2 Result(i: 2.828427 | j: 1.414214 | ij:
4.000000): 247.720466
ITER: 0 | i: 8 | j: 3 Result(i: 2.828427 | j: 1.732051 | i
j:
4.898979): 252.619445
ITER: 0 | i: 8 | j: 4 Result(i: 2.828427 | j: 2.000000 | ij:
5.656854): 258.276300
ITER: 0 | i: 8 | j: 5 Result(i: 2.828427 | j: 2.236068 | i
j:
6.324555): 264.600855
ITER: 0 | i: 8 | j: 6 Result(i: 2.828427 | j: 2.449490 | ij:
6.928203): 271.529058
ITER: 0 | i: 8 | j: 7 Result(i: 2.828427 | j: 2.645751 | i
j:
7.483315): 279.012373
ITER: 0 | i: 8 | j: 8 Result(i: 2.828427 | j: 2.828427 | ij:
8.000000): 287.012373
ITER: 0 | i: 8 | j: 9 Result(i: 2.828427 | j: 3.000000 | i
j:
8.485281): 295.497654
ITER: 0 | i: 9 | j: 0 Result(i: 3.000000 | j: 0.000000 | ij:
0.000000): 295.497654
ITER: 0 | i: 9 | j: 1 Result(i: 3.000000 | j: 1.000000 | i
j:
3.000000): 298.497654
ITER: 0 | i: 9 | j: 2 Result(i: 3.000000 | j: 1.414214 | ij:
4.242641): 302.740295
ITER: 0 | i: 9 | j: 3 Result(i: 3.000000 | j: 1.732051 | i
j:
5.196152): 307.936447
ITER: 0 | i: 9 | j: 4 Result(i: 3.000000 | j: 2.000000 | ij:
6.000000): 313.936447
ITER: 0 | i: 9 | j: 5 Result(i: 3.000000 | j: 2.236068 | i
j:
6.708204): 320.644651
ITER: 0 | i: 9 | j: 6 Result(i: 3.000000 | j: 2.449490 | ij:
7.348469): 327.993120
ITER: 0 | i: 9 | j: 7 Result(i: 3.000000 | j: 2.645751 | i
j:
7.937254): 335.930374
ITER: 0 | i: 9 | j: 8 Result(i: 3.000000 | j: 2.828427 | ij:
8.485281): 344.415656
ITER: 0 | i: 9 | j: 9 Result(i: 3.000000 | j: 3.000000 | i
j:
9.000000): 353.415656
Final Result: 353.415656
ITER: 1 | i: 0 | j: 0 Result(i: 0.000000 | j: 0.000000 | ij:
0.000000): 0.000000
ITER: 1 | i: 0 | j: 1 Result(i: 0.000000 | j: 1.000000 | i
j:
0.000000): 0.000000
ITER: 1 | i: 0 | j: 2 Result(i: 0.000000 | j: 1.414214 | ij:
0.000000): 0.000000
ITER: 1 | i: 0 | j: 3 Result(i: 0.000000 | j: 1.732051 | i
j:
0.000000): 0.000000
ITER: 1 | i: 0 | j: 4 Result(i: 0.000000 | j: 2.000000 | ij:
0.000000): 0.000000
ITER: 1 | i: 0 | j: 5 Result(i: 0.000000 | j: 2.236068 | i
j:
0.000000): 0.000000
ITER: 1 | i: 0 | j: 6 Result(i: 0.000000 | j: 2.449490 | ij:
0.000000): 0.000000
ITER: 1 | i: 0 | j: 7 Result(i: 0.000000 | j: 2.645751 | i
j:
0.000000): 0.000000
ITER: 1 | i: 0 | j: 8 Result(i: 0.000000 | j: 2.828427 | ij:
0.000000): 0.000000
ITER: 1 | i: 0 | j: 9 Result(i: 0.000000 | j: 3.000000 | i
j:
0.000000): 0.000000
ITER: 1 | i: 1 | j: 0 Result(i: 1.000000 | j: 0.000000 | ij:
0.000000): 0.000000
ITER: 1 | i: 1 | j: 1 Result(i: 1.000000 | j: 1.000000 | i
j:
1.000000): 1.000000
ITER: 1 | i: 1 | j: 2 Result(i: 1.000000 | j: 1.414214 | ij:
1.414214): 2.414214
ITER: 1 | i: 1 | j: 3 Result(i: 1.000000 | j: 1.732051 | i
j:
1.732051): 4.146264
ITER: 1 | i: 1 | j: 4 Result(i: 1.000000 | j: 2.000000 | ij:
2.000000): 6.146264
ITER: 1 | i: 1 | j: 5 Result(i: 1.000000 | j: 2.236068 | i
j:
2.236068): 8.382332
ITER: 1 | i: 1 | j: 6 Result(i: 1.000000 | j: 2.449490 | ij:
2.449490): 10.831822
ITER: 1 | i: 1 | j: 7 Result(i: 1.000000 | j: 2.645751 | i
j:
2.645751): 13.477573
ITER: 1 | i: 1 | j: 8 Result(i: 1.000000 | j: 2.828427 | ij:
2.828427): 16.306001
ITER: 1 | i: 1 | j: 9 Result(i: 1.000000 | j: 3.000000 | i
j:
3.000000): 19.306001
ITER: 1 | i: 2 | j: 0 Result(i: 1.414214 | j: 0.000000 | ij:
0.000000): 19.306001
ITER: 1 | i: 2 | j: 1 Result(i: 1.414214 | j: 1.000000 | i
j:
1.414214): 20.720214
ITER: 1 | i: 2 | j: 2 Result(i: 1.414214 | j: 1.414214 | ij:
2.000000): 22.720214
ITER: 1 | i: 2 | j: 3 Result(i: 1.414214 | j: 1.732051 | i
j:
2.449490): 25.169704
ITER: 1 | i: 2 | j: 4 Result(i: 1.414214 | j: 2.000000 | ij:
2.828427): 27.998131
ITER: 1 | i: 2 | j: 5 Result(i: 1.414214 | j: 2.236068 | i
j:
3.162278): 31.160409
ITER: 1 | i: 2 | j: 6 Result(i: 1.414214 | j: 2.449490 | ij:
3.464102): 34.624510
ITER: 1 | i: 2 | j: 7 Result(i: 1.414214 | j: 2.645751 | i
j:
3.741657): 38.366168
ITER: 1 | i: 2 | j: 8 Result(i: 1.414214 | j: 2.828427 | ij:
4.000000): 42.366168
ITER: 1 | i: 2 | j: 9 Result(i: 1.414214 | j: 3.000000 | i
j:
4.242641): 46.608808
ITER: 1 | i: 3 | j: 0 Result(i: 1.732051 | j: 0.000000 | ij:
0.000000): 46.608808
ITER: 1 | i: 3 | j: 1 Result(i: 1.732051 | j: 1.000000 | i
j:
1.732051): 48.340859
ITER: 1 | i: 3 | j: 2 Result(i: 1.732051 | j: 1.414214 | ij:
2.449490): 50.790349
ITER: 1 | i: 3 | j: 3 Result(i: 1.732051 | j: 1.732051 | i
j:
3.000000): 53.790349
ITER: 1 | i: 3 | j: 4 Result(i: 1.732051 | j: 2.000000 | ij:
3.464102): 57.254450
ITER: 1 | i: 3 | j: 5 Result(i: 1.732051 | j: 2.236068 | i
j:
3.872983): 61.127434
ITER: 1 | i: 3 | j: 6 Result(i: 1.732051 | j: 2.449490 | ij:
4.242641): 65.370075
ITER: 1 | i: 3 | j: 7 Result(i: 1.732051 | j: 2.645751 | i
j:
4.582576): 69.952650
ITER: 1 | i: 3 | j: 8 Result(i: 1.732051 | j: 2.828427 | ij:
4.898979): 74.851630
ITER: 1 | i: 3 | j: 9 Result(i: 1.732051 | j: 3.000000 | i
j:
5.196152): 80.047782
ITER: 1 | i: 4 | j: 0 Result(i: 2.000000 | j: 0.000000 | ij:
0.000000): 80.047782
ITER: 1 | i: 4 | j: 1 Result(i: 2.000000 | j: 1.000000 | i
j:
2.000000): 82.047782
ITER: 1 | i: 4 | j: 2 Result(i: 2.000000 | j: 1.414214 | ij:
2.828427): 84.876209
ITER: 1 | i: 4 | j: 3 Result(i: 2.000000 | j: 1.732051 | i
j:
3.464102): 88.340311
ITER: 1 | i: 4 | j: 4 Result(i: 2.000000 | j: 2.000000 | ij:
4.000000): 92.340311
ITER: 1 | i: 4 | j: 5 Result(i: 2.000000 | j: 2.236068 | i
j:
4.472136): 96.812447
ITER: 1 | i: 4 | j: 6 Result(i: 2.000000 | j: 2.449490 | ij:
4.898979): 101.711426
ITER: 1 | i: 4 | j: 7 Result(i: 2.000000 | j: 2.645751 | i
j:
5.291503): 107.002929
ITER: 1 | i: 4 | j: 8 Result(i: 2.000000 | j: 2.828427 | ij:
5.656854): 112.659783
ITER: 1 | i: 4 | j: 9 Result(i: 2.000000 | j: 3.000000 | i
j:
6.000000): 118.659783
ITER: 1 | i: 5 | j: 0 Result(i: 2.236068 | j: 0.000000 | ij:
0.000000): 118.659783
ITER: 1 | i: 5 | j: 1 Result(i: 2.236068 | j: 1.000000 | i
j:
2.236068): 120.895851
ITER: 1 | i: 5 | j: 2 Result(i: 2.236068 | j: 1.414214 | ij:
3.162278): 124.058129
ITER: 1 | i: 5 | j: 3 Result(i: 2.236068 | j: 1.732051 | i
j:
3.872983): 127.931112
ITER: 1 | i: 5 | j: 4 Result(i: 2.236068 | j: 2.000000 | ij:
4.472136): 132.403248
ITER: 1 | i: 5 | j: 5 Result(i: 2.236068 | j: 2.236068 | i
j:
5.000000): 137.403248
ITER: 1 | i: 5 | j: 6 Result(i: 2.236068 | j: 2.449490 | ij:
5.477226): 142.880474
ITER: 1 | i: 5 | j: 7 Result(i: 2.236068 | j: 2.645751 | i
j:
5.916080): 148.796553
ITER: 1 | i: 5 | j: 8 Result(i: 2.236068 | j: 2.828427 | ij:
6.324555): 155.121109
ITER: 1 | i: 5 | j: 9 Result(i: 2.236068 | j: 3.000000 | i
j:
6.708204): 161.829313
ITER: 1 | i: 6 | j: 0 Result(i: 2.449490 | j: 0.000000 | ij:
0.000000): 161.829313
ITER: 1 | i: 6 | j: 1 Result(i: 2.449490 | j: 1.000000 | i
j:
2.449490): 164.278802
ITER: 1 | i: 6 | j: 2 Result(i: 2.449490 | j: 1.414214 | ij:
3.464102): 167.742904
ITER: 1 | i: 6 | j: 3 Result(i: 2.449490 | j: 1.732051 | i
j:
4.242641): 171.985545
ITER: 1 | i: 6 | j: 4 Result(i: 2.449490 | j: 2.000000 | ij:
4.898979): 176.884524
ITER: 1 | i: 6 | j: 5 Result(i: 2.449490 | j: 2.236068 | i
j:
5.477226): 182.361750
ITER: 1 | i: 6 | j: 6 Result(i: 2.449490 | j: 2.449490 | ij:
6.000000): 188.361750
ITER: 1 | i: 6 | j: 7 Result(i: 2.449490 | j: 2.645751 | i
j:
6.480741): 194.842491
ITER: 1 | i: 6 | j: 8 Result(i: 2.449490 | j: 2.828427 | ij:
6.928203): 201.770694
ITER: 1 | i: 6 | j: 9 Result(i: 2.449490 | j: 3.000000 | i
j:
7.348469): 209.119163
ITER: 1 | i: 7 | j: 0 Result(i: 2.645751 | j: 0.000000 | ij:
0.000000): 209.119163
ITER: 1 | i: 7 | j: 1 Result(i: 2.645751 | j: 1.000000 | i
j:
2.645751): 211.764914
ITER: 1 | i: 7 | j: 2 Result(i: 2.645751 | j: 1.414214 | ij:
3.741657): 215.506572
ITER: 1 | i: 7 | j: 3 Result(i: 2.645751 | j: 1.732051 | i
j:
4.582576): 220.089147
ITER: 1 | i: 7 | j: 4 Result(i: 2.645751 | j: 2.000000 | ij:
5.291503): 225.380650
ITER: 1 | i: 7 | j: 5 Result(i: 2.645751 | j: 2.236068 | i
j:
5.916080): 231.296730
ITER: 1 | i: 7 | j: 6 Result(i: 2.645751 | j: 2.449490 | ij:
6.480741): 237.777470
ITER: 1 | i: 7 | j: 7 Result(i: 2.645751 | j: 2.645751 | i
j:
7.000000): 244.777470
ITER: 1 | i: 7 | j: 8 Result(i: 2.645751 | j: 2.828427 | ij:
7.483315): 252.260785
ITER: 1 | i: 7 | j: 9 Result(i: 2.645751 | j: 3.000000 | i
j:
7.937254): 260.198039
ITER: 1 | i: 8 | j: 0 Result(i: 2.828427 | j: 0.000000 | ij:
0.000000): 260.198039
ITER: 1 | i: 8 | j: 1 Result(i: 2.828427 | j: 1.000000 | i
j:
2.828427): 263.026466
ITER: 1 | i: 8 | j: 2 Result(i: 2.828427 | j: 1.414214 | ij:
4.000000): 267.026466
ITER: 1 | i: 8 | j: 3 Result(i: 2.828427 | j: 1.732051 | i
j:
4.898979): 271.925446
ITER: 1 | i: 8 | j: 4 Result(i: 2.828427 | j: 2.000000 | ij:
5.656854): 277.582300
ITER: 1 | i: 8 | j: 5 Result(i: 2.828427 | j: 2.236068 | i
j:
6.324555): 283.906855
ITER: 1 | i: 8 | j: 6 Result(i: 2.828427 | j: 2.449490 | ij:
6.928203): 290.835059
ITER: 1 | i: 8 | j: 7 Result(i: 2.828427 | j: 2.645751 | i
j:
7.483315): 298.318373
ITER: 1 | i: 8 | j: 8 Result(i: 2.828427 | j: 2.828427 | ij:
8.000000): 306.318373
ITER: 1 | i: 8 | j: 9 Result(i: 2.828427 | j: 3.000000 | i
j:
8.485281): 314.803655
ITER: 1 | i: 9 | j: 0 Result(i: 3.000000 | j: 0.000000 | ij:
0.000000): 314.803655
ITER: 1 | i: 9 | j: 1 Result(i: 3.000000 | j: 1.000000 | i
j:
3.000000): 317.803655
ITER: 1 | i: 9 | j: 2 Result(i: 3.000000 | j: 1.414214 | ij:
4.242641): 322.046295
ITER: 1 | i: 9 | j: 3 Result(i: 3.000000 | j: 1.732051 | i
j:
5.196152): 327.242448
ITER: 1 | i: 9 | j: 4 Result(i: 3.000000 | j: 2.000000 | ij:
6.000000): 333.242448
ITER: 1 | i: 9 | j: 5 Result(i: 3.000000 | j: 2.236068 | i
j:
6.708204): 339.950652
ITER: 1 | i: 9 | j: 6 Result(i: 3.000000 | j: 2.449490 | ij:
7.348469): 347.299121
ITER: 1 | i: 9 | j: 7 Result(i: 3.000000 | j: 2.645751 | i
j:
7.937254): 355.236375
ITER: 1 | i: 9 | j: 8 Result(i: 3.000000 | j: 2.828427 | ij:
8.485281): 363.721656
ITER: 1 | i: 9 | j: 9 Result(i: 3.000000 | j: 3.000000 | i
j:
9.000000): 372.721656
Final Result: 372.721656

As we can see in the following iterations the sqrt(1) as well as

the

result is set to zero for some reason.

ITER: 0 | i: 1 | j: 4 Result(i: 0.000000 | j: 2.000000 | ij:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 5 Result(i: 0.000000 | j: 2.236068 | i
j:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 6 Result(i: 0.000000 | j: 2.449490 | ij:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 7 Result(i: 0.000000 | j: 2.645751 | i
j:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 8 Result(i: 0.000000 | j: 2.828427 | ij:
0.000000): 0.000000
ITER: 0 | i: 1 | j: 9 Result(i: 0.000000 | j: 3.000000 | i
j:
0.000000): 0.000000

Please help me to resolve the accuracy issue! I think that it

will

be very useful for gem5 community.

To be noticed, I find the correct simulated tick in which the
application started in FS (using m5 dumpstats), and I start the
--debug-start, but the trace file which is generated is 10x

larger

than SE mode for the same application. How can I compare them?

Thank you in advance!
Best regards,
Nikos

Quoting Νικόλαος Ταμπουρατζής ntampouratzis@ece.auth.gr:

Dear Jason,

I am trying to use --debug-start but in FS mode it is very
difficult to find the tick on which the application is started!

However, I am writing the following very simple c++ program:

#include <cmath>
#include <stdio.h>

int main(){

int dim = 4096;

double result;

for (int iter = 0; iter < 2; iter++){
    result = 0;
    for (int i = 0; i < dim; i++){
        for (int j = 0; j < dim; j++){
            result += sqrt(i) * sqrt(j);
        }
    }
    printf("Result: %lf\n", result); //Result:

30530733453.127449

}

}

I cross-compile it using: riscv64-linux-gnu-g++ -static -O3 -o
test_riscv test_riscv.cpp

While in X86 (without cross-compilation of course), QEMU-RISCV,
GEM5-SE the result is the same (30530733453.127449), in GEM5-FS

the

result is different! In addition, the result is also different
between the 2 iterations.

Please reproduce the error if you want in order to verify my

result.

Ηow can the issue be resolved?

Thank you in advance!

Best regards,
Nikos

Quoting Jason Lowe-Power jason@lowepower.com:

Hi Nikos,

You can use --debug-start to start the debugging after some

number

of

ticks. Also, I would expect that the difference should come up

quickly, so

no need to run the program to the end.

For the FS mode one, you will want to just start the trace as

the

application starts. This could be a bit of a pain.

I'm not really sure what fundamentally could be different. FS

and SE

mode

use the exact same code for executing instructions, so I don't

think

that's

the problem. Have you tried running for smaller inputs or just

one

iteration?

Jason

On Wed, Sep 21, 2022 at 9:04 AM Νικόλαος Ταμπουρατζής <
ntampouratzis@ece.auth.gr> wrote:

Dear Bobby,

Iam trying to add --debug-flags=Exec (building the gem5 for

gem5.opt

not for gem5.fast which I had) but the debug traces exceed

the

20GB

(and it is not finished yet) for less than 1 simulated

second.

How

can

I reduce the size of the debug-flags (or set something more

specific)?

In contrast I build the HPCG benchmark with DHPCG_DEBUG flag.

If

you

want, you can compare these two output files
(hpcg20010909T014640_SE_Mode & HPCG-Benchmark_3.1_FS_Mode).

As

you

can

see, something goes wrong with the accuracy of calculations

in

FS

mode

(benchmark uses double precission). You can find the files

here:

http://kition.mhl.tuc.gr:8000/d/68d82f3533/

Best regards,
Nikos

Quoting Jason Lowe-Power jason@lowepower.com:

That's quite odd that it works in SE mode but not FS mode!

I would suggest running with --debug-flags=Exec for both and

then

perform a

diff to see how they differ.

Cheers,
Jason

On Tue, Sep 20, 2022 at 2:45 PM Νικόλαος Ταμπουρατζής <
ntampouratzis@ece.auth.gr> wrote:

Dear Bobby,

In QEMU I get the same (correct) results that I get in SE

mode

simulation. I get invalid results in FS simulation (in both
riscv-fs.py and riscv-ubuntu-run.py). I cannot access real

RISCV

hardware at this moment, however, if you want you may

execute my

the

following configuration:

./xhpcg --nx=16 --ny=16 --nz=16 --npx=1 --npy=1 --npz=1

--rt=0.1

Please let me know if you have any updates!

Best regards,
Nikos

Quoting Jason Lowe-Power jason@lowepower.com:

Hi Nikos,

I notice you said the following in your original email:

In addition, I used the RISCV Ubuntu image

(

),

I installed the gcc compiler, compile it (through qemu)

and I

get

wrong results too.

Is this saying you get the wrong results is QEMU? If so,

the bug

is in

GCC

or the HPCG workload, not in gem5. If not, I would test in

QEMU

to

make

sure the binary works there. Another way you could test to

see

if

the

problem is your binary or gem5 would be to run it on real

hardware. We

have

access to some RISC-V hardware here at UC Davis, if you

don't

have

access

to it.

Cheers,
Jason

On Tue, Sep 20, 2022 at 12:58 AM Νικόλαος Ταμπουρατζής <
ntampouratzis@ece.auth.gr> wrote:

Dear Bobby,

  1. I use the original riscv-fs.py which is provided in

the

latest

gem5

release.
I run the gem5 once (./build/RISCV/gem5.fast -d

./HPCG_FS_results

./configs/example/gem5_library/riscv-fs.py) in order to

download

the

riscv-bootloader-vmlinux-5.10 and riscv-disk-img.
After this I mount the riscv-disk-img (sudo mount -o loop
riscv-disk-img /mnt), put the xhpcg executable and I do

the

following

changes in riscv-fs.py to boot the riscv-disk-img with

executable:

image = CustomDiskImageResource(
local_path =

"/home/cossim/.cache/gem5/riscv-disk-img",

)

Set the Full System workload.

board.set_kernel_disk_workload(

kernel=Resource("riscv-bootloader-vmlinux-5.10"),

                 disk_image=image,

)

Finally, in the

gem5/src/python/gem5/components/boards/riscv_board.py

I change the last line to "return ["console=ttyS0",
"root={root_value}", "rw"]" in order to allow the write

permissions

in

the image.

  1. The HPCG benchmark after some iterations calculates if

the

results

are valid or not valid. In the case of FS it gives

invalid

results.

As

I see from the results, one (at least) problem is that

produces

different results in each HPCG execution (with the same

configuration).

Here is the HPCG output and riscv-fs.py
(http://kition.mhl.tuc.gr:8000/d/68d82f3533/). You may

reproduce

the

results in the video if you use the xhpcg executable
(http://kition.mhl.tuc.gr:8000/f/4ca25fdd3c/)

Please help me in order to solve it!

Finally, I get invalid results in the HPL benchmark in FS

mode

too.

Best regards,
Nikos

Quoting Bobby Bruce bbruce@ucdavis.edu:

I'm going to need a bit more information to help:

  1. In what way have you modified
    ./configs/example/gem5_library/riscv-fs.py? Can you

attach

the

script

here?

  1. What error are you getting or in what way are the

results

invalid?

Dr. Bobby R. Bruce
Room 3050,
Kemper Hall, UC Davis
Davis,
CA, 95616

web: https://www.bobbybruce.net

On Mon, Sep 19, 2022 at 1:43 PM Νικόλαος Ταμπουρατζής <
ntampouratzis@ece.auth.gr> wrote:

Dear gem5 community,

I have successfully cross-compile the HPCG benchmark

for

RISCV

(Serial

version, without MPI and OpenMP). While it working

properly

in

gem5

SE

mode (./build/RISCV/gem5.fast -d ./HPCG_SE_results
./configs/example/se.py -c xhpcg --options '--nx=16

--ny=16

--nz=16

--npx=1 --npy=1 --npz=1 --rt=0.1'), I get invalid

results

in FS

simulation using "./build/RISCV/gem5.fast -d

./HPCG_FS_results

./configs/example/gem5_library/riscv-fs.py" (I mount

the

riscv

image

and put it).

Can you help me please?

In addition, I used the RISCV Ubuntu image
(

),

I installed the gcc compiler, compile it (through

qemu)

and

I

get

wrong results too.

Here is the Makefile which I use, the hpcg executable

for

RISCV

(xhpcg), and a video that shows the results
(http://kition.mhl.tuc.gr:8000/f/4ca25fdd3c/).

P.S. I use the latest gem5 version.

Thank you in advance! :)

Best regards,
Nikos


gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to


gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to


gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-leave@gem5.org


gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-leave@gem5.org


gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-leave@gem5.org


gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-leave@gem5.org


gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-leave@gem5.org


gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-leave@gem5.org


gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-leave@gem5.org


gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-leave@gem5.org

You mean this bug? Unfortunately not, I've been very busy with the upcoming gem5 release and haven't had time to investigate this further. -- Dr. Bobby R. Bruce Room 3050, Kemper Hall, UC Davis Davis, CA, 95616 web: https://www.bobbybruce.net On Mon, Oct 31, 2022 at 1:45 AM Νικόλαος Ταμπουρατζής via gem5-users < gem5-users@gem5.org> wrote: > Dear Bobby, Jason, all, > > Is there any update about the accuracy of RISC-V FS? > > Best regards, > Nikos > > > Quoting Bobby Bruce <bbruce@ucdavis.edu>: > > > Jason and I had a theory that this may be due to the "Rounding Mode" for > > floating pointing being set incorrectly in FS mode. That's set via a > macro > > here: > > > https://gem5.googlesource.com/public/gem5/+/refs/tags/v22.0.0.2/src/arch/riscv/fp_inst.hh#36 > > > > I manually expanded the macro here: > > > https://gem5.googlesource.com/public/gem5/+/refs/tags/v22.0.0.2/src/arch/riscv/isa/decoder.isa#1495 > , > > inside the "fsqrt_d" definition then compiled "build/ALL/gem5.debug". > Then > > used gdb to add a breakpoint in the "Fsqrt_d::execute" function (in the > > generated "build/ALL/arch/riscv/generated/exec-ns.cc.inc" file). > > > > ``` > > gdb build/ALL/gem5.opt > > break Fsqrt_d::execute > > run bug-recreation/se-mode-run.py # or `run > bug-recreation/fs-mode-run.py` > > ``` > > > > Stepping through with gdb I the rounding mode is `0` for SE mode and `0` > > for FS mode as well. So, no luck with that theory. > > > > My new theory is that this bug has something to do with thread context > > switching being implemented incorrectly in RISC-V somehow. I find it > > strange that the sqrt(1) works fine for a while (i.e. returns `1`) then > > suddenly starts returning zero after a certain point in the execution. In > > addition, it's odd that the loop is not returning the same value each > time > > despite executing the same code. It'd make sense to me that the thread is > > being stored and then resumed with some corruption of the floating point > > data. This would also explain why this bug only occurs in FS mode. > > > > I'll try to find time to figure out a good test for this. If anyone has > any > > other theories or ideas then let me know. > > > > -- > > Dr. Bobby R. Bruce > > Room 3050, > > Kemper Hall, UC Davis > > Davis, > > CA, 95616 > > > > web: https://www.bobbybruce.net > > > > > > On Fri, Oct 7, 2022 at 12:50 PM Νικόλαος Ταμπουρατζής < > > ntampouratzis@ece.auth.gr> wrote: > >> > >> Dear Jason & Boddy, > >> > >> Unfortunately, I have tried my simple example without the sqrt > >> function and the problem remains. Specifically, I have the following > >> simple code: > >> > >> > >> #include <cmath> > >> #include <stdio.h> > >> > >> int main(){ > >> > >> int dim = 1024; > >> > >> double result; > >> > >> for (int iter = 0; iter < 2; iter++){ > >> result = 0; > >> for (int i = 0; i < dim; i++){ > >> for (int j = 0; j < dim; j++){ > >> result += i * j; > >> } > >> } > >> printf("Final Result: %lf\n", result); > >> } > >> } > >> > >> > >> In the above code, the correct result is 274341298176.000000 (from > >> RISCV-SE mode and x86), while in FS mode I get sometimes the correct > >> result and other times a different number. > >> > >> Best regards, > >> Nikos > >> > >> > >> Quoting Jason Lowe-Power <jason@lowepower.com>: > >> > >> > I have an idea... > >> > > >> > Have you put a breakpoint in the implementation of the fsqrt_d > > function? I > >> > would like to know if when running in SE mode and running in FS mode > we > > are > >> > using the same rounding mode. My hypothesis is that in FS mode the > > rounding > >> > mode is set differently. > >> > > >> > Cheers, > >> > Jason > >> > > >> > On Fri, Oct 7, 2022 at 12:15 AM Νικόλαος Ταμπουρατζής < > >> > ntampouratzis@ece.auth.gr> wrote: > >> > > >> >> Dear Boddy, > >> >> > >> >> Thanks a lot for the effort! I looked in detail and I observe that > the > >> >> problem is created only using float and double variables (in the case > >> >> of int it is working properly in FS mode). Specifically, in the case > >> >> of float the variables are set to "nan", while in the case of double > >> >> the variables are set to 0.000000 (in random time - probably from > some > >> >> instruction of simulated OS?). You may use a simple c/c++ example in > >> >> order to get some traces before going to HPCG... > >> >> > >> >> Thank you in advance!! > >> >> Best regards, > >> >> Nikos > >> >> > >> >> > >> >> Quoting Bobby Bruce <bbruce@ucdavis.edu>: > >> >> > >> >> > Hey Niko, > >> >> > > >> >> > Thanks for this analysis. I jumped a little into this today but > > didn't > >> >> get > >> >> > as far as you did. I wanted to find a quick way to recreate the > >> >> following: > >> >> > https://gem5-review.googlesource.com/c/public/gem5/+/64211. > Please > > feel > >> >> > free to use this, if it helps any. > >> >> > > >> >> > It's very strange to me that this bug hasn't manifested itself > > before but > >> >> > it's undeniably there. I'll try to spend more time looking at this > >> >> tomorrow > >> >> > with some traces and debug flags and see if I can narrow down the > >> >> problem. > >> >> > > >> >> > -- > >> >> > Dr. Bobby R. Bruce > >> >> > Room 3050, > >> >> > Kemper Hall, UC Davis > >> >> > Davis, > >> >> > CA, 95616 > >> >> > > >> >> > web: https://www.bobbybruce.net > >> >> > > >> >> > > >> >> > On Wed, Oct 5, 2022 at 2:26 PM Νικόλαος Ταμπουρατζής < > >> >> > ntampouratzis@ece.auth.gr> wrote: > >> >> > > >> >> >> In my previous results, I had used double (not float) for the > >> >> >> following variables: result, sq_i and sq_j. In the case of float > >> >> >> instead of double I get "nan" and not 0.000000. > >> >> >> > >> >> >> Quoting Νικόλαος Ταμπουρατζής <ntampouratzis@ece.auth.gr>: > >> >> >> > >> >> >> > Dear Jason, all, > >> >> >> > > >> >> >> > I am trying to find the accuracy problem with RISCV-FS and I > > observe > >> >> >> > that the problem is created (at least in my dummy example) > because > >> >> >> > the variables (double) are set to zero in random simulated time > > (for > >> >> >> > this reason I get different results among executions of the same > >> >> >> > code). Specifically for the following dummy code: > >> >> >> > > >> >> >> > > >> >> >> > #include <cmath> > >> >> >> > #include <stdio.h> > >> >> >> > > >> >> >> > int main(){ > >> >> >> > > >> >> >> > int dim = 10; > >> >> >> > > >> >> >> > float result; > >> >> >> > > >> >> >> > for (int iter = 0; iter < 2; iter++){ > >> >> >> > result = 0; > >> >> >> > for (int i = 0; i < dim; i++){ > >> >> >> > for (int j = 0; j < dim; j++){ > >> >> >> > float sq_i = sqrt(i); > >> >> >> > float sq_j = sqrt(j); > >> >> >> > result += sq_i * sq_j; > >> >> >> > printf("ITER: %d | i: %d | j: %d Result(i: %f | > j: > >> >> >> > %f | i*j: %f): %f\n", iter, i , j, sq_i, sq_j, sq_i * sq_j, > > result); > >> >> >> > } > >> >> >> > } > >> >> >> > printf("Final Result: %lf\n", result); > >> >> >> > } > >> >> >> > } > >> >> >> > > >> >> >> > > >> >> >> > The correct Final Result in both iterations is 372.721656. > > However, > >> >> >> > I get the following results in FS: > >> >> >> > > >> >> >> > ITER: 0 | i: 0 | j: 0 Result(i: 0.000000 | j: 0.000000 | i*j: > >> >> >> > 0.000000): 0.000000 > >> >> >> > ITER: 0 | i: 0 | j: 1 Result(i: 0.000000 | j: 1.000000 | i*j: > >> >> >> > 0.000000): 0.000000 > >> >> >> > ITER: 0 | i: 0 | j: 2 Result(i: 0.000000 | j: 1.414214 | i*j: > >> >> >> > 0.000000): 0.000000 > >> >> >> > ITER: 0 | i: 0 | j: 3 Result(i: 0.000000 | j: 1.732051 | i*j: > >> >> >> > 0.000000): 0.000000 > >> >> >> > ITER: 0 | i: 0 | j: 4 Result(i: 0.000000 | j: 2.000000 | i*j: > >> >> >> > 0.000000): 0.000000 > >> >> >> > ITER: 0 | i: 0 | j: 5 Result(i: 0.000000 | j: 2.236068 | i*j: > >> >> >> > 0.000000): 0.000000 > >> >> >> > ITER: 0 | i: 0 | j: 6 Result(i: 0.000000 | j: 2.449490 | i*j: > >> >> >> > 0.000000): 0.000000 > >> >> >> > ITER: 0 | i: 0 | j: 7 Result(i: 0.000000 | j: 2.645751 | i*j: > >> >> >> > 0.000000): 0.000000 > >> >> >> > ITER: 0 | i: 0 | j: 8 Result(i: 0.000000 | j: 2.828427 | i*j: > >> >> >> > 0.000000): 0.000000 > >> >> >> > ITER: 0 | i: 0 | j: 9 Result(i: 0.000000 | j: 3.000000 | i*j: > >> >> >> > 0.000000): 0.000000 > >> >> >> > ITER: 0 | i: 1 | j: 0 Result(i: 1.000000 | j: 0.000000 | i*j: > >> >> >> > 0.000000): 0.000000 > >> >> >> > ITER: 0 | i: 1 | j: 1 Result(i: 1.000000 | j: 1.000000 | i*j: > >> >> >> > 1.000000): 1.000000 > >> >> >> > ITER: 0 | i: 1 | j: 2 Result(i: 1.000000 | j: 1.414214 | i*j: > >> >> >> > 1.414214): 2.414214 > >> >> >> > ITER: 0 | i: 1 | j: 3 Result(i: 1.000000 | j: 1.732051 | i*j: > >> >> >> > 1.732051): 4.146264 > >> >> >> > ITER: 0 | i: 1 | j: 4 Result(i: 0.000000 | j: 2.000000 | i*j: > >> >> >> > 0.000000): 0.000000 > >> >> >> > ITER: 0 | i: 1 | j: 5 Result(i: 0.000000 | j: 2.236068 | i*j: > >> >> >> > 0.000000): 0.000000 > >> >> >> > ITER: 0 | i: 1 | j: 6 Result(i: 0.000000 | j: 2.449490 | i*j: > >> >> >> > 0.000000): 0.000000 > >> >> >> > ITER: 0 | i: 1 | j: 7 Result(i: 0.000000 | j: 2.645751 | i*j: > >> >> >> > 0.000000): 0.000000 > >> >> >> > ITER: 0 | i: 1 | j: 8 Result(i: 0.000000 | j: 2.828427 | i*j: > >> >> >> > 0.000000): 0.000000 > >> >> >> > ITER: 0 | i: 1 | j: 9 Result(i: 0.000000 | j: 3.000000 | i*j: > >> >> >> > 0.000000): 0.000000 > >> >> >> > ITER: 0 | i: 2 | j: 0 Result(i: 1.414214 | j: 0.000000 | i*j: > >> >> >> > 0.000000): 0.000000 > >> >> >> > ITER: 0 | i: 2 | j: 1 Result(i: 1.414214 | j: 1.000000 | i*j: > >> >> >> > 1.414214): 1.414214 > >> >> >> > ITER: 0 | i: 2 | j: 2 Result(i: 1.414214 | j: 1.414214 | i*j: > >> >> >> > 2.000000): 3.414214 > >> >> >> > ITER: 0 | i: 2 | j: 3 Result(i: 1.414214 | j: 1.732051 | i*j: > >> >> >> > 2.449490): 5.863703 > >> >> >> > ITER: 0 | i: 2 | j: 4 Result(i: 1.414214 | j: 2.000000 | i*j: > >> >> >> > 2.828427): 8.692130 > >> >> >> > ITER: 0 | i: 2 | j: 5 Result(i: 1.414214 | j: 2.236068 | i*j: > >> >> >> > 3.162278): 11.854408 > >> >> >> > ITER: 0 | i: 2 | j: 6 Result(i: 1.414214 | j: 2.449490 | i*j: > >> >> >> > 3.464102): 15.318510 > >> >> >> > ITER: 0 | i: 2 | j: 7 Result(i: 1.414214 | j: 2.645751 | i*j: > >> >> >> > 3.741657): 19.060167 > >> >> >> > ITER: 0 | i: 2 | j: 8 Result(i: 1.414214 | j: 2.828427 | i*j: > >> >> >> > 4.000000): 23.060167 > >> >> >> > ITER: 0 | i: 2 | j: 9 Result(i: 1.414214 | j: 3.000000 | i*j: > >> >> >> > 4.242641): 27.302808 > >> >> >> > ITER: 0 | i: 3 | j: 0 Result(i: 1.732051 | j: 0.000000 | i*j: > >> >> >> > 0.000000): 27.302808 > >> >> >> > ITER: 0 | i: 3 | j: 1 Result(i: 1.732051 | j: 1.000000 | i*j: > >> >> >> > 1.732051): 29.034859 > >> >> >> > ITER: 0 | i: 3 | j: 2 Result(i: 1.732051 | j: 1.414214 | i*j: > >> >> >> > 2.449490): 31.484348 > >> >> >> > ITER: 0 | i: 3 | j: 3 Result(i: 1.732051 | j: 1.732051 | i*j: > >> >> >> > 3.000000): 34.484348 > >> >> >> > ITER: 0 | i: 3 | j: 4 Result(i: 1.732051 | j: 2.000000 | i*j: > >> >> >> > 3.464102): 37.948450 > >> >> >> > ITER: 0 | i: 3 | j: 5 Result(i: 1.732051 | j: 2.236068 | i*j: > >> >> >> > 3.872983): 41.821433 > >> >> >> > ITER: 0 | i: 3 | j: 6 Result(i: 1.732051 | j: 2.449490 | i*j: > >> >> >> > 4.242641): 46.064074 > >> >> >> > ITER: 0 | i: 3 | j: 7 Result(i: 1.732051 | j: 2.645751 | i*j: > >> >> >> > 4.582576): 50.646650 > >> >> >> > ITER: 0 | i: 3 | j: 8 Result(i: 1.732051 | j: 2.828427 | i*j: > >> >> >> > 4.898979): 55.545629 > >> >> >> > ITER: 0 | i: 3 | j: 9 Result(i: 1.732051 | j: 3.000000 | i*j: > >> >> >> > 5.196152): 60.741782 > >> >> >> > ITER: 0 | i: 4 | j: 0 Result(i: 2.000000 | j: 0.000000 | i*j: > >> >> >> > 0.000000): 60.741782 > >> >> >> > ITER: 0 | i: 4 | j: 1 Result(i: 2.000000 | j: 1.000000 | i*j: > >> >> >> > 2.000000): 62.741782 > >> >> >> > ITER: 0 | i: 4 | j: 2 Result(i: 2.000000 | j: 1.414214 | i*j: > >> >> >> > 2.828427): 65.570209 > >> >> >> > ITER: 0 | i: 4 | j: 3 Result(i: 2.000000 | j: 1.732051 | i*j: > >> >> >> > 3.464102): 69.034310 > >> >> >> > ITER: 0 | i: 4 | j: 4 Result(i: 2.000000 | j: 2.000000 | i*j: > >> >> >> > 4.000000): 73.034310 > >> >> >> > ITER: 0 | i: 4 | j: 5 Result(i: 2.000000 | j: 2.236068 | i*j: > >> >> >> > 4.472136): 77.506446 > >> >> >> > ITER: 0 | i: 4 | j: 6 Result(i: 2.000000 | j: 2.449490 | i*j: > >> >> >> > 4.898979): 82.405426 > >> >> >> > ITER: 0 | i: 4 | j: 7 Result(i: 2.000000 | j: 2.645751 | i*j: > >> >> >> > 5.291503): 87.696928 > >> >> >> > ITER: 0 | i: 4 | j: 8 Result(i: 2.000000 | j: 2.828427 | i*j: > >> >> >> > 5.656854): 93.353783 > >> >> >> > ITER: 0 | i: 4 | j: 9 Result(i: 2.000000 | j: 3.000000 | i*j: > >> >> >> > 6.000000): 99.353783 > >> >> >> > ITER: 0 | i: 5 | j: 0 Result(i: 2.236068 | j: 0.000000 | i*j: > >> >> >> > 0.000000): 99.353783 > >> >> >> > ITER: 0 | i: 5 | j: 1 Result(i: 2.236068 | j: 1.000000 | i*j: > >> >> >> > 2.236068): 101.589851 > >> >> >> > ITER: 0 | i: 5 | j: 2 Result(i: 2.236068 | j: 1.414214 | i*j: > >> >> >> > 3.162278): 104.752128 > >> >> >> > ITER: 0 | i: 5 | j: 3 Result(i: 2.236068 | j: 1.732051 | i*j: > >> >> >> > 3.872983): 108.625112 > >> >> >> > ITER: 0 | i: 5 | j: 4 Result(i: 2.236068 | j: 2.000000 | i*j: > >> >> >> > 4.472136): 113.097248 > >> >> >> > ITER: 0 | i: 5 | j: 5 Result(i: 2.236068 | j: 2.236068 | i*j: > >> >> >> > 5.000000): 118.097248 > >> >> >> > ITER: 0 | i: 5 | j: 6 Result(i: 2.236068 | j: 2.449490 | i*j: > >> >> >> > 5.477226): 123.574473 > >> >> >> > ITER: 0 | i: 5 | j: 7 Result(i: 2.236068 | j: 2.645751 | i*j: > >> >> >> > 5.916080): 129.490553 > >> >> >> > ITER: 0 | i: 5 | j: 8 Result(i: 2.236068 | j: 2.828427 | i*j: > >> >> >> > 6.324555): 135.815108 > >> >> >> > ITER: 0 | i: 5 | j: 9 Result(i: 2.236068 | j: 3.000000 | i*j: > >> >> >> > 6.708204): 142.523312 > >> >> >> > ITER: 0 | i: 6 | j: 0 Result(i: 2.449490 | j: 0.000000 | i*j: > >> >> >> > 0.000000): 142.523312 > >> >> >> > ITER: 0 | i: 6 | j: 1 Result(i: 2.449490 | j: 1.000000 | i*j: > >> >> >> > 2.449490): 144.972802 > >> >> >> > ITER: 0 | i: 6 | j: 2 Result(i: 2.449490 | j: 1.414214 | i*j: > >> >> >> > 3.464102): 148.436904 > >> >> >> > ITER: 0 | i: 6 | j: 3 Result(i: 2.449490 | j: 1.732051 | i*j: > >> >> >> > 4.242641): 152.679544 > >> >> >> > ITER: 0 | i: 6 | j: 4 Result(i: 2.449490 | j: 2.000000 | i*j: > >> >> >> > 4.898979): 157.578524 > >> >> >> > ITER: 0 | i: 6 | j: 5 Result(i: 2.449490 | j: 2.236068 | i*j: > >> >> >> > 5.477226): 163.055749 > >> >> >> > ITER: 0 | i: 6 | j: 6 Result(i: 2.449490 | j: 2.449490 | i*j: > >> >> >> > 6.000000): 169.055749 > >> >> >> > ITER: 0 | i: 6 | j: 7 Result(i: 2.449490 | j: 2.645751 | i*j: > >> >> >> > 6.480741): 175.536490 > >> >> >> > ITER: 0 | i: 6 | j: 8 Result(i: 2.449490 | j: 2.828427 | i*j: > >> >> >> > 6.928203): 182.464693 > >> >> >> > ITER: 0 | i: 6 | j: 9 Result(i: 2.449490 | j: 3.000000 | i*j: > >> >> >> > 7.348469): 189.813162 > >> >> >> > ITER: 0 | i: 7 | j: 0 Result(i: 2.645751 | j: 0.000000 | i*j: > >> >> >> > 0.000000): 189.813162 > >> >> >> > ITER: 0 | i: 7 | j: 1 Result(i: 2.645751 | j: 1.000000 | i*j: > >> >> >> > 2.645751): 192.458914 > >> >> >> > ITER: 0 | i: 7 | j: 2 Result(i: 2.645751 | j: 1.414214 | i*j: > >> >> >> > 3.741657): 196.200571 > >> >> >> > ITER: 0 | i: 7 | j: 3 Result(i: 2.645751 | j: 1.732051 | i*j: > >> >> >> > 4.582576): 200.783147 > >> >> >> > ITER: 0 | i: 7 | j: 4 Result(i: 2.645751 | j: 2.000000 | i*j: > >> >> >> > 5.291503): 206.074649 > >> >> >> > ITER: 0 | i: 7 | j: 5 Result(i: 2.645751 | j: 2.236068 | i*j: > >> >> >> > 5.916080): 211.990729 > >> >> >> > ITER: 0 | i: 7 | j: 6 Result(i: 2.645751 | j: 2.449490 | i*j: > >> >> >> > 6.480741): 218.471470 > >> >> >> > ITER: 0 | i: 7 | j: 7 Result(i: 2.645751 | j: 2.645751 | i*j: > >> >> >> > 7.000000): 225.471470 > >> >> >> > ITER: 0 | i: 7 | j: 8 Result(i: 2.645751 | j: 2.828427 | i*j: > >> >> >> > 7.483315): 232.954785 > >> >> >> > ITER: 0 | i: 7 | j: 9 Result(i: 2.645751 | j: 3.000000 | i*j: > >> >> >> > 7.937254): 240.892039 > >> >> >> > ITER: 0 | i: 8 | j: 0 Result(i: 2.828427 | j: 0.000000 | i*j: > >> >> >> > 0.000000): 240.892039 > >> >> >> > ITER: 0 | i: 8 | j: 1 Result(i: 2.828427 | j: 1.000000 | i*j: > >> >> >> > 2.828427): 243.720466 > >> >> >> > ITER: 0 | i: 8 | j: 2 Result(i: 2.828427 | j: 1.414214 | i*j: > >> >> >> > 4.000000): 247.720466 > >> >> >> > ITER: 0 | i: 8 | j: 3 Result(i: 2.828427 | j: 1.732051 | i*j: > >> >> >> > 4.898979): 252.619445 > >> >> >> > ITER: 0 | i: 8 | j: 4 Result(i: 2.828427 | j: 2.000000 | i*j: > >> >> >> > 5.656854): 258.276300 > >> >> >> > ITER: 0 | i: 8 | j: 5 Result(i: 2.828427 | j: 2.236068 | i*j: > >> >> >> > 6.324555): 264.600855 > >> >> >> > ITER: 0 | i: 8 | j: 6 Result(i: 2.828427 | j: 2.449490 | i*j: > >> >> >> > 6.928203): 271.529058 > >> >> >> > ITER: 0 | i: 8 | j: 7 Result(i: 2.828427 | j: 2.645751 | i*j: > >> >> >> > 7.483315): 279.012373 > >> >> >> > ITER: 0 | i: 8 | j: 8 Result(i: 2.828427 | j: 2.828427 | i*j: > >> >> >> > 8.000000): 287.012373 > >> >> >> > ITER: 0 | i: 8 | j: 9 Result(i: 2.828427 | j: 3.000000 | i*j: > >> >> >> > 8.485281): 295.497654 > >> >> >> > ITER: 0 | i: 9 | j: 0 Result(i: 3.000000 | j: 0.000000 | i*j: > >> >> >> > 0.000000): 295.497654 > >> >> >> > ITER: 0 | i: 9 | j: 1 Result(i: 3.000000 | j: 1.000000 | i*j: > >> >> >> > 3.000000): 298.497654 > >> >> >> > ITER: 0 | i: 9 | j: 2 Result(i: 3.000000 | j: 1.414214 | i*j: > >> >> >> > 4.242641): 302.740295 > >> >> >> > ITER: 0 | i: 9 | j: 3 Result(i: 3.000000 | j: 1.732051 | i*j: > >> >> >> > 5.196152): 307.936447 > >> >> >> > ITER: 0 | i: 9 | j: 4 Result(i: 3.000000 | j: 2.000000 | i*j: > >> >> >> > 6.000000): 313.936447 > >> >> >> > ITER: 0 | i: 9 | j: 5 Result(i: 3.000000 | j: 2.236068 | i*j: > >> >> >> > 6.708204): 320.644651 > >> >> >> > ITER: 0 | i: 9 | j: 6 Result(i: 3.000000 | j: 2.449490 | i*j: > >> >> >> > 7.348469): 327.993120 > >> >> >> > ITER: 0 | i: 9 | j: 7 Result(i: 3.000000 | j: 2.645751 | i*j: > >> >> >> > 7.937254): 335.930374 > >> >> >> > ITER: 0 | i: 9 | j: 8 Result(i: 3.000000 | j: 2.828427 | i*j: > >> >> >> > 8.485281): 344.415656 > >> >> >> > ITER: 0 | i: 9 | j: 9 Result(i: 3.000000 | j: 3.000000 | i*j: > >> >> >> > 9.000000): 353.415656 > >> >> >> > Final Result: 353.415656 > >> >> >> > ITER: 1 | i: 0 | j: 0 Result(i: 0.000000 | j: 0.000000 | i*j: > >> >> >> > 0.000000): 0.000000 > >> >> >> > ITER: 1 | i: 0 | j: 1 Result(i: 0.000000 | j: 1.000000 | i*j: > >> >> >> > 0.000000): 0.000000 > >> >> >> > ITER: 1 | i: 0 | j: 2 Result(i: 0.000000 | j: 1.414214 | i*j: > >> >> >> > 0.000000): 0.000000 > >> >> >> > ITER: 1 | i: 0 | j: 3 Result(i: 0.000000 | j: 1.732051 | i*j: > >> >> >> > 0.000000): 0.000000 > >> >> >> > ITER: 1 | i: 0 | j: 4 Result(i: 0.000000 | j: 2.000000 | i*j: > >> >> >> > 0.000000): 0.000000 > >> >> >> > ITER: 1 | i: 0 | j: 5 Result(i: 0.000000 | j: 2.236068 | i*j: > >> >> >> > 0.000000): 0.000000 > >> >> >> > ITER: 1 | i: 0 | j: 6 Result(i: 0.000000 | j: 2.449490 | i*j: > >> >> >> > 0.000000): 0.000000 > >> >> >> > ITER: 1 | i: 0 | j: 7 Result(i: 0.000000 | j: 2.645751 | i*j: > >> >> >> > 0.000000): 0.000000 > >> >> >> > ITER: 1 | i: 0 | j: 8 Result(i: 0.000000 | j: 2.828427 | i*j: > >> >> >> > 0.000000): 0.000000 > >> >> >> > ITER: 1 | i: 0 | j: 9 Result(i: 0.000000 | j: 3.000000 | i*j: > >> >> >> > 0.000000): 0.000000 > >> >> >> > ITER: 1 | i: 1 | j: 0 Result(i: 1.000000 | j: 0.000000 | i*j: > >> >> >> > 0.000000): 0.000000 > >> >> >> > ITER: 1 | i: 1 | j: 1 Result(i: 1.000000 | j: 1.000000 | i*j: > >> >> >> > 1.000000): 1.000000 > >> >> >> > ITER: 1 | i: 1 | j: 2 Result(i: 1.000000 | j: 1.414214 | i*j: > >> >> >> > 1.414214): 2.414214 > >> >> >> > ITER: 1 | i: 1 | j: 3 Result(i: 1.000000 | j: 1.732051 | i*j: > >> >> >> > 1.732051): 4.146264 > >> >> >> > ITER: 1 | i: 1 | j: 4 Result(i: 1.000000 | j: 2.000000 | i*j: > >> >> >> > 2.000000): 6.146264 > >> >> >> > ITER: 1 | i: 1 | j: 5 Result(i: 1.000000 | j: 2.236068 | i*j: > >> >> >> > 2.236068): 8.382332 > >> >> >> > ITER: 1 | i: 1 | j: 6 Result(i: 1.000000 | j: 2.449490 | i*j: > >> >> >> > 2.449490): 10.831822 > >> >> >> > ITER: 1 | i: 1 | j: 7 Result(i: 1.000000 | j: 2.645751 | i*j: > >> >> >> > 2.645751): 13.477573 > >> >> >> > ITER: 1 | i: 1 | j: 8 Result(i: 1.000000 | j: 2.828427 | i*j: > >> >> >> > 2.828427): 16.306001 > >> >> >> > ITER: 1 | i: 1 | j: 9 Result(i: 1.000000 | j: 3.000000 | i*j: > >> >> >> > 3.000000): 19.306001 > >> >> >> > ITER: 1 | i: 2 | j: 0 Result(i: 1.414214 | j: 0.000000 | i*j: > >> >> >> > 0.000000): 19.306001 > >> >> >> > ITER: 1 | i: 2 | j: 1 Result(i: 1.414214 | j: 1.000000 | i*j: > >> >> >> > 1.414214): 20.720214 > >> >> >> > ITER: 1 | i: 2 | j: 2 Result(i: 1.414214 | j: 1.414214 | i*j: > >> >> >> > 2.000000): 22.720214 > >> >> >> > ITER: 1 | i: 2 | j: 3 Result(i: 1.414214 | j: 1.732051 | i*j: > >> >> >> > 2.449490): 25.169704 > >> >> >> > ITER: 1 | i: 2 | j: 4 Result(i: 1.414214 | j: 2.000000 | i*j: > >> >> >> > 2.828427): 27.998131 > >> >> >> > ITER: 1 | i: 2 | j: 5 Result(i: 1.414214 | j: 2.236068 | i*j: > >> >> >> > 3.162278): 31.160409 > >> >> >> > ITER: 1 | i: 2 | j: 6 Result(i: 1.414214 | j: 2.449490 | i*j: > >> >> >> > 3.464102): 34.624510 > >> >> >> > ITER: 1 | i: 2 | j: 7 Result(i: 1.414214 | j: 2.645751 | i*j: > >> >> >> > 3.741657): 38.366168 > >> >> >> > ITER: 1 | i: 2 | j: 8 Result(i: 1.414214 | j: 2.828427 | i*j: > >> >> >> > 4.000000): 42.366168 > >> >> >> > ITER: 1 | i: 2 | j: 9 Result(i: 1.414214 | j: 3.000000 | i*j: > >> >> >> > 4.242641): 46.608808 > >> >> >> > ITER: 1 | i: 3 | j: 0 Result(i: 1.732051 | j: 0.000000 | i*j: > >> >> >> > 0.000000): 46.608808 > >> >> >> > ITER: 1 | i: 3 | j: 1 Result(i: 1.732051 | j: 1.000000 | i*j: > >> >> >> > 1.732051): 48.340859 > >> >> >> > ITER: 1 | i: 3 | j: 2 Result(i: 1.732051 | j: 1.414214 | i*j: > >> >> >> > 2.449490): 50.790349 > >> >> >> > ITER: 1 | i: 3 | j: 3 Result(i: 1.732051 | j: 1.732051 | i*j: > >> >> >> > 3.000000): 53.790349 > >> >> >> > ITER: 1 | i: 3 | j: 4 Result(i: 1.732051 | j: 2.000000 | i*j: > >> >> >> > 3.464102): 57.254450 > >> >> >> > ITER: 1 | i: 3 | j: 5 Result(i: 1.732051 | j: 2.236068 | i*j: > >> >> >> > 3.872983): 61.127434 > >> >> >> > ITER: 1 | i: 3 | j: 6 Result(i: 1.732051 | j: 2.449490 | i*j: > >> >> >> > 4.242641): 65.370075 > >> >> >> > ITER: 1 | i: 3 | j: 7 Result(i: 1.732051 | j: 2.645751 | i*j: > >> >> >> > 4.582576): 69.952650 > >> >> >> > ITER: 1 | i: 3 | j: 8 Result(i: 1.732051 | j: 2.828427 | i*j: > >> >> >> > 4.898979): 74.851630 > >> >> >> > ITER: 1 | i: 3 | j: 9 Result(i: 1.732051 | j: 3.000000 | i*j: > >> >> >> > 5.196152): 80.047782 > >> >> >> > ITER: 1 | i: 4 | j: 0 Result(i: 2.000000 | j: 0.000000 | i*j: > >> >> >> > 0.000000): 80.047782 > >> >> >> > ITER: 1 | i: 4 | j: 1 Result(i: 2.000000 | j: 1.000000 | i*j: > >> >> >> > 2.000000): 82.047782 > >> >> >> > ITER: 1 | i: 4 | j: 2 Result(i: 2.000000 | j: 1.414214 | i*j: > >> >> >> > 2.828427): 84.876209 > >> >> >> > ITER: 1 | i: 4 | j: 3 Result(i: 2.000000 | j: 1.732051 | i*j: > >> >> >> > 3.464102): 88.340311 > >> >> >> > ITER: 1 | i: 4 | j: 4 Result(i: 2.000000 | j: 2.000000 | i*j: > >> >> >> > 4.000000): 92.340311 > >> >> >> > ITER: 1 | i: 4 | j: 5 Result(i: 2.000000 | j: 2.236068 | i*j: > >> >> >> > 4.472136): 96.812447 > >> >> >> > ITER: 1 | i: 4 | j: 6 Result(i: 2.000000 | j: 2.449490 | i*j: > >> >> >> > 4.898979): 101.711426 > >> >> >> > ITER: 1 | i: 4 | j: 7 Result(i: 2.000000 | j: 2.645751 | i*j: > >> >> >> > 5.291503): 107.002929 > >> >> >> > ITER: 1 | i: 4 | j: 8 Result(i: 2.000000 | j: 2.828427 | i*j: > >> >> >> > 5.656854): 112.659783 > >> >> >> > ITER: 1 | i: 4 | j: 9 Result(i: 2.000000 | j: 3.000000 | i*j: > >> >> >> > 6.000000): 118.659783 > >> >> >> > ITER: 1 | i: 5 | j: 0 Result(i: 2.236068 | j: 0.000000 | i*j: > >> >> >> > 0.000000): 118.659783 > >> >> >> > ITER: 1 | i: 5 | j: 1 Result(i: 2.236068 | j: 1.000000 | i*j: > >> >> >> > 2.236068): 120.895851 > >> >> >> > ITER: 1 | i: 5 | j: 2 Result(i: 2.236068 | j: 1.414214 | i*j: > >> >> >> > 3.162278): 124.058129 > >> >> >> > ITER: 1 | i: 5 | j: 3 Result(i: 2.236068 | j: 1.732051 | i*j: > >> >> >> > 3.872983): 127.931112 > >> >> >> > ITER: 1 | i: 5 | j: 4 Result(i: 2.236068 | j: 2.000000 | i*j: > >> >> >> > 4.472136): 132.403248 > >> >> >> > ITER: 1 | i: 5 | j: 5 Result(i: 2.236068 | j: 2.236068 | i*j: > >> >> >> > 5.000000): 137.403248 > >> >> >> > ITER: 1 | i: 5 | j: 6 Result(i: 2.236068 | j: 2.449490 | i*j: > >> >> >> > 5.477226): 142.880474 > >> >> >> > ITER: 1 | i: 5 | j: 7 Result(i: 2.236068 | j: 2.645751 | i*j: > >> >> >> > 5.916080): 148.796553 > >> >> >> > ITER: 1 | i: 5 | j: 8 Result(i: 2.236068 | j: 2.828427 | i*j: > >> >> >> > 6.324555): 155.121109 > >> >> >> > ITER: 1 | i: 5 | j: 9 Result(i: 2.236068 | j: 3.000000 | i*j: > >> >> >> > 6.708204): 161.829313 > >> >> >> > ITER: 1 | i: 6 | j: 0 Result(i: 2.449490 | j: 0.000000 | i*j: > >> >> >> > 0.000000): 161.829313 > >> >> >> > ITER: 1 | i: 6 | j: 1 Result(i: 2.449490 | j: 1.000000 | i*j: > >> >> >> > 2.449490): 164.278802 > >> >> >> > ITER: 1 | i: 6 | j: 2 Result(i: 2.449490 | j: 1.414214 | i*j: > >> >> >> > 3.464102): 167.742904 > >> >> >> > ITER: 1 | i: 6 | j: 3 Result(i: 2.449490 | j: 1.732051 | i*j: > >> >> >> > 4.242641): 171.985545 > >> >> >> > ITER: 1 | i: 6 | j: 4 Result(i: 2.449490 | j: 2.000000 | i*j: > >> >> >> > 4.898979): 176.884524 > >> >> >> > ITER: 1 | i: 6 | j: 5 Result(i: 2.449490 | j: 2.236068 | i*j: > >> >> >> > 5.477226): 182.361750 > >> >> >> > ITER: 1 | i: 6 | j: 6 Result(i: 2.449490 | j: 2.449490 | i*j: > >> >> >> > 6.000000): 188.361750 > >> >> >> > ITER: 1 | i: 6 | j: 7 Result(i: 2.449490 | j: 2.645751 | i*j: > >> >> >> > 6.480741): 194.842491 > >> >> >> > ITER: 1 | i: 6 | j: 8 Result(i: 2.449490 | j: 2.828427 | i*j: > >> >> >> > 6.928203): 201.770694 > >> >> >> > ITER: 1 | i: 6 | j: 9 Result(i: 2.449490 | j: 3.000000 | i*j: > >> >> >> > 7.348469): 209.119163 > >> >> >> > ITER: 1 | i: 7 | j: 0 Result(i: 2.645751 | j: 0.000000 | i*j: > >> >> >> > 0.000000): 209.119163 > >> >> >> > ITER: 1 | i: 7 | j: 1 Result(i: 2.645751 | j: 1.000000 | i*j: > >> >> >> > 2.645751): 211.764914 > >> >> >> > ITER: 1 | i: 7 | j: 2 Result(i: 2.645751 | j: 1.414214 | i*j: > >> >> >> > 3.741657): 215.506572 > >> >> >> > ITER: 1 | i: 7 | j: 3 Result(i: 2.645751 | j: 1.732051 | i*j: > >> >> >> > 4.582576): 220.089147 > >> >> >> > ITER: 1 | i: 7 | j: 4 Result(i: 2.645751 | j: 2.000000 | i*j: > >> >> >> > 5.291503): 225.380650 > >> >> >> > ITER: 1 | i: 7 | j: 5 Result(i: 2.645751 | j: 2.236068 | i*j: > >> >> >> > 5.916080): 231.296730 > >> >> >> > ITER: 1 | i: 7 | j: 6 Result(i: 2.645751 | j: 2.449490 | i*j: > >> >> >> > 6.480741): 237.777470 > >> >> >> > ITER: 1 | i: 7 | j: 7 Result(i: 2.645751 | j: 2.645751 | i*j: > >> >> >> > 7.000000): 244.777470 > >> >> >> > ITER: 1 | i: 7 | j: 8 Result(i: 2.645751 | j: 2.828427 | i*j: > >> >> >> > 7.483315): 252.260785 > >> >> >> > ITER: 1 | i: 7 | j: 9 Result(i: 2.645751 | j: 3.000000 | i*j: > >> >> >> > 7.937254): 260.198039 > >> >> >> > ITER: 1 | i: 8 | j: 0 Result(i: 2.828427 | j: 0.000000 | i*j: > >> >> >> > 0.000000): 260.198039 > >> >> >> > ITER: 1 | i: 8 | j: 1 Result(i: 2.828427 | j: 1.000000 | i*j: > >> >> >> > 2.828427): 263.026466 > >> >> >> > ITER: 1 | i: 8 | j: 2 Result(i: 2.828427 | j: 1.414214 | i*j: > >> >> >> > 4.000000): 267.026466 > >> >> >> > ITER: 1 | i: 8 | j: 3 Result(i: 2.828427 | j: 1.732051 | i*j: > >> >> >> > 4.898979): 271.925446 > >> >> >> > ITER: 1 | i: 8 | j: 4 Result(i: 2.828427 | j: 2.000000 | i*j: > >> >> >> > 5.656854): 277.582300 > >> >> >> > ITER: 1 | i: 8 | j: 5 Result(i: 2.828427 | j: 2.236068 | i*j: > >> >> >> > 6.324555): 283.906855 > >> >> >> > ITER: 1 | i: 8 | j: 6 Result(i: 2.828427 | j: 2.449490 | i*j: > >> >> >> > 6.928203): 290.835059 > >> >> >> > ITER: 1 | i: 8 | j: 7 Result(i: 2.828427 | j: 2.645751 | i*j: > >> >> >> > 7.483315): 298.318373 > >> >> >> > ITER: 1 | i: 8 | j: 8 Result(i: 2.828427 | j: 2.828427 | i*j: > >> >> >> > 8.000000): 306.318373 > >> >> >> > ITER: 1 | i: 8 | j: 9 Result(i: 2.828427 | j: 3.000000 | i*j: > >> >> >> > 8.485281): 314.803655 > >> >> >> > ITER: 1 | i: 9 | j: 0 Result(i: 3.000000 | j: 0.000000 | i*j: > >> >> >> > 0.000000): 314.803655 > >> >> >> > ITER: 1 | i: 9 | j: 1 Result(i: 3.000000 | j: 1.000000 | i*j: > >> >> >> > 3.000000): 317.803655 > >> >> >> > ITER: 1 | i: 9 | j: 2 Result(i: 3.000000 | j: 1.414214 | i*j: > >> >> >> > 4.242641): 322.046295 > >> >> >> > ITER: 1 | i: 9 | j: 3 Result(i: 3.000000 | j: 1.732051 | i*j: > >> >> >> > 5.196152): 327.242448 > >> >> >> > ITER: 1 | i: 9 | j: 4 Result(i: 3.000000 | j: 2.000000 | i*j: > >> >> >> > 6.000000): 333.242448 > >> >> >> > ITER: 1 | i: 9 | j: 5 Result(i: 3.000000 | j: 2.236068 | i*j: > >> >> >> > 6.708204): 339.950652 > >> >> >> > ITER: 1 | i: 9 | j: 6 Result(i: 3.000000 | j: 2.449490 | i*j: > >> >> >> > 7.348469): 347.299121 > >> >> >> > ITER: 1 | i: 9 | j: 7 Result(i: 3.000000 | j: 2.645751 | i*j: > >> >> >> > 7.937254): 355.236375 > >> >> >> > ITER: 1 | i: 9 | j: 8 Result(i: 3.000000 | j: 2.828427 | i*j: > >> >> >> > 8.485281): 363.721656 > >> >> >> > ITER: 1 | i: 9 | j: 9 Result(i: 3.000000 | j: 3.000000 | i*j: > >> >> >> > 9.000000): 372.721656 > >> >> >> > Final Result: 372.721656 > >> >> >> > > >> >> >> > > >> >> >> > > >> >> >> > As we can see in the following iterations the sqrt(1) as well as > > the > >> >> >> > result is set to zero for some reason. > >> >> >> > > >> >> >> > ITER: 0 | i: 1 | j: 4 Result(i: 0.000000 | j: 2.000000 | i*j: > >> >> >> > 0.000000): 0.000000 > >> >> >> > ITER: 0 | i: 1 | j: 5 Result(i: 0.000000 | j: 2.236068 | i*j: > >> >> >> > 0.000000): 0.000000 > >> >> >> > ITER: 0 | i: 1 | j: 6 Result(i: 0.000000 | j: 2.449490 | i*j: > >> >> >> > 0.000000): 0.000000 > >> >> >> > ITER: 0 | i: 1 | j: 7 Result(i: 0.000000 | j: 2.645751 | i*j: > >> >> >> > 0.000000): 0.000000 > >> >> >> > ITER: 0 | i: 1 | j: 8 Result(i: 0.000000 | j: 2.828427 | i*j: > >> >> >> > 0.000000): 0.000000 > >> >> >> > ITER: 0 | i: 1 | j: 9 Result(i: 0.000000 | j: 3.000000 | i*j: > >> >> >> > 0.000000): 0.000000 > >> >> >> > > >> >> >> > Please help me to resolve the accuracy issue! I think that it > will > >> >> >> > be very useful for gem5 community. > >> >> >> > > >> >> >> > To be noticed, I find the correct simulated tick in which the > >> >> >> > application started in FS (using m5 dumpstats), and I start the > >> >> >> > --debug-start, but the trace file which is generated is 10x > larger > >> >> >> > than SE mode for the same application. How can I compare them? > >> >> >> > > >> >> >> > Thank you in advance! > >> >> >> > Best regards, > >> >> >> > Nikos > >> >> >> > > >> >> >> > Quoting Νικόλαος Ταμπουρατζής <ntampouratzis@ece.auth.gr>: > >> >> >> > > >> >> >> >> Dear Jason, > >> >> >> >> > >> >> >> >> I am trying to use --debug-start but in FS mode it is very > >> >> >> >> difficult to find the tick on which the application is started! > >> >> >> >> > >> >> >> >> However, I am writing the following very simple c++ program: > >> >> >> >> > >> >> >> >> #include <cmath> > >> >> >> >> #include <stdio.h> > >> >> >> >> > >> >> >> >> int main(){ > >> >> >> >> > >> >> >> >> int dim = 4096; > >> >> >> >> > >> >> >> >> double result; > >> >> >> >> > >> >> >> >> for (int iter = 0; iter < 2; iter++){ > >> >> >> >> result = 0; > >> >> >> >> for (int i = 0; i < dim; i++){ > >> >> >> >> for (int j = 0; j < dim; j++){ > >> >> >> >> result += sqrt(i) * sqrt(j); > >> >> >> >> } > >> >> >> >> } > >> >> >> >> printf("Result: %lf\n", result); //Result: > > 30530733453.127449 > >> >> >> >> } > >> >> >> >> } > >> >> >> >> > >> >> >> >> I cross-compile it using: riscv64-linux-gnu-g++ -static -O3 -o > >> >> >> >> test_riscv test_riscv.cpp > >> >> >> >> > >> >> >> >> > >> >> >> >> While in X86 (without cross-compilation of course), QEMU-RISCV, > >> >> >> >> GEM5-SE the result is the same (30530733453.127449), in GEM5-FS > > the > >> >> >> >> result is different! In addition, the result is also different > >> >> >> >> between the 2 iterations. > >> >> >> >> > >> >> >> >> Please reproduce the error if you want in order to verify my > > result. > >> >> >> >> Ηow can the issue be resolved? > >> >> >> >> > >> >> >> >> Thank you in advance! > >> >> >> >> > >> >> >> >> Best regards, > >> >> >> >> Nikos > >> >> >> >> > >> >> >> >> > >> >> >> >> Quoting Jason Lowe-Power <jason@lowepower.com>: > >> >> >> >> > >> >> >> >>> Hi Nikos, > >> >> >> >>> > >> >> >> >>> You can use --debug-start to start the debugging after some > > number > >> >> of > >> >> >> >>> ticks. Also, I would expect that the difference should come up > >> >> >> quickly, so > >> >> >> >>> no need to run the program to the end. > >> >> >> >>> > >> >> >> >>> For the FS mode one, you will want to just start the trace as > > the > >> >> >> >>> application starts. This could be a bit of a pain. > >> >> >> >>> > >> >> >> >>> I'm not really sure what fundamentally could be different. FS > > and SE > >> >> >> mode > >> >> >> >>> use the exact same code for executing instructions, so I don't > > think > >> >> >> that's > >> >> >> >>> the problem. Have you tried running for smaller inputs or just > > one > >> >> >> >>> iteration? > >> >> >> >>> > >> >> >> >>> Jason > >> >> >> >>> > >> >> >> >>> > >> >> >> >>> > >> >> >> >>> On Wed, Sep 21, 2022 at 9:04 AM Νικόλαος Ταμπουρατζής < > >> >> >> >>> ntampouratzis@ece.auth.gr> wrote: > >> >> >> >>> > >> >> >> >>>> Dear Bobby, > >> >> >> >>>> > >> >> >> >>>> Iam trying to add --debug-flags=Exec (building the gem5 for > >> >> gem5.opt > >> >> >> >>>> not for gem5.fast which I had) but the debug traces exceed > the > > 20GB > >> >> >> >>>> (and it is not finished yet) for less than 1 simulated > second. > > How > >> >> can > >> >> >> >>>> I reduce the size of the debug-flags (or set something more > >> >> specific)? > >> >> >> >>>> > >> >> >> >>>> In contrast I build the HPCG benchmark with DHPCG_DEBUG flag. > > If > >> >> you > >> >> >> >>>> want, you can compare these two output files > >> >> >> >>>> (hpcg20010909T014640_SE_Mode & HPCG-Benchmark_3.1_FS_Mode). > As > > you > >> >> can > >> >> >> >>>> see, something goes wrong with the accuracy of calculations > in > > FS > >> >> mode > >> >> >> >>>> (benchmark uses double precission). You can find the files > > here: > >> >> >> >>>> http://kition.mhl.tuc.gr:8000/d/68d82f3533/ > >> >> >> >>>> > >> >> >> >>>> Best regards, > >> >> >> >>>> Nikos > >> >> >> >>>> > >> >> >> >>>> Quoting Jason Lowe-Power <jason@lowepower.com>: > >> >> >> >>>> > >> >> >> >>>>> That's quite odd that it works in SE mode but not FS mode! > >> >> >> >>>>> > >> >> >> >>>>> I would suggest running with --debug-flags=Exec for both and > > then > >> >> >> >>>> perform a > >> >> >> >>>>> diff to see how they differ. > >> >> >> >>>>> > >> >> >> >>>>> Cheers, > >> >> >> >>>>> Jason > >> >> >> >>>>> > >> >> >> >>>>> On Tue, Sep 20, 2022 at 2:45 PM Νικόλαος Ταμπουρατζής < > >> >> >> >>>>> ntampouratzis@ece.auth.gr> wrote: > >> >> >> >>>>> > >> >> >> >>>>>> Dear Bobby, > >> >> >> >>>>>> > >> >> >> >>>>>> In QEMU I get the same (correct) results that I get in SE > > mode > >> >> >> >>>>>> simulation. I get invalid results in FS simulation (in both > >> >> >> >>>>>> riscv-fs.py and riscv-ubuntu-run.py). I cannot access real > > RISCV > >> >> >> >>>>>> hardware at this moment, however, if you want you may > > execute my > >> >> >> xhpcg > >> >> >> >>>>>> binary (http://kition.mhl.tuc.gr:8000/f/4ca25fdd3c/) with > the > >> >> >> >>>>>> following configuration: > >> >> >> >>>>>> > >> >> >> >>>>>> ./xhpcg --nx=16 --ny=16 --nz=16 --npx=1 --npy=1 --npz=1 > > --rt=0.1 > >> >> >> >>>>>> > >> >> >> >>>>>> Please let me know if you have any updates! > >> >> >> >>>>>> > >> >> >> >>>>>> Best regards, > >> >> >> >>>>>> Nikos > >> >> >> >>>>>> > >> >> >> >>>>>> > >> >> >> >>>>>> Quoting Jason Lowe-Power <jason@lowepower.com>: > >> >> >> >>>>>> > >> >> >> >>>>>>> Hi Nikos, > >> >> >> >>>>>>> > >> >> >> >>>>>>> I notice you said the following in your original email: > >> >> >> >>>>>>> > >> >> >> >>>>>>> In addition, I used the RISCV Ubuntu image > >> >> >> >>>>>>>> ( > >> >> >> > https://github.com/gem5/gem5-resources/tree/stable/src/riscv-ubuntu > >> >> >> >>>> ), > >> >> >> >>>>>>>> I installed the gcc compiler, compile it (through qemu) > > and I > >> >> get > >> >> >> >>>>>>>> wrong results too. > >> >> >> >>>>>>> > >> >> >> >>>>>>> > >> >> >> >>>>>>> Is this saying you get the wrong results is QEMU? If so, > > the bug > >> >> >> is in > >> >> >> >>>>>> GCC > >> >> >> >>>>>>> or the HPCG workload, not in gem5. If not, I would test in > > QEMU > >> >> to > >> >> >> >>>> make > >> >> >> >>>>>>> sure the binary works there. Another way you could test to > > see > >> >> if > >> >> >> the > >> >> >> >>>>>>> problem is your binary or gem5 would be to run it on real > >> >> >> hardware. We > >> >> >> >>>>>> have > >> >> >> >>>>>>> access to some RISC-V hardware here at UC Davis, if you > > don't > >> >> have > >> >> >> >>>> access > >> >> >> >>>>>>> to it. > >> >> >> >>>>>>> > >> >> >> >>>>>>> Cheers, > >> >> >> >>>>>>> Jason > >> >> >> >>>>>>> > >> >> >> >>>>>>> On Tue, Sep 20, 2022 at 12:58 AM Νικόλαος Ταμπουρατζής < > >> >> >> >>>>>>> ntampouratzis@ece.auth.gr> wrote: > >> >> >> >>>>>>> > >> >> >> >>>>>>>> Dear Bobby, > >> >> >> >>>>>>>> > >> >> >> >>>>>>>> 1) I use the original riscv-fs.py which is provided in > the > >> >> latest > >> >> >> >>>> gem5 > >> >> >> >>>>>>>> release. > >> >> >> >>>>>>>> I run the gem5 once (./build/RISCV/gem5.fast -d > >> >> ./HPCG_FS_results > >> >> >> >>>>>>>> ./configs/example/gem5_library/riscv-fs.py) in order to > >> >> download > >> >> >> the > >> >> >> >>>>>>>> riscv-bootloader-vmlinux-5.10 and riscv-disk-img. > >> >> >> >>>>>>>> After this I mount the riscv-disk-img (sudo mount -o loop > >> >> >> >>>>>>>> riscv-disk-img /mnt), put the xhpcg executable and I do > the > >> >> >> following > >> >> >> >>>>>>>> changes in riscv-fs.py to boot the riscv-disk-img with > >> >> executable: > >> >> >> >>>>>>>> > >> >> >> >>>>>>>> image = CustomDiskImageResource( > >> >> >> >>>>>>>> local_path = > > "/home/cossim/.cache/gem5/riscv-disk-img", > >> >> >> >>>>>>>> ) > >> >> >> >>>>>>>> > >> >> >> >>>>>>>> # Set the Full System workload. > >> >> >> >>>>>>>> board.set_kernel_disk_workload( > >> >> >> >>>>>>>> > >> >> >> kernel=Resource("riscv-bootloader-vmlinux-5.10"), > >> >> >> >>>>>>>> disk_image=image, > >> >> >> >>>>>>>> ) > >> >> >> >>>>>>>> > >> >> >> >>>>>>>> Finally, in the > >> >> >> gem5/src/python/gem5/components/boards/riscv_board.py > >> >> >> >>>>>>>> I change the last line to "return ["console=ttyS0", > >> >> >> >>>>>>>> "root={root_value}", "rw"]" in order to allow the write > >> >> >> permissions > >> >> >> >>>> in > >> >> >> >>>>>>>> the image. > >> >> >> >>>>>>>> > >> >> >> >>>>>>>> > >> >> >> >>>>>>>> 2) The HPCG benchmark after some iterations calculates if > > the > >> >> >> results > >> >> >> >>>>>>>> are valid or not valid. In the case of FS it gives > invalid > >> >> >> results. > >> >> >> >>>> As > >> >> >> >>>>>>>> I see from the results, one (at least) problem is that > > produces > >> >> >> >>>>>>>> different results in each HPCG execution (with the same > >> >> >> >>>> configuration). > >> >> >> >>>>>>>> > >> >> >> >>>>>>>> Here is the HPCG output and riscv-fs.py > >> >> >> >>>>>>>> (http://kition.mhl.tuc.gr:8000/d/68d82f3533/). You may > >> >> reproduce > >> >> >> the > >> >> >> >>>>>>>> results in the video if you use the xhpcg executable > >> >> >> >>>>>>>> (http://kition.mhl.tuc.gr:8000/f/4ca25fdd3c/) > >> >> >> >>>>>>>> > >> >> >> >>>>>>>> Please help me in order to solve it! > >> >> >> >>>>>>>> > >> >> >> >>>>>>>> Finally, I get invalid results in the HPL benchmark in FS > > mode > >> >> >> too. > >> >> >> >>>>>>>> > >> >> >> >>>>>>>> Best regards, > >> >> >> >>>>>>>> Nikos > >> >> >> >>>>>>>> > >> >> >> >>>>>>>> > >> >> >> >>>>>>>> Quoting Bobby Bruce <bbruce@ucdavis.edu>: > >> >> >> >>>>>>>> > >> >> >> >>>>>>>> > I'm going to need a bit more information to help: > >> >> >> >>>>>>>> > > >> >> >> >>>>>>>> > 1. In what way have you modified > >> >> >> >>>>>>>> > ./configs/example/gem5_library/riscv-fs.py? Can you > > attach > >> >> the > >> >> >> >>>> script > >> >> >> >>>>>>>> here? > >> >> >> >>>>>>>> > 2. What error are you getting or in what way are the > > results > >> >> >> >>>> invalid? > >> >> >> >>>>>>>> > > >> >> >> >>>>>>>> > - > >> >> >> >>>>>>>> > Dr. Bobby R. Bruce > >> >> >> >>>>>>>> > Room 3050, > >> >> >> >>>>>>>> > Kemper Hall, UC Davis > >> >> >> >>>>>>>> > Davis, > >> >> >> >>>>>>>> > CA, 95616 > >> >> >> >>>>>>>> > > >> >> >> >>>>>>>> > web: https://www.bobbybruce.net > >> >> >> >>>>>>>> > > >> >> >> >>>>>>>> > > >> >> >> >>>>>>>> > On Mon, Sep 19, 2022 at 1:43 PM Νικόλαος Ταμπουρατζής < > >> >> >> >>>>>>>> > ntampouratzis@ece.auth.gr> wrote: > >> >> >> >>>>>>>> > > >> >> >> >>>>>>>> >> > >> >> >> >>>>>>>> >> Dear gem5 community, > >> >> >> >>>>>>>> >> > >> >> >> >>>>>>>> >> I have successfully cross-compile the HPCG benchmark > for > >> >> RISCV > >> >> >> >>>>>> (Serial > >> >> >> >>>>>>>> >> version, without MPI and OpenMP). While it working > > properly > >> >> in > >> >> >> >>>> gem5 > >> >> >> >>>>>> SE > >> >> >> >>>>>>>> >> mode (./build/RISCV/gem5.fast -d ./HPCG_SE_results > >> >> >> >>>>>>>> >> ./configs/example/se.py -c xhpcg --options '--nx=16 > > --ny=16 > >> >> >> >>>> --nz=16 > >> >> >> >>>>>>>> >> --npx=1 --npy=1 --npz=1 --rt=0.1'), I get invalid > > results > >> >> in FS > >> >> >> >>>>>>>> >> simulation using "./build/RISCV/gem5.fast -d > >> >> ./HPCG_FS_results > >> >> >> >>>>>>>> >> ./configs/example/gem5_library/riscv-fs.py" (I mount > the > >> >> riscv > >> >> >> >>>> image > >> >> >> >>>>>>>> >> and put it). > >> >> >> >>>>>>>> >> > >> >> >> >>>>>>>> >> Can you help me please? > >> >> >> >>>>>>>> >> > >> >> >> >>>>>>>> >> In addition, I used the RISCV Ubuntu image > >> >> >> >>>>>>>> >> ( > >> >> >> >>>> > >> >> https://github.com/gem5/gem5-resources/tree/stable/src/riscv-ubuntu > >> >> >> >>>>>> ), > >> >> >> >>>>>>>> >> I installed the gcc compiler, compile it (through > qemu) > > and > >> >> I > >> >> >> get > >> >> >> >>>>>>>> >> wrong results too. > >> >> >> >>>>>>>> >> > >> >> >> >>>>>>>> >> Here is the Makefile which I use, the hpcg executable > > for > >> >> RISCV > >> >> >> >>>>>>>> >> (xhpcg), and a video that shows the results > >> >> >> >>>>>>>> >> (http://kition.mhl.tuc.gr:8000/f/4ca25fdd3c/). > >> >> >> >>>>>>>> >> > >> >> >> >>>>>>>> >> P.S. I use the latest gem5 version. > >> >> >> >>>>>>>> >> > >> >> >> >>>>>>>> >> Thank you in advance! :) > >> >> >> >>>>>>>> >> > >> >> >> >>>>>>>> >> Best regards, > >> >> >> >>>>>>>> >> Nikos > >> >> >> >>>>>>>> >> _______________________________________________ > >> >> >> >>>>>>>> >> gem5-users mailing list -- gem5-users@gem5.org > >> >> >> >>>>>>>> >> To unsubscribe send an email to > > gem5-users-leave@gem5.org > >> >> >> >>>>>>>> >> > >> >> >> >>>>>>>> > >> >> >> >>>>>>>> > >> >> >> >>>>>>>> _______________________________________________ > >> >> >> >>>>>>>> gem5-users mailing list -- gem5-users@gem5.org > >> >> >> >>>>>>>> To unsubscribe send an email to > gem5-users-leave@gem5.org > >> >> >> >>>>>>>> > >> >> >> >>>>>> > >> >> >> >>>>>> > >> >> >> >>>>>> _______________________________________________ > >> >> >> >>>>>> gem5-users mailing list -- gem5-users@gem5.org > >> >> >> >>>>>> To unsubscribe send an email to gem5-users-leave@gem5.org > >> >> >> >>>>>> > >> >> >> >>>> > >> >> >> >>>> > >> >> >> >>>> _______________________________________________ > >> >> >> >>>> gem5-users mailing list -- gem5-users@gem5.org > >> >> >> >>>> To unsubscribe send an email to gem5-users-leave@gem5.org > >> >> >> >>>> > >> >> >> >> > >> >> >> >> > >> >> >> >> _______________________________________________ > >> >> >> >> gem5-users mailing list -- gem5-users@gem5.org > >> >> >> >> To unsubscribe send an email to gem5-users-leave@gem5.org > >> >> >> > > >> >> >> > > >> >> >> > _______________________________________________ > >> >> >> > gem5-users mailing list -- gem5-users@gem5.org > >> >> >> > To unsubscribe send an email to gem5-users-leave@gem5.org > >> >> >> > >> >> >> > >> >> >> _______________________________________________ > >> >> >> gem5-users mailing list -- gem5-users@gem5.org > >> >> >> To unsubscribe send an email to gem5-users-leave@gem5.org > >> >> >> > >> >> > >> >> > >> >> _______________________________________________ > >> >> gem5-users mailing list -- gem5-users@gem5.org > >> >> To unsubscribe send an email to gem5-users-leave@gem5.org > >> >> > >> > >> > >> _______________________________________________ > >> gem5-users mailing list -- gem5-users@gem5.org > >> To unsubscribe send an email to gem5-users-leave@gem5.org > > > _______________________________________________ > gem5-users mailing list -- gem5-users@gem5.org > To unsubscribe send an email to gem5-users-leave@gem5.org >