Btw, I have also tried that if I don't write checkpoint, just use KVM CPU to boot and then run lulesh completely, it can be successful:
/mydata/gem5/build/X86/gem5.opt --outdir=m5out_p16 -r -e fs.py --disk-image=./x86-64-system/disks/base.img --kernel=./x86-64-system/binaries/vmlinux-5.4.49 --num-cpus=16 --cpu-type=X86KvmCPU --mem-size=8GB --script=./test.rcS
test.rcS:
/bin/lulesh2.0 -p -s 100
m5 exit
The lulesh2.0 can be run completely according to the output. And gem5 exits naturally.
Original
From:"YongjieHuang via gem5-users"< gem5-users@gem5.org >;
Date:2024/8/16 15:57
To:"gem5-users"< gem5-users@gem5.org >;
CC:"YongjieHuang"< 876167080@qq.com >;
Subject:[gem5-users] BUG: kernel NULL pointer dereference occurs when restoring a checkpoint generated by KVM core in FS mode
Dear all,
I want to use KVM core to write checkpoints and use O3 core to restore the checkpoints. But I meet a kernel BUG.
My Gem5 is V23.0.0.1. The image and kernel were downloaded from https://www.gem5.org/project/2020/03/09/boot-tests.html.
The kernel is 'vmlinux-5.4.49' and the image is 'disk image (GZIPPED)'.
I used X86KvmCPU to write a checkpoint during the time when lulesh2.0 is running with openMP. Below is the script for booting the system and write a checkpoint when hitting 9 Billion instructions.
/mydata/gem5/build/X86/gem5.opt --outdir=m5out_p16 -r -e fs.py --disk-image=./x86-64-system/disks/disk.img --kernel=./x86-64-system/binaries/vmlinux-5.4.49 --num-cpus=16 --cpu-type=X86KvmCPU --mem-size=8GB --checkpoint-dir=ckptest --at-instruction --take-checkpoints 9000000000 --script=./test.rcS
test.rcS is the script for running lulesh2.0 which is already located in /bin of the disk image manually by sudo mount :
/bin/lulesh2.0 -p -s 100
m5 exit
However, when I use O3 cpu to restore the checkpoint written above with the command line below, I can see a kernel BUG in system.pc.com_1.device file instead of seeing the lulesh process continuing.
command line: /mydata/gem5/build/X86/gem5.opt --outdir=m5out_p16 -r -e fs.py --disk-image=./x86-64-system/disks/disk.img --kernel=./x86-64-system/binaries/vmlinux-5.4.49 --num-cpus=16 --cpu-type=X86O3CPU --caches --cpu-clock=2.4GHz --l1i_size=32kB --l1i_assoc=8 --l1d_size=64kB --l1d_assoc=8 --l2cache --l2_size=1MB --l2_assoc=16 --l3cache --l3_size=16MB --l3_assoc=16 --mem-size=8GB --checkpoint-dir=ckptest -r 1
BUG: kernel NULL pointer dereference, address: 0000000000000040#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: 0000 [#1] SMP NOPTI
CPU: 0 PID: 5 Comm: kworker/0:0 Not tainted 5.4.49 #8
Hardware name: , BIOS 06/08/2008
Workqueue: 0x0 (events)
RIP: 0010:set_next_entity+0x9/0x65
Code: 48 89 df 5b 5d 41 5c e9 fb a0 ff ff 59 48 89 df 31 d2 5b 5d 41 5c e9 35 a4 ff ff 58 5b 5d 41 5c c3 41 55 41 54 55 53 48 89 fd <83> 7e 40 00 48 89 f3 74 35 4c 8d 66 18 4c 3b 67 40 4c 8d 6f 38 75
RSP: 0018:ffffc90000037e30 EFLAGS: 0000006e
RAX: 0000000000000000 RBX: ffff888238a26440 RCX: ffffffff81a19d80
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff888238a26480
RBP: ffff888238a26480 R08: 00000003e5663c00 R09: 00000000000000ff
R10: 00000000fffbad80 R11: 0000000000000800 R12: 0000000000000000
R13: ffff888238a26480 R14: ffff8882379749b0 R15: 0000000000000000
FS: 00007faa47824700(0000) GS:ffff888238a00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000000000b0 CR3: 00000002358e8000 CR4: 00000000000006f0
Call Trace:
pick_next_task_fair+0xe5/0x18c
__schedule+0x1e3/0x40a
? do_raw_spin_lock+0x2b/0x52
? create_worker+0x16a/0x16a
schedule+0x75/0x9f
worker_thread+0x1e7/0x22f
kthread+0xf0/0xf5
? kthread_destroy_worker+0x39/0x39
ret_from_fork+0x22/0x40
Modules linked in:
CR2: 0000000000000040
---[ end trace 25f0872c331972c4 ]---
BUG: kernel NULL pointer dereference, address: 0000000000000040
RIP: 0010:set_next_entity+0x9/0x65
In addition, I am sure in the checkpoinit generating process, the lulesh2.0 is running successfully in the guest system accoding to the ouput of -p parameter of lulesh.
Can anyone tell me what should I do ?
I really appreciate your helps!
Best,
Yongjie