gem5-users@gem5.org

The gem5 Users mailing list

View all threads

Bypassing L2 Cache for DMA memory requests

ZG
Zehan Gao
Thu, Jul 7, 2022 9:17 PM

Hi all,

I'm working on a project about the memory interference created by DMA memory requests and I want to bypass the L2 cache for memory requests from DMA.
I use the SimpleSSD to simulate an NVMe SSD that access the memory by DMA, and use the following command to start gem5:

./build/ARM/gem5.fast ./configs/example/fs.py --fast-forward=300000000 --kernel=vmlinux --dtb-file=armv8_gem5_v1_4cpu.dtb --machine-type=VExpress_GEM5_V1 --num-cpu=4 --cpu-clock=3GHz --caches --l2cache --cpu-type=TimingSimpleCPU --mem-size=4GB --mem-type=LPDDR2_S4_1066_1x32 --ssd-interface=nvme --ssd-config=./src/dev/storage/simplessd/config/linaro.cfg --disk-image=linaro-aarch64-linux.img --root-device=/dev/nvme0n1p1 --disable-ide

I observed that the DMA traffic goes through the L2 cache, and I want to remove the L2 cache from the path of DMA Packets. The problem seems to be common but unfortunately I can’t find answers from searching.

I notice that the DMA traffic first received by the IOCache. So I tried to remove the IOCache in fs.py and add the memory address range in IOBridge.

    gicv2m_range = AddrRange(0x2c1c0000, 0x2c1d0000 - 1)

    if options.caches or options.l2cache:
        # By default the IOCache runs at the system clock
        # test_sys.iocache = IOCache(addr_ranges = test_sys.mem_ranges)
        # test_sys.iocache.cpu_side = test_sys.iobus.master
        # test_sys.iocache.mem_side = test_sys.membus.slave


        if buildEnv['TARGET_ISA'] in "arm":
            if options.machine_type == "VExpress_GEM5_V1":
                test_sys.iobridge = Bridge( #delay='50ns',
                                           ranges=test_sys.mem_ranges + [gicv2m_range])

                test_sys.iobridge.slave = test_sys.iobus.master
                test_sys.iobridge.master = test_sys.membus.slave

I receive the following error after a memory request from NVMe:

panic: panic condition !invalidate && !pkt->hasSharers() occurred: system.cpu0.dcache is passing a Modified line through WriteReq [17b3de000:17b3de03f] ES, but keeping the block

Removing the IOCache is probably not the proper way to avoid L2 cache in DMA memory access. Any advice on how to achieve that?

Thanks,
Zehan

Hi all, I'm working on a project about the memory interference created by DMA memory requests and I want to bypass the L2 cache for memory requests from DMA. I use the SimpleSSD to simulate an NVMe SSD that access the memory by DMA, and use the following command to start gem5: ./build/ARM/gem5.fast ./configs/example/fs.py --fast-forward=300000000 --kernel=vmlinux --dtb-file=armv8_gem5_v1_4cpu.dtb --machine-type=VExpress_GEM5_V1 --num-cpu=4 --cpu-clock=3GHz --caches --l2cache --cpu-type=TimingSimpleCPU --mem-size=4GB --mem-type=LPDDR2_S4_1066_1x32 --ssd-interface=nvme --ssd-config=./src/dev/storage/simplessd/config/linaro.cfg --disk-image=linaro-aarch64-linux.img --root-device=/dev/nvme0n1p1 --disable-ide I observed that the DMA traffic goes through the L2 cache, and I want to remove the L2 cache from the path of DMA Packets. The problem seems to be common but unfortunately I can’t find answers from searching. I notice that the DMA traffic first received by the IOCache. So I tried to remove the IOCache in fs.py and add the memory address range in IOBridge. gicv2m_range = AddrRange(0x2c1c0000, 0x2c1d0000 - 1) if options.caches or options.l2cache: # By default the IOCache runs at the system clock # test_sys.iocache = IOCache(addr_ranges = test_sys.mem_ranges) # test_sys.iocache.cpu_side = test_sys.iobus.master # test_sys.iocache.mem_side = test_sys.membus.slave if buildEnv['TARGET_ISA'] in "arm": if options.machine_type == "VExpress_GEM5_V1": test_sys.iobridge = Bridge( #delay='50ns', ranges=test_sys.mem_ranges + [gicv2m_range]) test_sys.iobridge.slave = test_sys.iobus.master test_sys.iobridge.master = test_sys.membus.slave I receive the following error after a memory request from NVMe: panic: panic condition !invalidate && !pkt->hasSharers() occurred: system.cpu0.dcache is passing a Modified line through WriteReq [17b3de000:17b3de03f] ES, but keeping the block Removing the IOCache is probably not the proper way to avoid L2 cache in DMA memory access. Any advice on how to achieve that? Thanks, Zehan
ZG
Zehan Gao
Mon, Jul 11, 2022 5:52 AM

After some research, I found the following things related to the problem.

In ARM architecture, the cache coherency can be managed by hardware.  This would incur an overhead depending on the design, and in gem5 simulation the overhead is observed. The reference is https://developer.arm.com/documentation/den0024/a/Memory-Ordering/Memory-attributes/Cacheable-and-shareable-memory-attributes

In gem5, the IOCache is introduced to support the cache coherency implemented with snoop requests in the classic memory model. Therefore, removing the IOCache would cause problems with coherency. This is mentioned in previous post https://www.mail-archive.com/gem5-users@gem5.org/msg08779.html

In conclusion, the DMA memory requests doesn’t go through the L2 cache. Instead, it goes into the IOCache and the IOCache would generate snoop requests which would be processed by L1 and L2 cache. Eliminating such feature requires modification of gem5 hardware and kernel, and it doesn’t reflect typical ARM systems.

Correct me if I made anything wrong.

From: Zehan Gaomailto:z99gao@uwaterloo.ca
Sent: Thursday, July 7, 2022 5:19 PM
To: Zehan Gao via gem5-usersmailto:gem5-users@gem5.org
Subject: [gem5-users] Bypassing L2 Cache for DMA memory requests

Hi all,

I'm working on a project about the memory interference created by DMA memory requests and I want to bypass the L2 cache for memory requests from DMA.
I use the SimpleSSD to simulate an NVMe SSD that access the memory by DMA, and use the following command to start gem5:

./build/ARM/gem5.fast ./configs/example/fs.py --fast-forward=300000000 --kernel=vmlinux --dtb-file=armv8_gem5_v1_4cpu.dtb --machine-type=VExpress_GEM5_V1 --num-cpu=4 --cpu-clock=3GHz --caches --l2cache --cpu-type=TimingSimpleCPU --mem-size=4GB --mem-type=LPDDR2_S4_1066_1x32 --ssd-interface=nvme --ssd-config=./src/dev/storage/simplessd/config/linaro.cfg --disk-image=linaro-aarch64-linux.img --root-device=/dev/nvme0n1p1 --disable-ide

I observed that the DMA traffic goes through the L2 cache, and I want to remove the L2 cache from the path of DMA Packets. The problem seems to be common but unfortunately I can’t find answers from searching.

I notice that the DMA traffic first received by the IOCache. So I tried to remove the IOCache in fs.py and add the memory address range in IOBridge.

    gicv2m_range = AddrRange(0x2c1c0000, 0x2c1d0000 - 1)

    if options.caches or options.l2cache:
        # By default the IOCache runs at the system clock
        # test_sys.iocache = IOCache(addr_ranges = test_sys.mem_ranges)
        # test_sys.iocache.cpu_side = test_sys.iobus.master
        # test_sys.iocache.mem_side = test_sys.membus.slave


        if buildEnv['TARGET_ISA'] in "arm":
            if options.machine_type == "VExpress_GEM5_V1":
                test_sys.iobridge = Bridge( #delay='50ns',
                                           ranges=test_sys.mem_ranges + [gicv2m_range])

                test_sys.iobridge.slave = test_sys.iobus.master
                test_sys.iobridge.master = test_sys.membus.slave

I receive the following error after a memory request from NVMe:

panic: panic condition !invalidate && !pkt->hasSharers() occurred: system.cpu0.dcache is passing a Modified line through WriteReq [17b3de000:17b3de03f] ES, but keeping the block

Removing the IOCache is probably not the proper way to avoid L2 cache in DMA memory access. Any advice on how to achieve that?

Thanks,
Zehan

After some research, I found the following things related to the problem. In ARM architecture, the cache coherency can be managed by hardware. This would incur an overhead depending on the design, and in gem5 simulation the overhead is observed. The reference is https://developer.arm.com/documentation/den0024/a/Memory-Ordering/Memory-attributes/Cacheable-and-shareable-memory-attributes In gem5, the IOCache is introduced to support the cache coherency implemented with snoop requests in the classic memory model. Therefore, removing the IOCache would cause problems with coherency. This is mentioned in previous post https://www.mail-archive.com/gem5-users@gem5.org/msg08779.html In conclusion, the DMA memory requests doesn’t go through the L2 cache. Instead, it goes into the IOCache and the IOCache would generate snoop requests which would be processed by L1 and L2 cache. Eliminating such feature requires modification of gem5 hardware and kernel, and it doesn’t reflect typical ARM systems. Correct me if I made anything wrong. From: Zehan Gao<mailto:z99gao@uwaterloo.ca> Sent: Thursday, July 7, 2022 5:19 PM To: Zehan Gao via gem5-users<mailto:gem5-users@gem5.org> Subject: [gem5-users] Bypassing L2 Cache for DMA memory requests Hi all, I'm working on a project about the memory interference created by DMA memory requests and I want to bypass the L2 cache for memory requests from DMA. I use the SimpleSSD to simulate an NVMe SSD that access the memory by DMA, and use the following command to start gem5: ./build/ARM/gem5.fast ./configs/example/fs.py --fast-forward=300000000 --kernel=vmlinux --dtb-file=armv8_gem5_v1_4cpu.dtb --machine-type=VExpress_GEM5_V1 --num-cpu=4 --cpu-clock=3GHz --caches --l2cache --cpu-type=TimingSimpleCPU --mem-size=4GB --mem-type=LPDDR2_S4_1066_1x32 --ssd-interface=nvme --ssd-config=./src/dev/storage/simplessd/config/linaro.cfg --disk-image=linaro-aarch64-linux.img --root-device=/dev/nvme0n1p1 --disable-ide I observed that the DMA traffic goes through the L2 cache, and I want to remove the L2 cache from the path of DMA Packets. The problem seems to be common but unfortunately I can’t find answers from searching. I notice that the DMA traffic first received by the IOCache. So I tried to remove the IOCache in fs.py and add the memory address range in IOBridge. gicv2m_range = AddrRange(0x2c1c0000, 0x2c1d0000 - 1) if options.caches or options.l2cache: # By default the IOCache runs at the system clock # test_sys.iocache = IOCache(addr_ranges = test_sys.mem_ranges) # test_sys.iocache.cpu_side = test_sys.iobus.master # test_sys.iocache.mem_side = test_sys.membus.slave if buildEnv['TARGET_ISA'] in "arm": if options.machine_type == "VExpress_GEM5_V1": test_sys.iobridge = Bridge( #delay='50ns', ranges=test_sys.mem_ranges + [gicv2m_range]) test_sys.iobridge.slave = test_sys.iobus.master test_sys.iobridge.master = test_sys.membus.slave I receive the following error after a memory request from NVMe: panic: panic condition !invalidate && !pkt->hasSharers() occurred: system.cpu0.dcache is passing a Modified line through WriteReq [17b3de000:17b3de03f] ES, but keeping the block Removing the IOCache is probably not the proper way to avoid L2 cache in DMA memory access. Any advice on how to achieve that? Thanks, Zehan