[dpdk-dev] [PATCH] eal/ppc: fix secondary process to map hugepages in correct order

Sergio Gonzalez Monroy sergio.gonzalez.monroy at intel.com
Fri May 20 10:01:20 CEST 2016


On 20/05/2016 04:03, Chao Zhu wrote:
> Bruce,
>
> Recently, we find some bugs with mmap in PowerLinux. The mmap doesn't
> respect the address hints. In function get_virtual_area() in eal_memory.c,
> mmap get the free virtual address range as the address hint. However, when
> mapping the real memory in rte_eal_hugepage_init(), mmap doesn't return the
> same address as the requested address. When taking a look at the
> /proc/<pid>/maps, the requested address range is free for use. With this
> bug, pre-allocate some free space doesn't work.

Hi Chao,

If I understand you correctly, the issue you are describing would cause 
DPDK to
fail initialization even with the reverse reordering that you are doing 
for PPC.

Basically (just showing relevant initialization steps):
1. map_all_hugepages(..., orig=1)
     - map all hugepages
2. find physical address for each hugepage
3. sort by physical address
4. map_all_hugepages(..., orig=0)
     - Now we try to get big chunk of virtual address for a block of 
contig hugepages
        so we know we have that virtual address chunk available.
     - Then we try to remap each page of that block of contig pages into 
that
        virtual address chunk.

So the issue you are describing would make step 4 fail regardless of the 
different
ordering that PPC does.
I'm probably missing something, would you care to elaborate?

Sergio


> We're trying to create some test case and report it as a bug to kernel
> community.
>
> Here's some logs:
> ===============================
> EAL: Ask a virtual area of 0x10000000 bytes
> EAL: Virtual area found at 0x3fffa7000000 (size = 0x10000000)
> EAL: map_all_hugepages, /mnt/huge/rtemap_52,paddr 0x3ca6000000  requested
> addr: 0x3fffa7000000  mmaped addr: 0x3efff0000000
> EAL: map_all_hugepages, /mnt/huge/rtemap_53,paddr 0x3ca5000000  requested
> addr: 0x3fffa8000000  mmaped addr: 0x3effef000000
> EAL: map_all_hugepages, /mnt/huge/rtemap_54,paddr 0x3ca4000000  requested
> addr: 0x3fffa9000000  mmaped addr: 0x3effee000000
> EAL: map_all_hugepages, /mnt/huge/rtemap_55,paddr 0x3ca3000000  requested
> addr: 0x3fffaa000000  mmaped addr: 0x3effed000000
> EAL: map_all_hugepages, /mnt/huge/rtemap_56,paddr 0x3ca2000000  requested
> addr: 0x3fffab000000  mmaped addr: 0x3effec000000
> EAL: map_all_hugepages, /mnt/huge/rtemap_57,paddr 0x3ca1000000  requested
> addr: 0x3fffac000000  mmaped addr: 0x3effeb000000
> EAL: map_all_hugepages, /mnt/huge/rtemap_58,paddr 0x3ca0000000  requested
> addr: 0x3fffad000000  mmaped addr: 0x3effea000000
> EAL: map_all_hugepages, /mnt/huge/rtemap_59,paddr 0x3c9f000000  requested
> addr: 0x3fffae000000  mmaped addr: 0x3effe9000000
> EAL: map_all_hugepages, /mnt/huge/rtemap_60,paddr 0x3c9e000000  requested
> addr: 0x3fffaf000000  mmaped addr: 0x3effe8000000
> EAL: map_all_hugepages, /mnt/huge/rtemap_61,paddr 0x3c9d000000  requested
> addr: 0x3fffb0000000  mmaped addr: 0x3effe7000000
> EAL: map_all_hugepages, /mnt/huge/rtemap_62, paddr 0x3c9c000000 requested
> addr:  0x3fffb1000000 mmaped addr:  0x3effe6000000
> EAL: map_all_hugepages, /mnt/huge/rtemap_63, paddr 0x3c9b000000 requested
> addr:  0x3fffb2000000 mmaped addr:  0x3effe5000000
> EAL: map_all_hugepages, /mnt/huge/rtemap_51, paddr 0x3c9a000000 requested
> addr:  0x3fffb3000000 mmaped addr:  0x3effe4000000
> EAL: map_all_hugepages, /mnt/huge/rtemap_50, paddr 0x3c99000000 requested
> addr:  0x3fffb4000000 mmaped addr:  0x3effe3000000
> EAL: map_all_hugepages, /mnt/huge/rtemap_49, paddr 0x3c98000000 requested
> addr:  0x3fffb5000000 mmaped addr:  0x3effe2000000
> EAL: map_all_hugepages, /mnt/huge/rtemap_48, paddr 0x3c97000000 requested
> addr:  0x3fffb6000000 mmaped addr:  0x3effe1000000
>
> # cat /proc/143765/maps
> 01000000-02000000 rw-s 00000000 00:27 61162550
> /mnt/huge/rtemap_14
> 02000000-03000000 rw-s 00000000 00:27 61162536
> /mnt/huge/rtemap_0
> 03000000-04000000 rw-s 00000000 00:27 61162537
> /mnt/huge/rtemap_1
> 04000000-05000000 rw-s 00000000 00:27 61162538
> /mnt/huge/rtemap_2
> 05000000-06000000 rw-s 00000000 00:27 61162539
> /mnt/huge/rtemap_3
> 06000000-07000000 rw-s 00000000 00:27 61162540
> /mnt/huge/rtemap_4
> 07000000-08000000 rw-s 00000000 00:27 61162541
> /mnt/huge/rtemap_5
> 08000000-09000000 rw-s 00000000 00:27 61162542
> /mnt/huge/rtemap_6
> 09000000-0a000000 rw-s 00000000 00:27 61162543
> /mnt/huge/rtemap_7
> 0a000000-0b000000 rw-s 00000000 00:27 61162544
> /mnt/huge/rtemap_8
> 0b000000-0c000000 rw-s 00000000 00:27 61162545
> /mnt/huge/rtemap_9
> 0c000000-0d000000 rw-s 00000000 00:27 61162546
> /mnt/huge/rtemap_10
> 0d000000-0e000000 rw-s 00000000 00:27 61162547
> /mnt/huge/rtemap_11
> 0e000000-0f000000 rw-s 00000000 00:27 61162548
> /mnt/huge/rtemap_12
> 0f000000-10000000 rw-s 00000000 00:27 61162549
> /mnt/huge/rtemap_13
> 10000000-101f0000 r-xp 00000000 08:32 6040458
> /home/dpdk/build/app/test
> 101f0000-10220000 rw-p 001f0000 08:32 6040458
> /home/dpdk/build/app/test
> 10220000-15c20000 rw-p 00000000 00:00 0
> [heap]
> 20000000-21000000 rw-s 00000000 00:27 61162566
> /mnt/huge/rtemap_30
> 21000000-22000000 rw-s 00000000 00:27 61162567
> /mnt/huge/rtemap_31
> 22000000-23000000 rw-s 00000000 00:27 61162568
> /mnt/huge/rtemap_32
> 23000000-24000000 rw-s 00000000 00:27 61162569
> /mnt/huge/rtemap_33
> 24000000-25000000 rw-s 00000000 00:27 61162570
> /mnt/huge/rtemap_34
> 25000000-26000000 rw-s 00000000 00:27 61162571
> /mnt/huge/rtemap_35
> 26000000-27000000 rw-s 00000000 00:27 61162572
> /mnt/huge/rtemap_36
> 27000000-28000000 rw-s 00000000 00:27 61162573
> /mnt/huge/rtemap_37
> 28000000-29000000 rw-s 00000000 00:27 61162574
> /mnt/huge/rtemap_38
> 29000000-2a000000 rw-s 00000000 00:27 61162575
> /mnt/huge/rtemap_39
> 2a000000-2b000000 rw-s 00000000 00:27 61162576
> /mnt/huge/rtemap_40
> 2b000000-2c000000 rw-s 00000000 00:27 61162577
> /mnt/huge/rtemap_41
> 2c000000-2d000000 rw-s 00000000 00:27 61162578
> /mnt/huge/rtemap_42
> 2d000000-2e000000 rw-s 00000000 00:27 61162579
> /mnt/huge/rtemap_43
> 2e000000-2f000000 rw-s 00000000 00:27 61162580
> /mnt/huge/rtemap_44
> 2f000000-30000000 rw-s 00000000 00:27 61162581
> /mnt/huge/rtemap_45
> 30000000-31000000 rw-s 00000000 00:27 61162582
> /mnt/huge/rtemap_46
> 31000000-32000000 rw-s 00000000 00:27 61162583
> /mnt/huge/rtemap_47
> 32000000-33000000 rw-s 00000000 00:27 61162584
> /mnt/huge/rtemap_48
> 33000000-34000000 rw-s 00000000 00:27 61162585
> /mnt/huge/rtemap_49
> 34000000-35000000 rw-s 00000000 00:27 61162586
> /mnt/huge/rtemap_50
> 35000000-36000000 rw-s 00000000 00:27 61162587
> /mnt/huge/rtemap_51
> 36000000-37000000 rw-s 00000000 00:27 61162588
> /mnt/huge/rtemap_52
> 37000000-38000000 rw-s 00000000 00:27 61162589
> /mnt/huge/rtemap_53
> 38000000-39000000 rw-s 00000000 00:27 61162590
> /mnt/huge/rtemap_54
> 39000000-3a000000 rw-s 00000000 00:27 61162591
> /mnt/huge/rtemap_55
> 3a000000-3b000000 rw-s 00000000 00:27 61162592
> /mnt/huge/rtemap_56
> 3b000000-3c000000 rw-s 00000000 00:27 61162593
> /mnt/huge/rtemap_57
> 3c000000-3d000000 rw-s 00000000 00:27 61162594
> /mnt/huge/rtemap_58
> 3d000000-3e000000 rw-s 00000000 00:27 61162595
> /mnt/huge/rtemap_59
> 3e000000-3f000000 rw-s 00000000 00:27 61162596
> /mnt/huge/rtemap_60
> 3f000000-40000000 rw-s 00000000 00:27 61162597
> /mnt/huge/rtemap_61
> 40000000-41000000 rw-s 00000000 00:27 61162598
> /mnt/huge/rtemap_62
> 41000000-42000000 rw-s 00000000 00:27 61162599
> /mnt/huge/rtemap_63
> 3effb1000000-3effb2000000 rw-s 00000000 00:27 61162541
> /mnt/huge/rtemap_5
> 3effb2000000-3effb3000000 rw-s 00000000 00:27 61162540
> /mnt/huge/rtemap_4
> 3effb3000000-3effb4000000 rw-s 00000000 00:27 61162551
> /mnt/huge/rtemap_15
> 3effb4000000-3effb5000000 rw-s 00000000 00:27 61162538
> /mnt/huge/rtemap_2
> 3effb5000000-3effb6000000 rw-s 00000000 00:27 61162549
> /mnt/huge/rtemap_13
> 3effb6000000-3effb7000000 rw-s 00000000 00:27 61162544
> /mnt/huge/rtemap_8
> 3effb7000000-3effb8000000 rw-s 00000000 00:27 61162543
> /mnt/huge/rtemap_7
> 3effb8000000-3effb9000000 rw-s 00000000 00:27 61162548
> /mnt/huge/rtemap_12
> 3effb9000000-3effba000000 rw-s 00000000 00:27 61162537
> /mnt/huge/rtemap_1
> 3effba000000-3effbb000000 rw-s 00000000 00:27 61162550
> /mnt/huge/rtemap_14
> 3effbb000000-3effbc000000 rw-s 00000000 00:27 61162545
> /mnt/huge/rtemap_9
> 3effbc000000-3effbd000000 rw-s 00000000 00:27 61162546
> /mnt/huge/rtemap_10
> 3effbd000000-3effbe000000 rw-s 00000000 00:27 61162547
> /mnt/huge/rtemap_11
> 3effbe000000-3effbf000000 rw-s 00000000 00:27 61162539
> /mnt/huge/rtemap_3
> 3effbf000000-3effc0000000 rw-s 00000000 00:27 61162542
> /mnt/huge/rtemap_6
> 3effc0000000-3effc1000000 rw-s 00000000 00:27 61162536
> /mnt/huge/rtemap_0
> 3effc1000000-3effc2000000 rw-s 00000000 00:27 61162556
> /mnt/huge/rtemap_20
> 3effc2000000-3effc3000000 rw-s 00000000 00:27 61162552
> /mnt/huge/rtemap_16
> 3effc3000000-3effc4000000 rw-s 00000000 00:27 61162553
> /mnt/huge/rtemap_17
> 3effc4000000-3effc5000000 rw-s 00000000 00:27 61162554
> /mnt/huge/rtemap_18
> 3effc5000000-3effc6000000 rw-s 00000000 00:27 61162555
> /mnt/huge/rtemap_19
> 3effc6000000-3effc7000000 rw-s 00000000 00:27 61162567
> /mnt/huge/rtemap_31
> 3effc7000000-3effc8000000 rw-s 00000000 00:27 61162566
> /mnt/huge/rtemap_30
> 3effc8000000-3effc9000000 rw-s 00000000 00:27 61162558
> /mnt/huge/rtemap_22
> 3effc9000000-3effca000000 rw-s 00000000 00:27 61162557
> /mnt/huge/rtemap_21
> 3effca000000-3effcb000000 rw-s 00000000 00:27 61162560
> /mnt/huge/rtemap_24
> 3effcb000000-3effcc000000 rw-s 00000000 00:27 61162561
> /mnt/huge/rtemap_25
> 3effcc000000-3effcd000000 rw-s 00000000 00:27 61162564
> /mnt/huge/rtemap_28
> 3effcd000000-3effce000000 rw-s 00000000 00:27 61162559
> /mnt/huge/rtemap_23
> 3effce000000-3effcf000000 rw-s 00000000 00:27 61162563
> /mnt/huge/rtemap_27
> 3effcf000000-3effd0000000 rw-s 00000000 00:27 61162562
> /mnt/huge/rtemap_26
> 3effd0000000-3effd1000000 rw-s 00000000 00:27 61162565
> /mnt/huge/rtemap_29
> 3effd1000000-3effd2000000 rw-s 00000000 00:27 61162572
> /mnt/huge/rtemap_36
> 3effd2000000-3effd3000000 rw-s 00000000 00:27 61162568
> /mnt/huge/rtemap_32
> 3effd3000000-3effd4000000 rw-s 00000000 00:27 61162569
> /mnt/huge/rtemap_33
> 3effd4000000-3effd5000000 rw-s 00000000 00:27 61162570
> /mnt/huge/rtemap_34
> 3effd5000000-3effd6000000 rw-s 00000000 00:27 61162571
> /mnt/huge/rtemap_35
> 3effd6000000-3effd7000000 rw-s 00000000 00:27 61162583
> /mnt/huge/rtemap_47
> 3effd7000000-3effd8000000 rw-s 00000000 00:27 61162582
> /mnt/huge/rtemap_46
> 3effd8000000-3effd9000000 rw-s 00000000 00:27 61162581
> /mnt/huge/rtemap_45
> 3effd9000000-3effda000000 rw-s 00000000 00:27 61162580
> /mnt/huge/rtemap_44
> 3effda000000-3effdb000000 rw-s 00000000 00:27 61162579
> /mnt/huge/rtemap_43
> 3effdb000000-3effdc000000 rw-s 00000000 00:27 61162578
> /mnt/huge/rtemap_42
> 3effdc000000-3effdd000000 rw-s 00000000 00:27 61162577
> /mnt/huge/rtemap_41
> 3effdd000000-3effde000000 rw-s 00000000 00:27 61162574
> /mnt/huge/rtemap_38
> 3effde000000-3effdf000000 rw-s 00000000 00:27 61162573
> /mnt/huge/rtemap_37
> 3effdf000000-3effe0000000 rw-s 00000000 00:27 61162575
> /mnt/huge/rtemap_39
> 3effe0000000-3effe1000000 rw-s 00000000 00:27 61162576
> /mnt/huge/rtemap_40
> 3effe1000000-3effe2000000 rw-s 00000000 00:27 61162584
> /mnt/huge/rtemap_48
> 3effe2000000-3effe3000000 rw-s 00000000 00:27 61162585
> /mnt/huge/rtemap_49
> 3effe3000000-3effe4000000 rw-s 00000000 00:27 61162586
> /mnt/huge/rtemap_50
> 3effe4000000-3effe5000000 rw-s 00000000 00:27 61162587
> /mnt/huge/rtemap_51
> 3effe5000000-3effe6000000 rw-s 00000000 00:27 61162599
> /mnt/huge/rtemap_63
> 3effe6000000-3effe7000000 rw-s 00000000 00:27 61162598
> /mnt/huge/rtemap_62
> 3effe7000000-3effe8000000 rw-s 00000000 00:27 61162597
> /mnt/huge/rtemap_61
> 3effe8000000-3effe9000000 rw-s 00000000 00:27 61162596
> /mnt/huge/rtemap_60
> 3effe9000000-3effea000000 rw-s 00000000 00:27 61162595
> /mnt/huge/rtemap_59
> 3effea000000-3effeb000000 rw-s 00000000 00:27 61162594
> /mnt/huge/rtemap_58
> 3effeb000000-3effec000000 rw-s 00000000 00:27 61162593
> /mnt/huge/rtemap_57
> 3effec000000-3effed000000 rw-s 00000000 00:27 61162592
> /mnt/huge/rtemap_56
> 3effed000000-3effee000000 rw-s 00000000 00:27 61162591
> /mnt/huge/rtemap_55
> 3effee000000-3effef000000 rw-s 00000000 00:27 61162590
> /mnt/huge/rtemap_54
> 3effef000000-3efff0000000 rw-s 00000000 00:27 61162589
> /mnt/huge/rtemap_53
> 3efff0000000-3efff1000000 rw-s 00000000 00:27 61162588
> /mnt/huge/rtemap_52
> 3efff1000000-3efff2000000 rw-s 00000000 00:27 61162565
> /mnt/huge/rtemap_29
> 3efff2000000-3efff3000000 rw-s 00000000 00:27 61162564
> /mnt/huge/rtemap_28
> 3efff3000000-3efff4000000 rw-s 00000000 00:27 61162563
> /mnt/huge/rtemap_27
> 3efff4000000-3efff5000000 rw-s 00000000 00:27 61162562
> /mnt/huge/rtemap_26
> 3efff5000000-3efff6000000 rw-s 00000000 00:27 61162561
> /mnt/huge/rtemap_25
> 3efff6000000-3efff7000000 rw-s 00000000 00:27 61162560
> /mnt/huge/rtemap_24
> 3efff7000000-3efff8000000 rw-s 00000000 00:27 61162559
> /mnt/huge/rtemap_23
> 3efff8000000-3efff9000000 rw-s 00000000 00:27 61162558
> /mnt/huge/rtemap_22
> 3efff9000000-3efffa000000 rw-s 00000000 00:27 61162557
> /mnt/huge/rtemap_21
> 3efffa000000-3efffb000000 rw-s 00000000 00:27 61162556
> /mnt/huge/rtemap_20
> 3efffb000000-3efffc000000 rw-s 00000000 00:27 61162555
> /mnt/huge/rtemap_19
> 3efffc000000-3efffd000000 rw-s 00000000 00:27 61162554
> /mnt/huge/rtemap_18
> 3efffd000000-3efffe000000 rw-s 00000000 00:27 61162553
> /mnt/huge/rtemap_17
> 3efffe000000-3effff000000 rw-s 00000000 00:27 61162552
> /mnt/huge/rtemap_16
> 3effff000000-3f0000000000 rw-s 00000000 00:27 61162551
> /mnt/huge/rtemap_15
> 3fffb7bc0000-3fffb7c10000 rw-p 00000000 00:00 0
> 3fffb7c10000-3fffb7c50000 rw-s 00000000 00:12 3926240
> /run/.rte_config
> 3fffb7c50000-3fffb7c70000 rw-p 00000000 00:00 0
> 3fffb7c70000-3fffb7e20000 r-xp 00000000 08:32 7090531
> /opt/at7.1/lib64/power8/libc-2.19.so
> 3fffb7e20000-3fffb7e30000 rw-p 001a0000 08:32 7090531
> /opt/at7.1/lib64/power8/libc-2.19.so
> 3fffb7e30000-3fffb7e50000 rw-p 00000000 00:00 0
> 3fffb7e50000-3fffb7e70000 r-xp 00000000 08:32 7090563
> /opt/at7.1/lib64/power8/libpthread-2.19.so
> 3fffb7e70000-3fffb7e80000 rw-p 00010000 08:32 7090563
> /opt/at7.1/lib64/power8/libpthread-2.19.so
> 3fffb7e80000-3fffb7e90000 r-xp 00000000 08:32 7090210
> /opt/at7.1/lib64/libdl-2.19.so
> 3fffb7e90000-3fffb7ea0000 rw-p 00000000 08:32 7090210
> /opt/at7.1/lib64/libdl-2.19.so
> 3fffb7ea0000-3fffb7ec0000 r-xp 00000000 08:32 7090533
> /opt/at7.1/lib64/power8/libz.so.1.2.6
> 3fffb7ec0000-3fffb7ed0000 rw-p 00010000 08:32 7090533
> /opt/at7.1/lib64/power8/libz.so.1.2.6
> 3fffb7ed0000-3fffb7f90000 r-xp 00000000 08:32 7090568
> /opt/at7.1/lib64/power8/libm-2.19.so
> 3fffb7f90000-3fffb7fa0000 rw-p 000b0000 08:32 7090568
> /opt/at7.1/lib64/power8/libm-2.19.so
> 3fffb7fa0000-3fffb7fc0000 r-xp 00000000 00:00 0
> [vdso]
> 3fffb7fc0000-3fffb7ff0000 r-xp 00000000 08:32 7090048
> /opt/at7.1/lib64/ld-2.19.so
> 3fffb7ff0000-3fffb8000000 rw-p 00020000 08:32 7090048
> /opt/at7.1/lib64/ld-2.19.so
> 3ffffffd0000-400000000000 rw-p 00000000 00:00 0
> [stack]
>
>
> -----Original Message-----
> From: Bruce Richardson [mailto:bruce.richardson at intel.com]
> Sent: 2016年3月23日 1:11
> To: Sergio Gonzalez Monroy <sergio.gonzalez.monroy at intel.com>
> Cc: Gowrishankar <gowrishankar.m at linux.vnet.ibm.com>; dev at dpdk.org;
> chaozhu at linux.vnet.ibm.com; David Marchand <david.marchand at 6wind.com>
> Subject: Re: [dpdk-dev] [PATCH] eal/ppc: fix secondary process to map
> hugepages in correct order
>
> On Tue, Mar 22, 2016 at 04:35:32PM +0000, Sergio Gonzalez Monroy wrote:
>> First of all, forgive my ignorance regarding ppc64 and if the
>> questions are naive but after having a look to the already existing
>> code for ppc64 and this patch now, why are we doing this reverse
>> mapping at all?
>>
>> I guess the question revolves around the comment in eal_memory.c:
>> 1316                 /* On PPC64 architecture, the mmap always start from
>> higher
>> 1317                  * virtual address to lower address. Here, both the
>> physical
>> 1318                  * address and virtual address are in descending
> order
>> */
>>
>>  From looking at the code, for ppc64 we do qsort in reverse order and
>> thereafter everything looks to be is done to account for that reverse
>> sorting.
>>
>> CC: Chao Zhu and David Marchand as original author and reviewer of the
> code.
>> Sergio
>>
> Just to add my 2c here. At one point, with I believe some i686 installs -
> don't remember the specific OS/kernel, we found that the mmap calls were
> returning the highest free address first and then working downwards - must
> like seems to be described here. To fix this we changed the mmap code from
> assuming that addresses are mapped upwards, to instead explicitly requesting
> a large free block of memory (mmap of /dev/zero) to find a free address
> space range of the correct size, and then explicitly mmapping each
> individual page to the appropriate place in that free range. With this
> scheme it didn't matter whether the OS tried to mmap the pages from the
> highest or lowest address because we always told the OS where to put the
> page (and we knew the slot was free from the earlier block mmap).
> Would this scheme not also work for PPC in a similar way? (Again, forgive
> unfamiliarity with PPC! :-) )
>
> /Bruce
>
>> On 07/03/2016 14:13, Gowrishankar wrote:
>>> From: Gowri Shankar <gowrishankar.m at linux.vnet.ibm.com>
>>>
>>> For a secondary process address space to map hugepages from every
>>> segment of primary process, hugepage_file entries has to be mapped
>>> reversely from the list that primary process updated for every
>>> segment. This is for a reason that, in ppc64, hugepages are sorted for
> decrementing addresses.
>>> Signed-off-by: Gowrishankar <gowrishankar.m at linux.vnet.ibm.com>
>>> ---



More information about the dev mailing list