[dpdk-users] Larger number of hugepages causes bus error.
Sushil Adhikari
sushil446 at gmail.com
Thu Feb 23 18:02:59 CET 2017
Thank you Keith and Monroy, with your help I was able to track down the
problem, My var/run was too small to hold the hugepage information so when
I increased its size, it worked. Thank you so much.
On Thu, Feb 23, 2017 at 10:35 AM, Sergio Gonzalez Monroy <
sergio.gonzalez.monroy at intel.com> wrote:
> As Keith suggested, gdb is probably your best bet now.
> You could also do 'strace' to see if something shows up there.
>
> If you are running as root, the application is opening a file in /var/run
> to store some hugepage information, then it memsets to 0.
>
> What distro and kernel are you running on?
>
>
>
> On 23/02/2017 16:19, Sushil Adhikari wrote:
>
>> I didn't understand what you mean by hugepage value, if you mean number of
>> hugepages here's what it looks like
>> [~]$ grep -ri hugepages /proc/meminfo
>> AnonHugePages: 0 kB
>> HugePages_Total: 512
>> HugePages_Free: 512
>> HugePages_Rsvd: 0
>> HugePages_Surp: 0
>> Hugepagesize: 2048 kB
>>
>> And the linux version is 4.4.20.
>>
>> On Thu, Feb 23, 2017 at 9:17 AM, Wiles, Keith <keith.wiles at intel.com>
>> wrote:
>>
>> On Feb 22, 2017, at 7:18 PM, Sushil Adhikari <sushil446 at gmail.com>
>>>>
>>> wrote:
>>>
>>>> Thank you Keith for the response,
>>>>
>>>> Yes it should be line 1142 not 1405, I was using 16.11 and now I'm using
>>>>
>>> 17.02 and still getting the same error.
>>>
>>> Not sure what to say here, it looks like some type of system
>>> configuration
>>> issue as I do not see it on my machine.
>>>
>>> Can you tell if the hugepage has a value and is it sane? The next thing
>>> is
>>> to see where in that memory is it failing start, end or middle someplace.
>>> Use GDB and compile the code with ‘make install
>>> T=x86_64-native-lunixapp-gcc EXTRA_CFLAGS=“-g -O0”' then set a break
>>> point
>>> on ‘b eal_memory.c:1142’ and inspect the memory pointer hugepage. I do
>>> not
>>> think it is overrun error meaning the size for memset is different then
>>> what was allocated and just stepping off the end.
>>>
>>> Also you did not tell me the linux version you are using?
>>>
>>> On Wed, Feb 22, 2017 at 8:46 PM, Wiles, Keith <keith.wiles at intel.com>
>>>>
>>> wrote:
>>>
>>>> On Feb 22, 2017, at 6:43 PM, Wiles, Keith <keith.wiles at intel.com>
>>>>>
>>>> wrote:
>>>
>>>> On Feb 22, 2017, at 6:30 PM, Sushil Adhikari <sushil446 at gmail.com>
>>>>>>
>>>>> wrote:
>>>
>>>> I used the basic command line option "dpdkTimer -c 0xf -n 4"
>>>>>> And to update on my findings so far I have narrowed down to this
>>>>>>
>>>>> line(1405)
>>>
>>>> memset(hugepage, 0, nr_hugefiles * sizeof(struct hugepage_file));
>>>>>> of function rte_eal_hugepage_init() in file
>>>>>>
>>>>> dpdk\lib\librte_eal\linuxapp\eal\eal_memory.c
>>>
>>>> What version of DPDK are you using? I was looking at the file at 1405
>>>>>
>>>> and I do not see a memset() call.
>>>
>>>> I found the memset call at 1142 in my 17.05-rc0 code. Please try the
>>>>
>>> latest version and see if you get the same problem.
>>>
>>>> Yes I have the hugepages of size 2MB(2048) and when I calculate the
>>>>>>
>>>>> memory this memset function is trying to set, it comes out to
>>> 512(nr_hugefiles) * 4144 ( sizeof(struct hugepage_file) ) = 2121728 which
>>> larger than 2MB, so my doubt is that the hugepages I have
>>> allocated(512*2MB) is not contiguous 1GB memory its trying to access
>>> memory
>>> thats not part of hugepage, is that a possibility, even though I am
>>> setting
>>> up hugepages during boot time by providing it through kernel option.
>>>
>>>>
>>>>>> On Wed, Feb 22, 2017 at 8:05 PM, Wiles, Keith <keith.wiles at intel.com>
>>>>>>
>>>>> wrote:
>>>
>>>> On Feb 22, 2017, at 3:05 PM, Sushil Adhikari <sushil446 at gmail.com>
>>>>>>>
>>>>>> wrote:
>>>
>>>> Hi,
>>>>>>>
>>>>>>> I was trying to run dpdk timer app by setting 512 2MB hugepages but
>>>>>>>
>>>>>> the
>>>
>>>> application crashed with following error
>>>>>>> EAL: Detected 4 lcore(s)
>>>>>>> EAL: Probing VFIO support...
>>>>>>> Bus error (core dumped)
>>>>>>>
>>>>>>> If I reduce the number of hugepages to 256 it works fine. I
>>>>>>>
>>>>>> wondering what
>>>
>>>> could be the problem here. Here's my cpu info
>>>>>>>
>>>>>> I normally run with 2048 x 2 or 2048 per socket on my machine. What
>>>>>>
>>>>> is the command line you are using to start the application?
>>>
>>>> processor : 0
>>>>>>> vendor_id : GenuineIntel
>>>>>>> cpu family : 6
>>>>>>> model : 26
>>>>>>> model name : Intel(R) Core(TM) i7 CPU 950 @ 3.07GHz
>>>>>>> stepping : 5
>>>>>>> microcode : 0x11
>>>>>>> cpu MHz : 2794.000
>>>>>>> cache size : 8192 KB
>>>>>>> physical id : 0
>>>>>>> siblings : 4
>>>>>>> core id : 0
>>>>>>> cpu cores : 4
>>>>>>> apicid : 0
>>>>>>> initial apicid : 0
>>>>>>> fpu : yes
>>>>>>> fpu_exception : yes
>>>>>>> cpuid level : 11
>>>>>>> wp : yes
>>>>>>> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr
>>>>>>>
>>>>>> pge mca
>>>
>>>> cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe
>>>>>>>
>>>>>> syscall nx
>>>
>>>> rdtscp lm constant_tsc arch_
>>>>>>> perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni
>>>>>>>
>>>>>> dtes64
>>>
>>>> monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 popcnt
>>>>>>> lahf_lm ida dtherm tpr_shadow vnm
>>>>>>> i flexpriority ept vpid
>>>>>>> bugs :
>>>>>>> bogomips : 5600.00
>>>>>>> clflush size : 64
>>>>>>> cache_alignment : 64
>>>>>>> address sizes : 36 bits physical, 48 bits virtual
>>>>>>> power management:
>>>>>>>
>>>>>>> processor : 1
>>>>>>> vendor_id : GenuineIntel
>>>>>>> cpu family : 6
>>>>>>> model : 26
>>>>>>> model name : Intel(R) Core(TM) i7 CPU 950 @ 3.07GHz
>>>>>>> stepping : 5
>>>>>>> microcode : 0x11
>>>>>>> cpu MHz : 2794.000
>>>>>>> cache size : 8192 KB
>>>>>>> physical id : 0
>>>>>>> siblings : 4
>>>>>>> core id : 1
>>>>>>> cpu cores : 4
>>>>>>> apicid : 2
>>>>>>> initial apicid : 2
>>>>>>> fpu : yes
>>>>>>> fpu_exception : yes
>>>>>>> cpuid level : 11
>>>>>>> wp : yes
>>>>>>> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr
>>>>>>>
>>>>>> pge mca
>>>
>>>> cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe
>>>>>>>
>>>>>> syscall nx
>>>
>>>> rdtscp lm constant_tsc arch_
>>>>>>> perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni
>>>>>>>
>>>>>> dtes64
>>>
>>>> monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 popcnt
>>>>>>> lahf_lm ida dtherm tpr_shadow vnm
>>>>>>> i flexpriority ept vpid
>>>>>>> bugs :
>>>>>>> bogomips : 5600.00
>>>>>>> clflush size : 64
>>>>>>> cache_alignment : 64
>>>>>>> address sizes : 36 bits physical, 48 bits virtual
>>>>>>> power management:......
>>>>>>>
>>>>>>> And Here's my meminfo
>>>>>>>
>>>>>>> MemTotal: 24679608 kB
>>>>>>> MemFree: 24014156 kB
>>>>>>> MemAvailable: 23950600 kB
>>>>>>> Buffers: 3540 kB
>>>>>>> Cached: 31436 kB
>>>>>>> SwapCached: 0 kB
>>>>>>> Active: 21980 kB
>>>>>>> Inactive: 22256 kB
>>>>>>> Active(anon): 10760 kB
>>>>>>> Inactive(anon): 2940 kB
>>>>>>> Active(file): 11220 kB
>>>>>>> Inactive(file): 19316 kB
>>>>>>> Unevictable: 0 kB
>>>>>>> Mlocked: 0 kB
>>>>>>> SwapTotal: 0 kB
>>>>>>> SwapFree: 0 kB
>>>>>>> Dirty: 32 kB
>>>>>>> Writeback: 0 kB
>>>>>>> AnonPages: 9252 kB
>>>>>>> Mapped: 11912 kB
>>>>>>> Shmem: 4448 kB
>>>>>>> Slab: 27712 kB
>>>>>>> SReclaimable: 11276 kB
>>>>>>> SUnreclaim: 16436 kB
>>>>>>> KernelStack: 2672 kB
>>>>>>> PageTables: 1000 kB
>>>>>>> NFS_Unstable: 0 kB
>>>>>>> Bounce: 0 kB
>>>>>>> WritebackTmp: 0 kB
>>>>>>> CommitLimit: 12077660 kB
>>>>>>> Committed_AS: 137792 kB
>>>>>>> VmallocTotal: 34359738367 kB
>>>>>>> VmallocUsed: 0 kB
>>>>>>> VmallocChunk: 0 kB
>>>>>>> HardwareCorrupted: 0 kB
>>>>>>> AnonHugePages: 2048 kB
>>>>>>> CmaTotal: 0 kB
>>>>>>> CmaFree: 0 kB
>>>>>>> HugePages_Total: 256
>>>>>>> HugePages_Free: 0
>>>>>>> HugePages_Rsvd: 0
>>>>>>> HugePages_Surp: 0
>>>>>>> Hugepagesize: 2048 kB
>>>>>>> DirectMap4k: 22000 kB
>>>>>>> DirectMap2M: 25133056 kB
>>>>>>>
>>>>>> Regards,
>>>>>> Keith
>>>>>>
>>>>>>
>>>>>> Regards,
>>>>> Keith
>>>>>
>>>> Regards,
>>>> Keith
>>>>
>>>>
>>>> Regards,
>>> Keith
>>>
>>>
>>>
>
More information about the users
mailing list