Secondary process stuck in rte_eal_memory_init
Anna Tauzzi
admin at argonnetech.net
Wed Aug 24 12:11:28 CEST 2022
Using lslocks command on Linux I see that the primary has a lock on
/mnt/huge2M and the secondary is waiting for a lock on the same directory.
SECONDARY 2416270 FLOCK WRITE* 0 0 0 /mnt/huge2M...
2416174
PRIMARY 2416174 FLOCK WRITE 0 0 0 /mnt/huge2M...
Is a PRIMARY supposed to hold a permanent lock on a /mnt/huge2M ?
Il giorno mer 24 ago 2022 alle ore 11:18 Anna Tauzzi <admin at argonnetech.net>
ha scritto:
> Already tried the first suggestion with no luck, the secondary always gets
> stuck:
>
> #0 0x00007fc6d3eb05ab in flock () at ../sysdeps/unix/syscall-template.S:78
> #1 0x00007fc6d3ba1343 in sync_walk () from /usr/local/lib/librte_eal.so.22
> #2 0x00007fc6d3b8402b in rte_memseg_list_walk_thread_unsafe () from
> /usr/local/lib/librte_eal.so.22
> #3 0x00007fc6d3ba18bf in eal_memalloc_sync_with_primary () from
> /usr/local/lib/librte_eal.so.22
> #4 0x00007fc6d3ba24b5 in rte_eal_hugepage_attach () from
> /usr/local/lib/librte_eal.so.22
> #5 0x00007fc6d3b848f1 in rte_eal_memory_init () from
> /usr/local/lib/librte_eal.so.22
> #6 0x00007fc6d3b782aa in rte_eal_init.cold () from
> /usr/local/lib/librte_eal.so.22
>
> For the second info:
> if I prevent the primary to allocate on the NUMA where secondary is
> running, then, the secondary doesn't get stuck.
>
>
>
>
> Il giorno mer 24 ago 2022 alle ore 11:14 Antonio Di Bacco <
> a.dibacco.ks at gmail.com> ha scritto:
>
>> Can you try launching the secondary with some delay in order not to
>> overlap with memory allocations done in the primary?
>> Is your primary allocating memory on NUMA 0 where the secondary is
>> running?
>>
>> On Tue, Aug 23, 2022 at 4:54 PM Anna Tauzzi <admin at argonnetech.net>
>> wrote:
>> >
>> > I have a primary process that spawns a secondary process.Primary is on
>> NUMA 1 while secondary on NUMA 0.
>> > The secondary process starts up but when calling rte_eal_init it gets
>> stuck with this backtrace:
>> >
>> > flock()
>> > sync_walk()
>> > rte_memseg_list_walk_thread_unsafe()
>> > eal_memalloc_sync_with_primary()
>> > rte_eal_hugepage_attach()
>> > rte_eal_memory_init()
>> > rte_eal_init.cold()
>> >
>> > While starting the secondary, it is possible that the primary is
>> allocating memory on different NUMAs. I'm saying this because if in the
>> primary I replace the dpdk memory allocation function (rte_zalloc...) with
>> a plain memalign I don't get this problem.
>> >
>> >
>> >
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mails.dpdk.org/archives/users/attachments/20220824/24b4a4c0/attachment.htm>
More information about the users
mailing list