[dpdk-dev] [PATCH v4] eal: make hugetlb initialization more robust

Sergio Gonzalez Monroy sergio.gonzalez.monroy at intel.com
Wed May 18 09:56:13 CEST 2016


On 17/05/2016 17:39, David Marchand wrote:
> Hello Jianfeng,
>
> On Thu, May 12, 2016 at 2:44 AM, Jianfeng Tan <jianfeng.tan at intel.com> wrote:
>> This patch adds an option, --huge-trybest, to use a recover mechanism to
>> the case that there are not so many hugepages (declared in sysfs), which
>> can be used. It relys on a mem access to fault-in hugepages, and if fails
>> with SIGBUS, recover to previously saved stack environment with
>> siglongjmp().
>>
>> Besides, this solution fixes an issue when hugetlbfs is specified with an
>> option of size. Currently DPDK does not respect the quota of a hugetblfs
>> mount. It fails to init the EAL because it tries to map the number of free
>> hugepages in the system rather than using the number specified in the quota
>> for that mount.
>>
>> It's still an open issue with CONFIG_RTE_EAL_SINGLE_FILE_SEGMENTS. Under
>> this case (such as IVSHMEM target), having hugetlbfs mounts with quota will
>> fail to remap hugepages as it relies on having mapped all free hugepages
>> in the system.
> For such a case case, maybe having some warning log message when it
> fails would help the user.
> + a known issue in the release notes ?
>
>
>> diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c
>> index 5b9132c..8c77010 100644
>> --- a/lib/librte_eal/linuxapp/eal/eal_memory.c
>> +++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
>> @@ -417,12 +434,33 @@ map_all_hugepages(struct hugepage_file *hugepg_tbl,
>>                          hugepg_tbl[i].final_va = virtaddr;
>>                  }
>>
>> +               if (orig && internal_config.huge_trybest) {
>> +                       /* In linux, hugetlb limitations, like cgroup, are
>> +                        * enforced at fault time instead of mmap(), even
>> +                        * with the option of MAP_POPULATE. Kernel will send
>> +                        * a SIGBUS signal. To avoid to be killed, save stack
>> +                        * environment here, if SIGBUS happens, we can jump
>> +                        * back here.
>> +                        */
>> +                       if (wrap_sigsetjmp()) {
>> +                               RTE_LOG(DEBUG, EAL, "SIGBUS: Cannot mmap more "
>> +                                       "hugepages of size %u MB\n",
>> +                                       (unsigned)(hugepage_sz / 0x100000));
>> +                               munmap(virtaddr, hugepage_sz);
>> +                               close(fd);
>> +                               unlink(hugepg_tbl[i].filepath);
>> +                               return i;
>> +                       }
>> +                       *(int *)virtaddr = 0;
>> +               }
>> +
>> +
>>                  /* set shared flock on the file. */
>>                  if (flock(fd, LOCK_SH | LOCK_NB) == -1) {
>> -                       RTE_LOG(ERR, EAL, "%s(): Locking file failed:%s \n",
>> +                       RTE_LOG(DEBUG, EAL, "%s(): Locking file failed:%s \n",
>>                                  __func__, strerror(errno));
>>                          close(fd);
>> -                       return -1;
>> +                       return i;
>>                  }
>>
>>                  close(fd);
> Maybe I missed something, but we are writing into some hugepage before
> the flock has been called.
> Are we sure there is nobody else using this hugepage ?
>
> Especially, can't this cause trouble to a primary process running if
> we start the exact same primary process ?
>

We lock the hugepage directory during eal_hugepage_info_init(), and we 
do not unlock
until we have finished eal_memory_init.

I think that takes care of that case.

Sergio


More information about the dev mailing list