[dpdk-dev] no hugepage with UIO poll-mode driver

Younghwan Go yhwan at ndsl.kaist.edu
Thu Nov 26 05:47:03 CET 2015


Hello,

Thank you all for helping us understand on issues with no hugepage option.

As Konstantin mentioned at the end, I tried using VFIO module instead of 
IGB UIO module. I enabled all necessary parameters (IOMMU, 
virtualization, vfio-pci, VFIO permission) and ran my code with no 
hugepage option.

At first, it seemed to receive packets fine, but after a while, it 
stopped receiving packets. I could temporarily remove this issue by not 
calling rte_eth_tx_burst(). Also, when I looked at the received packets, 
they all contained 0s instead of actual data. Was there anything that I 
missed in running with VFIO? I'm curious if no hugepage with no hugepage 
option was confirmed to run with VFIO.

Thank you,
Younghwan

2015-11-25 오후 11:12에 Ananyev, Konstantin 이(가) 쓴 글:
>
>> -----Original Message-----
>> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Sergio Gonzalez Monroy
>> Sent: Wednesday, November 25, 2015 1:44 PM
>> To: Thomas Monjalon
>> Cc: dev at dpdk.org
>> Subject: Re: [dpdk-dev] no hugepage with UIO poll-mode driver
>>
>> On 25/11/2015 13:22, Thomas Monjalon wrote:
>>> 2015-11-25 12:02, Bruce Richardson:
>>>> On Wed, Nov 25, 2015 at 12:03:05PM +0100, Thomas Monjalon wrote:
>>>>> 2015-11-25 11:00, Bruce Richardson:
>>>>>> On Wed, Nov 25, 2015 at 11:23:57AM +0100, Thomas Monjalon wrote:
>>>>>>> 2015-11-25 10:08, Bruce Richardson:
>>>>>>>> On Wed, Nov 25, 2015 at 03:39:17PM +0900, Younghwan Go wrote:
>>>>>>>>> Hi Jianfeng,
>>>>>>>>>
>>>>>>>>> Thanks for the email. rte mempool was successfully created without any
>>>>>>>>> error. Now the next problem is that rte_eth_rx_burst() is always returning 0
>>>>>>>>> as if there was no packet to receive... Do you have any suggestion on what
>>>>>>>>> might be causing this issue? In the meantime, I will be digging through
>>>>>>>>> ixgbe driver code to see what's going on.
>>>>>>>>>
>>>>>>>>> Thank you,
>>>>>>>>> Younghwan
>>>>>>>>>
>>>>>>>> The problem is that with --no-huge we don't have the physical address of the memory
>>>>>>>> to write to the network card. That's what it's marked as for testing only.
>>>>>>> Even with rte_mem_virt2phy() + rte_mem_lock_page() ?
>>>>>>>
>>>>>> With no-huge, we just set up a single memory segment at startup and set its
>>>>>> "physaddr" to be the virtual address.
>>>>>>
>>>>>>           /* hugetlbfs can be disabled */
>>>>>>           if (internal_config.no_hugetlbfs) {
>>>>>>                   addr = mmap(NULL, internal_config.memory, PROT_READ | PROT_WRITE,
>>>>>>                                   MAP_PRIVATE | MAP_ANONYMOUS, 0, 0);
>>>>>>                   if (addr == MAP_FAILED) {
>>>>>>                           RTE_LOG(ERR, EAL, "%s: mmap() failed: %s\n", __func__,
>>>>>>                                           strerror(errno));
>>>>>>                           return -1;
>>>>>>                   }
>>>>>>                   mcfg->memseg[0].phys_addr = (phys_addr_t)(uintptr_t)addr;
>>>>> rte_mem_virt2phy() does not use memseg.phys_addr but /proc/self/pagemap:
>>>>>
>>>>>       /*
>>>>>        * the pfn (page frame number) are bits 0-54 (see
>>>>>        * pagemap.txt in linux Documentation)
>>>>>        */
>>>>>       physaddr = ((page & 0x7fffffffffffffULL) * page_size)
>>>>>           + ((unsigned long)virtaddr % page_size);
>>>>>
>>>> Yes, you are right. I was not aware that that function was used as part of the
>>>> mempool init, but now I see that "rte_mempool_virt2phy()" does indeed call that
>>>> function if hugepages are disabled, so my bad.
>>> Do you think we could move --no-huge in the main section (not only for testing)?
>> Hi,
>>
>> I think the main issue is going to be the HW descriptors queues.
>> AFAIK drivers now call rte_eth_dma_zone_reserve, which is basically a
>> wrapper around
>> rte_memzone_reserve, to get a chunk of phys memory, and in the case of
>> --no-huge is
>> not going to be really phys contiguous.
>>
>> Ideally we would move and expand the functionality of dma_zone reserve
>> API to the EAL,
>> so we could detect what page size we have and set the boundary for such
>> page size.
>> dma_zone_reserve does something similar to work on Xen target by
>> reserving memzones
>> on 2MB boundary.
> With xen we have a special kernel driver that allocates physically continuous
> chunks of memory for us.
> So we can guarantee that each such chunk would be at least 2MB long.
> That's enough to allocate HW rings (max HW ring size for let say ixgbe is ~64KB).
> Here there is absolutely no guarantee that memory allocated by kernel will be memory continuous.
> Of course you can search though all pages that you allocated and most likely you'll find a continuous
> chunk big enough for that.
> Another problem - mbufs.
> You need to be sure that each mbuf doesn't cross page boundary
> (in case next page is not adjacent to current one).
> So you'll probably need to use rte_mempool_xmem_create() to allocate mbufs from no hugepages.
> BTW, as I remember with vfio in place you should be able to do IO with no-hugepages options, no?
> As it relies on vfio ability to setup IOMMU tables for you.
> Konstantin
>
>> Sergio



More information about the dev mailing list