[dpdk-dev] no hugepage with UIO poll-mode driver

Ananyev, Konstantin konstantin.ananyev at intel.com
Wed Nov 25 15:12:20 CET 2015



> -----Original Message-----
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Sergio Gonzalez Monroy
> Sent: Wednesday, November 25, 2015 1:44 PM
> To: Thomas Monjalon
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] no hugepage with UIO poll-mode driver
> 
> On 25/11/2015 13:22, Thomas Monjalon wrote:
> > 2015-11-25 12:02, Bruce Richardson:
> >> On Wed, Nov 25, 2015 at 12:03:05PM +0100, Thomas Monjalon wrote:
> >>> 2015-11-25 11:00, Bruce Richardson:
> >>>> On Wed, Nov 25, 2015 at 11:23:57AM +0100, Thomas Monjalon wrote:
> >>>>> 2015-11-25 10:08, Bruce Richardson:
> >>>>>> On Wed, Nov 25, 2015 at 03:39:17PM +0900, Younghwan Go wrote:
> >>>>>>> Hi Jianfeng,
> >>>>>>>
> >>>>>>> Thanks for the email. rte mempool was successfully created without any
> >>>>>>> error. Now the next problem is that rte_eth_rx_burst() is always returning 0
> >>>>>>> as if there was no packet to receive... Do you have any suggestion on what
> >>>>>>> might be causing this issue? In the meantime, I will be digging through
> >>>>>>> ixgbe driver code to see what's going on.
> >>>>>>>
> >>>>>>> Thank you,
> >>>>>>> Younghwan
> >>>>>>>
> >>>>>> The problem is that with --no-huge we don't have the physical address of the memory
> >>>>>> to write to the network card. That's what it's marked as for testing only.
> >>>>> Even with rte_mem_virt2phy() + rte_mem_lock_page() ?
> >>>>>
> >>>> With no-huge, we just set up a single memory segment at startup and set its
> >>>> "physaddr" to be the virtual address.
> >>>>
> >>>>          /* hugetlbfs can be disabled */
> >>>>          if (internal_config.no_hugetlbfs) {
> >>>>                  addr = mmap(NULL, internal_config.memory, PROT_READ | PROT_WRITE,
> >>>>                                  MAP_PRIVATE | MAP_ANONYMOUS, 0, 0);
> >>>>                  if (addr == MAP_FAILED) {
> >>>>                          RTE_LOG(ERR, EAL, "%s: mmap() failed: %s\n", __func__,
> >>>>                                          strerror(errno));
> >>>>                          return -1;
> >>>>                  }
> >>>>                  mcfg->memseg[0].phys_addr = (phys_addr_t)(uintptr_t)addr;
> >>> rte_mem_virt2phy() does not use memseg.phys_addr but /proc/self/pagemap:
> >>>
> >>>      /*
> >>>       * the pfn (page frame number) are bits 0-54 (see
> >>>       * pagemap.txt in linux Documentation)
> >>>       */
> >>>      physaddr = ((page & 0x7fffffffffffffULL) * page_size)
> >>>          + ((unsigned long)virtaddr % page_size);
> >>>
> >> Yes, you are right. I was not aware that that function was used as part of the
> >> mempool init, but now I see that "rte_mempool_virt2phy()" does indeed call that
> >> function if hugepages are disabled, so my bad.
> > Do you think we could move --no-huge in the main section (not only for testing)?
> Hi,
> 
> I think the main issue is going to be the HW descriptors queues.
> AFAIK drivers now call rte_eth_dma_zone_reserve, which is basically a
> wrapper around
> rte_memzone_reserve, to get a chunk of phys memory, and in the case of
> --no-huge is
> not going to be really phys contiguous.
> 
> Ideally we would move and expand the functionality of dma_zone reserve
> API to the EAL,
> so we could detect what page size we have and set the boundary for such
> page size.
> dma_zone_reserve does something similar to work on Xen target by
> reserving memzones
> on 2MB boundary.

With xen we have a special kernel driver that allocates physically continuous 
chunks of memory for us.
So we can guarantee that each such chunk would be at least 2MB long.
That's enough to allocate HW rings (max HW ring size for let say ixgbe is ~64KB).
Here there is absolutely no guarantee that memory allocated by kernel will be memory continuous.
Of course you can search though all pages that you allocated and most likely you'll find a continuous
chunk big enough for that.
Another problem - mbufs. 
You need to be sure that each mbuf doesn't cross page boundary
(in case next page is not adjacent to current one).
So you'll probably need to use rte_mempool_xmem_create() to allocate mbufs from no hugepages.
BTW, as I remember with vfio in place you should be able to do IO with no-hugepages options, no?
As it relies on vfio ability to setup IOMMU tables for you.
Konstantin

> 
> Sergio


More information about the dev mailing list