[dpdk-dev] Running DPDK as an unprivileged user
jianfeng.tan at intel.com
Thu Jan 5 16:52:31 CET 2017
On 1/5/2017 5:34 AM, Walker, Benjamin wrote:
> On Wed, 2017-01-04 at 19:39 +0800, Tan, Jianfeng wrote:
>> Hi Benjamin,
>> On 12/30/2016 4:41 AM, Walker, Benjamin wrote:
>>> DPDK today begins by allocating all of the required
>>> hugepages, then finds all of the physical addresses for
>>> those hugepages using /proc/self/pagemap, sorts the
>>> hugepages by physical address, then remaps the pages to
>>> contiguous virtual addresses. Later on and if vfio is
>>> enabled, it asks vfio to pin the hugepages and to set their
>>> DMA addresses in the IOMMU to be the physical addresses
>>> discovered earlier. Of course, running as an unprivileged
>>> user means all of the physical addresses in
>>> /proc/self/pagemap are just 0, so this doesn't end up
>>> working. Further, there is no real reason to choose the
>>> physical address as the DMA address in the IOMMU - it would
>>> be better to just count up starting at 0.
>> Why not just using virtual address as the DMA address in this case to
>> avoid maintaining another kind of addresses?
> That's a valid choice, although I'm just storing the DMA address in the
> physical address field that already exists. You either have a physical
> address or a DMA address and never both.
Yes, I understand that's why you cast the second question below.
>>> Also, because the
>>> pages are pinned after the virtual to physical mapping is
>>> looked up, there is a window where a page could be moved.
>>> Hugepage mappings can be moved on more recent kernels (at
>>> least 4.x), and the reliability of hugepages having static
>>> mappings decreases with every kernel release.
>> Do you mean kernel might take back a physical page after mapping it to a
>> virtual page (maybe copy the data to another physical page)? Could you
>> please show some links or kernel commits?
> Yes - the kernel can move a physical page to another physical page
> and change the virtual mapping at any time. For a concise example
> see 'man migrate_pages(2)', or for a more serious example the code
> that performs memory page compaction in the kernel which was
> recently extended to support hugepages.
> Before we go down the path of me proving that the mapping isn't static,
> let me turn that line of thinking around. Do you have any documentation
> demonstrating that the mapping is static? It's not static for 4k pages, so
> why are we assuming that it is static for 2MB pages? I understand that
> it happened to be static for some versions of the kernel, but my understanding
> is that this was purely by coincidence and never by intention.
Thank you for the information. Based on what you provide above, I
realize this behavior could happen since long time ago.
>>> Note that this
>>> probably means that using uio on recent kernels is subtly
>>> broken and cannot be supported going forward because there
>>> is no uio mechanism to pin the memory.
>>> The first open question I have is whether DPDK should allow
>>> uio at all on recent (4.x) kernels. My current understanding
>>> is that there is no way to pin memory and hugepages can now
>>> be moved around, so uio would be unsafe. What does the
>>> community think here?
Back to this question, removing uio support in DPDK seems a little
overkill to me. Can we just document it down? Like, firstly warn users
do not invoke migrate_pages() or move_pages() to a DPDK process; as for
the kcompactd daemon and some more cases (like compaction could be
triggered by alloc_pages()), could we just recommend to disable
Another side, how does vfio pin those memory? Through memlock (from code
in vfio_pin_pages())? So why not just mlock those hugepages?
>>> My second question is whether the user should be allowed to
>>> mix uio and vfio usage simultaneously. For vfio, the
>>> physical addresses are really DMA addresses and are best
>>> when arbitrarily chosen to appear sequential relative to
>>> their virtual addresses.
>> Why "sequential relative to their virtual addresses"? IOMMU table is for
>> DMA addr -> physical addr mapping. So we need to DMA addresses
>> "sequential relative to their physical addresses"? Based on your above
>> analysis on how hugepages are initialized, virtual addresses is a good
>> candidate for DMA address?
> The code already goes through a separate organizational step on all of
> the pages that remaps the virtual addresses such that they're sequential
> relative to the physical backing pages, so this mostly ends up as the same
> Choosing to use the virtual address is a totally valid choice, but I worry it
> may lead to confusion during debugging or in a multi-process scenario.
More information about the dev