[dpdk-dev] [RFC v2 00/23] Dynamic memory allocation for DPDK

Burakov, Anatoly anatoly.burakov at intel.com
Mon Feb 5 11:03:35 CET 2018


On 02-Feb-18 7:28 PM, Yongseok Koh wrote:
> On Tue, Dec 26, 2017 at 05:19:25PM +0000, Walker, Benjamin wrote:
>> On Fri, 2017-12-22 at 09:13 +0000, Burakov, Anatoly wrote:
>>> On 21-Dec-17 9:38 PM, Walker, Benjamin wrote:
>>>> SPDK will need some way to register for a notification when pages are
>>>> allocated
>>>> or freed. For storage, the number of requests per second is (relative to
>>>> networking) fairly small (hundreds of thousands per second in a traditional
>>>> block storage stack, or a few million per second with SPDK). Given that, we
>>>> can
>>>> afford to do a dynamic lookup from va to pa/iova on each request in order to
>>>> greatly simplify our APIs (users can just pass pointers around instead of
>>>> mbufs). DPDK has a way to lookup the pa from a given va, but it does so by
>>>> scanning /proc/self/pagemap and is very slow. SPDK instead handles this by
>>>> implementing a lookup table of va to pa/iova which we populate by scanning
>>>> through the DPDK memory segments at start up, so the lookup in our table is
>>>> sufficiently fast for storage use cases. If the list of memory segments
>>>> changes,
>>>> we need to know about it in order to update our map.
>>>
>>> Hi Benjamin,
>>>
>>> So, in other words, we need callbacks on alloa/free. What information
>>> would SPDK need when receiving this notification? Since we can't really
>>> know in advance how many pages we allocate (it may be one, it may be a
>>> thousand) and they no longer are guaranteed to be contiguous, would a
>>> per-page callback be OK? Alternatively, we could have one callback per
>>> operation, but only provide VA and size of allocated memory, while
>>> leaving everything else to the user. I do add a virt2memseg() function
>>> which would allow you to look up segment physical addresses easier, so
>>> you won't have to manually scan memseg lists to get IOVA for a given VA.
>>>
>>> Thanks for your feedback and suggestions!
>>
>> Yes - callbacks on alloc/free would be perfect. Ideally for us we want one
>> callback per virtual memory region allocated, plus a function we can call to
>> find the physical addresses/page break points on that virtual region. The
>> function that finds the physical addresses does not have to be efficient - we'll
>> just call that once when the new region is allocated and store the results in a
>> fast lookup table. One call per virtual region is better for us than one call
>> per physical page because we're actually keeping multiple different types of
>> memory address translation tables in SPDK. One translates from va to pa/iova, so
>> for this one we need to break this up into physical pages and it doesn't matter
>> if you do one call per virtual region or one per physical page. However another
>> one translates from va to RDMA lkey, so it is much more efficient if we can
>> register large virtual regions in a single call.
> 
> Another yes to callbacks. Like Benjamin mentioned about RDMA, MLX PMD has to
> look up LKEY per each packet DMA. Let me briefly explain about this for your
> understanding. For security reason, we don't allow application initiates a DMA
> transaction with unknown random physical addresses. Instead, va-to-pa mapping
> (we call it Memory Region) should be pre-registered and LKEY is the index of the
> translation entry registered in device. With the current static memory model, it
> is easy to manage because v-p mapping is unchanged over time. But if it becomes
> dynamic, MLX PMD should get notified with the event to register/un-regsiter
> Memory Region.
> 
> For MLX PMD, it is also enough to get one notification per allocation/free of a
> virutal memory region. It shouldn't necessarily be a per-page call like Benjamin
> mentioned because PA of region doesn't need to be contiguous for registration.
> But it doesn't need to know about physical address of the region (I'm not saying
> it is unnecessary, but just FYI :-).
> 
> Thanks,
> Yongseok
> 

Thanks for your feedback, good to hear we're on the right track. I 
already have a prototype implementation of this working, due for v1 
submission :)

-- 
Thanks,
Anatoly


More information about the dev mailing list