[dpdk-dev] [RFC v2 00/23] Dynamic memory allocation for DPDK

Nélio Laranjeiro nelio.laranjeiro at 6wind.com
Mon Feb 5 11:18:52 CET 2018


On Mon, Feb 05, 2018 at 10:03:35AM +0000, Burakov, Anatoly wrote:
> On 02-Feb-18 7:28 PM, Yongseok Koh wrote:
> > On Tue, Dec 26, 2017 at 05:19:25PM +0000, Walker, Benjamin wrote:
> > > On Fri, 2017-12-22 at 09:13 +0000, Burakov, Anatoly wrote:
> > > > On 21-Dec-17 9:38 PM, Walker, Benjamin wrote:
> > > > > SPDK will need some way to register for a notification when pages are
> > > > > allocated
> > > > > or freed. For storage, the number of requests per second is (relative to
> > > > > networking) fairly small (hundreds of thousands per second in a traditional
> > > > > block storage stack, or a few million per second with SPDK). Given that, we
> > > > > can
> > > > > afford to do a dynamic lookup from va to pa/iova on each request in order to
> > > > > greatly simplify our APIs (users can just pass pointers around instead of
> > > > > mbufs). DPDK has a way to lookup the pa from a given va, but it does so by
> > > > > scanning /proc/self/pagemap and is very slow. SPDK instead handles this by
> > > > > implementing a lookup table of va to pa/iova which we populate by scanning
> > > > > through the DPDK memory segments at start up, so the lookup in our table is
> > > > > sufficiently fast for storage use cases. If the list of memory segments
> > > > > changes,
> > > > > we need to know about it in order to update our map.
> > > > 
> > > > Hi Benjamin,
> > > > 
> > > > So, in other words, we need callbacks on alloa/free. What information
> > > > would SPDK need when receiving this notification? Since we can't really
> > > > know in advance how many pages we allocate (it may be one, it may be a
> > > > thousand) and they no longer are guaranteed to be contiguous, would a
> > > > per-page callback be OK? Alternatively, we could have one callback per
> > > > operation, but only provide VA and size of allocated memory, while
> > > > leaving everything else to the user. I do add a virt2memseg() function
> > > > which would allow you to look up segment physical addresses easier, so
> > > > you won't have to manually scan memseg lists to get IOVA for a given VA.
> > > > 
> > > > Thanks for your feedback and suggestions!
> > > 
> > > Yes - callbacks on alloc/free would be perfect. Ideally for us we want one
> > > callback per virtual memory region allocated, plus a function we can call to
> > > find the physical addresses/page break points on that virtual region. The
> > > function that finds the physical addresses does not have to be efficient - we'll
> > > just call that once when the new region is allocated and store the results in a
> > > fast lookup table. One call per virtual region is better for us than one call
> > > per physical page because we're actually keeping multiple different types of
> > > memory address translation tables in SPDK. One translates from va to pa/iova, so
> > > for this one we need to break this up into physical pages and it doesn't matter
> > > if you do one call per virtual region or one per physical page. However another
> > > one translates from va to RDMA lkey, so it is much more efficient if we can
> > > register large virtual regions in a single call.
> > 
> > Another yes to callbacks. Like Benjamin mentioned about RDMA, MLX PMD has to
> > look up LKEY per each packet DMA. Let me briefly explain about this for your
> > understanding. For security reason, we don't allow application initiates a DMA
> > transaction with unknown random physical addresses. Instead, va-to-pa mapping
> > (we call it Memory Region) should be pre-registered and LKEY is the index of the
> > translation entry registered in device. With the current static memory model, it
> > is easy to manage because v-p mapping is unchanged over time. But if it becomes
> > dynamic, MLX PMD should get notified with the event to register/un-regsiter
> > Memory Region.
> > 
> > For MLX PMD, it is also enough to get one notification per allocation/free of a
> > virutal memory region. It shouldn't necessarily be a per-page call like Benjamin
> > mentioned because PA of region doesn't need to be contiguous for registration.
> > But it doesn't need to know about physical address of the region (I'm not saying
> > it is unnecessary, but just FYI :-).
> > 
> > Thanks,
> > Yongseok
> > 
> 
> Thanks for your feedback, good to hear we're on the right track. I already
> have a prototype implementation of this working, due for v1 submission :)

Hi Anatoly,

Good to know.
Do you see some performances impact with this series?

Thanks,

-- 
Nélio Laranjeiro
6WIND


More information about the dev mailing list