[dpdk-dev] [EXT] Re: [PATCH v11 2/4] eal: add legacy kni option

Vamsi Krishna Attunuru vattunuru at marvell.com
Fri Oct 25 11:20:20 CEST 2019



> -----Original Message-----
> From: Jerin Jacob <jerinjacobk at gmail.com>
> Sent: Friday, October 25, 2019 1:01 AM
> To: Olivier Matz <olivier.matz at 6wind.com>
> Cc: Vamsi Krishna Attunuru <vattunuru at marvell.com>; Andrew Rybchenko
> <arybchenko at solarflare.com>; Ferruh Yigit <ferruh.yigit at intel.com>;
> thomas at monjalon.net; Jerin Jacob Kollanukkaran <jerinj at marvell.com>; Kiran
> Kumar Kokkilagadda <kirankumark at marvell.com>; anatoly.burakov at intel.com;
> stephen at networkplumber.org; dev at dpdk.org
> Subject: Re: [dpdk-dev] [EXT] Re: [PATCH v11 2/4] eal: add legacy kni option
> 
> On Thu, Oct 24, 2019 at 11:05 PM Olivier Matz <olivier.matz at 6wind.com>
> wrote:
> >
> > Hi,
> >
> > On Wed, Oct 23, 2019 at 08:32:08PM +0530, Jerin Jacob wrote:
> > > On Wed, Oct 23, 2019 at 8:17 PM Olivier Matz <olivier.matz at 6wind.com>
> wrote:
> > > >
> > > > Hi,
> > > >
> > > > On Wed, Oct 23, 2019 at 03:42:39PM +0530, Jerin Jacob wrote:
> > > > > On Tue, Oct 22, 2019 at 7:01 PM Vamsi Krishna Attunuru
> > > > > <vattunuru at marvell.com> wrote:
> > > > > >
> > > > > > Hi Ferruh,
> > > > > >
> > > > > > Can you please explain the problems in using kni dedicated mbuf alloc
> routines while enabling kni iova=va mode. Please see the below discussion with
> Andrew. He wanted to know the problems in having newer APIs.
> > > > >
> > > > >
> > > > > While waiting  for the Ferruh reply, I would like to summarise
> > > > > the current status
> > > > >
> > > > > # In order to make KNI work with IOVA as VA, We need to make
> > > > > sure mempool pool _object_ should not span across two huge pages
> > > > >
> > > > > # This problem can be fixed by, either of:
> > > > >
> > > > > a) Introduce a flag in mempool to define this constraint, so
> > > > > that, when only needed, this constraint enforced and this is in
> > > > > line with existing semantics of addressing such problems in
> > > > > mempool
> > > > >
> > > > > b) Instead of creating a flag, Make this behavior by default in
> > > > > mempool for IOVA as VA case
> > > > >
> > > > > Upside:
> > > > > b1) There is no need for specific mempool_create for KNI.
> > > > >
> > > > > Downside:
> > > > > b2) Not align with existing mempool API semantics
> > > > > b3) There will be a trivial amount of memory waste as we can not
> > > > > allocate from the edge. Considering the normal huge page memory
> > > > > size is 1G or 512MB this not a real issue.
> > > > >
> > > > > c) Make IOVA as PA when KNI kernel module is loaded
> > > > >
> > > > > Upside:
> > > > > c1) Doing option (a) would call for new KNI specific mempool
> > > > > create API i.e existing KNI applications need a one-line change
> > > > > in application to make it work with release 19.11 or later.
> > > > >
> > > > > Downslide:
> > > > > c2) Driver which needs RTE_PCI_DRV_NEED_IOVA_AS_VA can not work
> > > > > with KNI
> > > > > c3) Need root privilege to run KNI as IOVA as PA need root
> > > > > privilege
> > > > >
> > > > > For the next year, we expect applications to work 19.11 without
> > > > > any code change. My personal opinion to make go with option (a)
> > > > > and update the release notes to document the change any it
> > > > > simple one-line change.
> > > > >
> > > > > The selection of (a) vs (b) is between KNI and Mempool maintainers.
> > > > > Could we please reach a consensus? Or can we discuss this TB meeting?
> > > > >
> > > > > We are going back and forth on this feature on for the last 3
> > > > > releases. Now that, we solved all the technical problems, please
> > > > > help us to decide (a) vs (b) to make forward progress.
> > > >
> > > > Thank you for the summary.
> > > > What is not clear to me is if (a) or (b) may break an existing
> > > > application, and if yes, in which case.
> > >
> > > Thanks for the reply.
> > >
> > > To be clear we are talking about out of tree KNI tree application.
> > > Which they don't want to
> > > change rte_pktmbuf_pool_create() to rte_kni_pktmbuf_pool_create()
> > > and build for v19.11
> > >
> > > So in case (b) there is no issue as It will be using rte_pktmbuf_pool_create ().
> > > But in case of (a) it will create an issue if out of tree KNI
> > > application is using rte_pktmbuf_pool_create() which is not using
> > > the NEW flag.
> >
> > Following yesterday's discussion at techboard, I looked at the mempool
> > code and at my previous RFC patch. It took some time to remind me what
> > was my worries.
> 
> Thanks for the review Olivier.
> 
> Just to make sure the correct one is reviewed.
> 
> 1) v7 had similar issue mentioned
> https://urldefense.proofpoint.com/v2/url?u=http-
> 3A__patches.dpdk.org_patch_56585_&d=DwIBaQ&c=nKjWec2b6R0mOyPaz7xtf
> Q&r=WllrYaumVkxaWjgKto6E_rtDQshhIhik2jkvzFyRhW8&m=MMwAZe76YMVHe
> 8UcHjL4IBnfX5YvtbocwICAZGBY97A&s=mfN_afnyFm65sQYzaAg_-
> uM9o22A5j392TdBZY-bKK4&e=
> 
> 2) v11 addressed the review comments and you have given the Acked-by for
> mempool change https://urldefense.proofpoint.com/v2/url?u=http-
> 3A__patches.dpdk.org_patch_61559_&d=DwIBaQ&c=nKjWec2b6R0mOyPaz7xtf
> Q&r=WllrYaumVkxaWjgKto6E_rtDQshhIhik2jkvzFyRhW8&m=MMwAZe76YMVHe
> 8UcHjL4IBnfX5YvtbocwICAZGBY97A&s=frFvKOHFDRhTam6jDZZc6omK2gb1RU62
> xzAiiBMnf0I&e=
> 
> My thought process in the TB meeting was, since
> rte_mempool_populate_from_pg_sz_chunks() reviwed replace
> rte_pktmbuf_pool_create's  rte_mempool_populate_default() with
> rte_mempool_populate_from_pg_sz_chunks()
> in IOVA == VA case to avoid a new KNI mempool_create API.
> 
> >
> > Currently, in rte_mempool_populate_default(), when the mempool is
> > populated, we first try to allocate one iova-contiguous block of (n *
> > elt_size). On success, we use this memory to fully populate the
> > mempool without taking care of crossing page boundaries.
> >
> > If we change the behavior to prevent objects from crossing pages, the
> > assumption that allocating (n * elt_size) is always enough becomes
> > wrong.  By luck, there is no real impact, because if the mempool is
> > not fully populated after this first iteration, it will allocate a new
> > chunk.
> >
> > To be rigorous, we need to better calculate the amount of memory to
> > allocate, according to page size.

Hi Olivier,

Thanks for the review, I think the below mentioned problems exist with 
current mempool_populate_default() api and will there be high chances of hitting 
those problems when we precalculate the memory size(after preventing objs from 
pg boundary and fit complete mempool memory in single mem chunk) and if
mempool size goes beyond page size as below example. ?,  

Regards,
Vamsi

> >
> > Looking at the code, I found another problem in the same area: let's
> > say we populate a mempool that requires 1.1GB (and we use 1G huge pages):
> >	
> > 1/ mempool code will first tries to allocate an iova-contiguous zone
> >    of 1.1G -> fail
> > 2/ it then tries to allocate a page-aligned non iova-contiguous
> >    zone of 1.1G, which is 2G. On success, a lot of memory is wasted.
> > 3/ on error, we try to allocate the biggest zone, it can still return
> >    a zone between 1.1G and 2G, which can also waste memory.
> >
> > I will rework my mempool patchset to properly address these issues,
> > hopefully tomorrow.
> 
> OK.
> 
> 
> >
> > Also, I thought about another idea to solve your issue, not sure it is
> > better but it would not imply to change the mempool behavior. If I
> > understood the problem, when a mbuf is accross 2 pages, the copy of
> > the data can fail in kni because the mbuf is not virtually contiguous
> > in the
> 
> For KNI use case, we would need _physically_ contiguous to make sure that
> using, get_user_pages_remote() we get  physically contiguous memory map, so
> that both KNI kernel thread and KNI kernel context and DPDK userspace can use
> the same memory in different contexts.
> 
> 
> 
> > kernel. So why not in this case splitting the memcpy() into several,
> > each of them being on a single page (and calling phys2virt() for each
> > page)? The same would have to be done when accessing the fields of the
> > mbuf structure if it crosses a page boundary. Would that work? This
> 
> If the above is the requirement, Does this logic need to be in slow path or fast
> path?
> 
> > could be a B plan.
> 
> OK.
> 
> >
> > Olivier


More information about the dev mailing list