[dpdk-dev] [PATCH v3 3/9] ring: introduce RTS ring mode

Ananyev, Konstantin konstantin.ananyev at intel.com
Tue Apr 14 15:18:44 CEST 2020


> > > >
> > > > +#ifdef ALLOW_EXPERIMENTAL_API
> > > > +#include <rte_ring_rts.h>
> > > > +#endif
> > > > +
> > > >  /**
> > > >   * Enqueue several objects on a ring.
> > > >   *
> > > > @@ -484,8 +520,21 @@ static __rte_always_inline unsigned int
> > > > rte_ring_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
> > > >  		      unsigned int n, unsigned int *free_space)  {
> > > > -	return __rte_ring_do_enqueue(r, obj_table, n,
> > > > RTE_RING_QUEUE_FIXED,
> > > > -			r->prod.sync_type, free_space);
> > > > +	switch (r->prod.sync_type) {
> > > > +	case RTE_RING_SYNC_MT:
> > > > +		return rte_ring_mp_enqueue_bulk(r, obj_table, n, free_space);
> > > > +	case RTE_RING_SYNC_ST:
> > > > +		return rte_ring_sp_enqueue_bulk(r, obj_table, n, free_space);
> > > Have you validated if these affect the performance for the existing APIs?
> >
> > I run ring_pmd_perf_autotest
> > (AFAIK, that's the only one of our perf tests that calls
> > rte_ring_enqueue/dequeue), and didn't see any real difference in perf
> > numbers.
> >
> > > I am also wondering why should we support these new modes in the legacy
> > APIs?
> >
> > Majority of DPDK users still do use legacy API, and I am not sure all of them
> > will be happy to switch to _elem_ one manually.
> > Plus I can't see how we can justify that after let say:
> > rte_ring_init(ring, ..., RING_F_MP_HTS_ENQ | RING_F_MC_HTS_DEQ); returns
> > with success valid call to rte_ring_enqueue(ring,...) should fail.
> Agree, I think the only way right now is through documentation.
> 
> >
> > > I think users should move to use rte_ring_xxx_elem APIs. If users want to
> > use RTS/HTS it will be a good time for them to move to new APIs.
> >
> > If they use rte_ring_enqueue/dequeue all they have to do - just change flags
> > in ring_create/ring_init call.
> > With what you suggest - they have to change every
> > rte_ring_enqueue/dequeue to rte_ring_elem_enqueue/dequeue.
> > That's much bigger code churn.
> But these are just 1 to 1 mapped.  I would think, there are not a whole lot of them in the application code, may be ~10 lines?

I suppose it depends a lot on particular user app.
My preference not to force users to do extra changes in their code.
If we can add new functionality while keeping existing API, why not to do it?
Less disturbance for users seems a good thing to me.

> I think the bigger factor for the user here is the algorithm changes in rte_ring library. Bigger effort for the users would be testing rather than
> code changes in the applications.
> >
> > > They anyway have to test their code for RTS/HTS, might as well make the
> > change to new APIs and test both.
> > > It will be less code to maintain for the community as well.
> >
> > That's true, right now there is a lot of duplication between _elem_ and legacy
> > code.
> >  Actually the only real diff between them - actual copying of the objects.
> >  But I thought we are going to deal with that, just by changing one day all
> > legacy API to wrappers around _elem_ calls, i.e something like:
> >
> > static __rte_always_inline unsigned int
> > rte_ring_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
> >                       unsigned int n, unsigned int *free_space) {
> > 	return  rte_ring_enqueue_elem_bulk(r, obj_table, sizeof(uintptr_t), n,
> > free_space); }
> >
> > That way users will switch to new API automatically, without any extra effort
> > for them, and we will be able to remove legacy code.
> > Do you have some other thoughts here how to deal with this legacy/elem
> > conversion?
> Yes, that is what I was thinking, but had not considered any addition of new APIs.
> But, I am wondering if we should look at deprecation?

You mean to deprecate existing  legacy API?
rte_ring_enqueue/dequeue_bulk, etc?
I don't think we need to deprecate it at all.
As long as we'll have _elem_  functions called underneath there would be one implementation anyway,
and we can leave them forever, so users wouldn't need to change their existing code at all. 

> If we decide to deprecate, it would be good to avoid making the users of RTS/HTS do
> the work twice (once to make the switch to RTS/HTS and then another to _elem_ APIs).
> 
> One thing we can do is to implement the wrappers you mentioned above for RTS/HTS now.

That's a very good point.
 It will require some re-org to allow rte_ring.h to include rte_ring_elem.h,
but I think it is doable, will try to make these changes in v4. 

> I also it is worth considering to switch to these
> wrappers 20.05 so that come 20.11, we have a code base that has gone through couple of releases' testing.

You mean wrappers for existing legacy API (MP/MC, SP/SC modes)?
It is probably too late to make such changes in 20.05, probably early 20.08 is a good time for that.  

> 
> <snip>
> 
> > > > +
> > > > +#endif /* _RTE_RING_RTS_ELEM_H_ */
> > > > diff --git a/lib/librte_ring/rte_ring_rts_generic.h
> > > > b/lib/librte_ring/rte_ring_rts_generic.h
> > > > new file mode 100644
> > > > index 000000000..f88460d47
> > > > --- /dev/null
> > > > +++ b/lib/librte_ring/rte_ring_rts_generic.h
> > > I do not know the benefit to providing the generic version. Do you know
> > why this was done in the legacy APIs?
> >
> > I think at first we had generic API only, then later C11 was added.
> > As I remember, C11 one on IA was measured as a bit slower then generic,
> > so it was decided to keep both.
> >
> > > If there is no performance difference between generic and C11 versions,
> > should we just skip the generic version?
> > > The oldest compiler in CI are GCC 4.8.5 and Clang 3.4.2 and C11 built-ins
> > are supported earlier than these compiler versions.
> > > I feel the code is growing exponentially in rte_ring library and we should try
> > to cut non-value-ass code/APIs aggressively.
> >
> > I'll check is there perf difference for RTS and HTS between generic and C11
> > versions on IA.
> > Meanwhile please have a proper look at C11 implementation, I am not that
> > familiar with C11 atomics yet.
> ok
> 
> > If there would be no problems with it and no noticeable diff in performance -
> > I am ok to have for RTS/HTS modes C11 version only.

>From what I see on my box, there is no much difference
in terms of performance between *generic* and *c11_mem* for RTS/HTS.
ring_stress_autotest for majority of cases shows ~1% diff,
in some cases c11 numbers are even a bit better.
So will keep c11 version only in v4.



More information about the dev mailing list