[dpdk-dev] [PATCHv3] librte_acl make it build/work for 'default' target

Neil Horman nhorman at tuxdriver.com
Wed Aug 27 20:56:54 CEST 2014


On Wed, Aug 27, 2014 at 11:25:04AM +0000, Ananyev, Konstantin wrote:
> > From: Neil Horman [mailto:nhorman at tuxdriver.com]
> > Sent: Tuesday, August 26, 2014 6:45 PM
> > To: Ananyev, Konstantin
> > Cc: dev at dpdk.org; thomas.monjalon at 6wind.com
> > Subject: Re: [PATCHv3] librte_acl make it build/work for 'default' target
> > 
> > On Mon, Aug 25, 2014 at 04:30:05PM +0000, Ananyev, Konstantin wrote:
> > > Hi Neil,
> > >
> > > > -----Original Message-----
> > > > From: Neil Horman [mailto:nhorman at tuxdriver.com]
> > > > Sent: Thursday, August 21, 2014 9:15 PM
> > > > To: dev at dpdk.org
> > > > Cc: Ananyev, Konstantin; thomas.monjalon at 6wind.com; Neil Horman
> > > > Subject: [PATCHv3] librte_acl make it build/work for 'default' target
> > > >
> > > > Make ACL library to build/work on 'default' architecture:
> > > > - make rte_acl_classify_scalar really scalar
> > > >  (make sure it wouldn't use sse4 instrincts through resolve_priority()).
> > > > - Provide two versions of rte_acl_classify code path:
> > > >   rte_acl_classify_sse() - could be build and used only on systems with sse4.2
> > > >   and upper, return -ENOTSUP on lower arch.
> > > >   rte_acl_classify_scalar() - a slower version, but could be build and used
> > > >   on all systems.
> > > > - keep common code shared between these two codepaths.
> > > >
> > > > v2 chages:
> > > >  run-time selection of most appropriate code-path for given ISA.
> > > >  By default the highest supprted one is selected.
> > > >  User can still override that selection by manually assigning new value to
> > > >  the global function pointer rte_acl_default_classify.
> > > >  rte_acl_classify() becomes a macro calling whatever rte_acl_default_classify
> > > >  points to.
> > > >
> > >
> > > I see you decided not to wait for me and fix everything by yourself :)
> > >
> > Yeah, sorry, I'm getting pinged about enabling these features in Fedora, and it
> > had been about 2 weeks, so I figured I'd just take care of it.
> 
> No worries. I admit that it was a long delay from my side.
> 
> > 
> > > > V3 Changes
> > > >  Updated classify pointer to be a function so as to better preserve ABI
> > >
> > > As I said in my previous mail it generates extra jump...
> > > Though from numbers I got the performance impact is negligible: < 1%.
> > > So I suppose, I don't have a good enough reason to object :)
> > >
> > Yeah, I just don't see a way around it.  I was hoping that the compiler would
> > have been smart enough to see that the rte_acl_classify function was small and
> > in-linable, but apparently it won't do that.  As you note however the
> > performance change is minor (I'm guessing within a standard deviation of your
> > results).
> > 
> > > Though I still think we better keep  rte_acl_classify_scalar() publically available (same as we do for rte acl_classify_sse()):
> > > First of all keep  rte_acl_classify_scalar() is already part of our public API.
> > > Also, as I remember, one of the customers explicitly asked for scalar version and they planned to call it directly.
> > > Plus using rte_acl_select_classify() to always switch between implementations is not always handy:
> > 
> > I'm not exactly opposed to this, though it seems odd to me that a user might
> > want to call a particular version of the classifier directly.  But I certainly
> > can't predict everything a consumer wants to do.  If we really need to keep it
> > public then, it begs the question, is providing a generic entry point even
> > worthwhile?  Is it just as easy to expose the scalar/sse and any future versions
> > directly so the application can just embody the intellegence to select the best
> > path?  That saves us having to maintain another API point.  I can go with
> > consensus on that.
> > 
> > > -  it is global, which means that we can't simultaneously use classify_scalar() and classify_sse() for 2 different ACL contexts.
> > > - to properly support such switching we then will need to support something like (see app/test/test_acl.c below):
> > >   old_alg = rte_acl_get_classify();
> > >   rte_acl_select_classify(new_alg);
> > >   ...
> > >   rte_acl_select_classify(old_alg);
> > >
> > We could attach the classification method to the acl context, so each
> > rte_acl_ctx can point to whatever classifier funtion it wants to.  That would
> > remove the global issues you point out above.
> 
> I thought about that approach too.
> But there is one implication with DPDK MP model: 
> Same ACL context can be shared by different DPDK processes,
> while acl_classify() could be loaded to the different addresses.
> Of course we can overcome it by creating a global table of function pointers indexed by calssify_alg and
> store inside ACL ctx alg instead of actual function pointer.
> But that means extra overhead of at least two loads per classify() call.
> 
Hmm, how is the context shared around between processes?  Is it just shared as a
common cow data page resulting from a fork?  If so, then we should be good
because the DSO text will be at the same address (i.e. the pointer will still be
valid).  If you do some sort of message passing, then, yes, thats a problem.


> >  Or alternatively we can just not
> > provide a generic entry point and let each user select a specific function.
> 
> I wonder can we have sort of mixed approach:
> 1. provide a generic entry point that would be set to the best (from our knowledge) available classify function.
> 2. Let each user use a specific function if he wants too.
> 
> i.e: 
> - keep classify_scalar/classify_sse/classify_... public.
> - keep your current implementation of rte_acl_classify()
> BTW in that way, we probably can make acl_select_classify() static. 
> 
Agreed, depending on your answer above, this might be the best solution.

> So most users would just use generic entry point and wouldn't need to write their own code wrappers around it.
> For users who need to use a particular classify()  version - they can call it directly.
> 
It does seem reasonable.  Let me know what the ctx sharing mechanism is from
above, and we can settle this.

> > 
> > 
> > > >  REmoved macro definitions for match check functions to make them static inline
> > >
> > > More comments inlined below.
> > >snip>
> > > >
> > > >  	/* make a quick check for scalar */
> > > > -	ret = rte_acl_classify_scalar(acx, data, results,
> > > > +	rte_acl_select_classify(ACL_CLASSIFY_SCALAR);
> > > > +	ret = rte_acl_classify(acx, data, results,
> > > >  			RTE_DIM(acl_test_data), RTE_ACL_MAX_CATEGORIES);
> > >
> > >
> > > As I said above, that doesn't seem correct: we set rte_acl_default_classify = rte_acl_classify_scalar and never restore it back to the
> > original value.
> > > To support it properly, we need to:
> > > old_alg = rte_acl_get_classify();
> > >  rte_acl_select_classify(new_alg);
> > >  ...
> > >  rte_acl_select_classify(old_alg);
> > >
> > So, for the purposes of this test application, I don't see that as being needed.
> > Every call to rte_acl_classify is preceded by a setting of the classifier
> > function, so you're safe.
> 
> Not every, that's a problem.
> As I can see, in test/test_acl.c you replaced
> rte_acl_classify_scalar();
> with
> rte_acl_select_classify(SCALAR);
> rte_acl_classify();
> 
> And never restore previous value of rte_acl_default_classify.
> Right now rte_acl_default_classify is global, so after first:
> rte_acl_select_classify(SCALAR);
> all subsequent rte_acl_classify() will actually use scalar version.
> 
Hmm, ok, I'll take a closer look at it.

> >  If you're concerned about other processes using the
> > dpdk library at the same time, you're still safe, as despite being a global
> > variable, data pages in a DSO are Copy on Write, so each process gets their own
> > copy of the global variable.
> 
> No, my concern here is only about  app/test here. 
> 
> > 
> > Multiple threads within the same process are problematic, I agree, and thats
> > solvable with the per-acl-context mechanism that I described above, though that
> > shouldn't be needed here as this seems to be a single threaded program.
> > 
> > > Make all this just to keep UT valid seems like a big hassle to me.
> > > So I said above - probably better just leave it to call rte_acl_classify_scalar() directly.
> > >
> > That works for me too, though the per-context mechanism seems kind of nice to
> > me.  Let me know what you prefer.
> > 
> > ><snip>
> > > >
> > > > diff --git a/lib/librte_acl/acl_match_check.h b/lib/librte_acl/acl_match_check.h
> > > > new file mode 100644
> > > > index 0000000..4dc1982
> > > > --- /dev/null
> > > > +++ b/lib/librte_acl/acl_match_check.h
> > >
> > > As a nit: we probably don't need a special header just for one function and can place it inside acl_run.h.
> > >
> > Agreed, I can move that to acl_run.h.
> > 
> > ><snip>
> > > > + */
> > > > +static inline uint64_t
> > > > +acl_match_check(uint64_t transition, int slot,
> > > > +	const struct rte_acl_ctx *ctx, struct parms *parms,
> > > > +	struct acl_flow_data *flows, void (*resolve_priority)(
> > > > +	uint64_t transition, int n, const struct rte_acl_ctx *ctx,
> > > > +	struct parms *parms, const struct rte_acl_match_results *p,
> > > > +	uint32_t categories))
> > >
> > > Ugh, that's really hard to read.
> > > Can we create a typedef for resolve_priority function type:
> > > typedef void (*resolve_priority_t)(uint64_t, int,
> > >         const struct rte_acl_ctx *ctx, struct parms *,
> > >         const struct rte_acl_match_results *, uint32_t);
> > > And use it here?
> > >
> > Sure, I'm fine with doing that.
> > 
> > ><snip>
> > > > +
> > > > +/* by default, use always avaialbe scalar code path. */
> > > > +rte_acl_classify_t rte_acl_default_classify = rte_acl_classify_scalar;
> > >
> > > Why not 'static'?
> > > I thought you'd like to hide it  from external world.
> > >
> > Doh!  I didn't do the one thing that I really meant to do.  I removed it from
> > the header file but I forgot to declare the variable static.  I'll fix that.
> > 
> > > > +
> > > > +void rte_acl_select_classify(enum acl_classify_alg alg)
> > > > +{
> > > > +
> > > > +	switch(alg)
> > > > +	{
> > > > +		case ACL_CLASSIFY_DEFAULT:
> > > > +		case ACL_CLASSIFY_SCALAR:
> > > > +			rte_acl_default_classify = rte_acl_classify_scalar;
> > > > +			break;
> > > > +		case ACL_CLASSIFY_SSE:
> > > > +			rte_acl_default_classify = rte_acl_classify_sse;
> > > > +			break;
> > > > +	}
> > > > +
> > > > +}
> > >
> > > As this is init phase function, I suppose we can add check that alg has a valid(supported) value, and return some error as return
> > value, if not.
> > >
> > Not sure I follow what you're saying above, are you suggesting that we add a
> > rte_cpu_get_flag_enabled check to rte_acl_select_classify above?
> > 
> > ><snip>
> > > >   *
> > > > @@ -286,9 +289,10 @@ rte_acl_reset(struct rte_acl_ctx *ctx);
> > > >   * @return
> > > >   *   zero on successful completion.
> > > >   *   -EINVAL for incorrect arguments.
> > > > + *   -ENOTSUP for unsupported platforms.
> > >
> > > Please remove the line above: current implementation doesn't return ENOTSUP
> > > (I think that was left from v1).
> > >
> > Yup, probably was.  I'll remove it.
> > 
> > > >   */
> > > >  int
> > > > -rte_acl_classify(const struct rte_acl_ctx *ctx, const uint8_t **data,
> > > > +rte_acl_classify_sse(const struct rte_acl_ctx *ctx, const uint8_t **data,
> > > >  	uint32_t *results, uint32_t num, uint32_t categories);
> > > >
> > > >  /**
> > > > @@ -323,9 +327,23 @@ rte_acl_classify(const struct rte_acl_ctx *ctx, const uint8_t **data,
> > > >   *   zero on successful completion.
> > > >   *   -EINVAL for incorrect arguments.
> > > >   */
> > > > -int
> > > > -rte_acl_classify_scalar(const struct rte_acl_ctx *ctx, const uint8_t **data,
> > > > -	uint32_t *results, uint32_t num, uint32_t categories);
> > >
> > >
> > > As I said above we'd better keep it.
> > >
> > Ok, can do.
> > 
> > > > +
> > > > +enum acl_classify_alg {
> > > > +	ACL_CLASSIFY_DEFAULT = 0,
> > > > +	ACL_CLASSIFY_SCALAR = 1,
> > > > +	ACL_CLASSIFY_SSE = 2,
> > > > +};
> > >
> > > As a nit: as this emum is part of public API, I think it is better to add rte_ prefix: enum rte_acl_classify_alg
> > >
> > Sure, done.
> > 
> > > > +
> > > > +extern inline int rte_acl_classify(const struct rte_acl_ctx *ctx,
> > > > +				   const uint8_t **data,
> > > > +				   uint32_t *results, uint32_t num,
> > > > +				   uint32_t categories);
> > >
> > > Again as a nit: here and everywhere can we keep same style through the whole DPDK - function name from the new line:
> > > extern nt
> > > rte_acl_classify(...);
> > >
> > Ok
> > 
> > I'll produce another version based on your feedback regarding the
> > per-context-calssifier method vs. just removing the generic classifier.
> > 
> > Regards
> > Neil
> 
> 


More information about the dev mailing list