[dpdk-dev] [PATCH 2/3] rte_sched: introduce reciprocal divide
    Hannes Frederic Sowa 
    hannes at stressinduktion.org
       
    Wed Dec  2 17:57:03 CET 2015
    
    
  
Hello,
On Wed, Dec 2, 2015, at 17:45, Dumitrescu, Cristian wrote:
> > diff --git a/lib/librte_sched/rte_reciprocal.h
> > b/lib/librte_sched/rte_reciprocal.h
> > new file mode 100644
> > index 0000000..abd1525
> > --- /dev/null
> > +++ b/lib/librte_sched/rte_reciprocal.h
> > @@ -0,0 +1,39 @@
> > +/*
> > + * Reciprocal divide
> > + *
> > + * Used with permission from original authors
> > + *  Hannes Frederic Sowa and Daniel Borkmann
> > + *
> > + * This algorithm is based on the paper "Division by Invariant
> > + * Integers Using Multiplication" by Torbjörn Granlund and Peter
> > + * L. Montgomery.
> 
> Stephen, can you please provide a link to this paper?
<https://gmplib.org/~tege/divcnst-pldi94.pdf>
> > + *
> > + * The assembler implementation from Agner Fog, which this code is
> > + * based on, can be found here:
> > + * http://www.agner.org/optimize/asmlib.zip
> > + *
> > + * This optimization for A/B is helpful if the divisor B is mostly
> > + * runtime invariant. The reciprocal of B is calculated in the
> > + * slow-path with reciprocal_value(). The fast-path can then just use
> > + * a much faster multiplication operation with a variable dividend A
> > + * to calculate the division A/B.
> > + */
> > +
> > +#ifndef _RTE_RECIPROCAL_H_
> > +#define _RTE_RECIPROCAL_H_
> > +
> > +struct rte_reciprocal {
> > +	uint32_t m;
> > +	uint8_t sh1, sh2;
> > +};
> 
> The size of this structure is not a multiple of 32 bits. You seem to
> transfer this structure by value rather than by reference (the function
> rte_reciprocal_value() below returns an instance of this structure), I
> don't feel comfortable with the last 16 bits of the structure being left
> uninitialized, we should probably add some explicit pad field and
> initialize this structure explicitly to zero at init time?
Note, it is used by static inline functions in fast path which most
probably expands the code in question, thus no real argument passing
happens (at least this is what I saw in the linux kernel assembly). I
don't think you need to worry about padding. It happens very often
without noticing. ;)
> > +
> > +static inline uint32_t rte_reciprocal_divide(uint32_t a, struct rte_reciprocal
> > R)
> > +{
> > +	uint32_t t = (uint32_t)(((uint64_t)a * R.m) >> 32);
> > +
> > +	return (t + ((a - t) >> R.sh1)) >> R.sh2;
> > +}
> > +
> > +struct rte_reciprocal rte_reciprocal_value(uint32_t d);
> 
> Why 32-bit arithmetic? We had a lot of bugs in librte_sched library due
> to 32-bit arithmetic that were particularly difficult to track. Can we
> have this function rte_reciprocal_divide() return a 64-bit integer and
> replace any 32-bit arithmetic/conversion with 64-bit operations?
There was no use case at this time and I am actually not sure how easy
the move to 64 bit is, as it would require one multiplication operation
in an integer domain twice as large.
> > +
> > +#endif /* _RTE_RECIPROCAL_H_ */
> > --
> > 2.1.4
> 
> As previously discussed, a simpler/faster alternative to floating point
> division is 64-bit multiplication followed by right shift, any particular
> reason why this approach was not considered?
This is exact division. It depends on what you want. I am not sure if
you want to do array accesses with floating point division or simple 64
bit multiplication and shifting.
Bye,
Hannes
    
    
More information about the dev
mailing list