[dpdk-dev] [PATCH v2 6/6] rte_sched: eliminate floating point in calculating byte clock

Stephen Hemminger stephen at networkplumber.org
Fri Mar 13 00:09:26 CET 2015


On Thu, 12 Mar 2015 19:06:39 +0000
"Dumitrescu, Cristian" <cristian.dumitrescu at intel.com> wrote:

> 
> 
> > -----Original Message-----
> > From: Stephen Hemminger [mailto:stephen at networkplumber.org]
> > Sent: Tuesday, March 10, 2015 4:14 PM
> > To: Dumitrescu, Cristian
> > Cc: dev at dpdk.org; Stephen Hemminger; Stephen Hemminger
> > Subject: [PATCH v2 6/6] rte_sched: eliminate floating point in calculating byte
> > clock
> > 
> > From: Stephen Hemminger <shemming at brocade.com>
> > 
> > The old code was doing a floating point divide for each rte_dequeue()
> > which is very expensive. Change to using fixed point scaled math instead.
> > This improved performance from 5Gbit/sec to 10 Gbit/sec
> > 
> > Signed-off-by: Stephen Hemminger <stephen at networkplumber.org>
> > ---
> > v2 -- no changes
> >       despite objections, the performance observation is real
> >       on Intel(R) Core(TM) i7-3770 CPU
> > 
> >  lib/librte_sched/rte_sched.c | 14 ++++++++++----
> >  1 file changed, 10 insertions(+), 4 deletions(-)
> > 
> > diff --git a/lib/librte_sched/rte_sched.c b/lib/librte_sched/rte_sched.c
> > index 74d0e0a..522a647 100644
> > --- a/lib/librte_sched/rte_sched.c
> > +++ b/lib/librte_sched/rte_sched.c
> > @@ -102,6 +102,9 @@
> > 
> >  #define RTE_SCHED_BMP_POS_INVALID             UINT32_MAX
> > 
> > +/* For cycles_per_byte calculation */
> > +#define RTE_SCHED_TIME_SHIFT		      20
> > +
> >  struct rte_sched_subport {
> >  	/* Token bucket (TB) */
> >  	uint64_t tb_time; /* time of last update */
> > @@ -239,7 +242,7 @@ struct rte_sched_port {
> >  	uint64_t time_cpu_cycles;     /* Current CPU time measured in CPU
> > cyles */
> >  	uint64_t time_cpu_bytes;      /* Current CPU time measured in bytes
> > */
> >  	uint64_t time;                /* Current NIC TX time measured in bytes */
> > -	double cycles_per_byte;       /* CPU cycles per byte */
> > +	uint32_t cycles_per_byte;       /* CPU cycles per byte (scaled) */
> > 
> >  	/* Scheduling loop detection */
> >  	uint32_t pipe_loop;
> > @@ -657,7 +660,9 @@ rte_sched_port_config(struct
> > rte_sched_port_params *params)
> >  	port->time_cpu_cycles = rte_get_tsc_cycles();
> >  	port->time_cpu_bytes = 0;
> >  	port->time = 0;
> > -	port->cycles_per_byte = ((double) rte_get_tsc_hz()) / ((double)
> > params->rate);
> > +
> > +	port->cycles_per_byte = (rte_get_tsc_hz() <<
> > RTE_SCHED_TIME_SHIFT)
> > +		/ params->rate;
> > 
> >  	/* Scheduling loop detection */
> >  	port->pipe_loop = RTE_SCHED_PIPE_INVALID;
> > @@ -2126,11 +2131,12 @@ rte_sched_port_time_resync(struct
> > rte_sched_port *port)
> >  {
> >  	uint64_t cycles = rte_get_tsc_cycles();
> >  	uint64_t cycles_diff = cycles - port->time_cpu_cycles;
> > -	double bytes_diff = ((double) cycles_diff) / port->cycles_per_byte;
> > +	uint64_t bytes_diff = (cycles_diff << RTE_SCHED_TIME_SHIFT)
> > +		/ port->cycles_per_byte;
> > 
> >  	/* Advance port time */
> >  	port->time_cpu_cycles = cycles;
> > -	port->time_cpu_bytes += (uint64_t) bytes_diff;
> > +	port->time_cpu_bytes += bytes_diff;
> >  	if (port->time < port->time_cpu_bytes) {
> >  		port->time = port->time_cpu_bytes;
> >  	}
> > --
> > 2.1.4
> 
> Stephen,
> 
> We agreed during the previous round to look at 64-bit multiplication option, but looks like this patch is identical to the previous one. Did you meet any issues in implementing this approach? As stated before, I do not think this is the best solution for the reasons listed previously, and this part of the code is too sensitive to take the risk.
> 
> Since Thomas indicated these patches will be considered for 2.1 rather than 2.0 release, it looks like we have some time to refine this work. I would reiterate the same proposal that Thomas made: re-submit the patches where we have consensus, and keep this one out for the moment; you and me can sync up offline and come back with an implementation proposal that would hopefully address the previous concerns for 2.1 release. Would this work for you?
> 
> Thank you for your work and for your understanding!
> 
> Regards,
> Cristian
> 

I haven't had time to do the analysis to figure out how to keep the maths
in range so that 64 bit multiplicative divide would work. When I first tried
it the issue was that the cycles_diff and bytes_diff would both have to
scaled so that they are 32 bit values.

Should be possible, might get back to it (or someone else might take up
the cause).


More information about the dev mailing list