[dpdk-dev] [PATCH 3/3] rte_sched: eliminate floating point in calculating byte clock
Dumitrescu, Cristian
cristian.dumitrescu at intel.com
Wed Dec 2 17:48:17 CET 2015
> -----Original Message-----
> From: Stephen Hemminger [mailto:stephen at networkplumber.org]
> Sent: Sunday, November 29, 2015 8:47 PM
> To: Dumitrescu, Cristian <cristian.dumitrescu at intel.com>
> Cc: dev at dpdk.org; Stephen Hemminger <stephen at networkplumber.org>
> Subject: [PATCH 3/3] rte_sched: eliminate floating point in calculating byte
> clock
>
> The old code was doing a floating point divide for each rte_dequeue()
> which is very expensive. Change to using fixed point scaled inverse
> multiply. To maintain equivalent precision, scaled math is used.
> The application ABI is the same.
>
> This improved performance from 5Gbit/sec to 10 Gbit/sec when configured
> for 10 Gbit/sec rate.
>
> There was some feedback from Cristian that he wanted a better
> solution and was going to give one, but none was provided.
> For 2.2 this is a better solution than existing code, if someone
> has a better version I would love to see it.
>
> Signed-off-by: Stephen Hemminger <stephen at networkplumber.org>
> ---
> lib/librte_sched/rte_sched.c | 23 ++++++++++++++++++-----
> 1 file changed, 18 insertions(+), 5 deletions(-)
>
> diff --git a/lib/librte_sched/rte_sched.c b/lib/librte_sched/rte_sched.c
> index 16acd6b..cfae136 100644
> --- a/lib/librte_sched/rte_sched.c
> +++ b/lib/librte_sched/rte_sched.c
> @@ -47,6 +47,7 @@
> #include "rte_bitmap.h"
> #include "rte_sched_common.h"
> #include "rte_approx.h"
> +#include "rte_reciprocal.h"
>
> #ifdef __INTEL_COMPILER
> #pragma warning(disable:2259) /* conversion may lose significant bits */
> @@ -62,6 +63,11 @@
> #define RTE_SCHED_PIPE_INVALID UINT32_MAX
> #define RTE_SCHED_BMP_POS_INVALID UINT32_MAX
>
> +/* Scaling for cycles_per_byte calculation
> + * Chosen so that minimum rate is 480 bit/sec
> + */
> +#define RTE_SCHED_TIME_SHIFT 8
Stephen, can you please elaborate why we need to shift the dividend at all and why the shift value was picked as 8? Is 8 a hard constraint? How does this affect the scheduling precision/accuracy?
> +
> struct rte_sched_subport {
> /* Token bucket (TB) */
> uint64_t tb_time; /* time of last update */
> @@ -215,7 +221,7 @@ struct rte_sched_port {
> uint64_t time_cpu_cycles; /* Current CPU time measured in CPU
> cyles */
> uint64_t time_cpu_bytes; /* Current CPU time measured in bytes
> */
> uint64_t time; /* Current NIC TX time measured in bytes */
> - double cycles_per_byte; /* CPU cycles per byte */
> + struct rte_reciprocal inv_cycles_per_byte; /* CPU cycles per byte */
>
> /* Scheduling loop detection */
> uint32_t pipe_loop;
> @@ -610,7 +616,7 @@ struct rte_sched_port *
> rte_sched_port_config(struct rte_sched_port_params *params)
> {
> struct rte_sched_port *port = NULL;
> - uint32_t mem_size, bmp_mem_size, n_queues_per_port, i;
> + uint32_t mem_size, bmp_mem_size, n_queues_per_port, i,
> cycles_per_byte;
>
> /* Check user parameters. Determine the amount of memory to
> allocate */
> mem_size = rte_sched_port_get_memory_footprint(params);
> @@ -661,7 +667,10 @@ rte_sched_port_config(struct
> rte_sched_port_params *params)
> port->time_cpu_cycles = rte_get_tsc_cycles();
> port->time_cpu_bytes = 0;
> port->time = 0;
> - port->cycles_per_byte = ((double) rte_get_tsc_hz()) / ((double)
> params->rate);
> +
> + cycles_per_byte = (rte_get_tsc_hz() << RTE_SCHED_TIME_SHIFT)
> + / params->rate;
> + port->inv_cycles_per_byte = rte_reciprocal_value(cycles_per_byte);
>
> /* Scheduling loop detection */
> port->pipe_loop = RTE_SCHED_PIPE_INVALID;
> @@ -2088,11 +2097,15 @@ rte_sched_port_time_resync(struct
> rte_sched_port *port)
> {
> uint64_t cycles = rte_get_tsc_cycles();
> uint64_t cycles_diff = cycles - port->time_cpu_cycles;
> - double bytes_diff = ((double) cycles_diff) / port->cycles_per_byte;
> + uint64_t bytes_diff;
> +
> + /* Compute elapsed time in bytes */
> + bytes_diff = rte_reciprocal_divide(cycles_diff <<
> RTE_SCHED_TIME_SHIFT,
> + port->inv_cycles_per_byte);
>
> /* Advance port time */
> port->time_cpu_cycles = cycles;
> - port->time_cpu_bytes += (uint64_t) bytes_diff;
> + port->time_cpu_bytes += bytes_diff;
> if (port->time < port->time_cpu_bytes)
> port->time = port->time_cpu_bytes;
>
> --
> 2.1.4
Can you provide some insight into how you tested this code and the test vectors you used?
More information about the dev
mailing list