[dpdk-dev] [PATCH 2/3] rte_sched: introduce reciprocal divide

Dumitrescu, Cristian cristian.dumitrescu at intel.com
Wed Dec 2 17:45:01 CET 2015



> -----Original Message-----
> From: Stephen Hemminger [mailto:stephen at networkplumber.org]
> Sent: Sunday, November 29, 2015 8:47 PM
> To: Dumitrescu, Cristian <cristian.dumitrescu at intel.com>
> Cc: dev at dpdk.org; Stephen Hemminger <stephen at networkplumber.org>;
> Hannes Frederic Sowa <hannes at stressinduktion.org>
> Subject: [PATCH 2/3] rte_sched: introduce reciprocal divide
> 
> This adds (with permission of the original author)
> reciprocal divide based on algorithm in Linux.
> 
> Signed-off-by: Stephen Hemminger <stephen at networkplumber.org>
> Signed-off-by: Hannes Frederic Sowa <hannes at stressinduktion.org>
> ---
>  lib/librte_sched/Makefile         |  6 ++--
>  lib/librte_sched/rte_reciprocal.c | 72
> +++++++++++++++++++++++++++++++++++++++
>  lib/librte_sched/rte_reciprocal.h | 39 +++++++++++++++++++++
>  3 files changed, 115 insertions(+), 2 deletions(-)
>  create mode 100644 lib/librte_sched/rte_reciprocal.c
>  create mode 100644 lib/librte_sched/rte_reciprocal.h
> 
> diff --git a/lib/librte_sched/Makefile b/lib/librte_sched/Makefile
> index b1cb285..e0a2c6d 100644
> --- a/lib/librte_sched/Makefile
> +++ b/lib/librte_sched/Makefile
> @@ -48,10 +48,12 @@ LIBABIVER := 1
>  #
>  # all source are stored in SRCS-y
>  #
> -SRCS-$(CONFIG_RTE_LIBRTE_SCHED) += rte_sched.c rte_red.c rte_approx.c
> +SRCS-$(CONFIG_RTE_LIBRTE_SCHED) += rte_sched.c rte_red.c
> rte_approx.c \
> +	rte_reciprocal.c
> 
>  # install includes
> -SYMLINK-$(CONFIG_RTE_LIBRTE_SCHED)-include := rte_sched.h
> rte_bitmap.h rte_sched_common.h rte_red.h rte_approx.h
> +SYMLINK-$(CONFIG_RTE_LIBRTE_SCHED)-include := rte_sched.h
> rte_bitmap.h \
> +	rte_sched_common.h rte_red.h rte_approx.h rte_reciprocal.h
> 
>  # this lib depends upon:
>  DEPDIRS-$(CONFIG_RTE_LIBRTE_SCHED) += lib/librte_mempool
> lib/librte_mbuf
> diff --git a/lib/librte_sched/rte_reciprocal.c
> b/lib/librte_sched/rte_reciprocal.c
> new file mode 100644
> index 0000000..652f023
> --- /dev/null
> +++ b/lib/librte_sched/rte_reciprocal.c
> @@ -0,0 +1,72 @@
> +/*-
> + *   BSD LICENSE
> + *
> + *   Copyright(c) Hannes Frederic Sowa
> + *   All rights reserved.
> + *
> + *   Redistribution and use in source and binary forms, with or without
> + *   modification, are permitted provided that the following conditions
> + *   are met:
> + *
> + *     * Redistributions of source code must retain the above copyright
> + *       notice, this list of conditions and the following disclaimer.
> + *     * Redistributions in binary form must reproduce the above copyright
> + *       notice, this list of conditions and the following disclaimer in
> + *       the documentation and/or other materials provided with the
> + *       distribution.
> + *     * Neither the name of Intel Corporation nor the names of its

Why is Intel mentioned here, as according to this license header Intel is not the copyright holder?

> + *       contributors may be used to endorse or promote products derived
> + *       from this software without specific prior written permission.
> + *
> + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
> CONTRIBUTORS
> + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT
> NOT
> + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND
> FITNESS FOR
> + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
> COPYRIGHT
> + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
> INCIDENTAL,
> + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
> NOT
> + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS
> OF USE,
> + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED
> AND ON ANY
> + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR
> TORT
> + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF
> THE USE
> + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH
> DAMAGE.
> + */
> +
> +#include <stdio.h>
> +#include <stdint.h>
> +
> +#include <rte_common.h>
> +
> +#include "rte_reciprocal.h"
> +
> +/* find largest set bit.
> + * portable and slow but does not matter for this usage.
> + */
> +static inline int fls(uint32_t x)
> +{
> +	int b;
> +
> +	for (b = 31; b >= 0; --b) {
> +		if (x & (1u << b))
> +			return b + 1;
> +	}
> +
> +	return 0;
> +}
> +
> +struct rte_reciprocal rte_reciprocal_value(uint32_t d)
> +{
> +	struct rte_reciprocal R;
> +	uint64_t m;
> +	int l;
> +
> +	l = fls(d - 1);
> +	m = ((1ULL << 32) * ((1ULL << l) - d));
> +	m /= d;
> +
> +	++m;
> +	R.m = m;
> +	R.sh1 = RTE_MIN(l, 1);
> +	R.sh2 = RTE_MAX(l - 1, 0);
> +
> +	return R;
> +}
> diff --git a/lib/librte_sched/rte_reciprocal.h
> b/lib/librte_sched/rte_reciprocal.h
> new file mode 100644
> index 0000000..abd1525
> --- /dev/null
> +++ b/lib/librte_sched/rte_reciprocal.h
> @@ -0,0 +1,39 @@
> +/*
> + * Reciprocal divide
> + *
> + * Used with permission from original authors
> + *  Hannes Frederic Sowa and Daniel Borkmann
> + *
> + * This algorithm is based on the paper "Division by Invariant
> + * Integers Using Multiplication" by Torbjörn Granlund and Peter
> + * L. Montgomery.

Stephen, can you please provide a link to this paper?

> + *
> + * The assembler implementation from Agner Fog, which this code is
> + * based on, can be found here:
> + * http://www.agner.org/optimize/asmlib.zip
> + *
> + * This optimization for A/B is helpful if the divisor B is mostly
> + * runtime invariant. The reciprocal of B is calculated in the
> + * slow-path with reciprocal_value(). The fast-path can then just use
> + * a much faster multiplication operation with a variable dividend A
> + * to calculate the division A/B.
> + */
> +
> +#ifndef _RTE_RECIPROCAL_H_
> +#define _RTE_RECIPROCAL_H_
> +
> +struct rte_reciprocal {
> +	uint32_t m;
> +	uint8_t sh1, sh2;
> +};

The size of this structure is not a multiple of 32 bits. You seem to transfer this structure by value rather than by reference (the function rte_reciprocal_value() below returns an instance of this structure), I don't feel comfortable with the last 16 bits of the structure being left uninitialized, we should probably add some explicit pad field and initialize this structure explicitly to zero at init time?

> +
> +static inline uint32_t rte_reciprocal_divide(uint32_t a, struct rte_reciprocal
> R)
> +{
> +	uint32_t t = (uint32_t)(((uint64_t)a * R.m) >> 32);
> +
> +	return (t + ((a - t) >> R.sh1)) >> R.sh2;
> +}
> +
> +struct rte_reciprocal rte_reciprocal_value(uint32_t d);

Why 32-bit arithmetic? We had a lot of bugs in librte_sched library due to 32-bit arithmetic that were particularly difficult to track. Can we have this function rte_reciprocal_divide() return a 64-bit integer and replace any 32-bit arithmetic/conversion with 64-bit operations?

> +
> +#endif /* _RTE_RECIPROCAL_H_ */
> --
> 2.1.4

As previously discussed, a simpler/faster alternative to floating point division is 64-bit multiplication followed by right shift, any particular reason why this approach was not considered?



More information about the dev mailing list