[dpdk-dev] [EXT] [PATCH v3 4/5] spinlock: use wfe to reduce contention on aarch64
Jerin Jacob Kollanukkaran
jerinj at marvell.com
Wed Jul 24 14:17:05 CEST 2019
> -----Original Message-----
> From: Gavin Hu <gavin.hu at arm.com>
> Sent: Tuesday, July 23, 2019 9:14 PM
> To: dev at dpdk.org
> Cc: nd at arm.com; thomas at monjalon.net; stephen at networkplumber.org;
> Jerin Jacob Kollanukkaran <jerinj at marvell.com>; Pavan Nikhilesh
> Bhagavatula <pbhagavatula at marvell.com>;
> Honnappa.Nagarahalli at arm.com; gavin.hu at arm.com
> Subject: [EXT] [PATCH v3 4/5] spinlock: use wfe to reduce contention on
> aarch64
>
> In acquiring a spinlock, cores repeatedly poll the lock variable.
> This is replaced by rte_wait_until_equal API.
>
> 5~10% performance gain was measured by running spinlock_autotest on
> 14 isolated cores of ThunderX2.
>
> Signed-off-by: Gavin Hu <gavin.hu at arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang at arm.com>
> Reviewed-by: Phil Yang <phil.yang at arm.com>
> Reviewed-by: Steve Capper <steve.capper at arm.com>
> Reviewed-by: Ola Liljedahl <ola.liljedahl at arm.com>
> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli at arm.com>
> Tested-by: Pavan Nikhilesh <pbhagavatula at marvell.com>
> ---
> .../common/include/arch/arm/rte_spinlock.h | 25
> ++++++++++++++++++++++
> .../common/include/generic/rte_spinlock.h | 2 +-
> 2 files changed, 26 insertions(+), 1 deletion(-)
>
> diff --git a/lib/librte_eal/common/include/arch/arm/rte_spinlock.h
> b/lib/librte_eal/common/include/arch/arm/rte_spinlock.h
> index 1a6916b..f25d17f 100644
> --- a/lib/librte_eal/common/include/arch/arm/rte_spinlock.h
> +++ b/lib/librte_eal/common/include/arch/arm/rte_spinlock.h
> @@ -16,6 +16,31 @@ extern "C" {
> #include <rte_common.h>
> #include "generic/rte_spinlock.h"
>
> +/* armv7a does support WFE, but an explicit wake-up signal using SEV is
> + * required (must be preceded by DSB to drain the store buffer) and
> + * this is less performant, so keep armv7a implementation unchanged.
> + */
> +#if defined(RTE_USE_WFE) && defined(RTE_ARCH_ARM64) static inline
See below. Please avoid complicated conditional compilation logic for scalability and readability.
> void
> +rte_spinlock_lock(rte_spinlock_t *sl) {
> + unsigned int tmp;
> + /* http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.
> + * faqs/ka16809.html
> + */
> + asm volatile(
> + "sevl\n"
> + "1: wfe\n"
> + "2: ldaxr %w[tmp], %w[locked]\n"
> + "cbnz %w[tmp], 1b\n"
> + "stxr %w[tmp], %w[one], %w[locked]\n"
> + "cbnz %w[tmp], 2b\n"
> + : [tmp] "=&r" (tmp), [locked] "+Q"(sl->locked)
> + : [one] "r" (1)
> + : "cc", "memory");
> +}
> +#endif
> +
> static inline int rte_tm_supported(void) {
> return 0;
> diff --git a/lib/librte_eal/common/include/generic/rte_spinlock.h
> b/lib/librte_eal/common/include/generic/rte_spinlock.h
> index 87ae7a4..cf4f15b 100644
> --- a/lib/librte_eal/common/include/generic/rte_spinlock.h
> +++ b/lib/librte_eal/common/include/generic/rte_spinlock.h
> @@ -57,7 +57,7 @@ rte_spinlock_init(rte_spinlock_t *sl) static inline void
> rte_spinlock_lock(rte_spinlock_t *sl);
>
> -#ifdef RTE_FORCE_INTRINSICS
> +#if defined(RTE_FORCE_INTRINSICS) && !defined(RTE_USE_WFE)
I would like to avoid hacking around adjusting generic code to meet specific requirement.
For example, if someone enables RTE_USE_WFE for armv7 it will break
And it will pain for the new architecture to use RTE_FORCE_INTRINSICS.
Looks like the time has come to disable RTE_FORCE_INTRINSICS for arm64.
Since this patch is targeted for next release. How about enable native
Implementation for RTE_FORCE_INTRINSICS used code for arm64 like spinlock, ticketlock like x86.
If you guys don't have the bandwidth to convert all blocks, let us know, we can collaborate
and Marvell can take up some RTE_FORCE_INTRINSICS conversion for next release.
> static inline void
> rte_spinlock_lock(rte_spinlock_t *sl)
> {
> --
> 2.7.4
More information about the dev
mailing list