[dpdk-dev] [PATCH] spinlock: fix atomic and out of order execution

François-Frédéric Ozog ff at ozog.com
Fri Jan 10 19:02:18 CET 2014


Hi Thomas,

I am afraid I introduced unnecessary complexity in the discussion as the
spinlock issues I mentioned are connected to a work in progress on my side
(implement a Chelsio cxgb5 PMD) but *not* to the general DPDK. 

I'll explain some aspects of the context and how critical sections has to be
handled in the case of the Chelsio cxgb5 PMD.

1) WC memory
Most memory is cached, some is not cached at all, and some is tagged (via
paging attributes) to be "Write Combining". This means that writes to WC
memory locations do *NOT* go through the cache but rather to a memory "write
buffer" internal to the processor. The processor delays write to memory as
it please, trying to minimize the actual number of writes to DRAM. This type
of memory is used by some devices as a fast interface, for instance Chelsio
cxgb5 write queue can filled using this method
(http://lwn.net/Articles/542643/). Public DPDK cannot declare such memory
type as it requires kernel support.

2) fencing and out of order execution
Out of order execution is related to the processor that may actually change
the order of the instructions of a program as it has been produced by the
compiler. In general, this relates only to performance but for critical
sections, it is absolutely necessary to ensure that the processor does NOT
postpone critical section instructions AFTER the unlock.
So there is a need of a "fence" to tell the processor that some instruction
blocks out of order execution: no previous instruction can be postponed
after the "fence".

3) Public DPDK critical sections

The DPDK implements lock/unlock with xchg instruction which is atomic and
serializing for "normal" cached memory. It includes an implicit lock prefix,
so no need to add such prefix (your patch proposal). This (implicit) lock
prefix is actually acting as an out of order execution fence, thus ensuring
that critical section is working as expected. Nothing has to be added or
changed for public DPDK to have correct critical sections.

4) "Extended" DPDK critical sections

Because of Chelsio cxgb5 PMD driver I am developing, I added kernel support
to DPDK to declare some memory regions as Write Combining, hence to be able
to leverage the output queue fast path.
Because of this, I also had to add proper fencing for critical sections.
The relation is the fact that writes to WC memory does NOT go through the
cache AND that the lock prefix is implemented as a cache protocol
transaction. In other words, a lock prefix is NOT an out of order
instruction fence for instructions that deal with WC memory. So, the
processor may decide to postpone the WC related instruction ATFER the
unlock.
To create a proper out of order execution fence, the processor has a set of
explicit memory fences that do the job.
So before a rte_unlock, I have to add a rte_mb().


So:
- for people willing to develop applications on top of DPDK, my comments can
be disregarded, they are not relevant, there is no need to understand them. 
- for people willing to develop PMD drivers, the fencing comments should be
clear to use the proper fencing.


I am sorry for any confusion I may have introduced in the community.


François-Frédéric


> -----Message d'origine-----
> De : dev [mailto:dev-bounces at dpdk.org] De la part de Thomas Monjalon
> Envoyé : samedi 21 décembre 2013 00:38
> À : dev at dpdk.org
> Objet : [dpdk-dev] [PATCH] spinlock: fix atomic and out of order execution
> 
> From: Damien Millescamps <damien.millescamps at 6wind.com>
> 
> Add lock prefix before xchg instructions in order to be atomic and flush
> speculative values to ensure effective execution order (as an acquire
> barrier).
> 
> MPLOCKED is a "lock" in multicore case.
> 
> Signed-off-by: Damien Millescamps <damien.millescamps at 6wind.com>
> Signed-off-by: Thomas Monjalon <thomas.monjalon at 6wind.com>
> ---
>  lib/librte_eal/common/include/rte_spinlock.h |    7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/lib/librte_eal/common/include/rte_spinlock.h
> b/lib/librte_eal/common/include/rte_spinlock.h
> index f7a245a..8edb971 100644
> --- a/lib/librte_eal/common/include/rte_spinlock.h
> +++ b/lib/librte_eal/common/include/rte_spinlock.h
> @@ -51,6 +51,7 @@
>  extern "C" {
>  #endif
> 
> +#include <rte_atomic.h>
>  #include <rte_lcore.h>
>  #ifdef RTE_FORCE_INTRINSICS
>  #include <rte_common.h>
> @@ -93,7 +94,7 @@ rte_spinlock_lock(rte_spinlock_t *sl)
>  	int lock_val = 1;
>  	asm volatile (
>  			"1:\n"
> -			"xchg %[locked], %[lv]\n"
> +			MPLOCKED "xchg %[locked], %[lv]\n"
>  			"test %[lv], %[lv]\n"
>  			"jz 3f\n"
>  			"2:\n"
> @@ -124,7 +125,7 @@ rte_spinlock_unlock (rte_spinlock_t *sl)  #ifndef
> RTE_FORCE_INTRINSICS
>  	int unlock_val = 0;
>  	asm volatile (
> -			"xchg %[locked], %[ulv]\n"
> +			MPLOCKED "xchg %[locked], %[ulv]\n"
>  			: [locked] "=m" (sl->locked), [ulv] "=q"
(unlock_val)
>  			: "[ulv]" (unlock_val)
>  			: "memory");
> @@ -148,7 +149,7 @@ rte_spinlock_trylock (rte_spinlock_t *sl)
>  	int lockval = 1;
> 
>  	asm volatile (
> -			"xchg %[locked], %[lockval]"
> +			MPLOCKED "xchg %[locked], %[lockval]"
>  			: [locked] "=m" (sl->locked), [lockval] "=q"
(lockval)
>  			: "[lockval]" (lockval)
>  			: "memory");
> --
> 1.7.10.4



More information about the dev mailing list