[dpdk-dev] [PATCH v8 1/3] doc: add optimizations using C11 atomic built-ins

David Marchand david.marchand at redhat.com
Thu Jul 16 12:35:03 CEST 2020

Previous message: [dpdk-dev] [PATCH v8 1/3] doc: add optimizations using C11 atomic built-ins
Next message: [dpdk-dev] [PATCH v8 1/3] doc: add optimizations using C11 atomic built-ins
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hello,

On Thu, Jul 16, 2020 at 6:58 AM Phil Yang <phil.yang at arm.com> wrote:
>
> Add information about possible optimizations using C11 atomic built-ins.

We are missing a review on this doc update.

Thanks.


-- 
David Marchand

>
> Signed-off-by: Phil Yang <phil.yang at arm.com>
> Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli at arm.com>
> ---
>  doc/guides/prog_guide/writing_efficient_code.rst | 59 +++++++++++++++++++++++-
>  1 file changed, 58 insertions(+), 1 deletion(-)
>
> diff --git a/doc/guides/prog_guide/writing_efficient_code.rst b/doc/guides/prog_guide/writing_efficient_code.rst
> index 849f63e..53a1ca1 100644
> --- a/doc/guides/prog_guide/writing_efficient_code.rst
> +++ b/doc/guides/prog_guide/writing_efficient_code.rst
> @@ -167,7 +167,13 @@ but with the added cost of lower throughput.
>  Locks and Atomic Operations
>  ---------------------------
>
> -Atomic operations imply a lock prefix before the instruction,
> +This section describes some key considerations when using locks and atomic
> +operations in the DPDK environment.
> +
> +Locks
> +~~~~~
> +
> +On x86, atomic operations imply a lock prefix before the instruction,
>  causing the processor's LOCK# signal to be asserted during execution of the following instruction.
>  This has a big impact on performance in a multicore environment.
>
> @@ -176,6 +182,57 @@ It can often be replaced by other solutions like per-lcore variables.
>  Also, some locking techniques are more efficient than others.
>  For instance, the Read-Copy-Update (RCU) algorithm can frequently replace simple rwlocks.
>
> +Atomic Operations: Use C11 Atomic Built-ins
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +DPDK generic rte_atomic operations are implemented by __sync built-ins. These
> +__sync built-ins result in full barriers on aarch64, which are unnecessary
> +in many use cases. They can be replaced by __atomic built-ins that conform to
> +the C11 memory model and provide finer memory order control.
> +
> +So replacing the rte_atomic operations with __atomic built-ins might improve
> +performance for aarch64 machines.
> +
> +Some typical optimization cases are listed below:
> +
> +Atomicity
> +^^^^^^^^^
> +
> +Some use cases require atomicity alone, the ordering of the memory operations
> +does not matter. For example, the packet statistics counters need to be
> +incremented atomically but do not need any particular memory ordering.
> +So, RELAXED memory ordering is sufficient.
> +
> +One-way Barrier
> +^^^^^^^^^^^^^^^
> +
> +Some use cases allow for memory reordering in one way while requiring memory
> +ordering in the other direction.
> +
> +For example, the memory operations before the spinlock lock are allowed to
> +move to the critical section, but the memory operations in the critical section
> +are not allowed to move above the lock. In this case, the full memory barrier
> +in the compare-and-swap operation can be replaced with ACQUIRE memory order.
> +On the other hand, the memory operations after the spinlock unlock are allowed
> +to move to the critical section, but the memory operations in the critical
> +section are not allowed to move below the unlock. So the full barrier in the
> +store operation can use RELEASE memory order.
> +
> +Reader-Writer Concurrency
> +^^^^^^^^^^^^^^^^^^^^^^^^^
> +
> +Lock-free reader-writer concurrency is one of the common use cases in DPDK.
> +
> +The payload or the data that the writer wants to communicate to the reader,
> +can be written with RELAXED memory order. However, the guard variable should
> +be written with RELEASE memory order. This ensures that the store to guard
> +variable is observable only after the store to payload is observable.
> +
> +Correspondingly, on the reader side, the guard variable should be read
> +with ACQUIRE memory order. The payload or the data the writer communicated,
> +can be read with RELAXED memory order. This ensures that, if the store to
> +guard variable is observable, the store to payload is also observable.
> +
>  Coding Considerations
>  ---------------------
>
> --
> 2.7.4
>

Previous message: [dpdk-dev] [PATCH v8 1/3] doc: add optimizations using C11 atomic built-ins
Next message: [dpdk-dev] [PATCH v8 1/3] doc: add optimizations using C11 atomic built-ins
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the dev mailing list