[dpdk-dev] [PATCH v7 1/3] doc: add generic atomic deprecation section

Phil Yang Phil.Yang at arm.com
Wed Jul 15 04:58:54 CEST 2020
Previous message: [dpdk-dev] [PATCH v7 1/3] doc: add generic atomic deprecation section
Next message: [dpdk-dev] [PATCH v7 2/3] devtools: prevent use of rte atomic APIs in future patches
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hi Honnappa,

Thanks for your comments and suggestions.
I Will update it in the next version.


Hi John,

I would greatly appreciate it if you kindly give me some feedback on this proposal.

Thanks,
Phil

> Hi Phil,
> 	Few comments.
> 
> <snip>
> 
> > Subject: [PATCH v7 1/3] doc: add generic atomic deprecation section
> I think we should change this as follows:
> "doc: add optimizations using C11 atomic built-ins"
> 
> >
> > Add deprecating the generic rte_atomic_xx APIs to C11 atomic built-ins
> guide
> > and examples.
> I think same here, something like following would help:
> "Add information about possible optimizations using C11 atomic built-ins"
> 
> >
> > Signed-off-by: Phil Yang <phil.yang at arm.com>
> > Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli at arm.com>
> > ---
> >  doc/guides/prog_guide/writing_efficient_code.rst | 64
> > +++++++++++++++++++++++-
> >  1 file changed, 63 insertions(+), 1 deletion(-)
> >
> > diff --git a/doc/guides/prog_guide/writing_efficient_code.rst
> > b/doc/guides/prog_guide/writing_efficient_code.rst
> > index 849f63e..16d6188 100644
> > --- a/doc/guides/prog_guide/writing_efficient_code.rst
> > +++ b/doc/guides/prog_guide/writing_efficient_code.rst
> > @@ -167,7 +167,13 @@ but with the added cost of lower throughput.
> >  Locks and Atomic Operations
> >  ---------------------------
> >
> > -Atomic operations imply a lock prefix before the instruction,
> > +This section describes some key considerations when using locks and
> > +atomic operations in the DPDK environment.
> > +
> > +Locks
> > +~~~~~
> > +
> > +On x86, atomic operations imply a lock prefix before the instruction,
> >  causing the processor's LOCK# signal to be asserted during execution of
> the
> > following instruction.
> >  This has a big impact on performance in a multicore environment.
> >
> > @@ -176,6 +182,62 @@ It can often be replaced by other solutions like per-
> > lcore variables.
> >  Also, some locking techniques are more efficient than others.
> >  For instance, the Read-Copy-Update (RCU) algorithm can frequently
> replace
> > simple rwlocks.
> >
> > +Atomic Operations: Use C11 Atomic Built-ins
> > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > +
> > +DPDK generic rte_atomic operations are implemented by `__sync built-ins
> > +<https://gcc.gnu.org/onlinedocs/gcc/_005f_005fsync-Builtins.html>`_.
> We can skip the hyper-link. Google search on "__sync built-in functions"
> leads to this page easily.
> 
> > +These __sync built-ins result in full barriers on aarch64, which are
> > +unnecessary in many use cases. They can be replaced by `__atomic
> > +built-ins
> 
> > +<https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html>`_
> Same here, we can skip the hyper-link.
> 
> > +that conform to the C11 memory model and provide finer memory order
> > control.
> > +
> > +So replacing the rte_atomic operations with __atomic built-ins might
> > +improve performance for aarch64 machines.
> > +
> > +Some typical optimization cases are listed below:
> > +
> > +Atomicity
> > +^^^^^^^^^
> > +
> > +Some use cases require atomicity alone, the ordering of the memory
> > +operations does not matter. For example the packets statistics in
> > +``virtio_xmit()`` function of ``vhost`` example application. It just
> > +updates the number of transmitted packets, no subsequent logic
> depends
> Instead of referring to the code here, it is better to keep it generic. May be
> something like below?
> "For example, the packet statistics counters need to be incremented
> atomically but do not need any particular memory ordering. So, RELAXED
> memory ordering is sufficient".
> 
> > +on these counters. So the RELAXED memory ordering is sufficient.
> > +
> > +One-way Barrier
> > +^^^^^^^^^^^^^^^
> > +
> > +Some use cases allow for memory reordering in one way while requiring
> > +memory ordering in the other direction.
> > +
> Spinlock is a good example, making is little more generic would be helpful.
> 
> > +For example, the memory operations before the ``rte_spinlock_lock()``
> Instead of referring to "rte_spinlock_lock" can you just say spinlock lock
> > +can move to the critical section, but the memory operations in the
>      ^^^ are allowed to
> > +critical section cannot move above the lock. In this case, the full
>                                 ^^^^^^ are not allowed to
> > +memory barrier in the compare-and-swap operation can be replaced to
>                                                                                                                                  ^^ with
> > +ACQUIRE. On the other hand, the memory operations after the
>      ^^^^^^^^ ACQUIRE memory order.
> > +``rte_spinlock_unlock()`` can move to the critical section, but the
>                                                   ^^^ are allowed to
> > +memory operations in the critical section cannot move below the unlock.
>                                                                                 ^^^^^^ are not allowed to
> > +So the full barrier in the STORE operation can be replaced with RELEASE.
> I think this above statement can be replaced with something like below:
> 
> "So the full barrier in the store operation can use RELEASE memory order."
> 
> > +
> > +Reader-Writer Concurrency
> > +^^^^^^^^^^^^^^^^^^^^^^^^^
> > +
> > +Lock-free reader-writer concurrency is one of the common use cases in
> DPDK.
> > +
> > +The payload or the data that the writer wants to communicate to the
> > +reader, can be written with RELAXED memory order. However, the guard
> > +variable should be written with RELEASE memory order. This ensures that
> > +the store to guard variable is observable only after the store to payload is
> > observable.
> > +Refer to ``rte_hash_cuckoo_insert_mw()`` for an example.
> We can remove just this line above.
> 
> > +
> > +Correspondingly, on the reader side, the guard variable should be read
> > +with ACQUIRE memory order. The payload or the data the writer
> > +communicated, can be read with RELAXED memory order. This ensures
> that,
> > +if the store to guard variable is observable, the store to payload is also
> > observable.
> > +Refer to rte_hash ``search_one_bucket_lf()`` for an example.
> Same here, suggest removing just the above line.
> 
> > +
> >  Coding Considerations
> >  ---------------------
> >
> > --
> > 2.7.4
Previous message: [dpdk-dev] [PATCH v7 1/3] doc: add generic atomic deprecation section
Next message: [dpdk-dev] [PATCH v7 2/3] devtools: prevent use of rte atomic APIs in future patches
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the dev mailing list