[dpdk-dev] [PATCH v6 1/4] doc: add generic atomic deprecation	section
    Thomas Monjalon 
    thomas at monjalon.net
       
    Fri Jul 10 18:55:52 CEST 2020
    
    
  
Interestingly, John, our doc maintainer is not Cc'ed.
I add him.
Please use --cc-cmd devtools/get-maintainer.sh
I am expecting a review from an x86 maintainer as well.
If no maintainer replies, ping them.
07/07/2020 11:50, Phil Yang:
> Add deprecating the generic rte_atomic_xx APIs to c11 atomic built-ins
> guide and examples.
[...]
> +Atomic Operations: Use C11 Atomic Built-ins
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +DPDK `generic rte_atomic <https://github.com/DPDK/dpdk/blob/v20.02/lib/librte_eal/common/include/generic/rte_atomic.h>`_ operations are
Why this github link on 20.02?
Please try to keep lines small.
> +implemented by `__sync built-ins <https://gcc.gnu.org/onlinedocs/gcc/_005f_005fsync-Builtins.html>`_.
Long links should be on their own line to avoid long lines.
> +These __sync built-ins result in full barriers on aarch64, which are unnecessary
> +in many use cases. They can be replaced by `__atomic built-ins <https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html>`_ that
> +conform to the C11 memory model and provide finer memory order control.
> +
> +So replacing the rte_atomic operations with __atomic built-ins might improve
> +performance for aarch64 machines. `More details <https://www.dpdk.org/wp-content/uploads/sites/35/2019/10/StateofC11Code.pdf>`_.
"More details."
Please make a sentence.
> +
> +Some typical optimization cases are listed below:
> +
> +Atomicity
> +^^^^^^^^^
> +
> +Some use cases require atomicity alone, the ordering of the memory operations
> +does not matter. For example the packets statistics in the `vhost <https://github.com/DPDK/dpdk/blob/v20.02/examples/vhost/main.c#L796>`_ example application.
Again github.
If you really want a web link, use code.dpdk.org or doc.dpdk.org/api
But why giving code example at all?
> +
> +It just updates the number of transmitted packets, no subsequent logic depends
> +on these counters. So the RELAXED memory ordering is sufficient:
> +
> +.. code-block:: c
> +
> +    static __rte_always_inline void
> +    virtio_xmit(struct vhost_dev *dst_vdev, struct vhost_dev *src_vdev,
> +            struct rte_mbuf *m)
> +    {
> +        ...
> +        ...
> +        if (enable_stats) {
> +            __atomic_add_fetch(&dst_vdev->stats.rx_total_atomic, 1, __ATOMIC_RELAXED);
> +            __atomic_add_fetch(&dst_vdev->stats.rx_atomic, ret, __ATOMIC_RELAXED);
> +            ...
> +        }
> +    }
I don't see how adding real code helps here.
Why not just mentioning __atomic_add_fetch and __ATOMIC_RELAXED?
> +
> +One-way Barrier
> +^^^^^^^^^^^^^^^
> +
> +Some use cases allow for memory reordering in one way while requiring memory
> +ordering in the other direction.
> +
> +For example, the memory operations before the `lock <https://github.com/DPDK/dpdk/blob/v20.02/lib/librte_eal/common/include/generic/rte_spinlock.h#L66>`_ can move to the
> +critical section, but the memory operations in the critical section cannot move
> +above the lock. In this case, the full memory barrier in the CAS operation can
> +be replaced to ACQUIRE. On the other hand, the memory operations after the
> +`unlock <https://github.com/DPDK/dpdk/blob/v20.02/lib/librte_eal/common/include/generic/rte_spinlock.h#L88>`_ can move to the critical section, but the memory operations in the
> +critical section cannot move below the unlock. So the full barrier in the STORE
> +operation can be replaced with RELEASE.
Again github links instead of our doxygen.
> +
> +Reader-Writer Concurrency
> +^^^^^^^^^^^^^^^^^^^^^^^^^
No blank line here?
> +Lock-free reader-writer concurrency is one of the common use cases in DPDK.
> +
> +The payload or the data that the writer wants to communicate to the reader,
> +can be written with RELAXED memory order. However, the guard variable should
> +be written with RELEASE memory order. This ensures that the store to guard
> +variable is observable only after the store to payload is observable.
> +Refer to `rte_hash insert <https://github.com/DPDK/dpdk/blob/v20.02/lib/librte_hash/rte_cuckoo_hash.c#L737>`_ for an example.
Hum...
> +
> +.. code-block:: c
> +
> +    static inline int32_t
> +    rte_hash_cuckoo_insert_mw(const struct rte_hash *h,
> +        ...
> +        int32_t *ret_val)
> +    {
> +        ...
> +        ...
> +
> +        /* Insert new entry if there is room in the primary
> +         * bucket.
> +         */
> +        for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
> +                /* Check if slot is available */
> +                if (likely(prim_bkt->key_idx[i] == EMPTY_SLOT)) {
> +                        prim_bkt->sig_current[i] = sig;
> +                        /* Store to signature and key should not
> +                         * leak after the store to key_idx. i.e.
> +                         * key_idx is the guard variable for signature
> +                         * and key.
> +                         */
> +                        __atomic_store_n(&prim_bkt->key_idx[i],
> +                                         new_idx,
> +                                         __ATOMIC_RELEASE);
> +                        break;
> +                }
> +        }
> +
> +        ...
> +    }
> +
> +Correspondingly, on the reader side, the guard variable should be read
> +with ACQUIRE memory order. The payload or the data the writer communicated,
> +can be read with RELAXED memory order. This ensures that, if the store to
> +guard variable is observable, the store to payload is also observable. Refer to `rte_hash lookup <https://github.com/DPDK/dpdk/blob/v20.02/lib/librte_hash/rte_cuckoo_hash.c#L1215>`_ for an example.
> +
> +.. code-block:: c
> +
> +    static inline int32_t
> +    search_one_bucket_lf(const struct rte_hash *h, const void *key, uint16_t sig,
> +        void **data, const struct rte_hash_bucket *bkt)
> +    {
> +        ...
> +
> +        for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
> +            ....
> +            if (bkt->sig_current[i] == sig) {
> +                key_idx = __atomic_load_n(&bkt->key_idx[i],
> +                                        __ATOMIC_ACQUIRE);
> +                if (key_idx != EMPTY_SLOT) {
> +                    k = (struct rte_hash_key *) ((char *)keys +
> +                        key_idx * h->key_entry_size);
> +
> +                if (rte_hash_cmp_eq(key, k->key, h) == 0) {
> +                    if (data != NULL) {
> +                        *data = __atomic_load_n(&k->pdata,
> +                                        __ATOMIC_ACQUIRE);
> +                    }
> +
> +                    /*
> +                    * Return index where key is stored,
> +                    * subtracting the first dummy index
> +                    */
> +                    return key_idx - 1;
> +                }
> +            ...
> +    }
> +
NACK for the big chunks of real code.
Please use words and avoid code.
If you insist on keeping code in doc, I will make you responsible
of updating all the code we have already in the doc :)
    
    
More information about the dev
mailing list