[PATCH v2 09/19] rcu: use rte optional stdatomic API
Tyler Retzlaff
roretzla at linux.microsoft.com
Thu Oct 26 18:36:53 CEST 2023
On Thu, Oct 26, 2023 at 04:24:54AM +0000, Ruifeng Wang wrote:
> > -----Original Message-----
> > From: Tyler Retzlaff <roretzla at linux.microsoft.com>
> > Sent: Thursday, October 26, 2023 6:38 AM
> > To: Ruifeng Wang <Ruifeng.Wang at arm.com>
> > Cc: dev at dpdk.org; Akhil Goyal <gakhil at marvell.com>; Anatoly Burakov
> > <anatoly.burakov at intel.com>; Andrew Rybchenko <andrew.rybchenko at oktetlabs.ru>; Bruce
> > Richardson <bruce.richardson at intel.com>; Chenbo Xia <chenbo.xia at intel.com>; Ciara Power
> > <ciara.power at intel.com>; David Christensen <drc at linux.vnet.ibm.com>; David Hunt
> > <david.hunt at intel.com>; Dmitry Kozlyuk <dmitry.kozliuk at gmail.com>; Dmitry Malloy
> > <dmitrym at microsoft.com>; Elena Agostini <eagostini at nvidia.com>; Erik Gabriel Carrillo
> > <erik.g.carrillo at intel.com>; Fan Zhang <fanzhang.oss at gmail.com>; Ferruh Yigit
> > <ferruh.yigit at amd.com>; Harman Kalra <hkalra at marvell.com>; Harry van Haaren
> > <harry.van.haaren at intel.com>; Honnappa Nagarahalli <Honnappa.Nagarahalli at arm.com>;
> > jerinj at marvell.com; Konstantin Ananyev <konstantin.v.ananyev at yandex.ru>; Matan Azrad
> > <matan at nvidia.com>; Maxime Coquelin <maxime.coquelin at redhat.com>; Narcisa Ana Maria Vasile
> > <navasile at linux.microsoft.com>; Nicolas Chautru <nicolas.chautru at intel.com>; Olivier Matz
> > <olivier.matz at 6wind.com>; Ori Kam <orika at nvidia.com>; Pallavi Kadam
> > <pallavi.kadam at intel.com>; Pavan Nikhilesh <pbhagavatula at marvell.com>; Reshma Pattan
> > <reshma.pattan at intel.com>; Sameh Gobriel <sameh.gobriel at intel.com>; Shijith Thotton
> > <sthotton at marvell.com>; Sivaprasad Tummala <sivaprasad.tummala at amd.com>; Stephen Hemminger
> > <stephen at networkplumber.org>; Suanming Mou <suanmingm at nvidia.com>; Sunil Kumar Kori
> > <skori at marvell.com>; thomas at monjalon.net; Viacheslav Ovsiienko <viacheslavo at nvidia.com>;
> > Vladimir Medvedkin <vladimir.medvedkin at intel.com>; Yipeng Wang <yipeng1.wang at intel.com>;
> > nd <nd at arm.com>
> > Subject: Re: [PATCH v2 09/19] rcu: use rte optional stdatomic API
> >
> > On Wed, Oct 25, 2023 at 09:41:22AM +0000, Ruifeng Wang wrote:
> > > > -----Original Message-----
> > > > From: Tyler Retzlaff <roretzla at linux.microsoft.com>
> > > > Sent: Wednesday, October 18, 2023 4:31 AM
> > > > To: dev at dpdk.org
> > > > Cc: Akhil Goyal <gakhil at marvell.com>; Anatoly Burakov
> > > > <anatoly.burakov at intel.com>; Andrew Rybchenko
> > > > <andrew.rybchenko at oktetlabs.ru>; Bruce Richardson
> > > > <bruce.richardson at intel.com>; Chenbo Xia <chenbo.xia at intel.com>;
> > > > Ciara Power <ciara.power at intel.com>; David Christensen
> > > > <drc at linux.vnet.ibm.com>; David Hunt <david.hunt at intel.com>; Dmitry
> > > > Kozlyuk <dmitry.kozliuk at gmail.com>; Dmitry Malloy
> > > > <dmitrym at microsoft.com>; Elena Agostini <eagostini at nvidia.com>; Erik
> > > > Gabriel Carrillo <erik.g.carrillo at intel.com>; Fan Zhang
> > > > <fanzhang.oss at gmail.com>; Ferruh Yigit <ferruh.yigit at amd.com>;
> > > > Harman Kalra <hkalra at marvell.com>; Harry van Haaren
> > > > <harry.van.haaren at intel.com>; Honnappa Nagarahalli
> > > > <Honnappa.Nagarahalli at arm.com>; jerinj at marvell.com; Konstantin
> > > > Ananyev <konstantin.v.ananyev at yandex.ru>; Matan Azrad
> > > > <matan at nvidia.com>; Maxime Coquelin <maxime.coquelin at redhat.com>;
> > > > Narcisa Ana Maria Vasile <navasile at linux.microsoft.com>; Nicolas
> > > > Chautru <nicolas.chautru at intel.com>; Olivier Matz
> > > > <olivier.matz at 6wind.com>; Ori Kam <orika at nvidia.com>; Pallavi Kadam
> > > > <pallavi.kadam at intel.com>; Pavan Nikhilesh
> > > > <pbhagavatula at marvell.com>; Reshma Pattan <reshma.pattan at intel.com>;
> > > > Sameh Gobriel <sameh.gobriel at intel.com>; Shijith Thotton
> > > > <sthotton at marvell.com>; Sivaprasad Tummala
> > > > <sivaprasad.tummala at amd.com>; Stephen Hemminger
> > > > <stephen at networkplumber.org>; Suanming Mou <suanmingm at nvidia.com>;
> > > > Sunil Kumar Kori <skori at marvell.com>; thomas at monjalon.net;
> > > > Viacheslav Ovsiienko <viacheslavo at nvidia.com>; Vladimir Medvedkin
> > > > <vladimir.medvedkin at intel.com>; Yipeng Wang
> > > > <yipeng1.wang at intel.com>; Tyler Retzlaff
> > > > <roretzla at linux.microsoft.com>
> > > > Subject: [PATCH v2 09/19] rcu: use rte optional stdatomic API
> > > >
> > > > Replace the use of gcc builtin __atomic_xxx intrinsics with
> > > > corresponding rte_atomic_xxx optional stdatomic API
> > > >
> > > > Signed-off-by: Tyler Retzlaff <roretzla at linux.microsoft.com>
> > > > ---
> > > > lib/rcu/rte_rcu_qsbr.c | 48 +++++++++++++++++------------------
> > > > lib/rcu/rte_rcu_qsbr.h | 68
> > > > +++++++++++++++++++++++++-------------------------
> > > > 2 files changed, 58 insertions(+), 58 deletions(-)
> > > >
> > > > diff --git a/lib/rcu/rte_rcu_qsbr.c b/lib/rcu/rte_rcu_qsbr.c index
> > > > 17be93e..4dc7714 100644
> > > > --- a/lib/rcu/rte_rcu_qsbr.c
> > > > +++ b/lib/rcu/rte_rcu_qsbr.c
> > > > @@ -102,21 +102,21 @@
> > > > * go out of sync. Hence, additional checks are required.
> > > > */
> > > > /* Check if the thread is already registered */
> > > > - old_bmap = __atomic_load_n(__RTE_QSBR_THRID_ARRAY_ELM(v, i),
> > > > - __ATOMIC_RELAXED);
> > > > + old_bmap = rte_atomic_load_explicit(__RTE_QSBR_THRID_ARRAY_ELM(v, i),
> > > > + rte_memory_order_relaxed);
> > > > if (old_bmap & 1UL << id)
> > > > return 0;
> > > >
> > > > do {
> > > > new_bmap = old_bmap | (1UL << id);
> > > > - success = __atomic_compare_exchange(
> > > > + success = rte_atomic_compare_exchange_strong_explicit(
> > > > __RTE_QSBR_THRID_ARRAY_ELM(v, i),
> > > > - &old_bmap, &new_bmap, 0,
> > > > - __ATOMIC_RELEASE, __ATOMIC_RELAXED);
> > > > + &old_bmap, new_bmap,
> > > > + rte_memory_order_release, rte_memory_order_relaxed);
> > > >
> > > > if (success)
> > > > - __atomic_fetch_add(&v->num_threads,
> > > > - 1, __ATOMIC_RELAXED);
> > > > + rte_atomic_fetch_add_explicit(&v->num_threads,
> > > > + 1, rte_memory_order_relaxed);
> > > > else if (old_bmap & (1UL << id))
> > > > /* Someone else registered this thread.
> > > > * Counter should not be incremented.
> > > > @@ -154,8 +154,8 @@
> > > > * go out of sync. Hence, additional checks are required.
> > > > */
> > > > /* Check if the thread is already unregistered */
> > > > - old_bmap = __atomic_load_n(__RTE_QSBR_THRID_ARRAY_ELM(v, i),
> > > > - __ATOMIC_RELAXED);
> > > > + old_bmap = rte_atomic_load_explicit(__RTE_QSBR_THRID_ARRAY_ELM(v, i),
> > > > + rte_memory_order_relaxed);
> > > > if (!(old_bmap & (1UL << id)))
> > > > return 0;
> > > >
> > > > @@ -165,14 +165,14 @@
> > > > * completed before removal of the thread from the list of
> > > > * reporting threads.
> > > > */
> > > > - success = __atomic_compare_exchange(
> > > > + success = rte_atomic_compare_exchange_strong_explicit(
> > > > __RTE_QSBR_THRID_ARRAY_ELM(v, i),
> > > > - &old_bmap, &new_bmap, 0,
> > > > - __ATOMIC_RELEASE, __ATOMIC_RELAXED);
> > > > + &old_bmap, new_bmap,
> > > > + rte_memory_order_release, rte_memory_order_relaxed);
> > > >
> > > > if (success)
> > > > - __atomic_fetch_sub(&v->num_threads,
> > > > - 1, __ATOMIC_RELAXED);
> > > > + rte_atomic_fetch_sub_explicit(&v->num_threads,
> > > > + 1, rte_memory_order_relaxed);
> > > > else if (!(old_bmap & (1UL << id)))
> > > > /* Someone else unregistered this thread.
> > > > * Counter should not be incremented.
> > > > @@ -227,8 +227,8 @@
> > > >
> > > > fprintf(f, " Registered thread IDs = ");
> > > > for (i = 0; i < v->num_elems; i++) {
> > > > - bmap = __atomic_load_n(__RTE_QSBR_THRID_ARRAY_ELM(v, i),
> > > > - __ATOMIC_ACQUIRE);
> > > > + bmap = rte_atomic_load_explicit(__RTE_QSBR_THRID_ARRAY_ELM(v, i),
> > > > + rte_memory_order_acquire);
> > > > id = i << __RTE_QSBR_THRID_INDEX_SHIFT;
> > > > while (bmap) {
> > > > t = __builtin_ctzl(bmap);
> > > > @@ -241,26 +241,26 @@
> > > > fprintf(f, "\n");
> > > >
> > > > fprintf(f, " Token = %" PRIu64 "\n",
> > > > - __atomic_load_n(&v->token, __ATOMIC_ACQUIRE));
> > > > + rte_atomic_load_explicit(&v->token, rte_memory_order_acquire));
> > > >
> > > > fprintf(f, " Least Acknowledged Token = %" PRIu64 "\n",
> > > > - __atomic_load_n(&v->acked_token, __ATOMIC_ACQUIRE));
> > > > + rte_atomic_load_explicit(&v->acked_token,
> > > > +rte_memory_order_acquire));
> > > >
> > > > fprintf(f, "Quiescent State Counts for readers:\n");
> > > > for (i = 0; i < v->num_elems; i++) {
> > > > - bmap = __atomic_load_n(__RTE_QSBR_THRID_ARRAY_ELM(v, i),
> > > > - __ATOMIC_ACQUIRE);
> > > > + bmap = rte_atomic_load_explicit(__RTE_QSBR_THRID_ARRAY_ELM(v, i),
> > > > + rte_memory_order_acquire);
> > > > id = i << __RTE_QSBR_THRID_INDEX_SHIFT;
> > > > while (bmap) {
> > > > t = __builtin_ctzl(bmap);
> > > > fprintf(f, "thread ID = %u, count = %" PRIu64 ", lock count = %u\n",
> > > > id + t,
> > > > - __atomic_load_n(
> > > > + rte_atomic_load_explicit(
> > > > &v->qsbr_cnt[id + t].cnt,
> > > > - __ATOMIC_RELAXED),
> > > > - __atomic_load_n(
> > > > + rte_memory_order_relaxed),
> > > > + rte_atomic_load_explicit(
> > > > &v->qsbr_cnt[id + t].lock_cnt,
> > > > - __ATOMIC_RELAXED));
> > > > + rte_memory_order_relaxed));
> > > > bmap &= ~(1UL << t);
> > > > }
> > > > }
> > > > diff --git a/lib/rcu/rte_rcu_qsbr.h b/lib/rcu/rte_rcu_qsbr.h index
> > > > 87e1b55..9f4aed2 100644
> > > > --- a/lib/rcu/rte_rcu_qsbr.h
> > > > +++ b/lib/rcu/rte_rcu_qsbr.h
> > > > @@ -63,11 +63,11 @@
> > > > * Given thread id needs to be converted to index into the array and
> > > > * the id within the array element.
> > > > */
> > > > -#define __RTE_QSBR_THRID_ARRAY_ELM_SIZE (sizeof(uint64_t) * 8)
> > > > +#define __RTE_QSBR_THRID_ARRAY_ELM_SIZE
> > > > +(sizeof(RTE_ATOMIC(uint64_t)) *
> > > > +8)
> > > > #define __RTE_QSBR_THRID_ARRAY_SIZE(max_threads) \
> > > > RTE_ALIGN(RTE_ALIGN_MUL_CEIL(max_threads, \
> > > > __RTE_QSBR_THRID_ARRAY_ELM_SIZE) >> 3, RTE_CACHE_LINE_SIZE)
> > > > -#define __RTE_QSBR_THRID_ARRAY_ELM(v, i) ((uint64_t *) \
> > > > +#define __RTE_QSBR_THRID_ARRAY_ELM(v, i) ((uint64_t __rte_atomic *)
> > > > +\
> > >
> > > Is it equivalent to ((RTE_ATOMIC(uint64_t) *)?
> >
> > i'm not sure if you're asking about the resultant type of the expression or not?
>
> I see other places are using specifier hence the question.
>
> >
> > in this context we aren't specifying an atomic type but rather adding the atomic qualifier
> > to what should already be a variable that has an atomic specified type with a cast which
> > is why we use __rte_atomic.
>
> I read from document [1] that atomic qualified type may have a different size from the original type.
> If that is the case, the size difference could cause issue in bitmap array accessing.
> Did I misunderstand?
>
> [1] https://en.cppreference.com/w/c/language/atomic
>
you do not misunderstand, the standard allows atomic specified type
sizes to differ from their ordinary native type sizes. though i have
some issues with how cppreference is wording things here as compared
with the actual standard.
one of the reasons is it allows all standard atomic functions to be
'generic' which means they can be used on objects of arbitrary size
instead of just integer and pointer types. i.e. you can use it on
struct, union and array types.
it's implementation defined how the operations are made atomic and
is obviously target processor dependent, but in cases when the processor
has no intrinsic support to perform the operation atomically the toolchain
may generate the code that uses a hidden lock. you can test whether this
is the case for arbitrary objects using standard specified atomic_is_lock_free.
https://en.cppreference.com/w/c/atomic/atomic_is_lock_free
so that's the long answer form of why they *may* be different size,
alignment etc.. but the real question is in this instance will it be?
probably not.
mainly because it wouldn't make a lot of sense for clang/gcc to suddenly
decide that sizeof(uint64_t) != sizeof(_Atomic(uint64_t)) or that they
should need to use a lock on amd64 processor to load/store atomically
(assuming native alignment) etc..
a lot of the above is why we had a lot of discussion about how and when
we could enable the use of standard C11 atomics in dpdk. as you've
probably noticed for existing platforms, toolchains and targets it is
actually defaulted off, but it does allow binary packagers or users to
build with it on.
for compatibility only the strictest of guarantees can be made when dpdk
and the application are both built consistently to use or not use
standard atomics. it is strongly cautioned that applications should not
attempt to use an unmatched atomic operation on a dpdk atomic object.
i.e. if you enabled standard atomics, don't use __atomic_load_n directly
on a field from a public dpdk structure, instead use
rte_atomic_load_explicit and make sure your application defines
RTE_ENABLE_STDATOMIC.
hope this explanation helps.
> >
> > >
> > > > ((struct rte_rcu_qsbr_cnt *)(v + 1) + v->max_threads) + i)
> > > > #define __RTE_QSBR_THRID_INDEX_SHIFT 6 #define
> > > > __RTE_QSBR_THRID_MASK 0x3f @@ -75,13 +75,13 @@
> > > >
> > >
> > > <snip>
More information about the dev
mailing list