Service core statistics MT safety
Van Haaren, Harry
harry.van.haaren at intel.com
Thu Jun 30 10:46:39 CEST 2022
Hey All,
(I've been away on Holidays a few days, just catching up!)
> -----Original Message-----
> From: Honnappa Nagarahalli <Honnappa.Nagarahalli at arm.com>
> Sent: Wednesday, June 29, 2022 9:07 PM
> To: Mattias Rönnblom <hofors at lysator.liu.se>; mattias.ronnblom
> <mattias.ronnblom at ericsson.com>; Morten Brørup
> <mb at smartsharesystems.com>; dev at dpdk.org
> Cc: Van Haaren, Harry <harry.van.haaren at intel.com>; nd <nd at arm.com>; nd
> <nd at arm.com>
> Subject: RE: Service core statistics MT safety
>
> <snip>
<big snip of previous discussions>
> > At the time of the read operation (in the global counter solution), there may well
> > be cycles consumed or calls having been made, but not yet posted. The window
> > between call having been made, and global counter having been incremented
> > (and thus made globally visible) is small, but non-zero.
> Agree. The read value is the atomic state of the system at a given instance (when the
> read was executed), though that instance happened few cycles back.
> (Just to be clear, I am fine with per-core counters)
Option 1: "Per core counters"
> Agree we need atomic operations. I am not sure if __atomic_fetch_add or
> __atomic_store_n would have a large difference. __atomic_fetch_add would result
> in less number of instructions. I am fine with either.
Option 2: "Use atomics for counter increments".
> > >> I was fortunate to get some data from a real-world application, and
> > >> enabling service core stats resulted in a 7% degradation of overall
> > >> system capacity. I'm guessing atomic instructions would not make things
> > better.
Agree, performance of atomics is likely to reduce performance.. but correctness
is worth more than performance.
<snip>
In my mind, any LTS/backports get the simplest/highest-confidence bugfix: using atomics.
The atomics are behind the "service stats" feature enable, so impact is only when those are
enabled.
If there is still a performance hit, and there are *no* MT services registered, we could check
a static-global flag, and if there are no MT services use the normal adds. Thoughts on such a
solution to reduce atomic perf impact only to apps with MT-services?
The code changes themselves are OK.. I can send a patch with fix if there's agreement on the approach?
diff --git a/lib/eal/common/rte_service.c b/lib/eal/common/rte_service.c
index ef31b1f63c..a07c8fc2d7 100644
--- a/lib/eal/common/rte_service.c
+++ b/lib/eal/common/rte_service.c
@@ -363,9 +363,9 @@ service_runner_do_callback(struct rte_service_spec_impl *s,
uint64_t start = rte_rdtsc();
s->spec.callback(userdata);
uint64_t end = rte_rdtsc();
- s->cycles_spent += end - start;
+ __atomic_fetch_add(&s->cycles_spent, (end-start), __ATOMIC_RELAXED);
+ __atomic_fetch_add(&s->calls, 1, __ATOMIC_RELAXED);
cs->calls_per_service[service_idx]++;
- s->calls++;
} else
s->spec.callback(userdata);
}
More information about the dev
mailing list