Service core statistics MT safety
Mattias Rönnblom
mattias.ronnblom at ericsson.com
Tue Jun 28 08:52:46 CEST 2022
On 2022-06-28 02:14, Honnappa Nagarahalli wrote:
> <snip>
>>>>>>
>>>>>>> From: Mattias Rönnblom [mailto:mattias.ronnblom at ericsson.com]
>>>>>>> Sent: Monday, 27 June 2022 13.06
>>>>>>>
>>>>>>> Hi.
>>>>>>>
>>>>>>> Is it safe to enable stats on MT safe services?
>>>>>>>
>>>>>>> https://protect2.fireeye.com/v1/url?k=31323334-501d5122-313273af-
>> 4
>>>>>>> 54445555731-6096fdb16385f88f&q=1&e=27b94605-d1e2-40b6-af6d-
>> 9ebc54d
>>>>>>>
>> 5db18&u=https%3A%2F%2Fgithub.com%2FDPDK%2Fdpdk%2Fblob%2Fmain%
>> 2Flib
>>>>>>> %2Feal%2Fcommon%2Frte_service.c%23
>>>>>>> L3
>>>>>>> 6
>>>>>>> 6
>>>>>>>
>>>>>>> It seems to me this would have to be an __atomic_add for this code
>>>>>>> to produce deterministic results.
>>>>>>
>>>>>> I agree. The same goes for the 'calls' field.
>>>>> The calling function does the locking.
>>>>> https://protect2.fireeye.com/v1/url?k=31323334-501d5122-313273af-
>> 454
>>>>> 445555731-5ce419f8bf9f9b76&q=1&e=27b94605-d1e2-40b6-af6d-
>> 9ebc54d5db1
>>>>>
>> 8&u=https%3A%2F%2Fgithub.com%2FDPDK%2Fdpdk%2Fblob%2Fmain%2Flib
>> %2Feal
>>>>> %2Fcommon%2Frte_service.c%23L3
>>>>> 98
>>>>>
>>>>> For more information you can look at:
>>>>> https://protect2.fireeye.com/v1/url?k=31323334-501d5122-313273af-
>> 454
>>>>> 445555731-ba0d1416f08856f0&q=1&e=27b94605-d1e2-40b6-af6d-
>> 9ebc54d5db1
>>>>>
>> 8&u=https%3A%2F%2Fgithub.com%2FDPDK%2Fdpdk%2Fblob%2Fmain%2Flib
>> %2Feal
>>>>> %2Finclude%2Frte_service.h%23L
>>>>> 120
>>>>>
>>>>
>>>> What about the
>>>> https://protect2.fireeye.com/v1/url?k=31323334-501d5122-313273af-
>> 4544
>>>> 45555731-b64334addc78c264&q=1&e=27b94605-d1e2-40b6-af6d-
>> 9ebc54d5db18&
>>>>
>> u=https%3A%2F%2Fgithub.com%2FDPDK%2Fdpdk%2Fblob%2Fmain%2Flib%2
>> Feal%2F
>>>> common%2Frte_service.c%23L404
>>>> call (for MT safe services)?
>>>>
>>>> There's no lock held there.
>>> Good point.
>>> This is the case where the service running in service cores is MT safe. However,
>> the stats are incremented outside of the MT Safety mechanism employed by the
>> service. So, yes, this and other updates in the function
>> 'service_runner_do_callback' need to be updated atomically.
>>
>> Maybe a better solution would be to move this to the core_state struct (and
>> eliminate the "calls" field since the same information is already in the core_state
>> struct). The contention on these cache lines will be pretty crazy for services with
>> short run time (say a thousand cycles or less per call), assuming they are
>> mapped to many cores.
> That's one option, the structures are internal as well. With this option stats need to be aggregated which will not give an accurate view. But, that is the nature of the statistics.
>
Per-core counters is a very common pattern. Used for Linux MIB counters,
for example. I'm not sure I think it's much less accurate. I mean, you
just load in quick succession what's globally visible for the different
per-lcore counters. If you do a relaxed store on an ARM, how long time
does it take until it's seen by someone doing a relaxed load on a
different core? Roughly.
> I am also wondering if these stats are of any use other than for debugging. Adding a capability to disable stats might help as well.
>
They could be used as a crude tool to determine service core
utilization. Comparing utilization between different services running on
the same core should be straight-forward, but lcore utilization is
harder in absolute terms. If you just look at "cycles", a completely
idle core would look like it's very busy (basically rdtsc latency added
for every loop). I assume you'd have to do some heuristic based on both
"calls" and "cycles" to get an estimate.
I think service core utilization would be very useful, although that
would require some changes in the service function signature, so the
service can report back if it did some useful work for a particular call.
That would make for a DPDK 'top'. Just like 'top', it can't impose any
serious performance degradation when used, to be really useful, I think.
Sure, it should be possible to turn it on and off. I thought that was
the case already?
>>
>> Idle service cores will basically do nothing else than stall waiting for these lines, I
>> suspect, hampering the progress of more busy cores.
More information about the dev
mailing list