[PATCH] net/af_packet: fix statistics
Mattias Rönnblom
hofors at lysator.liu.se
Thu May 2 19:57:55 CEST 2024
On 2024-05-02 18:16, Stephen Hemminger wrote:
> On Thu, 2 May 2024 15:12:43 +0100
> Ferruh Yigit <ferruh.yigit at amd.com> wrote:
>
>> I am not referring multiple core sharing a queue, this is wrong for DPDK.
>>
>> For single core case, if a variable doesn't have 'volatile' qualifier
>> and load/stores are not atomic, as compiler is not aware that there may
>> be other threads will access this value, what prevents compiler to hold
>> stats value in a register and store it memory at later stage, let say at
>> the end of the application forwarding loop?
>
> No other driver does this; it is not a problem since once rx/tx burst
> function returns, there is no state the compiler is aware of.
>
>
>> For this case keeping 'volatile' qualifier works. But Mattias has a
>> suggestion to use "normal load + normal add + atomic store" in datapath.
>>
>>
>> For getting stats, which will be in different thread, you are already
>> casting it to 'volatile' to enforce compiler read from memory each time.
>>
>>
>> Additionally when we take stats reset into account, which can happen in
>> different thread, 'volatile' is not enough.
>> For reliable stats reset, either we need to use full atomics, or offset
>> logic which I am trying to in other patch.
>
> If you want to attack the problem in the general case, there should be
> standard set of functions for handling per-core stats. If you look at
> Linux kernel you will see macros around this function:
>
> netdev_core_stats_inc((struct net_device *dev, u32 offset) {
> struct net_device_core_stats __percpu *p = READ_ONCE(dev->core_stats);
> unsigned long __percpu *field;
> ...
> field = (__force unsigned long __percpu *)((__force void *)p + offset);
> this_cpu_inc(*field);
> }
>
> The expansion of this_cpu_inc is nested but ends up being:
> *field += 1;
>
> No volatile is needed.
>
> The conclusion I reach from this is:
> - atomic is not needed for per-cpu data. See "volatile considered harmful"
> using a form of compiler barrier is useful but should be hidden in generic code.
> - what is the state of the better per-lcore patches?
> that should be used here.
> - need a more generic percpu stats infrastructure
> - ethedev needs to have a generic percpu infra and not reinvented in xdp, tap, packet, etc.
>
>
In this case, the counters are already effectively "per-lcore", since a
single RX/TX queue pair is not MT safe, and thus is only used by one
thread (or at least, by one thread at a time).
My impression is that both the Linux kernel and DPDK tend to assume that
no load or store tearing occurs for accesses to small, naturally
aligned, integer types. I'm guessing it's also commonly assumed there
are limits to how large section of the whole program the compiler may
reason about, to eliminate data and code/instructions that have no
effect on that thread. Thus, volatile are not needed to assure that for
example counter state is actually maintained, and updates actually occurs.
I will continue work on the lcore variables patch set as soon as time
permits. That will be after my attempts to complete the bitops and the
bitset patch sets have concluded.
More information about the dev
mailing list