[PATCH] net/af_packet: fix statistics
Stephen Hemminger
stephen at networkplumber.org
Thu May 2 18:16:58 CEST 2024
On Thu, 2 May 2024 15:12:43 +0100
Ferruh Yigit <ferruh.yigit at amd.com> wrote:
> I am not referring multiple core sharing a queue, this is wrong for DPDK.
>
> For single core case, if a variable doesn't have 'volatile' qualifier
> and load/stores are not atomic, as compiler is not aware that there may
> be other threads will access this value, what prevents compiler to hold
> stats value in a register and store it memory at later stage, let say at
> the end of the application forwarding loop?
No other driver does this; it is not a problem since once rx/tx burst
function returns, there is no state the compiler is aware of.
> For this case keeping 'volatile' qualifier works. But Mattias has a
> suggestion to use "normal load + normal add + atomic store" in datapath.
>
>
> For getting stats, which will be in different thread, you are already
> casting it to 'volatile' to enforce compiler read from memory each time.
>
>
> Additionally when we take stats reset into account, which can happen in
> different thread, 'volatile' is not enough.
> For reliable stats reset, either we need to use full atomics, or offset
> logic which I am trying to in other patch.
If you want to attack the problem in the general case, there should be
standard set of functions for handling per-core stats. If you look at
Linux kernel you will see macros around this function:
netdev_core_stats_inc((struct net_device *dev, u32 offset) {
struct net_device_core_stats __percpu *p = READ_ONCE(dev->core_stats);
unsigned long __percpu *field;
...
field = (__force unsigned long __percpu *)((__force void *)p + offset);
this_cpu_inc(*field);
}
The expansion of this_cpu_inc is nested but ends up being:
*field += 1;
No volatile is needed.
The conclusion I reach from this is:
- atomic is not needed for per-cpu data. See "volatile considered harmful"
using a form of compiler barrier is useful but should be hidden in generic code.
- what is the state of the better per-lcore patches?
that should be used here.
- need a more generic percpu stats infrastructure
- ethedev needs to have a generic percpu infra and not reinvented in xdp, tap, packet, etc.
More information about the dev
mailing list