[RFC v3] net/af_packet: make stats reset reliable

Mattias Rönnblom hofors at lysator.liu.se
Sun May 26 09:03:47 CEST 2024


On 2024-05-09 13:37, Morten Brørup wrote:
>> From: Bruce Richardson [mailto:bruce.richardson at intel.com]
>> Sent: Thursday, 9 May 2024 11.30
>>
>> On Thu, May 09, 2024 at 09:43:16AM +0200, Morten Brørup wrote:
>>>> From: Stephen Hemminger [mailto:stephen at networkplumber.org]
>>>> Sent: Wednesday, 8 May 2024 22.54
>>>>
>>>> On Wed, 8 May 2024 20:48:06 +0100
>>>> Ferruh Yigit <ferruh.yigit at amd.com> wrote:
>>>>
>>>>>>
>>>>>> The idea of load tearing is crazy talk of integral types. It
>> would
>>>> break so many things.
>>>>>> It is the kind of stupid compiler thing that would send Linus on
>> a
>>>> rant and get
>>>>>> the GCC compiler writers in trouble.
>>>>>>
>>>>>> The DPDK has always favored performance over strict safety guard
>>>> rails everywhere.
>>>>>> Switching to making every statistic an atomic operation is not
>> in
>>>> the spirit of
>>>>>> what is required. There is no strict guarantee necessary here.
>>>>>>
>>>>>
>>>>> I kind of agree with Stephen.
>>>>>
>>>>> Thanks Mattias, Morten & Stephen, it was informative discussion.
>> But
>>>> for
>>>>> *SW drivers* stats update and reset is not core functionality and
>> I
>>>>> think we can be OK to get hit on corner cases, instead of
>>>>> over-engineering or making code more complex.
>>>>
>>>>
>>>> I forgot the case of 64 bit values on 32 bit platforms!
>>>> Mostly because haven't cared about 32 bit for years...
>>>>
>>>> The Linux kernel uses some wrappers to handle this.
>>>> On 64 bit platforms they become noop.
>>>> On 32 bit platform, they are protected by a seqlock and updates are
>>>> wrapped by the sequence count.
>>>>
>>>> If we go this way, then doing similar Noop on 64 bit and atomic or
>>>> seqlock
>>>> on 32 bit should be done, but in common helper.
>>>>
>>>> Looking inside FreeBSD, it looks like that has changed over the
>> years as
>>>> well.
>>>>
>>>> 	if_inc_counter
>>>> 		counter_u64_add
>>>> 			atomic_add_64
>>>> But the counters are always per-cpu in this case. So although it
>> does
>>>> use
>>>> locked operation, will always be uncontended.
>>>>
>>>>
>>>> PS: Does DPDK still actually support 32 bit on x86? Can it be
>> dropped
>>>> this cycle?
>>>
>>> We cannot drop 32 bit architecture support altogether.
>>>
>>> But, unlike the Linux kernel, DPDK doesn't need to support ancient 32
>> bit architectures.
>>> If the few 32 bit architectures supported by DPDK provide non-tearing
>> 64 bit loads/stores, we don't need locks (in the fast path) for 64 bit
>> counters.
>>>
>>> In addition to 32 bit x86, DPDK supports ARMv7-A (a 32 bit
>> architecture) and 32 bit ARMv8.
>>> I don't think DPDK support any other 32 bit architectures.
>>>
>>>
>>> As Mattias mentioned, 32 bit x86 can use xmm registers to provide 64
>> bit non-tearing load/store.
>>>
>>
>> Testing this a little in godbolt, I see gcc using xmm registers on 32-
>> bit
>> when updating 64-bit counters, but clang doesn't seem to do so, but
>> instead
>> does 2 stores when writing back the 64 value. (I tried with both
>> volatile
>> and non-volatile 64-bit values, just to see if volatile would encourage
>> clang to do a single store).
>>
>> GCC: https://godbolt.org/z/9eqKfT3hz
>> Clang: https://godbolt.org/z/PT5EqKn4c
> 
> Interesting.
> I guess this can be fixed by manually implementing what GCC does.
> 

If you want an atomic store, use __atomic_store_n() or the equivalent.

On GCC, the resulting code will be compact and use an XMM register for 
the store.

On clang, you get much more instructions including a conditional jump to 
a compiler library function. You also get a compiler warning about 
misaligned atomic may incur significant cost (no surprise). All this on 
32-bit x86. Interesting, you get all those instructions even when you 
explicitly tell the compiler the address should be assumed to be 
aligned. Seems like a bug, or at least a short-coming.

So GCC assumes atomic stores are aligned, which seems perfectly 
reasonable to me.

> I'm more concerned about finding a high-performance (in the fast path) 64 bit counter solution for 32 bit ARM.
> 


More information about the dev mailing list