[RFC v3] net/af_packet: make stats reset reliable
Mattias Rönnblom
hofors at lysator.liu.se
Sun May 26 09:21:55 CEST 2024
On 2024-05-08 17:23, Stephen Hemminger wrote:
> On Wed, 8 May 2024 09:19:02 +0200
> Mattias Rönnblom <hofors at lysator.liu.se> wrote:
>
>> On 2024-05-04 00:00, Stephen Hemminger wrote:
>>> On Fri, 3 May 2024 16:45:47 +0100
>>> Ferruh Yigit <ferruh.yigit at amd.com> wrote:
>>>
>>>> For stats reset, use an offset instead of zeroing out actual stats values,
>>>> get_stats() displays diff between stats and offset.
>>>> This way stats only updated in datapath and offset only updated in stats
>>>> reset function. This makes stats reset function more reliable.
>>>>
>>>> As stats only written by single thread, we can remove 'volatile' qualifier
>>>> which should improve the performance in datapath.
>>>>
>>>> While updating around, 'igb_stats' parameter renamed as 'stats'.
>>>>
>>>> Signed-off-by: Ferruh Yigit <ferruh.yigit at amd.com>
>>>> ---
>>>> Cc: Mattias Rönnblom <mattias.ronnblom at ericsson.com>
>>>> Cc: Stephen Hemminger <stephen at networkplumber.org>
>>>> Cc: Morten Brørup <mb at smartsharesystems.com>
>>>>
>>>> This update triggered by mail list discussion [1].
>>>>
>>>> [1]
>>>> https://inbox.dpdk.org/dev/3b2cf48e-2293-4226-b6cd-5f4dd3969f99@lysator.liu.se/
>>>
>>>
>>> NAK
>>>
>>> I did not hear a good argument why atomic or volatile was necessary in the first place.
>>> Why?
>>>
>>
>> On the reader side, loads should be atomic.
>> On the writer side, stores should be atomic.
>>
>> Updates (stores) should actually occur in a timely manner. The complete
>> read-modify-write cycle need not be atomic, since we only have a single
>> writer. All this for the per-lcore counter case.
>>
>> If load or store tearing occurs, the counter values may occasionally
>> take totally bogus values. I think that should be avoided. Especially
>> since it will likely come at a very reasonable cost.
>>
>> From what it seems to me, load or store tearing may well occur. GCC may
>> generate two 32-bit stores for a program-level 64-bit store on 32-bit
>> x86. If you have constant and immediate-data store instructions,
>> constant writes may also be end up teared. The kernel documentation has
>> some example of this. Add LTO, it's not necessarily going to be all that
>> clear what is storing-a-constant and what is not.
>>
>> Maybe you care a little less if statistics are occasionally broken, or
>> some transient, inconsistent state, but generally they should work, and
>> they should never have some totally bogus values. So, statistics aren't
>> snow flakes, mostly just business as usual.
>>
>> We can't both have a culture that promotes C11-style parallel
>> programming, or, at the extreme, push the C11 APIs as-is, and the say
>> "and btw you don't have to care about the standard when it comes to
>> statistics".
>>
>> We could adopt the Linux kernel's rules, programming model, and APIs
>> (ignoring legal issues). That would be very old school, maybe somewhat
>> over-engineered for our purpose, include a fair amount of inline
>> assembler, and also and may well depend on GCC or GCC-like compilers,
>> just like what I believe the kernel does.
>>
>> We could use something in-between, heavily inspired by C11 but still
>> with an opportunity to work around compiler issues, library issues, and
>> extend the API for our use case.
>>
>> I agree we shouldn't have to mark statistics _Atomic, or RTE_ATOMIC(),
>> rte_atomic64_t, or rte_sometimes_atomic_and_sometimes_not64_t. Just
>> keeping the usual C integer types seems like a better option to me.
>>
>>> Why is this driver special (a snowflake) compared to all the other drivers doing software
>>> statistics (tap, virtio, xdp, ring, memif, netvsc, vmware)?
>>
>> If a broken piece of code has been copied around, one place is going to
>> be the first to be fixed.
>
>
> I dislike when any driver does something completely different than valid precedent.
> No other driver in DPDK, Vpp, FreeBSD, Linux (and probably Windows) uses atomic for
> updating statistics. We even got performance benefit at MS from removing atomic
> increment of staistics in internal layers.
>
All of those are using atomic stores when updating the statistics, I'm
sure. Assuring a store being atomic is one thing, and assuring the whole
read-modify-write cycle is atomic is a completely different (and very
much more expensive) thing.
> The idea of load tearing is crazy talk of integral types. It would break so many things.
> It is the kind of stupid compiler thing that would send Linus on a rant and get
> the GCC compiler writers in trouble.
>
On 32-bit x86, store tearing for 64-bit integral types is the order of
the day.
For <64 bit types, I agree. The only cases (like the one listed in the
kernel documentation), are going to be rare indeed.
> The DPDK has always favored performance over strict safety guard rails everywhere.
> Switching to making every statistic an atomic operation is not in the spirit of
> what is required. There is no strict guarantee necessary here.
>
I think we agree but just use different terminology.
More information about the dev
mailing list