[dpdk-dev] [PATCH v3 1/3] ring: read tail using atomic load

Ola Liljedahl Ola.Liljedahl at arm.com
Mon Oct 8 12:01:30 CEST 2018

Or maybe performance gets worse but not because of that one additional instruction/cycle in ring buffer enqueue and dequeue but because function or loop alignment changed for one or more functions.

When the benchmarking noise (possibly several % due to changes in code alignment) is bigger than the effect you are trying to measure (1 cycle per ring buffer enqueue/dequeue), benchmarking is not the right approach.

-- Ola

On 08/10/2018, 07:27, "Honnappa Nagarahalli" <Honnappa.Nagarahalli at arm.com> wrote:

    >     >
    >     > I doubt it is possible to benchmark with such a precision so to see the
    >     > potential difference of one ADD instruction.
    >     > Just changes in function alignment can affect performance by percents.
    > And
    >     > the natural variation when not using a 100% deterministic system is going
    > to
    >     > be a lot larger than one cycle per ring buffer operation.
    >     >
    >     > Some of the other patches are also for correctness (e.g. load-acquire of
    > tail)
    >     The discussion is about this patch alone. Other patches are already Acked.
    > So the benchmarking then makes zero sense.
    The whole point is to prove the effect of 1 instruction either way. IMO, it is simple enough, follow the memory model to the full extent. We have to keep in mind about other architectures as well. May be that additional instruction is not required on other architectures. 

More information about the dev mailing list