[dpdk-dev] [PATCH v3 2/5] ring: add a non-blocking implementation

Jerin Jacob Kollanukkaran jerinj at marvell.com
Mon Jan 28 15:04:25 CET 2019

Previous message: [dpdk-dev] [PATCH v3 2/5] ring: add a non-blocking implementation
Next message: [dpdk-dev] [PATCH v3 2/5] ring: add a non-blocking implementation
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Mon, 2019-01-28 at 13:43 +0000, Ola Liljedahl wrote:
> On Mon, 2019-01-28 at 13:34 +0000, Jerin Jacob Kollanukkaran wrote:
> > On Fri, 2019-01-25 at 17:21 +0000, Eads, Gage wrote:
> > > > -----Original Message-----
> > > > From: Ola Liljedahl [mailto:Ola.Liljedahl at arm.com]
> > > > Sent: Wednesday, January 23, 2019 4:16 AM
> > > > To: Eads, Gage <gage.eads at intel.com>; dev at dpdk.org
> > > > Cc: olivier.matz at 6wind.com; stephen at networkplumber.org; nd
> > > > <nd at arm.com>; Richardson, Bruce <bruce.richardson at intel.com>;
> > > > arybchenko at solarflare.com; Ananyev, Konstantin
> > > > <konstantin.ananyev at intel.com>
> > > > Subject: Re: [dpdk-dev] [PATCH v3 2/5] ring: add a non-blocking
> > > > implementation
> > > > 
> > > > s.
> > > > > > 
> > > > > You can tell this code was written when I thought x86-64 was
> > > > > the
> > > > > only
> > > > > viable target :). Yes, you are correct.
> > > > > 
> > > > > With regards to using __atomic intrinsics, I'm planning on
> > > > > taking
> > > > > a
> > > > > similar approach to the functions duplicated in
> > > > > rte_ring_generic.h and
> > > > > rte_ring_c11_mem.h: one version that uses rte_atomic
> > > > > functions
> > > > > (and
> > > > > thus stricter memory ordering) and one that uses __atomic
> > > > > intrinsics
> > > > > (and thus can benefit from more relaxed memory ordering).
> > > > What's the advantage of having two different implementations?
> > > > What
> > > > is the
> > > > disadvantage?
> > > > 
> > > > The existing ring buffer code originally had only the "legacy"
> > > > implementation
> > > > which was kept when the __atomic implementation was added. The
> > > > reason
> > > > claimed was that some older compilers for x86 do not support
> > > > GCC
> > > > __atomic
> > > > builtins. But I thought there was consensus that new
> > > > functionality
> > > > could have
> > > > only __atomic implementations.
> > > > 
> > > When CONFIG_RTE_RING_USE_C11_MEM_MODEL was introduced, it was
> > > left
> > > disabled for thunderx[1] for performance reasons. Assuming that
> > > hasn't changed, the advantage to having two versions is to best
> > > support all of DPDK's platforms. The disadvantage is of course
> > > duplicated code and the additional maintenance burden.
> > > 
> > > That said, if the thunderx maintainers are ok with it, I'm
> > > certainly 
> > The ring code was so fundamental building block for DPDK, there
> > was 
> > difference in performance and there was already legacy code so
> > introducing C11_MEM_MODEL was justified IMO. 
> > 
> > For the nonblocking implementation, I am happy to test with
> > three ARM64 microarchitectures and share the result with
> > C11_MEM_MODEL
> > vs non C11_MEM_MODLE performance.
> We should ensure the C11 memory model version enforces minimal
> ordering
> requirements:

I agree.

I think, We should have enough test case for performance measurement in
order to choose algorithms and quantify the other variables like C11 vs
non C11, LDXP/STXP vs CASP etc.


> 1) when computing number of available slots, allow for underflow
> (head and tail
> observed in unexpected order) instead of imposing read order with an
> additional
> read barrier.
> 2) We could cheat a little and use an explicit LoadStore barrier
> instead of
>  store-release/cas-release in dequeue (which only reads the ring). At
> least see
> if this improves performance. See such a patch here:
> https://github.com/ARM-software/progress64/commit/84c48e9c84100eb5b2d15e54f0dbf7
> 8dfa468805
> 
> Ideally, C/C++ would have an __ATOMIC_RELEASE_READSONLY memory model
> to use in
> situations where the shared data was only read before being released.
> 
> >  We may need to consider PPC also
> > here. So IMO, based on the overall performance result may be can
> > decide
> > the new code direction.
> Does PPC (64-bit POWER?) have support for double-word (128-bit) CAS?

I dont know, I was telling wrt in general C11 mem model for PPC.

> 
> -- 
> Ola Liljedahl, Networking System Architect, Arm
> Phone +46706866373, Skype ola.liljedahl
>

Previous message: [dpdk-dev] [PATCH v3 2/5] ring: add a non-blocking implementation
Next message: [dpdk-dev] [PATCH v3 2/5] ring: add a non-blocking implementation
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the dev mailing list