[dpdk-dev] [PATCH v3 09/12] app/eventdev: add pipeline queue worker functions

Van Haaren, Harry harry.van.haaren at intel.com
Thu Jan 11 16:47:24 CET 2018


> From: Pavan Nikhilesh [mailto:pbhagavatula at caviumnetworks.com]
> Sent: Thursday, January 11, 2018 1:52 PM
> To: Van Haaren, Harry <harry.van.haaren at intel.com>;
> jerin.jacob at caviumnetworks.com; santosh.shukla at caviumnetworks.com; Eads,
> Gage <gage.eads at intel.com>; hemant.agrawal at nxp.com; nipun.gupta at nxp.com; Ma,
> Liang J <liang.j.ma at intel.com>
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v3 09/12] app/eventdev: add pipeline queue
> worker functions
> 
> On Thu, Jan 11, 2018 at 12:17:38PM +0000, Van Haaren, Harry wrote:
>  > >
>  <snip>
> > > > Thinking a little more about this, also in light of patch 11/12 of
> this
> > > series.
> > > >
> > > > The code here has a "safe" and "unsafe" version of TX. This involves
> > > adding a spinlock inside the code, which is being locked/unlocked before
> > > doing the actual TX action.
> > > >
> > > > I don't understand why this is necessary? DPDK's general stance on
> locking
> > > for data-path is DPDK functions do not provide locks, and that
> application
> > > level must implement thread-synchronization if it is required.
> > > >
> > > > In this case, the app/eventdev can be considered an App, but I don't
> like
> > > the idea of providing a sample application and code that duplicates core
> > > functionality with safe/unsafe versions..
> > > >
> > >
> > > Some PMD's (net/octeontx) have capability to do multi-thread safe Tx
> where
> > > no
> > > thread-synchronization is required. This is exposed via the offload flag
> > > 'DEV_TX_OFFLOAD_MT_LOCKFREE'.
> >
> > Yes understood.
> >
> >
> > > So, the _safe Tx functions are selected based on the above offload
> > > capability
> > > and when the capability is absent _unsafe Tx functions are selected i.e.
> > > synchronized Tx via spin locks based on the Egress port id.
> >
> >
> > This part changes the current behavior of the sample app.
> >
> > Currently there is a (SINGLE_LINK | ATOMIC) stage at the end of the
> pipeline, which performs this "many-to-one" action, allowing a single core
> to dequeue all TX traffic, and perform the TX operation in a lock-free
> manner.
> >
> > Changing this to a locking mechanism is going to hurt performance on
> platforms that do not support TX_OFFLOAD_MT_LOCKFREE.
> >
> > In my opinion, the correct fix is to alter the overall pipeline, and
> always use lockless TX. Examples below;
> >
> > NO TX_OFFLOAD_MT_LOCKFREE:
> >
> >    Eth RX adapter -> stage 1 -> stage 2...(N-1) -> stage N -> stage TX
> (Atomic | SINGLE_LINK) -> eth TX
> 
> Agreed, when we detect that tx is not lockfree the workers would just
> forward
> the events to  (Atomic | SINGLE_LINK) event queue which would be dequeued by
> a
> service(mt_unsafe) and Tx them lockfree.
> 
> >
> >
> > WITH TX_OFFLOAD_MT_LOCKFREE:
> >
> >    Eth RX adapter -> stage 1 -> stage 2...(N-1) -> stage N -> eth TX MT
> Capable
> 
> The current lockfree pipeline would remain the same.
> >
> >
> > By configuring the pipeline based on MT_OFFLOAD_LOCKFREE capability flag,
> and adding the SINGLE_LINK at the end if required, we can support both
> models without resorting to locked TX functions.
> >
> > I think this will lead to a cleaner and more performant solution.
> >
> 
> Thoughts?

A quick summary of the issue here, and then an overview of my understanding of the proposed solution.


=== Issue ===
Ethdev hardware has a flag TX_OFFLOAD_MT_LOCKFREE, which when set means that multiple CPU threads can safely TX on a single ethdev-queue concurrently (aka; without locking). Not all hardware supports this, so applications must be able to gracefully handle hardware where this capability is not provided.


=== Solution ===
In eventdev pipelines with MT_LOCKFREE capability, the CPU running the last "worker" stage can also perform the ethdev-TX operation.

In eventdev pipelines without MT_LOCKFREE caps, we use a (Single Link | Atomic) stage to "fan in" the traffic to a single point, and use a TX service in order to abstract away the difference in CPU core requirements.



The above solution avoids placing locks in the datapath by modifying the pipeline design, and the difference in CPU requirements is abstracted by only registering the TX service if required.

Note that the TX service doesn't need the infrastructure like the RX adapter, as it is much simpler (dequeue from eventdev port, tx on ethdev port).


@Pavan, I believe this is the same solution as you - just making sure we're aligned!


Cheers, -Harry










More information about the dev mailing list