[RFC PATCH v1 0/4] Direct re-arming of buffers on receive side
    Morten Brørup 
    mb at smartsharesystems.com
       
    Thu Jan 27 18:13:38 CET 2022
    
    
  
> From: Honnappa Nagarahalli [mailto:Honnappa.Nagarahalli at arm.com]
> Sent: Thursday, 27 January 2022 05.07
> 
> Thanks Morten, appreciate your comments. Few responses inline.
> 
> > -----Original Message-----
> > From: Morten Brørup <mb at smartsharesystems.com>
> > Sent: Sunday, December 26, 2021 4:25 AM
> >
> > > From: Feifei Wang [mailto:feifei.wang2 at arm.com]
> > > Sent: Friday, 24 December 2021 17.46
> > >
> <snip>
> 
> > >
> > > However, this solution poses several constraint:
> > >
> > > 1)The receive queue needs to know which transmit queue it should
> take
> > > the buffers from. The application logic decides which transmit port
> to
> > > use to send out the packets. In many use cases the NIC might have a
> > > single port ([1], [2], [3]), in which case a given transmit queue
> is
> > > always mapped to a single receive queue (1:1 Rx queue: Tx queue).
> This
> > > is easy to configure.
> > >
> > > If the NIC has 2 ports (there are several references), then we will
> > > have
> > > 1:2 (RX queue: TX queue) mapping which is still easy to configure.
> > > However, if this is generalized to 'N' ports, the configuration can
> be
> > > long. More over the PMD would have to scan a list of transmit
> queues
> > > to pull the buffers from.
> >
> > I disagree with the description of this constraint.
> >
> > As I understand it, it doesn't matter now many ports or queues are in
> a NIC or
> > system.
> >
> > The constraint is more narrow:
> >
> > This patch requires that all packets ingressing on some port/queue
> must
> > egress on the specific port/queue that it has been configured to ream
> its
> > buffers from. I.e. an application cannot route packets between
> multiple ports
> > with this patch.
> Agree, this patch as is has this constraint. It is not a constraint
> that would apply for NICs with single port. The above text is
> describing some of the issues associated with generalizing the solution
> for N number of ports. If N is small, the configuration is small and
> scanning should not be bad.
> 
Perhaps we can live with the 1:1 limitation, if that is the primary use case.
Alternatively, the feature could fall back to using the mempool if unable to get/put buffers directly from/to a participating NIC. In this case, I envision a library serving as a shim layer between the NICs and the mempool. In other words: Take a step back from the implementation, and discuss the high level requirements and architecture of the proposed feature.
> >
> > >
> 
> <snip>
> 
> > >
> >
> > You are missing the fourth constraint:
> >
> > 4) The application must transmit all received packets immediately,
> i.e. QoS
> > queueing and similar is prohibited.
> I do not understand this, can you please elaborate?. Even if there is
> QoS queuing, there would be steady stream of packets being transmitted.
> These transmitted packets will fill the buffers on the RX side.
E.g. an appliance may receive packets on a 10 Gbps backbone port, and queue some of the packets up for a customer with a 20 Mbit/s subscription. When there is a large burst of packets towards that subscriber, they will queue up in the QoS queue dedicated to that subscriber. During that traffic burst, there is much more RX than TX. And after the traffic burst, there will be more TX than RX.
> 
> >
> <snip>
> 
> > >
> >
> > The patch provides a significant performance improvement, but I am
> > wondering if any real world applications exist that would use this.
> Only a
> > "router on a stick" (i.e. a single-port router) comes to my mind, and
> that is
> > probably sufficient to call it useful in the real world. Do you have
> any other
> > examples to support the usefulness of this patch?
> SmartNIC is a clear and dominant use case, typically they have a single
> port for data plane traffic (dual ports are mostly for redundancy)
> This patch avoids good amount of store operations. The smaller CPUs
> found in SmartNICs have smaller store buffers which can become
> bottlenecks. Avoiding the lcore cache saves valuable HW cache space.
OK. This is an important use case!
> 
> >
> > Anyway, the patch doesn't do any harm if unused, and the only
> performance
> > cost is the "if (rxq->direct_rxrearm_enable)" branch in the Ethdev
> driver. So I
> > don't oppose to it.
> >
> 
    
    
More information about the dev
mailing list