[dpdk-dev] [PATCH 3/5] app/testpmd: add missing transmit errors stats

Ananyev, Konstantin konstantin.ananyev at intel.com
Fri Feb 15 19:15:04 CET 2019



> -----Original Message-----
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of David Marchand
> Sent: Friday, February 15, 2019 5:32 PM
> To: Thomas Monjalon <thomas at monjalon.net>
> Cc: Richardson, Bruce <bruce.richardson at intel.com>; dev at dpdk.org; Lu, Wenzhuo <wenzhuo.lu at intel.com>; Wu, Jingjing
> <jingjing.wu at intel.com>; Iremonger, Bernard <bernard.iremonger at intel.com>; Maxime Coquelin <mcoqueli at redhat.com>; Yigit, Ferruh
> <ferruh.yigit at intel.com>; Andrew Rybchenko <arybchenko at solarflare.com>; Wiles, Keith <keith.wiles at intel.com>
> Subject: Re: [dpdk-dev] [PATCH 3/5] app/testpmd: add missing transmit errors stats
> 
> On Fri, Feb 15, 2019 at 5:19 PM Thomas Monjalon <thomas at monjalon.net> wrote:
> 
> > 15/02/2019 16:04, David Marchand:
> > > On Fri, Feb 15, 2019 at 3:05 PM Bruce Richardson <
> > bruce.richardson at intel.com>
> > > wrote:
> > >
> > > > On Fri, Feb 15, 2019 at 10:33:47AM +0100, David Marchand wrote:
> > > > >    On Fri, Feb 15, 2019 at 9:58 AM Thomas Monjalon
> > > > >    <[1]thomas at monjalon.net> wrote:
> > > > >
> > > > >      14/02/2019 19:51, David Marchand:
> > > > >      > What is the purpose of oerrors ?
> > > > >      >
> > > > >      > Since the drivers (via rte_eth_tx_burst return value) report
> > the
> > > > >      numbers of
> > > > >      > packets successfully transmitted, the application can try to
> > > > >      retransmit the
> > > > >      > packets that did not make it and counts this.
> > > > >      > If the driver counts such "missed" packets, then it does the
> > job
> > > > >      the
> > > > >      > application will do anyway (wasting some cycles).
> > > > >      > But what is more a problem is that the application does not
> > know
> > > > >      if the
> > > > >      > packets in oerrors are its own retries or problems that the
> > driver
> > > > >      can not
> > > > >      > detect (hw problems) but the hw can.
> > > > >      >
> > > > >      > So the best option is that oerrors does not report the
> > packets the
> > > > >      driver
> > > > >      > refuses (and I can see some drivers that do not comply to
> > this)
> > > > >      but only
> > > > >      > "external" errors from the driver pov.
> > > > >      I can see the benefit of having driver errors in the stats,
> > > > >      so it is generically stored for later analysis or print.
> > > > >      It could be managed at ethdev level instead of the application
> > > > >      doing the computation.
> > > > >      What about splitting the Tx errors in 2 fields? oerrors / ofull
> > ?
> > > > >      Who said it's awful? sorry Bruce for anticipating ;)
> > > > >
> > > > >    Summary, correct me if we are not aligned :-)
> > > > >    - ofull (maybe ofifoerrors?) is actually a count of SW failed
> > > > transmits
> > > > >    - it would be handled in rte_eth_tx_burst() itself in a generic
> > way
> > > > >    - the drivers do not need to track such SW failed transmits
> > > > >    - oerrors only counts packets HW failed transmits, dropped out of
> > the
> > > > >    driver tx_pkt_burst() knowledge
> > > > >    - the application does not have to track SW failed transmits
> > since the
> > > > >    stats is in ethdev
> > > > >    It sounds good to me, this means an ethdev abi breakage.
> > > >
> > > > Hang on, why do we need ethdev to track this at all, given that it's
> > > > trivial for apps to track this themselves. Would we not be better just
> > to
> > > > add this tracking into testpmd and leave ethdev and drivers alone?
> > Perhaps
> > > > I'm missing something?
> > > >
> > >
> > > This was my first intention but Thomas hopped in ;-)
> >
> > I was just opening the discussion :)
> >
> > > testpmd does it already via the fs->fwd_dropped stats and ovs has its
> > > tx_dropped stat.
> > >
> > > The problem is that all drivers have different approach about this.
> > > Some drivers only count real hw errors in oerrors.
> > > But others count the packets it can't send in oerrors (+ there are some
> > > cases that seem buggy to me where the driver will always refuse the mbufs
> > > for reason X and the application can retry indefinitely to send...).
> >
> > We have 3 options:
> > 1/ status quo = oerrors is inconsistent across drivers
> > 2/ API break = oerrors stop being incremented for temporary
> >         unavailability (i.e. queue full, kind of ERETRY),
> >         report only packets which will be never sent,
> >         may be a small performance gain for some drivers
> > 3/ API + ABI break = same as 2/ +
> >         report ERETRY errors in ofull (same as tx_burst() delta)
> >
> > Note that the option 2 is a light API break which does not require
> > any deprecation notice because the original definition of oerrors
> > is really vague: "failed transmitted packets"
> > By changing the definition of errors to "packets lost", we can count
> > HW errors + packets not matching requirements.
> > As David suggests, the packets not matching requirements can be freed
> > as it is done for packets successfully transmitted to the HW.
> > We need also to update the definition of the return value of
> > rte_eth_tx_burst(): "packets actually stored in transmit descriptors".
> > We should also count the bad packets rejected by the driver.
> > Then the number of bad packets would be the difference between
> > the return value of rte_eth_tx_burst() and opackets counter.
> > This solution is fixing some bugs and enforce a consistent behaviour.
> >
> 
> I am also for option 2 especially because of this.
> A driver that refuses a packet for reason X (which is a limitation, or an
> incorrect config or whatever that is not a transient condition) but gives
> it back to the application is a bad driver.

Why? What.s wrong to leave it to the upper layer to decide what to
do with the packets that can't be sent (by one reason or another)?

> 
> 
> > The option 3 is breaking the ABI and may degrade the performances.
> > The only benefit is convenience or semantic: ofull would be the
> > equivalent of imissed.
> > The application can count the same by making the difference between
> > the burst size and the return of rte_eth_tx_burst.
> >
> > My vote is for the option 2.
> >
> 
> 
> 
> --
> David Marchand


More information about the dev mailing list