[dpdk-dev] [PATCH 3/5] app/testpmd: add missing transmit errors stats

David Marchand david.marchand at redhat.com
Fri Feb 15 18:32:28 CET 2019


On Fri, Feb 15, 2019 at 5:19 PM Thomas Monjalon <thomas at monjalon.net> wrote:

> 15/02/2019 16:04, David Marchand:
> > On Fri, Feb 15, 2019 at 3:05 PM Bruce Richardson <
> bruce.richardson at intel.com>
> > wrote:
> >
> > > On Fri, Feb 15, 2019 at 10:33:47AM +0100, David Marchand wrote:
> > > >    On Fri, Feb 15, 2019 at 9:58 AM Thomas Monjalon
> > > >    <[1]thomas at monjalon.net> wrote:
> > > >
> > > >      14/02/2019 19:51, David Marchand:
> > > >      > What is the purpose of oerrors ?
> > > >      >
> > > >      > Since the drivers (via rte_eth_tx_burst return value) report
> the
> > > >      numbers of
> > > >      > packets successfully transmitted, the application can try to
> > > >      retransmit the
> > > >      > packets that did not make it and counts this.
> > > >      > If the driver counts such "missed" packets, then it does the
> job
> > > >      the
> > > >      > application will do anyway (wasting some cycles).
> > > >      > But what is more a problem is that the application does not
> know
> > > >      if the
> > > >      > packets in oerrors are its own retries or problems that the
> driver
> > > >      can not
> > > >      > detect (hw problems) but the hw can.
> > > >      >
> > > >      > So the best option is that oerrors does not report the
> packets the
> > > >      driver
> > > >      > refuses (and I can see some drivers that do not comply to
> this)
> > > >      but only
> > > >      > "external" errors from the driver pov.
> > > >      I can see the benefit of having driver errors in the stats,
> > > >      so it is generically stored for later analysis or print.
> > > >      It could be managed at ethdev level instead of the application
> > > >      doing the computation.
> > > >      What about splitting the Tx errors in 2 fields? oerrors / ofull
> ?
> > > >      Who said it's awful? sorry Bruce for anticipating ;)
> > > >
> > > >    Summary, correct me if we are not aligned :-)
> > > >    - ofull (maybe ofifoerrors?) is actually a count of SW failed
> > > transmits
> > > >    - it would be handled in rte_eth_tx_burst() itself in a generic
> way
> > > >    - the drivers do not need to track such SW failed transmits
> > > >    - oerrors only counts packets HW failed transmits, dropped out of
> the
> > > >    driver tx_pkt_burst() knowledge
> > > >    - the application does not have to track SW failed transmits
> since the
> > > >    stats is in ethdev
> > > >    It sounds good to me, this means an ethdev abi breakage.
> > >
> > > Hang on, why do we need ethdev to track this at all, given that it's
> > > trivial for apps to track this themselves. Would we not be better just
> to
> > > add this tracking into testpmd and leave ethdev and drivers alone?
> Perhaps
> > > I'm missing something?
> > >
> >
> > This was my first intention but Thomas hopped in ;-)
>
> I was just opening the discussion :)
>
> > testpmd does it already via the fs->fwd_dropped stats and ovs has its
> > tx_dropped stat.
> >
> > The problem is that all drivers have different approach about this.
> > Some drivers only count real hw errors in oerrors.
> > But others count the packets it can't send in oerrors (+ there are some
> > cases that seem buggy to me where the driver will always refuse the mbufs
> > for reason X and the application can retry indefinitely to send...).
>
> We have 3 options:
> 1/ status quo = oerrors is inconsistent across drivers
> 2/ API break = oerrors stop being incremented for temporary
>         unavailability (i.e. queue full, kind of ERETRY),
>         report only packets which will be never sent,
>         may be a small performance gain for some drivers
> 3/ API + ABI break = same as 2/ +
>         report ERETRY errors in ofull (same as tx_burst() delta)
>
> Note that the option 2 is a light API break which does not require
> any deprecation notice because the original definition of oerrors
> is really vague: "failed transmitted packets"
> By changing the definition of errors to "packets lost", we can count
> HW errors + packets not matching requirements.
> As David suggests, the packets not matching requirements can be freed
> as it is done for packets successfully transmitted to the HW.
> We need also to update the definition of the return value of
> rte_eth_tx_burst(): "packets actually stored in transmit descriptors".
> We should also count the bad packets rejected by the driver.
> Then the number of bad packets would be the difference between
> the return value of rte_eth_tx_burst() and opackets counter.
> This solution is fixing some bugs and enforce a consistent behaviour.
>

I am also for option 2 especially because of this.
A driver that refuses a packet for reason X (which is a limitation, or an
incorrect config or whatever that is not a transient condition) but gives
it back to the application is a bad driver.


> The option 3 is breaking the ABI and may degrade the performances.
> The only benefit is convenience or semantic: ofull would be the
> equivalent of imissed.
> The application can count the same by making the difference between
> the burst size and the return of rte_eth_tx_burst.
>
> My vote is for the option 2.
>



-- 
David Marchand


More information about the dev mailing list