[dpdk-dev] [PATCH 4/4] port: fix ethdev writer burst too big

Sanford, Robert rsanford at akamai.com
Fri Apr 1 18:22:17 CEST 2016


Hi Cristian,

Please see my comments inline.

>
>
>> -----Original Message-----
>> From: Robert Sanford [mailto:rsanford2 at gmail.com]
>> Sent: Monday, March 28, 2016 9:52 PM
>> To: dev at dpdk.org; Dumitrescu, Cristian <cristian.dumitrescu at intel.com>
>> Subject: [PATCH 4/4] port: fix ethdev writer burst too big
>> 
>> For f_tx_bulk functions in rte_port_ethdev.c, we may unintentionally
>> send bursts larger than tx_burst_sz to the underlying ethdev.
>> Some PMDs (e.g., ixgbe) may truncate this request to their maximum
>> burst size, resulting in unnecessary enqueuing failures or ethdev
>> writer retries.
>
>Sending bursts larger than tx_burst_sz is actually intentional. The
>assumption is that NIC performance benefits from larger burst size. So
>the tx_burst_sz is used as a minimal burst size requirement, not as a
>maximal or fixed burst size requirement.
>
>I agree with you that a while ago the vector version of IXGBE driver used
>to work the way you describe it, but I don't think this is the case
>anymore. As an example, if TX burst size is set to 32 and 48 packets are
>transmitted, than the PMD will TX all the 48 packets (internally it can
>work in batches of 4, 8, 32, etc, should not matter) rather than TXing
>just 32 packets out of 48 and user having to either discard or retry with
>the remaining 16 packets. I am CC-ing Steve Liang for confirming this.
>
>Is there any PMD that people can name that currently behaves the
>opposite, i.e. given a burst of 48 pkts for TX, accept 32 pkts and
>discard the other 16?
>
>> 

Yes, I believe that IXGBE *still* truncates. What am I missing? :) My
interpretation of the latest vector TX burst function is that it truncates
bursts longer than txq->tx_rs_thresh. Here are relevant code snippets that
show it lowering the number of packets (nb_pkts) to enqueue (apologies in
advance for the email client mangling the indentation):

---

#define IXGBE_DEFAULT_TX_RSBIT_THRESH 32

static void
ixgbe_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info
*dev_info)
{
  ...
  dev_info->default_txconf = (struct rte_eth_txconf) {
    ...
    .tx_rs_thresh = IXGBE_DEFAULT_TX_RSBIT_THRESH,
    ...
  };
  ...
}


uint16_t
ixgbe_xmit_pkts_vec(void *tx_queue, struct rte_mbuf **tx_pkts,
  uint16_t nb_pkts)
{
  ...
  /* cross rx_thresh boundary is not allowed */
  nb_pkts = RTE_MIN(nb_pkts, txq->tx_rs_thresh);

  if (txq->nb_tx_free < txq->tx_free_thresh)
    ixgbe_tx_free_bufs(txq);
    

  nb_commit = nb_pkts = (uint16_t)RTE_MIN(txq->nb_tx_free, nb_pkts);
  if (unlikely(nb_pkts == 0))
    return 0;
  ...
  return nb_pkts;
}

---



>> We propose to fix this by moving the tx buffer flushing logic from
>> *after* the loop that puts all packets into the tx buffer, to *inside*
>> the loop, testing for a full burst when adding each packet.
>> 
>
>The issue I have with this approach is the introduction of a branch that
>has to be tested for each iteration of the loop rather than once for the
>entire loop.
>
>The code branch where you add this is actually the slow(er) code path
>(where local variable expr != 0), which is used for non-contiguous or
>bursts smaller than tx_burst_sz. Is there a particular reason you are
>only interested of enabling this strategy (of using tx_burst_sz as a
>fixed burst size requirement) only on this code path? The reason I am
>asking is the other fast(er) code path (where expr == 0) also uses
>tx_burst_sz as a minimal requirement and therefore it can send burst
>sizes bigger than tx_burst_sz.

The reason we limit the burst size only in the "else" path is that we also
proposed to limit the ethdev tx burst in the "if (expr==0)" path, in patch
3/4.

>
>
>> Signed-off-by: Robert Sanford <rsanford at akamai.com>
>> ---
>>  lib/librte_port/rte_port_ethdev.c |   20 ++++++++++----------
>>  1 files changed, 10 insertions(+), 10 deletions(-)
>> 
>> diff --git a/lib/librte_port/rte_port_ethdev.c
>> b/lib/librte_port/rte_port_ethdev.c
>> index 3fb4947..1283338 100644
>> --- a/lib/librte_port/rte_port_ethdev.c
>> +++ b/lib/librte_port/rte_port_ethdev.c
>> @@ -151,7 +151,7 @@ static int rte_port_ethdev_reader_stats_read(void
>> *port,
>>  struct rte_port_ethdev_writer {
>>  	struct rte_port_out_stats stats;
>> 
>> -	struct rte_mbuf *tx_buf[2 * RTE_PORT_IN_BURST_SIZE_MAX];
>> +	struct rte_mbuf *tx_buf[RTE_PORT_IN_BURST_SIZE_MAX];
>>  	uint32_t tx_burst_sz;
>>  	uint16_t tx_buf_count;
>>  	uint64_t bsz_mask;
>> @@ -257,11 +257,11 @@ rte_port_ethdev_writer_tx_bulk(void *port,
>>  			p->tx_buf[tx_buf_count++] = pkt;
>> 
>> 	RTE_PORT_ETHDEV_WRITER_STATS_PKTS_IN_ADD(p, 1);
>>  			pkts_mask &= ~pkt_mask;
>> -		}
>> 
>> -		p->tx_buf_count = tx_buf_count;
>> -		if (tx_buf_count >= p->tx_burst_sz)
>> -			send_burst(p);
>> +			p->tx_buf_count = tx_buf_count;
>> +			if (tx_buf_count >= p->tx_burst_sz)
>> +				send_burst(p);
>> +		}
>>  	}
>
>One observation here: if we enable this proposal (which I have an issue
>with due to the executing the branch per loop iteration rather than once
>per entire loop), it also eliminates the buffer overflow issue flagged by
>you in the other email :), so no need to e.g. doble the size of the port
>internal buffer (tx_buf).
>
>> 


Not exactly correct: We suggested doubling tx_buf[] for *ring* writers.
Here (the hunks above) we suggest the opposite: *reduce* the size of the
*ethdev* tx_buf[], because we never expect to buffer more than a full
burst.

You are correct about the additional branch per loop iteration. On the
other hand, the proposed change is simpler than something like this:
compute how many more packets we need to complete a full burst, copy them
to tx_buf[], send_burst(), and then copy the rest to tx_buf[]. Either way
is acceptable to me.


>>  	return 0;
>> @@ -328,7 +328,7 @@ static int rte_port_ethdev_writer_stats_read(void
>> *port,
>>  struct rte_port_ethdev_writer_nodrop {
>>  	struct rte_port_out_stats stats;
>> 
>> -	struct rte_mbuf *tx_buf[2 * RTE_PORT_IN_BURST_SIZE_MAX];
>> +	struct rte_mbuf *tx_buf[RTE_PORT_IN_BURST_SIZE_MAX];
>>  	uint32_t tx_burst_sz;
>>  	uint16_t tx_buf_count;
>>  	uint64_t bsz_mask;
>> @@ -466,11 +466,11 @@ rte_port_ethdev_writer_nodrop_tx_bulk(void
>> *port,
>>  			p->tx_buf[tx_buf_count++] = pkt;
>> 
>> 	RTE_PORT_ETHDEV_WRITER_NODROP_STATS_PKTS_IN_ADD(p, 1);
>>  			pkts_mask &= ~pkt_mask;
>> -		}
>> 
>> -		p->tx_buf_count = tx_buf_count;
>> -		if (tx_buf_count >= p->tx_burst_sz)
>> -			send_burst_nodrop(p);
>> +			p->tx_buf_count = tx_buf_count;
>> +			if (tx_buf_count >= p->tx_burst_sz)
>> +				send_burst_nodrop(p);
>> +		}
>>  	}
>> 
>>  	return 0;
>> --
>> 1.7.1
>

--
Regards,
Robert



More information about the dev mailing list