[dpdk-dev] [PATCH] ixgbe: fix checking for tx_free_thresh

Ananyev, Konstantin konstantin.ananyev at intel.com
Tue Jun 9 17:44:04 CEST 2015



> -----Original Message-----
> From: Zoltan Kiss [mailto:zoltan.kiss at linaro.org]
> Sent: Tuesday, June 09, 2015 4:08 PM
> To: Ananyev, Konstantin; dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH] ixgbe: fix checking for tx_free_thresh
> 
> 
> 
> On 09/06/15 12:18, Ananyev, Konstantin wrote:
> >
> >
> >> -----Original Message-----
> >> From: Zoltan Kiss [mailto:zoltan.kiss at linaro.org]
> >> Sent: Wednesday, June 03, 2015 6:47 PM
> >> To: Ananyev, Konstantin; dev at dpdk.org
> >> Subject: Re: [dpdk-dev] [PATCH] ixgbe: fix checking for tx_free_thresh
> >>
> >>
> >>
> >> On 02/06/15 18:35, Ananyev, Konstantin wrote:
> >>>
> >>>
> >>>> -----Original Message-----
> >>>> From: Zoltan Kiss [mailto:zoltan.kiss at linaro.org]
> >>>> Sent: Tuesday, June 02, 2015 4:08 PM
> >>>> To: Ananyev, Konstantin; dev at dpdk.org
> >>>> Subject: Re: [dpdk-dev] [PATCH] ixgbe: fix checking for tx_free_thresh
> >>>>
> >>>>
> >>>>
> >>>> On 02/06/15 14:31, Ananyev, Konstantin wrote:
> >>>>> Hi Zoltan,
> >>>>>
> >>>>>> -----Original Message-----
> >>>>>> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Zoltan Kiss
> >>>>>> Sent: Monday, June 01, 2015 5:16 PM
> >>>>>> To: dev at dpdk.org
> >>>>>> Subject: Re: [dpdk-dev] [PATCH] ixgbe: fix checking for tx_free_thresh
> >>>>>>
> >>>>>> Hi,
> >>>>>>
> >>>>>> Anyone would like to review this patch? Venky sent a NAK, but I've
> >>>>>> explained to him why it is a bug.
> >>>>>
> >>>>>
> >>>>> Well, I think Venky is right here.
> >>>> I think the comments above rte_eth_tx_burst() definition are quite clear
> >>>> about what tx_free_thresh means, e1000 and i40e use it that way, but not
> >>>> ixgbe.
> >>>>
> >>>>> Indeed that fix, will cause more often unsuccessful checks for DD bits and might cause a
> >>>>> slowdown for TX fast-path.
> >>>> Not if the applications set tx_free_thresh according to the definition
> >>>> of this value. But we can change the default value from 32 to something
> >>>> higher, e.g I'm using nb_desc/2, and it works out well.
> >>>
> >>> Sure we can, as I said below, we can unify it one way or another.
> >>> One way would be to make fast-path TX to free TXDs when number of occupied TXDs raises above tx_free_thresh
> >>> (what rte_ethdev.h comments say and what full-featured TX is doing).
> >>> Though in that case we have to change default value for tx_free_thresh, and all existing apps that
> >>> using tx_free_thresh==32 and fast-path TX will probably experience a slowdown.
> >>
> >> They are in trouble already, because i40e and e1000 uses it as defined.
> >
> > In fact, i40e has exactly the same problem as ixgbe:
> > fast-path and full-featured TX  code treat  tx_free_thresh in a different way.
> > igb just ignores input tx_free_thresh, while em has only full featured path.
> >
> > What I am saying, existing app that uses TX fast-path and sets tx_free_thresh=32
> > (as we did in our examples in previous versions) will experience a slowdown,
> > if we'll make all TX functions to behave like full-featured ones
> > (txq->nb_tx_desc - txq->nb_tx_free > txq->tx_free_thresh).
> >
> >  From other side, if app uses TX full-featured TX and sets tx_free_thresh=32,
> > then it  already has a possible slowdown, because of too often TXDs checking.
> > So, if we'll change tx_free_thresh semantics to wht fast-path uses,
> > It shouldn't see any slowdown, in fact it might see some improvement.
> >
> >> But I guess most apps are going with 0, which sets the drivers default.
> >> Others have to change the value to nb_txd - curr_value to have the same
> >> behaviour
> >>
> >>> Another way would be to make all TX functions to treat tx_conf->tx_free_thresh as fast-path TX functions do
> >>> (free TXDs when number of free TXDs drops below  tx_free_thresh) and update  rte_ethdev.h comments.
> >> And i40e and e1000e code as well. I don't see what difference it makes
> >> which way of definition you use, what I care is that it should be used
> >> consistently.
> >
> > Yes, both ways are possible, the concern is - how to minimise the impact for existing apps.
> > That's why I am leaning to the fast-path way.
> 
> Make sense to favour the fast-path way, I'll look into that and try to
> come up with a patch
> 
> >
> >>>
> >>> Though, I am not sure that it really worth all these changes.
> >>>   From one side, whatever tx_free_thresh would be,
> >>> the app should still assume that the worst case might happen,
> >>> and up to nb_tx_desc mbufs can be consumed by the queue.
> >>>   From other side, I think the default value should work well for most cases.
> >>> So I am still for graceful deprecation of that config parameter, see below.
> >>>
> >>>>
> >>>>> Anyway, with current PMD implementation, you can't guarantee that at any moment
> >>>>> TX queue wouldn't use more than tx_free_thresh mbufs.
> >>>>
> >>>>
> >>>>> There could be situations (low speed, or link is down for some short period, etc), when
> >>>>> much more than tx_free_thresh TXDs are in use and none of them could be freed by HW right now.
> >>>>> So your app better be prepared, that up to (nb_tx_desc * num_of_TX_queues) could be in use
> >>>>> by TX path at any given moment.
> >>>>>
> >>>>> Though yes,  there is an inconsistency how different ixgbe TX functions treat tx_conf->tx_free_thresh parameter.
> >>>>> That probably creates wrong expectations and confusion.
> >>>> Yes, ixgbe_xmit_pkts() use it the way it's defined, this two function
> >>>> doesn't.
> >>>>
> >>>>> We might try to unify it's usage one way or another, but I personally don't see much point in it.
> >>>>> After all, tx_free_tresh seems like a driver internal choice (based on the nb_tx_desc and other parameters).
> >>>>> So I think a better way would be:
> >>>>> 1. Deprecate tx_conf->tx_free_thresh (and remove it in later releases) and make
> >>>>> each driver to use what it thinks would be the best value.
> >>>> But how does the driver knows what's the best for the applications
> >>>> traffic pattern? I think it's better to leave the possibility for the
> >>>> app to fine tune it.
> >>>
> >>> My understanding is that for most cases the default value should do pretty well.
> >>> That default value, shouldn't be too small, so we avoid unnecessary & unsuccessful checks,
> >>> and probably shouldn't be too big, to prevent unnecessary mbufs consumption
> >>> (something between nb_tx_desc / 2 and 3 * nb_tx_desc / 4 probably).
> >> I agree
> >>
> >>>
> >>> But might be you have a good example, when such tuning is needed?
> >>> For what traffic patterns you would set tx_free_thresh to some different values,
> >>> and how will it impact performance?
> >> I don't have an actual example, but I think it's worth to keep this
> >> tuning option if we already have it. Most people probably wouldn't use
> >> it, but I can imagine that the very enthusiastic wants to try out
> >> different settings to find the best.
> >> E.g. I was testing odp_l2fwd when I came across the problem, and I found
> >> it useful to have this option. With its traffic pattern (receive a batch
> >> of packets then send them out on an another interface) it can happen
> >> that with different clock speeds you can find different optimums.
> >
> > Actually, after thinking about it a bit more -
> > I think it would more depend on how many RX/TX queues particular  lcore has to manage.
> > So, as you said, yes we probably better leave it, at least for now.
> >
> >>
> >>>
> >>> Again, if there would be tx_free_pkts(), why someone would also need a tx_conf->tx_free_thresh?
> >> I think about tx_free_pkts as a rainy day option, when you want ALL TX
> >> completed packets to be released, because you are out of buffers.
> >
> > What do you mean as 'rainy day' here?
> > App getting short of mbufs?
> > As I said before, it could be absolutely valid situation when
> > up to nb_tx_desc mbufs per TX queue can be in use.
> > So the app better be prepared for such situation anyway.
> > Either make sure there are enough mbufs in the pool,
> > or somehow gracefully degrade the service.
> > Again, if nb_tx_desc is big (let say for ixgbe it could be up to 4K),
> > freeing all TXDs at once could take a long time.
> > So I think it should be possible to specify maximum number of mbufs to free per call.
> >
> > My thought people will use it to release mbufs when there is an idle period.
> > Something like:
> >
> > n = rx_burst(...);
> > if (n == 0)  { ...; tx_free_mbufs(...); ... }
> > else {...}
> 
> Yes, this is similar to what I'm doing in odp-dpdk:
> 
> https://git.linaro.org/lng/odp-dpdk.git/commitdiff/1a8df254e18bb50dbd835729bc3d01fcb87ebc6b
> 
> The only problem is when the tx_free_threshold is not reached, the
> interfaces doesn't even start releasing the mbufs. And when you can't
> receive anything, it's likely you won't send out anything as well, which
> would normally trigger the releasing of the completed buffers.

Yep, that's why I think it is good to add this function:
If you are going to TX something, then you'll probably hit tx_free_thresh
and PMD will release unused mbufs 'automatically'.
If you have nothing to TX (idle period), you can try to release unused mbufs manually:
by calling tx_free_mbufs().

Konstantin

> 
> >
> > Konstantin
> >
> >> While
> >> tx_free_thresh is the fast path way of TX completion, when you have the
> >> room to wait for more packets to be gathered.
> >>
> >>>
> >>> Konstantin
> >>>
> >>>> In the meantime we can improve the default selection as well, as I
> >>>> suggested above.
> >>>>
> >>>>> 2. As you suggested in another mail, introduce an new function:
> >>>>> uint16_t rte_eth_tx_free_pkts(port_id, queue_id, nb_to_free).
> >>>>> That would give upper layer a better control of memory usage, and might be called by the upper layer at idle time,
> >>>>> so further tx_burst, don't need to spend time on freeing TXDs/packets.
> >>>> I agree.
> >>>>
> >>>>>
> >>>>> Konstantin
> >>>>>
> >>>>>
> >>>>>>
> >>>>>> Regards,
> >>>>>>
> >>>>>> Zoltan
> >>>>>>
> >>>>>> On 27/05/15 21:12, Zoltan Kiss wrote:
> >>>>>>> This check doesn't do what's required by rte_eth_tx_burst:
> >>>>>>> "When the number of previously sent packets reached the "minimum transmit
> >>>>>>> packets to free" threshold"
> >>>>>>>
> >>>>>>> This can cause problems when txq->tx_free_thresh + [number of elements in the
> >>>>>>> pool] < txq->nb_tx_desc.
> >>>>>>>
> >>>>>>> Signed-off-by: Zoltan Kiss <zoltan.kiss at linaro.org>
> >>>>>>> ---
> >>>>>>>      drivers/net/ixgbe/ixgbe_rxtx.c     | 4 ++--
> >>>>>>>      drivers/net/ixgbe/ixgbe_rxtx_vec.c | 2 +-
> >>>>>>>      2 files changed, 3 insertions(+), 3 deletions(-)
> >>>>>>>
> >>>>>>> diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c
> >>>>>>> index 4f9ab22..b70ed8c 100644
> >>>>>>> --- a/drivers/net/ixgbe/ixgbe_rxtx.c
> >>>>>>> +++ b/drivers/net/ixgbe/ixgbe_rxtx.c
> >>>>>>> @@ -250,10 +250,10 @@ tx_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
> >>>>>>>
> >>>>>>>      	/*
> >>>>>>>      	 * Begin scanning the H/W ring for done descriptors when the
> >>>>>>> -	 * number of available descriptors drops below tx_free_thresh.  For
> >>>>>>> +	 * number of in flight descriptors reaches tx_free_thresh. For
> >>>>>>>      	 * each done descriptor, free the associated buffer.
> >>>>>>>      	 */
> >>>>>>> -	if (txq->nb_tx_free < txq->tx_free_thresh)
> >>>>>>> +	if ((txq->nb_tx_desc - txq->nb_tx_free) > txq->tx_free_thresh)
> >>>>>>>      		ixgbe_tx_free_bufs(txq);
> >>>>>>>
> >>>>>>>      	/* Only use descriptors that are available */
> >>>>>>> diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec.c b/drivers/net/ixgbe/ixgbe_rxtx_vec.c
> >>>>>>> index abd10f6..f91c698 100644
> >>>>>>> --- a/drivers/net/ixgbe/ixgbe_rxtx_vec.c
> >>>>>>> +++ b/drivers/net/ixgbe/ixgbe_rxtx_vec.c
> >>>>>>> @@ -598,7 +598,7 @@ ixgbe_xmit_pkts_vec(void *tx_queue, struct rte_mbuf **tx_pkts,
> >>>>>>>      	if (unlikely(nb_pkts > RTE_IXGBE_VPMD_TX_BURST))
> >>>>>>>      		nb_pkts = RTE_IXGBE_VPMD_TX_BURST;
> >>>>>>>
> >>>>>>> -	if (txq->nb_tx_free < txq->tx_free_thresh)
> >>>>>>> +	if ((txq->nb_tx_desc - txq->nb_tx_free) > txq->tx_free_thresh)
> >>>>>>>      		ixgbe_tx_free_bufs(txq);
> >>>>>>>
> >>>>>>>      	nb_commit = nb_pkts = (uint16_t)RTE_MIN(txq->nb_tx_free, nb_pkts);
> >>>>>>>


More information about the dev mailing list