[dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for all NICs but 82598

Vlad Zolotarov vladz at cloudius-systems.com
Tue Aug 25 22:07:18 CEST 2015



On 08/25/15 22:30, Vladislav Zolotarov wrote:
>
>
> On Aug 25, 2015 22:16, "Zhang, Helin" <helin.zhang at intel.com 
> <mailto:helin.zhang at intel.com>> wrote:
> >
> >
> >
> > > -----Original Message-----
> > > From: Vlad Zolotarov [mailto:vladz at cloudius-systems.com 
> <mailto:vladz at cloudius-systems.com>]
> > > Sent: Tuesday, August 25, 2015 11:53 AM
> > > To: Zhang, Helin
> > > Cc: Lu, Wenzhuo; dev at dpdk.org <mailto:dev at dpdk.org>
> > > Subject: Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh 
> above 1 for
> > > all NICs but 82598
> > >
> > >
> > >
> > > On 08/25/15 21:43, Zhang, Helin wrote:
> > > >
> > > > Hi Vlad
> > > >
> > > > I think this could possibly be the root cause of your TX hang issue.
> > > > Please try to limit the number to 8 or less, and then see if the 
> issue
> > > > will still be there or not?
> > > >
> > >
> > > Helin, the issue has been seen on x540 devices. Pls., see a chapter
> > > 7.2.1.1 of x540 devices spec:
> > >
> > > A packet (or multiple packets in transmit segmentation) can span 
> any number of
> > > buffers (and their descriptors) up to a limit of 40 minus WTHRESH 
> minus 2 (see
> > > Section 7.2.3.3 for Tx Ring details and section Section 7.2.3.5.1 
> for WTHRESH
> > > details). For best performance it is recommended to minimize the 
> number of
> > > buffers as possible.
> > >
> > > Could u, pls., clarify why do u think that the maximum number of 
> data buffers is
> > > limited by 8?
> > OK, i40e hardware is 8
>
> For i40 it's a bit more complicated than just "not more than 8" - it's 
> not more than 8 for a non-TSO packet and not more than 8 for each MSS 
> including headers buffers for TSO. But this thread is not about i40e 
> so this doesn't seem to be relevant anyway.
>
> , so I'd assume x540 could have a similar one.
>
> x540 spec assumes otherwise... 😉
>
> Yes, in your case,
> > the limit could be around 38, right?
>
> If by "around 38" u mean "exactly 38" then u are absolutely right... 😉
>
> > Could you help to make sure there is no packet to be transmitted 
> uses more than
> > 38 descriptors?
>
> Just like i've already mentioned, we limit the cluster by at most 33 
> data segments. Therefore we are good here...
>
> > I heard that there is a similar hang issue on X710 if using more 
> than 8 descriptors for
> > a single packet. I am wondering if the issue is similar on x540.
>
> What's x710? If that's xl710 40G nics (i40e driver),
>

I've found what x710 NICs are - they are another NICs family managed by 
i40e PMD. Therefore the rest of what I said stands the same... ;)

> then it has its own specs with its own HW limitations i've mentioned 
> above. It has nothing to do with this thread that is all about 10G 
> nics managed by ixgbe driver.
>
> There is a different thread, where i've raised the 40G NICs xmit 
> issues. See "i40e xmit path HW limitation" thread.
>
> >
> > Regards,
> > Helin
> >
> > >
> > > thanks,
> > > vlad
> > >
> > > > It does not have any check for the number of descriptors to be used
> > > > for a single packet, and it relies on the users to give correct mbuf
> > > > chains.
> > > >
> > > > We may need a check of this somewhere. Of cause the point you
> > > > indicated we also need to carefully investigate or fix.
> > > >
> > > > Regards,
> > > >
> > > > Helin
> > > >
> > > > *From:*Vladislav Zolotarov [mailto:vladz at cloudius-systems.com 
> <mailto:vladz at cloudius-systems.com>]
> > > > *Sent:* Tuesday, August 25, 2015 11:34 AM
> > > > *To:* Zhang, Helin
> > > > *Cc:* Lu, Wenzhuo; dev at dpdk.org <mailto:dev at dpdk.org>
> > > > *Subject:* RE: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh
> > > > above 1 for all NICs but 82598
> > > >
> > > >
> > > > On Aug 25, 2015 21:14, "Zhang, Helin" <helin.zhang at intel.com 
> <mailto:helin.zhang at intel.com>
> > > > <mailto:helin.zhang at intel.com <mailto:helin.zhang at intel.com>>> 
> wrote:
> > > > >
> > > > > Hi Vlad
> > > > >
> > > > >
> > > > >
> > > > > In addition, I’d double check with you what’s the maximum 
> number of
> > > > descriptors would be used for a single packet transmitting?
> > > > >
> > > > > Datasheet said that it supports up to 8. I am wondering if 
> more than
> > > > 8 were used in your case?
> > > >
> > > > If memory serves me well the maximum number of data descriptors per
> > > > single xmit packet is 40 minus 2 minus WTHRESH. Since WTHRESH in 
> DPDK
> > > > is always zero it gives us 38 segments. We limit them by 33.
> > > >
> > > > >
> > > > > Thank you very much!
> > > > >
> > > > >
> > > > >
> > > > > Regards,
> > > > >
> > > > > Helin
> > > > >
> > > > >
> > > > >
> > > > > From: Zhang, Helin
> > > > > Sent: Wednesday, August 19, 2015 10:29 AM
> > > > > To: Vladislav Zolotarov
> > > > > Cc: Lu, Wenzhuo; dev at dpdk.org <mailto:dev at dpdk.org> 
> <mailto:dev at dpdk.org <mailto:dev at dpdk.org>>
> > > > >
> > > > > Subject: RE: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh
> > > > above 1 for all NICs but 82598
> > > > >
> > > > >
> > > > >
> > > > > Hi Vlad
> > > > >
> > > > >
> > > > >
> > > > > Thank you very much for the patches! Give me a few more time to
> > > > double check with more guys, and possibly hardware experts.
> > > > >
> > > > >
> > > > >
> > > > > Regards,
> > > > >
> > > > > Helin
> > > > >
> > > > >
> > > > >
> > > > > From: Vladislav Zolotarov [mailto:vladz at cloudius-systems.com 
> <mailto:vladz at cloudius-systems.com>
> > > > <mailto:vladz at cloudius-systems.com 
> <mailto:vladz at cloudius-systems.com>>]
> > > > > Sent: Tuesday, August 18, 2015 9:56 PM
> > > > > To: Lu, Wenzhuo
> > > > > Cc: dev at dpdk.org <mailto:dev at dpdk.org> <mailto:dev at dpdk.org 
> <mailto:dev at dpdk.org>>; Zhang, Helin
> > > > > Subject: RE: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh
> > > > above 1 for all NICs but 82598
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Aug 19, 2015 03:42, "Lu, Wenzhuo" <wenzhuo.lu at intel.com 
> <mailto:wenzhuo.lu at intel.com>
> > > > <mailto:wenzhuo.lu at intel.com <mailto:wenzhuo.lu at intel.com>>> wrote:
> > > > > >
> > > > > > Hi Helin,
> > > > > >
> > > > > > > -----Original Message-----
> > > > > > > From: dev [mailto:dev-bounces at dpdk.org 
> <mailto:dev-bounces at dpdk.org>
> > > > <mailto:dev-bounces at dpdk.org <mailto:dev-bounces at dpdk.org>>] On 
> Behalf Of Vlad Zolotarov
> > > > > > > Sent: Friday, August 14, 2015 1:38 PM
> > > > > > > To: Zhang, Helin; dev at dpdk.org <mailto:dev at dpdk.org> 
> <mailto:dev at dpdk.org <mailto:dev at dpdk.org>>
> > > > > > > Subject: Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid
> > > > tx_rs_thresh above 1 for
> > > > > > > all NICs but 82598
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On 08/13/15 23:28, Zhang, Helin wrote:
> > > > > > > > Hi Vlad
> > > > > > > >
> > > > > > > > I don't think the changes are needed. It says in datasheet
> > > > that the RS
> > > > > > > > bit should be set on the last descriptor of every 
> packet, ONLY
> > > > WHEN
> > > > > > > TXDCTL.WTHRESH equals to ZERO.
> > > > > > >
> > > > > > > Of course it's needed! See below.
> > > > > > > Exactly the same spec a few lines above the place u've just
> > > > quoted states:
> > > > > > >
> > > > > > > "Software should not set the RS bit when TXDCTL.WTHRESH is
> > > > greater than
> > > > > > > zero."
> > > > > > >
> > > > > > > And since all three (3) ixgbe xmit callbacks are utilizing RS
> > > > bit notation ixgbe PMD
> > > > > > > is actually not supporting any value of WTHRESH different 
> from zero.
> > > > > > I think Vlad is right. We need to fix this issue. Any 
> suggestion?
> > > > If not, I'd like to ack this patch.
> > > > >
> > > > > Pls., note that there is a v2 of this patch on the list. I 
> forgot to
> > > > patch ixgbevf_dev_info_get() in v1.
> > > > >
> > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > Regards,
> > > > > > > > Helin
> > > > > > > >
> > > > > > > >> -----Original Message-----
> > > > > > > >> From: Vlad Zolotarov [mailto:vladz at cloudius-systems.com 
> <mailto:vladz at cloudius-systems.com>
> > > > <mailto:vladz at cloudius-systems.com 
> <mailto:vladz at cloudius-systems.com>>]
> > > > > > > >> Sent: Thursday, August 13, 2015 11:07 AM
> > > > > > > >> To: dev at dpdk.org <mailto:dev at dpdk.org> 
> <mailto:dev at dpdk.org <mailto:dev at dpdk.org>>
> > > > > > > >> Cc: Zhang, Helin; Ananyev, Konstantin;
> > > > avi at cloudius-systems.com <mailto:avi at cloudius-systems.com> 
> <mailto:avi at cloudius-systems.com <mailto:avi at cloudius-systems.com>>; Vlad
> > > > > > > >> Zolotarov
> > > > > > > >> Subject: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid 
> tx_rs_thresh
> > > > above 1 for
> > > > > > > all
> > > > > > > >> NICs but 82598
> > > > > > > >>
> > > > > > > >> According to 82599 and x540 HW specifications RS bit *must*
> > > > be set in the
> > > > > > > last
> > > > > > > >> descriptor of *every* packet.
> > > > > > > > There is a condition that if TXDCTL.WTHRESH equal to zero.
> > > > > > >
> > > > > > > Right and ixgbe PMD requires this condition to be fulfilled in
> > > > order to
> > > > > > > function. See above.
> > > > > > >
> > > > > > > >
> > > > > > > >> This patch fixes the Tx hang we were constantly hitting 
> with
> > > > a seastar-based
> > > > > > > >> application on x540 NIC.
> > > > > > > > Could you help to share with us how to reproduce the tx hang
> > > > issue, with using
> > > > > > > > typical DPDK examples?
> > > > > > >
> > > > > > > Sorry. I'm not very familiar with the typical DPDK examples to
> > > > help u
> > > > > > > here. However this is quite irrelevant since without this this
> > > > > > > patch ixgbe PMD obviously abuses the HW spec as has been 
> explained
> > > above.
> > > > > > >
> > > > > > > We saw the issue when u stressed the xmit path with a lot of
> > > > > > > highly fragmented TCP frames (packets with up to 33 fragments
> > > > > > > with
> > > > non-headers
> > > > > > > fragments as small as 4 bytes) with all offload features 
> enabled.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > vlad
> > > > > > > >
> > > > > > > >> Signed-off-by: Vlad Zolotarov 
> <vladz at cloudius-systems.com <mailto:vladz at cloudius-systems.com>
> > > > <mailto:vladz at cloudius-systems.com 
> <mailto:vladz at cloudius-systems.com>>>
> > > > > > > >> ---
> > > > > > > >>  drivers/net/ixgbe/ixgbe_ethdev.c |  9 +++++++++
> > > > > > > >>  drivers/net/ixgbe/ixgbe_rxtx.c  | 23 
> ++++++++++++++++++++++-
> > > > > > > >>   2 files changed, 31 insertions(+), 1 deletion(-)
> > > > > > > >>
> > > > > > > >> diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c
> > > > > > > >> b/drivers/net/ixgbe/ixgbe_ethdev.c
> > > > > > > >> index b8ee1e9..6714fd9 100644
> > > > > > > >> --- a/drivers/net/ixgbe/ixgbe_ethdev.c
> > > > > > > >> +++ b/drivers/net/ixgbe/ixgbe_ethdev.c
> > > > > > > >> @@ -2414,6 +2414,15 @@ ixgbe_dev_info_get(struct 
> rte_eth_dev
> > > > *dev,
> > > > > > > struct
> > > > > > > >> rte_eth_dev_info *dev_info)
> > > > > > > >>            .txq_flags = ETH_TXQ_FLAGS_NOMULTSEGS |
> > > > > > > >> ETH_TXQ_FLAGS_NOOFFLOADS,
> > > > > > > >>    };
> > > > > > > >> +
> > > > > > > >> +  /*
> > > > > > > >> +   * According to 82599 and x540 specifications RS bit
> > > > *must* be set on
> > > > > > > the
> > > > > > > >> +   * last descriptor of *every* packet. Therefore we will
> > > > not allow the
> > > > > > > >> +   * tx_rs_thresh above 1 for all NICs newer than 82598.
> > > > > > > >> +   */
> > > > > > > >> +  if (hw->mac.type > ixgbe_mac_82598EB)
> > > > > > > >> + dev_info->default_txconf.tx_rs_thresh = 1;
> > > > > > > >> +
> > > > > > > >> dev_info->hash_key_size = IXGBE_HKEY_MAX_INDEX *
> > > > sizeof(uint32_t);
> > > > > > > >>    dev_info->reta_size = ETH_RSS_RETA_SIZE_128;
> > > > > > > >> dev_info->flow_type_rss_offloads = IXGBE_RSS_OFFLOAD_ALL;
> > > > > > > >> diff --
> > > > > > > git
> > > > > > > >> a/drivers/net/ixgbe/ixgbe_rxtx.c
> > > > b/drivers/net/ixgbe/ixgbe_rxtx.c index
> > > > > > > >> 91023b9..8dbdffc 100644
> > > > > > > >> --- a/drivers/net/ixgbe/ixgbe_rxtx.c
> > > > > > > >> +++ b/drivers/net/ixgbe/ixgbe_rxtx.c
> > > > > > > >> @@ -2085,11 +2085,19 @@ ixgbe_dev_tx_queue_setup(struct
> > > > rte_eth_dev
> > > > > > > >> *dev,
> > > > > > > >>    struct ixgbe_tx_queue *txq;
> > > > > > > >>    struct ixgbe_hw  *hw;
> > > > > > > >>    uint16_t tx_rs_thresh, tx_free_thresh;
> > > > > > > >> +  bool rs_deferring_allowed;
> > > > > > > >>
> > > > > > > >>    PMD_INIT_FUNC_TRACE();
> > > > > > > >>    hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
> > > > > > > >>
> > > > > > > >>    /*
> > > > > > > >> +   * According to 82599 and x540 specifications RS bit
> > > > *must* be set on
> > > > > > > the
> > > > > > > >> +   * last descriptor of *every* packet. Therefore we will
> > > > not allow the
> > > > > > > >> +   * tx_rs_thresh above 1 for all NICs newer than 82598.
> > > > > > > >> +   */
> > > > > > > >> +  rs_deferring_allowed = (hw->mac.type <=
> > > > > > > >> + ixgbe_mac_82598EB);
> > > > > > > >> +
> > > > > > > >> +  /*
> > > > > > > >>     * Validate number of transmit descriptors.
> > > > > > > >>     * It must not exceed hardware maximum, and must be 
> multiple
> > > > > > > >>     * of IXGBE_ALIGN.
> > > > > > > >> @@ -2110,6 +2118,8 @@ ixgbe_dev_tx_queue_setup(struct
> > > > > > > >> rte_eth_dev
> > > > > > > *dev,
> > > > > > > >>     * to transmit a packet is greater than the number 
> of free TX
> > > > > > > >>     * descriptors.
> > > > > > > >>     * The following constraints must be satisfied:
> > > > > > > >> +   *  tx_rs_thresh must be less than 2 for NICs for 
> which RS
> > > > deferring is
> > > > > > > >> +   *  forbidden (all but 82598).
> > > > > > > >>     *  tx_rs_thresh must be greater than 0.
> > > > > > > >>     *  tx_rs_thresh must be less than the size of the ring
> > > > minus 2.
> > > > > > > >>     *  tx_rs_thresh must be less than or equal to 
> tx_free_thresh.
> > > > > > > >> @@ -2121,9 +2131,20 @@ ixgbe_dev_tx_queue_setup(struct
> > > > rte_eth_dev
> > > > > > > *dev,
> > > > > > > >>     * When set to zero use default values.
> > > > > > > >>     */
> > > > > > > >>    tx_rs_thresh = (uint16_t)((tx_conf->tx_rs_thresh) ?
> > > > > > > >> - tx_conf->tx_rs_thresh : DEFAULT_TX_RS_THRESH);
> > > > > > > >> + tx_conf->tx_rs_thresh :
> > > > > > > >> + (rs_deferring_allowed ? DEFAULT_TX_RS_THRESH : 1));
> > > > > > > >>    tx_free_thresh = (uint16_t)((tx_conf->tx_free_thresh) ?
> > > > > > > >> tx_conf->tx_free_thresh : DEFAULT_TX_FREE_THRESH);
> > > > > > > >> +
> > > > > > > >> +  if (!rs_deferring_allowed && tx_rs_thresh > 1) {
> > > > > > > >> + PMD_INIT_LOG(ERR, "tx_rs_thresh must be less than
> > > > 2 since RS
> > > > > > > "
> > > > > > > >> +     "must be set for every packet
> > > > for this HW. "
> > > > > > > >> + "(tx_rs_thresh=%u port=%d queue=%d)",
> > > > > > > >> +  (unsigned int)tx_rs_thresh,
> > > > > > > >> + (int)dev->data->port_id, (int)queue_idx);
> > > > > > > >> +          return -(EINVAL);
> > > > > > > >> +  }
> > > > > > > >> +
> > > > > > > >>    if (tx_rs_thresh >= (nb_desc - 2)) {
> > > > > > > >> PMD_INIT_LOG(ERR, "tx_rs_thresh must be less than
> > > > > > > >> the
> > > > > > > number "
> > > > > > > >>  "of TX descriptors minus 2.
> > > > (tx_rs_thresh=%u "
> > > > > > > >> --
> > > > > > > >> 2.1.0
> > > > > >
> > > >
> >
>



More information about the dev mailing list