[dpdk-dev] [PATCH] net/ixgbe: fix busy polling while fiber link update

Ilya Maximets i.maximets at samsung.com
Mon Oct 15 10:40:18 CEST 2018


On 15.10.2018 06:03, Zhao1, Wei wrote:
> Hi, Ilya Maximets
> 
>> -----Original Message-----
>> From: Ilya Maximets [mailto:i.maximets at samsung.com]
>> Sent: Friday, October 12, 2018 6:15 PM
>> To: Zhao1, Wei <wei.zhao1 at intel.com>; Zhang, Qi Z <qi.z.zhang at intel.com>;
>> Laurent Hardy <laurent.hardy at 6wind.com>
>> Cc: Lu, Wenzhuo <wenzhuo.lu at intel.com>; Ananyev, Konstantin
>> <konstantin.ananyev at intel.com>; stable at dpdk.org; dev at dpdk.org
>> Subject: Re: [dpdk-dev] [PATCH] net/ixgbe: fix busy polling while fiber link
>> update
>>
>> On 12.10.2018 12:19, Zhao1, Wei wrote:
>>> Hi,
>>>
>>>> -----Original Message-----
>>>> From: Ilya Maximets [mailto:i.maximets at samsung.com]
>>>> Sent: Thursday, October 11, 2018 6:27 PM
>>>> To: Zhao1, Wei <wei.zhao1 at intel.com>; Zhang, Qi Z
>>>> <qi.z.zhang at intel.com>; Laurent Hardy <laurent.hardy at 6wind.com>
>>>> Cc: Lu, Wenzhuo <wenzhuo.lu at intel.com>; Ananyev, Konstantin
>>>> <konstantin.ananyev at intel.com>; stable at dpdk.org; dev at dpdk.org
>>>> Subject: Re: [dpdk-dev] [PATCH] net/ixgbe: fix busy polling while
>>>> fiber link update
>>>>
>>>> On 11.10.2018 12:56, Zhao1, Wei wrote:
>>>>> Hi,  Ilya Maximets AND laurent.hardy
>>>>
>>>> Hi, thanks for sharing your thoughts.
>>>> Comments inline.
>>>>
>>>>>
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Ilya Maximets
>>>>>> Sent: Wednesday, September 12, 2018 4:05 PM
>>>>>> To: Zhang, Qi Z <qi.z.zhang at intel.com>; dev at dpdk.org
>>>>>> Cc: Lu, Wenzhuo <wenzhuo.lu at intel.com>; Ananyev, Konstantin
>>>>>> <konstantin.ananyev at intel.com>; Laurent Hardy
>>>>>> <laurent.hardy at 6wind.com>; Dai, Wei <wei.dai at intel.com>;
>>>>>> stable at dpdk.org
>>>>>> Subject: Re: [dpdk-dev] [PATCH] net/ixgbe: fix busy polling while
>>>>>> fiber link update
>>>>>>
>>>>>> On 12.09.2018 09:49, Zhang, Qi Z wrote:
>>>>>>>
>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: Ilya Maximets [mailto:i.maximets at samsung.com]
>>>>>>>> Sent: Monday, September 10, 2018 11:09 PM
>>>>>>>> To: Zhang, Qi Z <qi.z.zhang at intel.com>; dev at dpdk.org
>>>>>>>> Cc: Lu, Wenzhuo <wenzhuo.lu at intel.com>; Ananyev, Konstantin
>>>>>>>> <konstantin.ananyev at intel.com>; Laurent Hardy
>>>>>>>> <laurent.hardy at 6wind.com>; Dai, Wei <wei.dai at intel.com>;
>>>>>>>> stable at dpdk.org
>>>>>>>> Subject: Re: [dpdk-dev] [PATCH] net/ixgbe: fix busy polling while
>>>>>>>> fiber link update
>>>>>>>>
>>>>>>>> On 04.09.2018 09:08, Zhang, Qi Z wrote:
>>>>>>>>> Hi Ilya:
>>>>>>>>>
>>>>>>>>>> -----Original Message-----
>>>>>>>>>> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Ilya
>>>>>>>>>> Maximets
>>>>>>>>>> Sent: Friday, August 31, 2018 8:40 PM
>>>>>>>>>> To: dev at dpdk.org
>>>>>>>>>> Cc: Lu, Wenzhuo <wenzhuo.lu at intel.com>; Ananyev, Konstantin
>>>>>>>>>> <konstantin.ananyev at intel.com>; Laurent Hardy
>>>>>>>>>> <laurent.hardy at 6wind.com>; Dai, Wei <wei.dai at intel.com>; Ilya
>>>>>>>>>> Maximets <i.maximets at samsung.com>; stable at dpdk.org
>>>>>>>>>> Subject: [dpdk-dev] [PATCH] net/ixgbe: fix busy polling while
>>>>>>>>>> fiber link update
>>>>>>>>>>
>>>>>>>>>> If the multispeed fiber link is in DOWN state, ixgbe_setup_link
>>>>>>>>>> could take around a second of busy polling. This is highly
>>>>>>>>>> inconvenient for the case where single thread periodically
>>>>>>>>>> checks the
>>>>>> link statuses.
>>>>>>>>>> For example, OVS main thread periodically updates the link
>>>>>>>>>> statuses and hangs for a really long time busy waiting on
>>>>>>>>>> ixgbe_setup_link() for a DOWN fiber ports. For case with 3 down
>>>>>>>>>> ports it hangs for a 3 seconds and unable to do anything
>>>>>>>>>> including
>>>> packet processing.
>>>>>>>>>> Fix that by shifting that workaround to a separate thread by
>>>>>>>>>> alarm handler that will try to set up link if it is DOWN.
>>>>>>>>>
>>>>>>>>> Does that mean we will block the interrupt thread for 3 seconds?
>>>>>>>>
>>>>>>>> Three times for one second. Other work could be scheduled
>> between.
>>>>>>>> IMHO, it's much better than blocking usual caller for 3 seconds.
>>>>>>>>
>>>>>>>>> Also, can we guarantee there will not be any race condition if
>>>>>>>>> we call
>>>>>>>> ixgbe_setup_link at another thread, the base code API is not
>>>>>>>> assumed to be thread-safe as I know.
>>>>>>>>
>>>>>>>> The only user of 'ixgbe_setup_link' is 'ixgbe_dev_start', but it
>>>>>>>> could be called only if device stopped. 'ixgbe_dev_stop' cancels
>>>>>>>> the
>>>> alarm.
>>>>>>>> Race with 'link_update' avoided by
>> 'IXGBE_FLAG_NEED_LINK_CONFIG'
>>>>>> flag.
>>>>>>>
>>>>>>> I guess, it' not only about when ixgb_setup_link race with itself,
>>>>>>> but also
>>>>>> when it race with other APIs.
>>>>>>> Also the concern is, even in current version, we can prove there
>>>>>>> is no issue,
>>>>>> how can we guarantee we are safe for future base code update? It's
>>>>>> not designed as thread-safe.
>>>>>>> For my option, the change is risky.
>>>>>>
>>>>>> In current implementation interrupt handler already calls the
>>>>>> 'ixgbe_dev_link_update' which subsequently calls 'ixgbe_setup_link'
>>>>>> in our case if LSC interrupts enabled. So, my change makes the
>>>>>> driver even safer by moving 'ixgbe_setup_link' to the same interrupt
>> thread.
>>>>>> Otherwise two threads (interrupts handler and the link status
>>>>>> checking
>>>>>> thread) could call 'ixgbe_setup_link' simultaneously.
>>>>>>
>>>>>>>
>>>>>>> Btw, since ixgbe support LSC, it is not necessary for "single
>>>>>>> thread
>>>>>> periodically checks the link statuses", right?
>>>>>>
>>>>>> In current implementation it will take at least 5 seconds (4 + 1)
>>>>>> for the interrupt handler to detect DOWN link state for ixgbe
>>>>>> multispeed fiber. This is too much for many real world cases.
>>>>>
>>>>> I have reviewed  this patch, now I agree with you of the point that
>>>>> when port is down, " main thread periodically updates the link
>>>>> statuses and
>>>> hangs for a really long time busy waiting on ixgbe_setup_link() for a
>>>> DOWN fiber ports ".
>>>>> This is introduced by a patch in the following:
>>>>> SHA-1: c12d22f65b132c56db7b4fdbfd5ddce27d1e9572
>>>>> * net/ixgbe: ensure link status is updated
>>>>>
>>>>> Because in this patch, ixgbe_setup_link() is called with input
>>>>> parameter
>>>> autoneg_wait_to_complete=1, this will cause loop check and sleep delay.
>>>>> At least 82599 seems has this delay.(BTW, whivh type of NIC are you
>>>>> use? X550 or 82599)
>>>>
>>>> I have 82599.
>>>>
>>>>> Your solution is add a eal_alarm_set for ixgbe_setup_link in the
>>>>> thread of
>>>> PMD driver, and do the set up work in that thread, is that right?
>>>>> And main thread avoid hang by the flag of
>>>> IXGBE_FLAG_NEED_LINK_CONFIG.
>>>>> I think this is a good idea for this problem, but it may cause
>>>>> problem for other legacy user of ixgbe pmd, because their legacy
>>>>> code, which use
>>>> main thread  to check link state and setup_link when port is down,
>>>> and they are not aware of it is done by other thread if add your patch.
>>>>
>>>> What are these applications? I see no public API for setup_link function.
>>>> It's internal to driver and should not be used externally.
>>>> Am I missing something?
>>>
>>> rte_eth_link_get() ,  it will also call the function of ixgbe_setup_link().
>>>
>>
>> rte_eth_link_get() does not call ixgbe_setup_link().
>> It only calls dev_ops->link_update().
> 
> No,  dev_ops->link_update call function ixgbe_dev_link_update()
> -> ixgbe_dev_link_update_share() -> ixgbe_setup_link()

But with my patch, calling of ixgbe_setup_link() happens in a separate
(interrupt) thread. There is no direct call from ixgbe_dev_link_update_share().
All the calls from the interrupt thread are sequenced by implementation.

> 
> 
>>
>>>
>>>>
>>>>>
>>>>> And is that ok if we change code in ixgbe_dev_link_update_share() to
>>>>>
>>>>> ixgbe_dev_link_update_share()
>>>>> {
>>>>>
>>>>> 	/* check if it needs to wait to complete, if lsc interrupt is enabled */
>>>>> 	if (wait_to_complete == 0 || dev->data->dev_conf.intr_conf.lsc != 0)
>>>>> 		wait = 0;
>>>>>
>>>>> 	if ((intr->flags & IXGBE_FLAG_NEED_LINK_CONFIG) &&
>>>>> 		ixgbe_get_media_type(hw) == ixgbe_media_type_fiber) {
>>>>> 		speed = hw->phy.autoneg_advertised;
>>>>> 		if (!speed)
>>>>> 			ixgbe_get_link_capabilities(hw, &speed, &autoneg);
>>>>> 		ixgbe_setup_link(hw, speed, wait);
>>>>> 	}
>>>>> }
>>>>>
>>>>> Then, your application can call rte_eth_link_get_nowait () to make
>>>>> wait_to_complete=0 when doing periodic link state check, Which will
>>>>> not
>>>> cause  loop check and sleep delay. Legacy code of other user call
>>>> rte_eth_link_get()  will not be affected also.
>>>>> But, I am NOT confident ,whether this will introduce new problem
>>>>> when
>>>> set up link without wait!
>>>>> So, this is just a discussion topic.
>>>>
>>>> Unfortunately this will not help. Take a look to the function
>>>> 'ixgbe_setup_mac_link_multispeed_fiber()', which is the main
>>>> problematic function here. 'wait_to_complete' here used only as
>>>> argument for ixgbe_setup_mac_link(), and the busy waiting loops are
>> outside of it.
>>>> Regardless of the 'wait_to_complete' value, this function will busy
>>>> poll the link for 1040 ms trying to setup 10GB speed and 140ms more
>>>> trying to setup 1GB speed. After that, it will call itself
>>>> recursively and wait again... Looks like I miscalculated last time.
>>>> Right now it'll be more than 2 seconds for each down port since following
>> patch merged:
>>>> 8fc1f32fa615 ("net/ixgbe: wait longer for link after fiber MAC setup").
>>>
>>> Yes, you are right, link state check loop in function
>>> ixgbe_setup_mac_link_multispeed_fiber() are not blocked by bool
>> autoneg_wait_to_complete, It will cause about 1s wait when port is down.
>>> And also, can we update code in function
>> ixgbe_setup_mac_link_multispeed_fiber() to  block link state check loop
>> using autoneg_wait_to_complete?
>>> I am not sure. Because there is a comment for this loop check:
>>> 		/*
>>> 		 * Wait for the controller to acquire link.  Per IEEE 802.3ap,
>>> 		 * Section 73.10.2, we may have to wait up to 500ms if KR is
>>> 		 * attempted.  82599 uses the same timing for 10g SFI.
>>> 		 */
>>> It seems we have to wait for at least 500ms for spec requirement before
>> we check link after configuration.
>>> If that is true, we can not do any change to these loop check.
>>> But why not main thread take some action to stop periodic link sate check
>> when it find it has been hang or link is down?
>>
>> To find that device is DOWN, thread will have to call this function at least
>> once for each port and wait a few seconds.
>> And how in that case we'll know that device is UP again?
>> As I already wrote in discussion for this patch, LSC is not an option, because it
>> takes at least 5 seconds to detect link state change, which is way too much
>> for many real world apps.
>>
>>>
>>>
>>>
>>>>
>>>>>
>>>>> Hi, laurent.hardy
>>>>>  You are the author for the patch (* net/ixgbe: ensure link status
>>>>> is
>>>> updated), why do you implement code that way?
>>>>> Is that must that  set up link with wait?
>>>>>
>>>>> ixgbe_setup_link(hw, speed, true);
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Regards
>>>>>>>>> Qi
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Fixes: c12d22f65b13 ("net/ixgbe: ensure link status is
>>>>>>>>>> updated")
>>>>>>>>>> CC: stable at dpdk.org
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Ilya Maximets <i.maximets at samsung.com>
>>>>>>>>>> ---
>>>>>>>>>>  drivers/net/ixgbe/ixgbe_ethdev.c | 43
>>>>>>>>>> ++++++++++++++++++++++++--------
>>>>>>>>>>  1 file changed, 32 insertions(+), 11 deletions(-)
>>>>>>>>>>
>>>>>>>>>> diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c
>>>>>>>>>> b/drivers/net/ixgbe/ixgbe_ethdev.c
>>>>>>>>>> index 26b192737..a33b9a6e8 100644
>>>>>>>>>> --- a/drivers/net/ixgbe/ixgbe_ethdev.c
>>>>>>>>>> +++ b/drivers/net/ixgbe/ixgbe_ethdev.c
>>>>>>>>>> @@ -221,6 +221,8 @@ static int
>>>>>>>>>> ixgbe_dev_interrupt_action(struct rte_eth_dev *dev,
>>>>>>>>>>  				      struct rte_intr_handle *handle);
>>>> static
>>>>>> void
>>>>>>>>>> ixgbe_dev_interrupt_handler(void *param);  static void
>>>>>>>>>> ixgbe_dev_interrupt_delayed_handler(void *param);
>>>>>>>>>> +static void ixgbe_dev_setup_link_alarm_handler(void *param);
>>>>>>>>>> +
>>>>>>>>>>  static int ixgbe_add_rar(struct rte_eth_dev *dev, struct
>>>>>>>>>> ether_addr *mac_addr,
>>>>>>>>>>  			 uint32_t index, uint32_t pool);  static void
>>>>>>>>>> ixgbe_remove_rar(struct rte_eth_dev *dev, uint32_t index); @@
>>>>>>>>>> -2791,6 +2793,8 @@ ixgbe_dev_stop(struct rte_eth_dev *dev)
>>>>>>>>>>
>>>>>>>>>>  	PMD_INIT_FUNC_TRACE();
>>>>>>>>>>
>>>>>>>>>> +	rte_eal_alarm_cancel(ixgbe_dev_setup_link_alarm_handler,
>>>> dev);
>>>>>>>>>> +
>>>>>>>>>>  	/* disable interrupts */
>>>>>>>>>>  	ixgbe_disable_intr(hw);
>>>>>>>>>>
>>>>>>>>>> @@ -3969,6 +3973,25 @@ ixgbevf_check_link(struct ixgbe_hw
>> *hw,
>>>>>>>>>> ixgbe_link_speed *speed,
>>>>>>>>>>  	return ret_val;
>>>>>>>>>>  }
>>>>>>>>>>
>>>>>>>>>> +static void
>>>>>>>>>> +ixgbe_dev_setup_link_alarm_handler(void *param) {
>>>>>>>>>> +	struct rte_eth_dev *dev = (struct rte_eth_dev *)param;
>>>>>>>>>> +	struct ixgbe_hw *hw =
>>>>>>>>>> IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
>>>>>>>>>> +	struct ixgbe_interrupt *intr =
>>>>>>>>>> +		IXGBE_DEV_PRIVATE_TO_INTR(dev->data-
>>>>> dev_private);
>>>>>>>>>> +	u32 speed;
>>>>>>>>>> +	bool autoneg = false;
>>>>>>>>>> +
>>>>>>>>>> +	speed = hw->phy.autoneg_advertised;
>>>>>>>>>> +	if (!speed)
>>>>>>>>>> +		ixgbe_get_link_capabilities(hw, &speed, &autoneg);
>>>>>>>>>> +
>>>>>>>>>> +	ixgbe_setup_link(hw, speed, true);
>>>>>>>>>> +
>>>>>>>>>> +	intr->flags &= ~IXGBE_FLAG_NEED_LINK_CONFIG; }
>>>>>>>>>> +
>>>>>>>>>>  /* return 0 means link status changed, -1 means not changed */
>>>>>>>>>> int ixgbe_dev_link_update_share(struct rte_eth_dev *dev, @@ -
>>>>>> 3981,9
>>>>>>>>>> +4004,7 @@ ixgbe_dev_link_update_share(struct rte_eth_dev
>>>> *dev,
>>>>>>>>>>  		IXGBE_DEV_PRIVATE_TO_INTR(dev->data-
>>>>> dev_private);
>>>>>>>>>>  	int link_up;
>>>>>>>>>>  	int diag;
>>>>>>>>>> -	u32 speed = 0;
>>>>>>>>>>  	int wait = 1;
>>>>>>>>>> -	bool autoneg = false;
>>>>>>>>>>
>>>>>>>>>>  	memset(&link, 0, sizeof(link));
>>>>>>>>>>  	link.link_status = ETH_LINK_DOWN; @@ -3993,13 +4014,8
>>>> @@
>>>>>>>>>> ixgbe_dev_link_update_share(struct
>>>>>>>> rte_eth_dev
>>>>>>>>>> *dev,
>>>>>>>>>>
>>>>>>>>>>  	hw->mac.get_link_status = true;
>>>>>>>>>>
>>>>>>>>>> -	if ((intr->flags & IXGBE_FLAG_NEED_LINK_CONFIG) &&
>>>>>>>>>> -		ixgbe_get_media_type(hw) ==
>>>> ixgbe_media_type_fiber) {
>>>>>>>>>> -		speed = hw->phy.autoneg_advertised;
>>>>>>>>>> -		if (!speed)
>>>>>>>>>> -			ixgbe_get_link_capabilities(hw, &speed,
>>>> &autoneg);
>>>>>>>>>> -		ixgbe_setup_link(hw, speed, true);
>>>>>>>>>> -	}
>>>>>>>>>> +	if (intr->flags & IXGBE_FLAG_NEED_LINK_CONFIG)
>>>>>>>>>> +		return rte_eth_linkstatus_set(dev, &link);
>>>>>>>>>>
>>>>>>>>>>  	/* check if it needs to wait to complete, if lsc interrupt is
>>>> enabled */
>>>>>>>>>>  	if (wait_to_complete == 0 || dev->data-
>>>>> dev_conf.intr_conf.lsc
>>>>>>>>>> !=
>>>>>>>>>> 0) @@
>>>>>>>>>> -4017,11 +4033,14 @@ ixgbe_dev_link_update_share(struct
>>>>>> rte_eth_dev
>>>>>>>> *dev,
>>>>>>>>>>  	}
>>>>>>>>>>
>>>>>>>>>>  	if (link_up == 0) {
>>>>>>>>>> -		intr->flags |= IXGBE_FLAG_NEED_LINK_CONFIG;
>>>>>>>>>> +		if (ixgbe_get_media_type(hw) ==
>>>> ixgbe_media_type_fiber)
>>>>>> {
>>>>>>>>>> +			intr->flags |=
>>>> IXGBE_FLAG_NEED_LINK_CONFIG;
>>>>>>>>>> +			rte_eal_alarm_set(10,
>>>>>>>>>> +
>>>> 	ixgbe_dev_setup_link_alarm_handler, dev);
>>>>>>>>>> +		}
>>>>>>>>>>  		return rte_eth_linkstatus_set(dev, &link);
>>>>>>>>>>  	}
>>>>>>>>>>
>>>>>>>>>> -	intr->flags &= ~IXGBE_FLAG_NEED_LINK_CONFIG;
>>>>>>>>>>  	link.link_status = ETH_LINK_UP;
>>>>>>>>>>  	link.link_duplex = ETH_LINK_FULL_DUPLEX;
>>>>>>>>>>
>>>>>>>>>> @@ -5128,6 +5147,8 @@ ixgbevf_dev_stop(struct rte_eth_dev
>>>> *dev)
>>>>>>>>>>
>>>>>>>>>>  	PMD_INIT_FUNC_TRACE();
>>>>>>>>>>
>>>>>>>>>> +	rte_eal_alarm_cancel(ixgbe_dev_setup_link_alarm_handler,
>>>> dev);
>>>>>>>>>> +
>>>>>>>>>>  	ixgbevf_intr_disable(dev);
>>>>>>>>>>
>>>>>>>>>>  	hw->adapter_stopped = 1;
>>>>>>>>>> --
>>>>>>>>>> 2.17.1
>>>>>>>>>


More information about the dev mailing list