[dpdk-dev] [PATCH] net/ixgbe: fix busy polling while fiber link update

Ilya Maximets i.maximets at samsung.com
Thu Oct 11 12:26:58 CEST 2018


On 11.10.2018 12:56, Zhao1, Wei wrote:
> Hi,  Ilya Maximets AND laurent.hardy

Hi, thanks for sharing your thoughts.
Comments inline.

> 
> 
>> -----Original Message-----
>> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Ilya Maximets
>> Sent: Wednesday, September 12, 2018 4:05 PM
>> To: Zhang, Qi Z <qi.z.zhang at intel.com>; dev at dpdk.org
>> Cc: Lu, Wenzhuo <wenzhuo.lu at intel.com>; Ananyev, Konstantin
>> <konstantin.ananyev at intel.com>; Laurent Hardy
>> <laurent.hardy at 6wind.com>; Dai, Wei <wei.dai at intel.com>;
>> stable at dpdk.org
>> Subject: Re: [dpdk-dev] [PATCH] net/ixgbe: fix busy polling while fiber link
>> update
>>
>> On 12.09.2018 09:49, Zhang, Qi Z wrote:
>>>
>>>
>>>> -----Original Message-----
>>>> From: Ilya Maximets [mailto:i.maximets at samsung.com]
>>>> Sent: Monday, September 10, 2018 11:09 PM
>>>> To: Zhang, Qi Z <qi.z.zhang at intel.com>; dev at dpdk.org
>>>> Cc: Lu, Wenzhuo <wenzhuo.lu at intel.com>; Ananyev, Konstantin
>>>> <konstantin.ananyev at intel.com>; Laurent Hardy
>>>> <laurent.hardy at 6wind.com>; Dai, Wei <wei.dai at intel.com>;
>>>> stable at dpdk.org
>>>> Subject: Re: [dpdk-dev] [PATCH] net/ixgbe: fix busy polling while
>>>> fiber link update
>>>>
>>>> On 04.09.2018 09:08, Zhang, Qi Z wrote:
>>>>> Hi Ilya:
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Ilya Maximets
>>>>>> Sent: Friday, August 31, 2018 8:40 PM
>>>>>> To: dev at dpdk.org
>>>>>> Cc: Lu, Wenzhuo <wenzhuo.lu at intel.com>; Ananyev, Konstantin
>>>>>> <konstantin.ananyev at intel.com>; Laurent Hardy
>>>>>> <laurent.hardy at 6wind.com>; Dai, Wei <wei.dai at intel.com>; Ilya
>>>>>> Maximets <i.maximets at samsung.com>; stable at dpdk.org
>>>>>> Subject: [dpdk-dev] [PATCH] net/ixgbe: fix busy polling while fiber
>>>>>> link update
>>>>>>
>>>>>> If the multispeed fiber link is in DOWN state, ixgbe_setup_link
>>>>>> could take around a second of busy polling. This is highly
>>>>>> inconvenient for the case where single thread periodically checks the
>> link statuses.
>>>>>> For example, OVS main thread periodically updates the link statuses
>>>>>> and hangs for a really long time busy waiting on ixgbe_setup_link()
>>>>>> for a DOWN fiber ports. For case with 3 down ports it hangs for a 3
>>>>>> seconds and unable to do anything including packet processing.
>>>>>> Fix that by shifting that workaround to a separate thread by alarm
>>>>>> handler that will try to set up link if it is DOWN.
>>>>>
>>>>> Does that mean we will block the interrupt thread for 3 seconds?
>>>>
>>>> Three times for one second. Other work could be scheduled between.
>>>> IMHO, it's much better than blocking usual caller for 3 seconds.
>>>>
>>>>> Also, can we guarantee there will not be any race condition if we
>>>>> call
>>>> ixgbe_setup_link at another thread, the base code API is not assumed
>>>> to be thread-safe as I know.
>>>>
>>>> The only user of 'ixgbe_setup_link' is 'ixgbe_dev_start', but it
>>>> could be called only if device stopped. 'ixgbe_dev_stop' cancels the alarm.
>>>> Race with 'link_update' avoided by 'IXGBE_FLAG_NEED_LINK_CONFIG'
>> flag.
>>>
>>> I guess, it' not only about when ixgb_setup_link race with itself, but also
>> when it race with other APIs.
>>> Also the concern is, even in current version, we can prove there is no issue,
>> how can we guarantee we are safe for future base code update? It's not
>> designed as thread-safe.
>>> For my option, the change is risky.
>>
>> In current implementation interrupt handler already calls the
>> 'ixgbe_dev_link_update' which subsequently calls 'ixgbe_setup_link'
>> in our case if LSC interrupts enabled. So, my change makes the driver even
>> safer by moving 'ixgbe_setup_link' to the same interrupt thread.
>> Otherwise two threads (interrupts handler and the link status checking
>> thread) could call 'ixgbe_setup_link' simultaneously.
>>
>>>
>>> Btw, since ixgbe support LSC, it is not necessary for "single thread
>> periodically checks the link statuses", right?
>>
>> In current implementation it will take at least 5 seconds (4 + 1) for the
>> interrupt handler to detect DOWN link state for ixgbe multispeed fiber. This
>> is too much for many real world cases.
> 
> I have reviewed  this patch, now I agree with you of the point that when port is down, " main thread
> periodically updates the link statuses and hangs for a really long time busy waiting on ixgbe_setup_link() for a DOWN fiber ports ".
> This is introduced by a patch in the following:
> SHA-1: c12d22f65b132c56db7b4fdbfd5ddce27d1e9572
> * net/ixgbe: ensure link status is updated
> 
> Because in this patch, ixgbe_setup_link() is called with input parameter autoneg_wait_to_complete=1, this will cause loop check and sleep delay.
> At least 82599 seems has this delay.(BTW, whivh type of NIC are you use? X550 or 82599)

I have 82599.

> Your solution is add a eal_alarm_set for ixgbe_setup_link in the thread of PMD driver, and do the set up work in that thread, is that right?
> And main thread avoid hang by the flag of IXGBE_FLAG_NEED_LINK_CONFIG.
> I think this is a good idea for this problem, but it may cause problem for other legacy user of ixgbe pmd, because their legacy code, 
> which use main thread  to check link state and setup_link when port is down, and they are not aware of it is done by other thread if add your patch.

What are these applications? I see no public API for setup_link function.
It's internal to driver and should not be used externally.
Am I missing something?

> 
> And is that ok if we change code in ixgbe_dev_link_update_share() to
> 
> ixgbe_dev_link_update_share()
> {
> 
> 	/* check if it needs to wait to complete, if lsc interrupt is enabled */
> 	if (wait_to_complete == 0 || dev->data->dev_conf.intr_conf.lsc != 0)
> 		wait = 0;
> 
> 	if ((intr->flags & IXGBE_FLAG_NEED_LINK_CONFIG) &&
> 		ixgbe_get_media_type(hw) == ixgbe_media_type_fiber) {
> 		speed = hw->phy.autoneg_advertised;
> 		if (!speed)
> 			ixgbe_get_link_capabilities(hw, &speed, &autoneg);
> 		ixgbe_setup_link(hw, speed, wait);
> 	}
> }
> 
> Then, your application can call rte_eth_link_get_nowait () to make wait_to_complete=0 when doing periodic link state check,
> Which will not cause  loop check and sleep delay. Legacy code of other user call rte_eth_link_get()  will not be affected also.
> But, I am NOT confident ,whether this will introduce new problem when set up link without wait!
> So, this is just a discussion topic.

Unfortunately this will not help. Take a look to the function
'ixgbe_setup_mac_link_multispeed_fiber()', which is the main problematic
function here. 'wait_to_complete' here used only as argument for
ixgbe_setup_mac_link(), and the busy waiting loops are outside of it.
Regardless of the 'wait_to_complete' value, this function will busy
poll the link for 1040 ms trying to setup 10GB speed and 140ms more
trying to setup 1GB speed. After that, it will call itself recursively
and wait again... Looks like I miscalculated last time. Right now it'll
be more than 2 seconds for each down port since following patch merged:
8fc1f32fa615 ("net/ixgbe: wait longer for link after fiber MAC setup").

> 
> Hi, laurent.hardy 
>  You are the author for the patch (* net/ixgbe: ensure link status is updated), why do you implement code that way?
> Is that must that  set up link with wait? 
> 
> ixgbe_setup_link(hw, speed, true);
> 
> 
> 
>>
>>>
>>>>
>>>>>
>>>>> Regards
>>>>> Qi
>>>>>
>>>>>>
>>>>>> Fixes: c12d22f65b13 ("net/ixgbe: ensure link status is updated")
>>>>>> CC: stable at dpdk.org
>>>>>>
>>>>>> Signed-off-by: Ilya Maximets <i.maximets at samsung.com>
>>>>>> ---
>>>>>>  drivers/net/ixgbe/ixgbe_ethdev.c | 43
>>>>>> ++++++++++++++++++++++++--------
>>>>>>  1 file changed, 32 insertions(+), 11 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c
>>>>>> b/drivers/net/ixgbe/ixgbe_ethdev.c
>>>>>> index 26b192737..a33b9a6e8 100644
>>>>>> --- a/drivers/net/ixgbe/ixgbe_ethdev.c
>>>>>> +++ b/drivers/net/ixgbe/ixgbe_ethdev.c
>>>>>> @@ -221,6 +221,8 @@ static int ixgbe_dev_interrupt_action(struct
>>>>>> rte_eth_dev *dev,
>>>>>>  				      struct rte_intr_handle *handle);  static
>> void
>>>>>> ixgbe_dev_interrupt_handler(void *param);  static void
>>>>>> ixgbe_dev_interrupt_delayed_handler(void *param);
>>>>>> +static void ixgbe_dev_setup_link_alarm_handler(void *param);
>>>>>> +
>>>>>>  static int ixgbe_add_rar(struct rte_eth_dev *dev, struct
>>>>>> ether_addr *mac_addr,
>>>>>>  			 uint32_t index, uint32_t pool);  static void
>>>>>> ixgbe_remove_rar(struct rte_eth_dev *dev, uint32_t index); @@
>>>>>> -2791,6 +2793,8 @@ ixgbe_dev_stop(struct rte_eth_dev *dev)
>>>>>>
>>>>>>  	PMD_INIT_FUNC_TRACE();
>>>>>>
>>>>>> +	rte_eal_alarm_cancel(ixgbe_dev_setup_link_alarm_handler, dev);
>>>>>> +
>>>>>>  	/* disable interrupts */
>>>>>>  	ixgbe_disable_intr(hw);
>>>>>>
>>>>>> @@ -3969,6 +3973,25 @@ ixgbevf_check_link(struct ixgbe_hw *hw,
>>>>>> ixgbe_link_speed *speed,
>>>>>>  	return ret_val;
>>>>>>  }
>>>>>>
>>>>>> +static void
>>>>>> +ixgbe_dev_setup_link_alarm_handler(void *param) {
>>>>>> +	struct rte_eth_dev *dev = (struct rte_eth_dev *)param;
>>>>>> +	struct ixgbe_hw *hw =
>>>>>> IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
>>>>>> +	struct ixgbe_interrupt *intr =
>>>>>> +		IXGBE_DEV_PRIVATE_TO_INTR(dev->data->dev_private);
>>>>>> +	u32 speed;
>>>>>> +	bool autoneg = false;
>>>>>> +
>>>>>> +	speed = hw->phy.autoneg_advertised;
>>>>>> +	if (!speed)
>>>>>> +		ixgbe_get_link_capabilities(hw, &speed, &autoneg);
>>>>>> +
>>>>>> +	ixgbe_setup_link(hw, speed, true);
>>>>>> +
>>>>>> +	intr->flags &= ~IXGBE_FLAG_NEED_LINK_CONFIG; }
>>>>>> +
>>>>>>  /* return 0 means link status changed, -1 means not changed */
>>>>>> int ixgbe_dev_link_update_share(struct rte_eth_dev *dev, @@ -
>> 3981,9
>>>>>> +4004,7 @@ ixgbe_dev_link_update_share(struct rte_eth_dev *dev,
>>>>>>  		IXGBE_DEV_PRIVATE_TO_INTR(dev->data->dev_private);
>>>>>>  	int link_up;
>>>>>>  	int diag;
>>>>>> -	u32 speed = 0;
>>>>>>  	int wait = 1;
>>>>>> -	bool autoneg = false;
>>>>>>
>>>>>>  	memset(&link, 0, sizeof(link));
>>>>>>  	link.link_status = ETH_LINK_DOWN; @@ -3993,13 +4014,8 @@
>>>>>> ixgbe_dev_link_update_share(struct
>>>> rte_eth_dev
>>>>>> *dev,
>>>>>>
>>>>>>  	hw->mac.get_link_status = true;
>>>>>>
>>>>>> -	if ((intr->flags & IXGBE_FLAG_NEED_LINK_CONFIG) &&
>>>>>> -		ixgbe_get_media_type(hw) == ixgbe_media_type_fiber) {
>>>>>> -		speed = hw->phy.autoneg_advertised;
>>>>>> -		if (!speed)
>>>>>> -			ixgbe_get_link_capabilities(hw, &speed, &autoneg);
>>>>>> -		ixgbe_setup_link(hw, speed, true);
>>>>>> -	}
>>>>>> +	if (intr->flags & IXGBE_FLAG_NEED_LINK_CONFIG)
>>>>>> +		return rte_eth_linkstatus_set(dev, &link);
>>>>>>
>>>>>>  	/* check if it needs to wait to complete, if lsc interrupt is enabled */
>>>>>>  	if (wait_to_complete == 0 || dev->data->dev_conf.intr_conf.lsc !=
>>>>>> 0) @@
>>>>>> -4017,11 +4033,14 @@ ixgbe_dev_link_update_share(struct
>> rte_eth_dev
>>>> *dev,
>>>>>>  	}
>>>>>>
>>>>>>  	if (link_up == 0) {
>>>>>> -		intr->flags |= IXGBE_FLAG_NEED_LINK_CONFIG;
>>>>>> +		if (ixgbe_get_media_type(hw) == ixgbe_media_type_fiber)
>> {
>>>>>> +			intr->flags |= IXGBE_FLAG_NEED_LINK_CONFIG;
>>>>>> +			rte_eal_alarm_set(10,
>>>>>> +				ixgbe_dev_setup_link_alarm_handler, dev);
>>>>>> +		}
>>>>>>  		return rte_eth_linkstatus_set(dev, &link);
>>>>>>  	}
>>>>>>
>>>>>> -	intr->flags &= ~IXGBE_FLAG_NEED_LINK_CONFIG;
>>>>>>  	link.link_status = ETH_LINK_UP;
>>>>>>  	link.link_duplex = ETH_LINK_FULL_DUPLEX;
>>>>>>
>>>>>> @@ -5128,6 +5147,8 @@ ixgbevf_dev_stop(struct rte_eth_dev *dev)
>>>>>>
>>>>>>  	PMD_INIT_FUNC_TRACE();
>>>>>>
>>>>>> +	rte_eal_alarm_cancel(ixgbe_dev_setup_link_alarm_handler, dev);
>>>>>> +
>>>>>>  	ixgbevf_intr_disable(dev);
>>>>>>
>>>>>>  	hw->adapter_stopped = 1;
>>>>>> --
>>>>>> 2.17.1
>>>>>


More information about the dev mailing list