[dpdk-dev] [PATCH] net/mlx5: fix link status initialization

Shahaf Shuler shahafs at mellanox.com
Mon Apr 9 14:28:04 CEST 2018


Monday, April 9, 2018 11:28 AM, Nélio Laranjeiro:
> Subject: Re: [PATCH] net/mlx5: fix link status initialization
> 
> On Sun, Apr 08, 2018 at 01:09:27PM +0000, Shahaf Shuler wrote:
> > Thursday, April 5, 2018 9:51 AM, Nélio Laranjeiro:
> > > Subject: Re: [PATCH] net/mlx5: fix link status initialization
> > >
> > > On Thu, Apr 05, 2018 at 05:35:57AM +0000, Shahaf Shuler wrote:
> > > > Wednesday, April 4, 2018 3:11 PM, Nélio Laranjeiro:
> > > > > Subject: Re: [PATCH] net/mlx5: fix link status initialization
> > > > >
> > > > > On Wed, Apr 04, 2018 at 09:58:33AM +0000, Shahaf Shuler wrote:
> > > > > > Wednesday, April 4, 2018 10:30 AM, Nélio Laranjeiro:
> > > > > > > Subject: Re: [PATCH] net/mlx5: fix link status
> > > > > > > initialization
> > > > > > >
> > > > > > > On Tue, Apr 03, 2018 at 07:48:17AM +0300, Shahaf Shuler wrote:
> >
> > [..]
> >
> > > > >
> > > > > According to your analysis, this is only necessary when the LCS
> > > > > is configured in the device.  Why not adding this call to
> > > > > mlx5_dev_interrupt_handler_install() which is responsible to
> > > > > install the LCS callback.
> > > >
> > > > I think it is good practice whether or not LSC is set.
> > > > The link status should be initialized to the correct value after the probe.
> > >
> > > There is no guarantee the link will be accurate, at probe time the
> > > link may be up so internal information has a status up with a speed with
> this patch.
> > > The application probes a second port, in the mean time the link of
> > > the first port goes down, the interrupt is still not installed and
> > > the internal status becomes wrong (still up whereas the port is down).
> > >
> > > Finally at start, the device installs the handler, but the link is
> > > still down whereas internally it is up, the application will call
> > > rte_eth_link_get_nowait() which will directly copy the wrong
> > > internal status to the application.
> >
> > This is not correct.
> > Using Verbs, the async_fd on which link status interrupts are reported is
> created on probe.
> > Even if the interrupt handler is not installed, interrupts still
> > trigger on this fd. They will be processed when the interrupt handler
> > will be installed as part of the port start.
> > So in fact you have the whole trace on the link status changes waiting
> > to be processed upon port start.
> 
> Right, but in such case, this patch still does not solves the issue.
> Until the dev_start() is called, the link may be inconsistent with the real
> status.
> 
> example:
> 
>  pci_probe --> link is up.
>  leaving pci probe, the link goes downs --> internally the PMD has a link up.
> 
> Until the dev_start() is called any call to rte_eth_link_get_nowait() will copy
> the internal PMD status which is not accurate.
> 
> From this point, the issue seems to fully come from
> rte_eth_link_get_nowait() which should not make any copy until the port is
> not started, until then the link may be inconsistent between the driver and
> the device.

Right. However fixing it only on the ethdev layer is not enough.
Assume the link was up from the beginning. 
For the case were the call for rte_eth_link_get_nowait happens only after the port start, the link will still be wrong as it will be reported as down (and no interrupt to fix it). 

So to summarize I plan to have this patch (with modification of the wait_to_complete) along with another fix for ethdev. 
Do we have an agreement? 


> 
> Regards,
> 
> --
> Nélio Laranjeiro
> 6WIND


More information about the dev mailing list