[dpdk-dev] [PATCH] net/mlx5: fix link status initialization

Nélio Laranjeiro nelio.laranjeiro at 6wind.com
Thu Apr 5 08:51:20 CEST 2018


On Thu, Apr 05, 2018 at 05:35:57AM +0000, Shahaf Shuler wrote:
> Wednesday, April 4, 2018 3:11 PM, Nélio Laranjeiro:
> > Subject: Re: [PATCH] net/mlx5: fix link status initialization
> > 
> > On Wed, Apr 04, 2018 at 09:58:33AM +0000, Shahaf Shuler wrote:
> > > Wednesday, April 4, 2018 10:30 AM, Nélio Laranjeiro:
> > > > Subject: Re: [PATCH] net/mlx5: fix link status initialization
> > > >
> > > > On Tue, Apr 03, 2018 at 07:48:17AM +0300, Shahaf Shuler wrote:
> > > > > Following commit 7ba5320baa32 ("net/mlx5: fix link status
> > > > > behavior")
> > > > >
> > > > > The initial link status is no longer set as part of the port start.
> > > > > This may cause application to query the link as down while in fact
> > > > > it was already up before the DPDK application start.
> > > >
> > > > There is something wrong in this explanation, the application should
> > > > query the link using this same callback, why the PMD should call it?
> > >
> > > It is how ethdev is implemented. The application is doing nothing
> > > wrong, it queries the link status using rte_eth_link_get_nowait
> > >
> > > When the application works with LSC interrupts the ethdev layer skips
> > > the PMD callback and just update according to the link status exists
> > > on device data.
> > > It is because it assumes the link status on the device data is the
> > > correct one since any link change is processed by the application.
> > >
> > > The issue is with the initial state of the link. If the link is
> > > already up when the PMD starts there will be no callback for the
> > > application.
> > >
> > > I think this logic is OK, and it is also a good practice to initialize
> > > the link status to the actual state of the link as part of the port
> > > probing.
> > 
> > The commit log should be re-worded to include this explanation.
> 
> Will add.
> 
> > 
> > > > > Fixes: 7ba5320baa32 ("net/mlx5: fix link status behavior")
> > > > > Cc: nelio.laranjeiro at 6wind.com
> > > > >
> > > > > Signed-off-by: Shahaf Shuler <shahafs at mellanox.com>
> > > > > Acked-by: Yongseok Koh <yskoh at mellanox.com>
> > > > > ---
> > > > >  drivers/net/mlx5/mlx5.c | 1 +
> > > > >  1 file changed, 1 insertion(+)
> > > > >
> > > > > diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
> > > > > index
> > > > > 7d58d66bb9..f954ea2862 100644
> > > > > --- a/drivers/net/mlx5/mlx5.c
> > > > > +++ b/drivers/net/mlx5/mlx5.c
> > > > > @@ -961,6 +961,7 @@ mlx5_pci_probe(struct rte_pci_driver *pci_drv
> > > > __rte_unused,
> > > > >  		DRV_LOG(DEBUG, "port %u forcing Ethernet interface up",
> > > > >  			eth_dev->data->port_id);
> > > > >  		mlx5_set_link_up(eth_dev);
> > > > > +		mlx5_link_update(eth_dev, 1);
> > > > >  		/* Store device configuration on private structure. */
> > > > >  		priv->config = config;
> > > > >  		continue;
> > > > > --
> > > > > 2.12.0
> > 
> > According to your analysis, this is only necessary when the LCS is configured
> > in the device.  Why not adding this call to
> > mlx5_dev_interrupt_handler_install() which is responsible to install the LCS
> > callback.
> 
> I think it is good practice whether or not LSC is set. 
> The link status should be initialized to the correct value after the probe.

There is no guarantee the link will be accurate, at probe time the link
may be up so internal information has a status up with a speed with this
patch.
The application probes a second port, in the mean time the link of the
first port goes down, the interrupt is still not installed and the
internal status becomes wrong (still up whereas the port is down).

Finally at start, the device installs the handler, but the link is still
down whereas internally it is up, the application will call
rte_eth_link_get_nowait() which will directly copy the wrong internal
status to the application.

There is also another situation, when the application stops the port,
the interrupt is also removed, which means during this time, the
internal status may be wrong as it won't be updated anymore.
This comes to the same possible situation as above.

> > Another point, the wait to complete flag is useless, if the link is up, the status
> > and speed will be accurate, if not, it will receive an LSC event later.
> 
> Agree.
> 
> So how about keeping the code on the current place, just removing the
> wait_to_complete? 

The current place is not fixing the issue as there is still a
possibility to have a wrong value.

Regards,

-- 
Nélio Laranjeiro
6WIND


More information about the dev mailing list