[dpdk-dev] [dpdk-stable] [PATCH 2/2] app/testpmd: fix invalid port detaching

Matan Azrad matan at mellanox.com
Thu Jan 23 16:29:20 CET 2020


Hi

From: Ferruh Yigit
> On 1/23/2020 2:05 PM, Matan Azrad wrote:
> > Hi
> >
> > From: Yigit, Ferruh
> >> On 11/12/2019 8:47 AM, Matan Azrad wrote:
> >>> The port was not validated before detaching.
> >>>
> >>> Ignore port detach operation when the port is not valid.
> >>>
> >>> Fixes: f8e5baa2662d ("app/testpmd: check not detaching device
> >>> twice")
> >>> Cc: thomas at monjalon.net
> >>> Cc: stable at dpdk.org
> >>>
> >>> Signed-off-by: Matan Azrad <matan at mellanox.com>
> >>> ---
> >>>  app/test-pmd/testpmd.c | 3 +++
> >>>  1 file changed, 3 insertions(+)
> >>>
> >>> diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c index
> >>> 4444346..370eefe 100644
> >>> --- a/app/test-pmd/testpmd.c
> >>> +++ b/app/test-pmd/testpmd.c
> >>> @@ -2545,6 +2545,9 @@ struct extmem_param {
> >>>
> >>>  	printf("Removing a device...\n");
> >>>
> >>> +	if (port_id_is_invalid(port_id, ENABLED_WARN))
> >>> +		return;
> >>> +
> >>>  	dev = rte_eth_devices[port_id].device;
> >>>  	if (dev == NULL) {
> >>>  		printf("Device already removed\n");
> >>>
> >>
> >> The patch is already in 19.11 [1] but it is breaking the testpmd
> >> hotplug support.
> >> Before 'detach_port_device()' called, the port has been stopped and
> >> closed [2], which will make port fail from 'port_id_is_invalid()'
> >> check and the device removal path never fully called.
> >> The implication is, since device not detached, vfio request interrupt
> >> keeps triggered continuously and re-starts the detach path, but
> >> because of the half cleaned device it fails and app gets stuck with a
> continuous log [3].
> >>
> >> I wonder if the actual hotplug has been tested with this patch, the
> >> commit log is not clear about the motivation and implication of the
> >> patch, I am not clear why this check is added but I am sending a
> >> patch soon to remove it back.
> >
> > The motivation of this patch was to prevent double detach on same port,
> so the user cannot call detach of invalid port.
> 
> What is the definition of the 'invalid port', if you mean device already
> detached case, in the second call of the function "if (dev == NULL)" check
> should prevent it going forward.

No, ethdev doesn't zero the device pointer when it release a port.
So even if the port is in unused state already - means invalid, the device pointer still may be valid and point to the last port that used the same id.


> But according the 'port_id_is_invalid()' API, a closed port is an invalid port, I
> think that is wrong in this context.

Why?

You are going to look on ethdev portid structure, don't you think we should valid the port before using its structure?


> >
> > I agree this patch is not good and we need a fix but I think the bug is
> conceptual.
> >
> > Testpmd tries to do detach by port_id which is derived by ethdev port id
> while detach work with rte_device.
> >
> > For example:
> > you can see in the line above after +++: dev =
> > rte_eth_devices[port_id].device, Testpmd may access invalid  or
> reallocated ethdev structure to get the device name and may even detach
> unwanted rte_device.
> 
> I thinks whichever function calling 'detach_port_device()' should check the
> port validity.
> 'detach_port_device()' doesn't know if port reallocated or not, it will free the
> given port_id, and when freeing done 'rte_eth_devices[port_id].device' will
> be NULL, this looks to me a valid check.

Please validate me, check ethdev, I don't think so, 'rte_eth_devices[port_id].device still valid after detach.

> The caller of the 'detach_port_device()' should ensure correct port_id
> passed to the function.

What is correct port id, if the port was released , is it correct?

> >
> > So, detach is broken with and without this patch.
> 
> I can't see how it is broken without the check, how the problem you
> mentioned can be reproduced? Or is it a theoretical issue?
> But with this check hotplug support is %100 reproducible broken.
> 
> >
> >
> > I think Testpmd should change the concept of rte_device mapping and put
> attention to next:
> > 1. Don't detach by ethdev port ID.
> > 2. Multiple ethdev port IDs may related to the same rte_device.
> >
> > The Testpmd user should be sure that all the port IDs of the rte_device are
> released before the detach call and Testpmd maybe need to validate it.
> > And like attach, detach should be triggered by PCI address \ rte_device
> name.
> >
> 
> We need to know about port_id too to be able to stop/close it.
> And sure no objection to improve the hotplug support but it is broken now,
> lets fix it first.
> 
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >> Regards,
> >> ferruh
> >>
> >>
> >> [1]
> >> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit
> >> .dp
> >>
> dk.org%2Fdpdk%2Fcommit%2F%3Fid%3D43d0e304980a1527bcac92dc679057
> >>
> b189e2545a&data=02%7C01%7Cmatan%40mellanox.com%7Cc3f40356d
> >>
> d124e20faf708d7a006e68c%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7
> >>
> C0%7C637153823809699996&sdata=dBy9m%2BxCA%2Bme1IpX2LqPARa
> >> 62giznKi8Xbtu220GA%2Bg%3D&reserved=0
> >>
> >> [2]
> >> rmv_port_callback
> >>   stop_port(port_id);
> >>   close_port(port_id);
> >>   detach_port_device(port_id);
> >>
> >> [3]
> >> EAL: can not get port by device 0000:00:05.0!
> >> EAL: can not get port by device 0000:00:05.0!
> >> EAL: can not get port by device 0000:00:05.0!
> >> EAL: can not get port by device 0000:00:05.0!
> >> EAL: can not get port by device 0000:00:05.0!
> >> EAL: can not get port by device 0000:00:05.0!
> >> ...



More information about the dev mailing list