<div dir="auto">I think at least this should be followed either in PMD or in ethdev layer to avoid this specific crashes. </div><br><div class="gmail_quote gmail_quote_container"><div dir="ltr" class="gmail_attr">On Mon, Jul 21, 2025, 17:00 Thomas Monjalon <<a href="mailto:thomas@monjalon.net">thomas@monjalon.net</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">21/07/2025 13:46, Ivan Malov:<br>
> On Mon, 21 Jul 2025, Dariusz Sosnowski wrote:<br>
> <br>
> > + mlx5 maintainers<br>
> ><br>
> > Thank you for the patch.<br>
> ><br>
> > Could you please include other PMD maintainers (or other maintainers, depending on changed code)<br>
> > in the future patches?<br>
> > There is a script which automatically adds maintainers while sending a patch.<br>
> > It is described in: <a href="https://doc.dpdk.org/guides/contributing/patches.html#sending-patches" rel="noreferrer noreferrer" target="_blank">https://doc.dpdk.org/guides/contributing/patches.html#sending-patches</a><br>
> ><br>
> > On Mon, Jul 21, 2025 at 03:38:51AM -0400, Khadem Ullah wrote:<br>
> >> When the primary process exits, the shared mlx5 state becomes<br>
> >> unavailable to secondary processes. If a secondary process attempts<br>
> >> to query device information (e.g., via testpmd), a NULL dereference<br>
> >> may occur due to missing shared data.<br>
> >><br>
> >> This patch adds a check for shared context availability and fails<br>
> >> gracefully while preventing a crash.<br>
> >><br>
> >> Fixes: e60fbd5b24fc ("mlx5: add device configure/start/stop")<br>
> >> Cc: <a href="mailto:stable@dpdk.org" target="_blank" rel="noreferrer">stable@dpdk.org</a><br>
> >><br>
> >> Signed-off-by: Khadem Ullah <<a href="mailto:14pwcse1224@uetpeshawar.edu.pk" target="_blank" rel="noreferrer">14pwcse1224@uetpeshawar.edu.pk</a>><br>
> >> ---<br>
> >> drivers/net/mlx5/mlx5_ethdev.c | 6 ++++++<br>
> >> 1 file changed, 6 insertions(+)<br>
> >><br>
> >> diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c<br>
> >> index 68d1c1bfa7..1848f6536a 100644<br>
> >> --- a/drivers/net/mlx5/mlx5_ethdev.c<br>
> >> +++ b/drivers/net/mlx5/mlx5_ethdev.c<br>
> >> @@ -368,6 +368,12 @@ mlx5_dev_infos_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *info)<br>
> >> * Since we need one CQ per QP, the limit is the minimum number<br>
> >> * between the two values.<br>
> >> */<br>
> >> + if (priv == NULL || priv->sh == NULL) {<br>
> >> + DRV_LOG(ERR,<br>
> >> + "mlx5 shared data unavailable (primary process likely exited)");<br>
> >> + rte_errno = ENODEV;<br>
> >> + return -rte_errno;<br>
> >> + }<br>
> ><br>
> > I don't think it's an issue on PMD level, but rather on<br>
> > ethdev/multi-process handling level.<br>
> <br>
> I may be very wrong here, but can't the PMD use its internal 'shared' state to<br>
> somehow memorise the fact that a secondary process has attached, in order to not<br>
> let the primary process close in the first place (before the secondary process<br>
> detaches)? Or is this going to violate some established convention?<br>
<br>
It looks to be a good idea.<br>
<br>
> > When primary process closes the port, ethdev library zeroes and frees<br>
> > device data shared between processes.<br>
> > ethdev port data (rte_eth_dev) on secondary is not updated so it now points to<br>
> > invalid data. rte_eth_dev_info_get() is not the only API call affected.<br>
> ><br>
> > If the primary process closes the port before exiting<br>
> > (like testpmd does) and it exits before the secondary,<br>
> > the any driver call seems invalid because of that use-after-free behavior.<br>
> ><br>
> > @Thomas, @Andrew - Do you happen to know if doing anything on ethdev ports<br>
> > in secondary process after primary has gracefully exited is supported?<br>
<br>
No there is no statement about whether it is supported or not.<br>
I think we should at least return an error instead of crashing.<br>
<br>
<br>
</blockquote></div>