[dpdk-dev] [PATCH 00/22] enable hotplug on multi-process

Zhang, Qi Z qi.z.zhang at intel.com
Tue Jun 19 04:43:34 CEST 2018


Hi Anatoly:

	Thanks for the review, see my reply in inline.

> -----Original Message-----
> From: Burakov, Anatoly
> Sent: Friday, June 15, 2018 11:16 PM
> To: Zhang, Qi Z <qi.z.zhang at intel.com>; thomas at monjalon.net
> Cc: Ananyev, Konstantin <konstantin.ananyev at intel.com>; dev at dpdk.org;
> Richardson, Bruce <bruce.richardson at intel.com>; Yigit, Ferruh
> <ferruh.yigit at intel.com>; Shelton, Benjamin H
> <benjamin.h.shelton at intel.com>; Vangati, Narender
> <narender.vangati at intel.com>
> Subject: Re: [PATCH 00/22] enable hotplug on multi-process
> 
> Hi Qi,
> 
> I haven't read the code yet, and i'll be the first to admit that i'm not too well
> versed on how shared/private device data works, so my apologies in advance
> if all of below comments are addressed by implementation details or are way
> off base!
> 
> 
> 
> On 07-Jun-18 1:38 PM, Qi Zhang wrote:
> >
> > 1. Attach a share device from primary
> > 2. Detach a share device from primary
> > 3. Attach a share device from secondary 4. Detach a share device from
> > secondary 5. Attach a private device from secondary 6. Detach a
> > private device from secondary 7. Detach a share device from secondary
> > privately 8. Attach a share device from secondary privately
> >
> 
> <...>
> 
> > Case 7, 8:
> > Secondary process can also temporally to detach a share device
> > "privately" then attach it back later, this action also not impact other
> > processes.
> >
> 
> Do we really need to implement these cases? It seems to me that this
> "reattach it later" introduces unnecessary complexity. 

I agree it's not necessary, but this looks like a free feature based on current implementation :)


>If secondary has
> detached the device, I think it is safer if we cannot reattach it,
> period, because it was a shared device. What if we try to attach it when
> a handshake has already completed and all other processes expect to
> detach it?

in the case: attach back a shared device already be detached will fail as expected.

For PCI devices, it will fail at driver probe.
For vdev, it will failed at rte_eth_dev_attach_secondary.

> 
> (in fact, do we differentiate between non-existent device and shared
> device that has been "privately detached"? I would expect that we keep
> the device as detached as opposed to forgetting about it, so that, come
> handshake, we can safely reply "yeah, we can detach the device", but
> maybe it's OK to not treat request to detach a non-existent device as an
> error... thoughts? am i getting something wrong?)
> 
> > APIs chenages:
> > ==============
> >
> > rte_eth_dev_attach and rte_eth_dev_attach are extended to support
> > share device attach/detach in primary-secondary process model, it will
> > be called in case 1,2,3,4.
> >
> > New API rte_eth_dev_attach_private and rte_eth_dev_detach_private are
> > introduced to cover case 5,6,7,8, this API can only be invoked in
> > secondary process.
> >
> > New API rte_eth_dev_lock and rte_eth_dev_unlock are introduced to let
> > application lock or unlock on specific ethdev, a locked device
> > can't be detached. This help applicaiton to prevent unexpected
> > device detaching, especially in multi-process envrionment.
> > Aslo the new API let application to register a callback function
> > which will be invoked before a device is going to be detached,
> > the return value of the function will decide if device will continue
> > be detached or not, this support application to do condition check
> > at runtime.
> 
> I assume that you've added device locking to avoid having to do
> handshake before detach, correct? 
Yes.
>Is this a shared lock of some kind, or
> is it a private lock? If it's shared lock, what happens if the process
> holding that lock crashes?

It’s a kind of process's private lock, but a shared device be locked any process will prevent it be detached.

> 
> >
> > PMD Impact:
> > ===========
> >
> > Currently device removing is not handled well in secondary process on
> most
> > pmd drivers, rte_eth_dev_relase_port will be invoked and will mess up
> > primary process since it reset all shared data. So we introduced new API
> > rte_eth_dev_release_port_local which only reset ethdev's state to unsued
> but
> > not touch shared data so other process will not be impacted.
> > Since not all device driver is target to support primary-secondary
> > process model, so the patch set only fix this on all Intel devices and
> > vdev, it can be refereneced by other driver when equevalent fix is required
> 
> Nitpick - why the naming mismatch between *_private() and *_local()?
Agree.
"private" make the API more identical.

> 
> >
> > Limitation:
> > ===========
> >
> > The solution does not cover the case that primary process exit while
> > secondary processes still be active. Though this is not a typial use
> > case, but if this happens:
> > 1. secondary process can't attach / detach any shared device since no
> > primary exist.
> > 2. secondary process still can attach / detach private device.
> > 3. secondary process still can detach a share device privately but may
> > not attach it back, that ethdev slot will become zombie slot.
> 
> I think this should be explicit and by design. Shared devices can only
> be communicated to all secondaries through a primary process. No primary
> - no shared devices. I don't think we can do anything about it unless we
> implement some kind of peer-to-peer IPC (which isn't happening as far as
> i'm aware).

Agree.

> 
> 
> 
> Thanks for your work on this patchset!

Thanks for the design review and all the helpful inputs.

Regards
Qi

> 
> --
> Thanks,
> Anatoly


More information about the dev mailing list