[dpdk-dev] [PATCH 00/22] enable hotplug on multi-process

Burakov, Anatoly anatoly.burakov at intel.com
Fri Jun 15 17:16:00 CEST 2018


Hi Qi,

I haven't read the code yet, and i'll be the first to admit that i'm not 
too well versed on how shared/private device data works, so my apologies 
in advance if all of below comments are addressed by implementation 
details or are way off base!



On 07-Jun-18 1:38 PM, Qi Zhang wrote:
> 
> 1. Attach a share device from primary
> 2. Detach a share device from primary
> 3. Attach a share device from secondary
> 4. Detach a share device from secondary
> 5. Attach a private device from secondary
> 6. Detach a private device from secondary
> 7. Detach a share device from secondary privately
> 8. Attach a share device from secondary privately
> 

<...>

> Case 7, 8:
> Secondary process can also temporally to detach a share device
> "privately" then attach it back later, this action also not impact other
> processes.
>

Do we really need to implement these cases? It seems to me that this 
"reattach it later" introduces unnecessary complexity. If secondary has 
detached the device, i think it is safer if we cannot reattach it, 
period, because it was a shared device. What if we try to attach it when 
a handshake has already completed and all other processes expect to 
detach it?

(in fact, do we differentiate between non-existent device and shared 
device that has been "privately detached"? I would expect that we keep 
the device as detached as opposed to forgetting about it, so that, come 
handshake, we can safely reply "yeah, we can detach the device", but 
maybe it's OK to not treat request to detach a non-existent device as an 
error... thoughts? am i getting something wrong?)

> APIs chenages:
> ==============
> 
> rte_eth_dev_attach and rte_eth_dev_attach are extended to support
> share device attach/detach in primary-secondary process model, it will
> be called in case 1,2,3,4.
> 
> New API rte_eth_dev_attach_private and rte_eth_dev_detach_private are
> introduced to cover case 5,6,7,8, this API can only be invoked in
> secondary process.
> 
> New API rte_eth_dev_lock and rte_eth_dev_unlock are introduced to let
> application lock or unlock on specific ethdev, a locked device
> can't be detached. This help applicaiton to prevent unexpected
> device detaching, especially in multi-process envrionment.
> Aslo the new API let application to register a callback function
> which will be invoked before a device is going to be detached,
> the return value of the function will decide if device will continue
> be detached or not, this support application to do condition check
> at runtime.

I assume that you've added device locking to avoid having to do 
handshake before detach, correct? Is this a shared lock of some kind, or 
is it a private lock? If it's shared lock, what happens if the process 
holding that lock crashes?

> 
> PMD Impact:
> ===========
> 
> Currently device removing is not handled well in secondary process on most
> pmd drivers, rte_eth_dev_relase_port will be invoked and will mess up
> primary process since it reset all shared data. So we introduced new API
> rte_eth_dev_release_port_local which only reset ethdev's state to unsued but
> not touch shared data so other process will not be impacted.
> Since not all device driver is target to support primary-secondary
> process model, so the patch set only fix this on all Intel devices and
> vdev, it can be refereneced by other driver when equevalent fix is required

Nitpick - why the naming mismatch between *_private() and *_local()?

> 
> Limitation:
> ===========
> 
> The solution does not cover the case that primary process exit while
> secondary processes still be active. Though this is not a typial use
> case, but if this happens:
> 1. secondary process can't attach / detach any shared device since no
> primary exist.
> 2. secondary process still can attach / detach private device.
> 3. secondary process still can detach a share device privately but may
> not attach it back, that ethdev slot will become zombie slot.

I think this should be explicit and by design. Shared devices can only 
be communicated to all secondaries through a primary process. No primary 
- no shared devices. I don't think we can do anything about it unless we 
implement some kind of peer-to-peer IPC (which isn't happening as far as 
i'm aware).



Thanks for your work on this patchset!

-- 
Thanks,
Anatoly


More information about the dev mailing list