[dpdk-dev] [PATCH v3 6/9] eal: register non-EAL threads as lcores

Thomas Monjalon thomas at monjalon.net
Tue Jun 30 16:35:26 CEST 2020


30/06/2020 14:07, Ananyev, Konstantin:
> > 26/06/2020 16:43, David Marchand:
> > > On Wed, Jun 24, 2020 at 1:59 PM Ananyev, Konstantin
> > > <konstantin.ananyev at intel.com> wrote:
> > > > > > Do you mean - make this new dynamic-lcore API return an error if callied
> > > > > > from secondary process?
> > > > >
> > > > > Yes, and prohibiting from attaching a secondary process if dynamic
> > > > > lcore API has been used in primary.
> > > > > I intend to squash in patch 6:
> > > > > https://github.com/david-marchand/dpdk/commit/e5861ee734bfe2e4dc23d9b919b0db2a32a58aee
> > > >
> > > > But secondary process can attach before lcore_register, so we'll have some sort of inconsistency in behaviour.
> > >
> > > If the developer tries to use both features, he gets an ERROR log in
> > > the two init path.
> > > So whatever the order at runtime, we inform the developer (who did not
> > > read/understand the rte_thread_register() documentation) that what he
> > > is doing is unsupported.
> > 
> > I agree.
> > Before this patch, pinning a thread on a random core can
> > trigger some issues.
> > After this patch, register an external thread will
> > take care of logging errors in case of inconsistencies.
> > So the user will know he is doing something not supported
> > by the app.
> 
> I understand that, and return a meaningful error is definitely
> better the silent crash or memory corruption.
> The problem with that approach, as I said before, MP group
> behaviour becomes non-deterministic.

It was already non-deterministic before these patches.

> > It is an nice improvement.
> > 
> > > > If we really  want to go ahead with such workaround -
> > 
> > It is not a workaround.
> > It is fixing some old issues and making clear what is really impossible.
> 
> The root cause of the problem is in our MP model design decisions:
> from one side we treat lcore_id as process local data, from other side
> in some shared data-structures we use lcore_id as an index.
> I think to fix it properly we need either: 
> make lcore_id data shared or stop using lcore_id as an index for shared data. 
> So from my perspective this approach is just one of possible workarounds.
> BTW, there is nothing wrong to have a workaround for the problem
> we are not ready to fix right now.

I think you are trying to fix multi-process handling.
This patch is not about multi-process, it only highlight incompatibilities.

> > > > probably better to introduce explicit EAL flag ( --single-process or so).
> > > > As Thomas and  Bruce suggested, if I understood them properly.
> > 
> > No I was thinking to maintain the tri-state information:
> > 	- secondary is possible
> > 	- secondary is attached
> > 	- secondary is forbidden
> 
> Ok, then I misunderstood you.
>  
> > Asking the user to use an option to forbid attaching a secondary process
> > is the same as telling him it is forbidden.
> 
> I don't think it is the same.
> On a live and complex system user can't always predict will the primary proc 
> use dynamic lcore and if it will at what particular moment.
> Same for secondary process launching - user might never start it,
> might start it straight after the primary one,
> or might be after several hours. 

I don't see the difference.
An app which register external threads is not compatible
with multi-process. It needs to be clear.
If the user tries to do it anyway, there can be some error, OK.

> > The error log is enough in my opinion.
> 
> I think it is better than nothing, but probably not the best one.
> Apart from possible non-consistent behaviour, it is quite restrictive:
> dynamic lcore_id wouldn't be available on any DPDK MP deployment.
> Which is a pity - I think it is a cool and useful feature.

So you are asking to extend the feature.
Honestly, I'm not a fan of multi-process,
so I would not push any feature for it.

If we don't add any new option now, and restrict MP handling
to error messages, it would not prevent from extending
in future, right?


> What do you guys think about different approach:
> introduce new optional EAL parameter to restrict lcore_id
> values available for the process.
> 
> #let say to start primary proc that can use lcore_id=[0-99] only:
> dpdk_primary --lcore-allow=0-99 ... --file-prefix=xz1
> 
> #to start secondary one for it with allowed lcore_id=[100-109]:
> dpdk_secondary --lcore-allow=100-109 ... --file-prefix=xz1 --proc-type=secondary  
>  
> It is still a workaround, but that way we don't need to
> add any new limitations for dynamic lcores and secondary process usage. 
> Now it is up to user to decide would multiple-process use the same shared data
> and if so - split lcore_id space properly among them
> (same as he has to do now with static lcores).

Isn't it pushing too much to the user?


> > > A EAL flag is a stable API from the start, as there is nothing
> > > describing how we can remove one.
> > > So a new EAL flag for an experimental API/feature seems contradictory.
> > >
> > > Going with a new features status API... I think it is beyond this series.
> > >
> > > Thomas seems to suggest an automatic resolution when features conflict
> > > happens.. ?
> > 
> > I suggest allowing the maximum and raise an error when usage conflicts.
> > It seems this is what you did in v4.
> > 
> > > I'll send the v4, let's discuss it there if you want.





More information about the dev mailing list