[dpdk-users] Multi-process recovery (is it even possible?)

Tan, Jianfeng jianfeng.tan at intel.com
Fri Mar 2 01:39:58 CET 2018

Previous message: [dpdk-users] Multi-process recovery (is it even possible?)
Next message: [dpdk-users] PMD for Broadcom/Emulex OCe14000 OCP Skyhawk-R
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]


> -----Original Message-----
> From: Lazarenko, Vlad (WorldQuant)
> [mailto:Vlad.Lazarenko at worldquant.com]
> Sent: Thursday, March 1, 2018 10:53 PM
> To: Tan, Jianfeng; 'users at dpdk.org'
> Subject: RE: Multi-process recovery (is it even possible?)
> 
> Hello Jianfeng,
> 
> Thanks for getting back to me.  I thought about using "udata64", too. But that
> didn't work for me if a single packet was fanned out to multiple slave
> processes.  But most importantly, it looks like if a slave process crashes
> somewhere in the middle of getting or putting packets from/to a pool, we
> could end up with a deadlock. So I guess I'd have to think about a different
> design or be ready to bounce all of the processes if one of them fails.

OK, a better design to avoid such hard issue is good way to go. Good luck!

Thanks,
Jianfeng

> 
> Thanks,
> Vlad
> 
> > -----Original Message-----
> > From: Tan, Jianfeng [mailto:jianfeng.tan at intel.com]
> > Sent: Thursday, March 01, 2018 3:20 AM
> > To: Lazarenko, Vlad (WorldQuant); 'users at dpdk.org'
> > Subject: RE: Multi-process recovery (is it even possible?)
> >
> >
> >
> > > -----Original Message-----
> > > From: users [mailto:users-bounces at dpdk.org] On Behalf Of Lazarenko,
> > > Vlad
> > > (WorldQuant)
> > > Sent: Thursday, March 1, 2018 2:54 AM
> > > To: 'users at dpdk.org'
> > > Subject: [dpdk-users] Multi-process recovery (is it even possible?)
> > >
> > > Guys,
> > >
> > > I am looking for possible solutions for the following problems that
> > > come along with asymmetric multi-process architecture...
> > >
> > > Given multiple processes share the same RX/TX queue(s) and packet
> > > pool(s) and the possibility of one packet from RX queue being fanned
> > > out to multiple slave processes, is there a way to recover from slave
> > > crashing (or exits w/o cleaning up properly)? In theory it could have
> > > incremented mbuf reference count more than once and unless
> everything
> > > is restarted, I don't see a reliable way to release those mbufs back to the
> > pool.
> >
> > Recycle an element is too difficult; from what I know, it's next to impossible.
> > To recycle a memzone/mempool is easier. So in your case, you might want
> to
> > use different pools for different queues (processes).
> >
> > If you really want to recycle an element, rte_mbuf in your case, it might be
> > doable by:
> > 1. set up rx callback for each process, and in the callback, store a special flag
> > at rte_mbuf->udata64.
> > 2. when the primary to detect a secondary is down, we iterate all element
> > with the special flag, and put them back into the ring.
> >
> > There is small chance to fail that , mbuf is allocated by a secondary process,
> > and before it's flagged, it crashes.
> >
> > Thanks,
> > Jianfeng
> >
> >
> > >
> > > Also, if spinlock is involved and either master or slave crashes,
> > > everything simply gets stuck. Is there any way to detect this (i.e. outside
> of
> > data path)..?
> > >
> > > Thanks,
> > > Vlad
> > >
> 
> 
> 
> ##########################################################
> #########################
> 
> The information contained in this communication is confidential, may be
> 
> subject to legal privilege, and is intended only for the individual named.
> 
> If you are not the named addressee, please notify the sender immediately
> and
> 
> delete this email from your system.  The views expressed in this email are
> 
> the views of the sender only.  Outgoing and incoming electronic
> communications
> 
> to this address are electronically archived and subject to review and/or
> disclosure
> 
> to someone other than the recipient.
> 
> ##########################################################
> #########################

Previous message: [dpdk-users] Multi-process recovery (is it even possible?)
Next message: [dpdk-users] PMD for Broadcom/Emulex OCe14000 OCP Skyhawk-R
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the users mailing list