[dpdk-users] Multi-process recovery (is it even possible?)

Lazarenko, Vlad (WorldQuant) Vlad.Lazarenko at worldquant.com
Thu Mar 1 15:53:18 CET 2018


Hello Jianfeng,

Thanks for getting back to me.  I thought about using "udata64", too. But that didn't work for me if a single packet was fanned out to multiple slave processes.  But most importantly, it looks like if a slave process crashes somewhere in the middle of getting or putting packets from/to a pool, we could end up with a deadlock. So I guess I'd have to think about a different design or be ready to bounce all of the processes if one of them fails.

Thanks,
Vlad

> -----Original Message-----
> From: Tan, Jianfeng [mailto:jianfeng.tan at intel.com]
> Sent: Thursday, March 01, 2018 3:20 AM
> To: Lazarenko, Vlad (WorldQuant); 'users at dpdk.org'
> Subject: RE: Multi-process recovery (is it even possible?)
> 
> 
> 
> > -----Original Message-----
> > From: users [mailto:users-bounces at dpdk.org] On Behalf Of Lazarenko,
> > Vlad
> > (WorldQuant)
> > Sent: Thursday, March 1, 2018 2:54 AM
> > To: 'users at dpdk.org'
> > Subject: [dpdk-users] Multi-process recovery (is it even possible?)
> >
> > Guys,
> >
> > I am looking for possible solutions for the following problems that
> > come along with asymmetric multi-process architecture...
> >
> > Given multiple processes share the same RX/TX queue(s) and packet
> > pool(s) and the possibility of one packet from RX queue being fanned
> > out to multiple slave processes, is there a way to recover from slave
> > crashing (or exits w/o cleaning up properly)? In theory it could have
> > incremented mbuf reference count more than once and unless everything
> > is restarted, I don't see a reliable way to release those mbufs back to the
> pool.
> 
> Recycle an element is too difficult; from what I know, it's next to impossible.
> To recycle a memzone/mempool is easier. So in your case, you might want to
> use different pools for different queues (processes).
> 
> If you really want to recycle an element, rte_mbuf in your case, it might be
> doable by:
> 1. set up rx callback for each process, and in the callback, store a special flag
> at rte_mbuf->udata64.
> 2. when the primary to detect a secondary is down, we iterate all element
> with the special flag, and put them back into the ring.
> 
> There is small chance to fail that , mbuf is allocated by a secondary process,
> and before it's flagged, it crashes.
> 
> Thanks,
> Jianfeng
> 
> 
> >
> > Also, if spinlock is involved and either master or slave crashes,
> > everything simply gets stuck. Is there any way to detect this (i.e. outside of
> data path)..?
> >
> > Thanks,
> > Vlad
> >



###################################################################################

The information contained in this communication is confidential, may be

subject to legal privilege, and is intended only for the individual named.

If you are not the named addressee, please notify the sender immediately and

delete this email from your system.  The views expressed in this email are

the views of the sender only.  Outgoing and incoming electronic communications

to this address are electronically archived and subject to review and/or disclosure

to someone other than the recipient.

###################################################################################



More information about the users mailing list