[dpdk-dev] [PATCH 3/4] ixgbe: automatic link recovery on VF

Lu, Wenzhuo wenzhuo.lu at intel.com
Tue May 17 10:20:00 CEST 2016


Hi Olivier,

> -----Original Message-----
> From: Olivier MATZ [mailto:olivier.matz at 6wind.com]
> Sent: Tuesday, May 17, 2016 3:51 PM
> To: Lu, Wenzhuo; dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH 3/4] ixgbe: automatic link recovery on VF
> 
> Hi Wenzhuo,
> 
> On 05/17/2016 03:11 AM, Lu, Wenzhuo wrote:
> >> -----Original Message-----
> >> From: Olivier Matz [mailto:olivier.matz at 6wind.com] If I understand
> >> well, ixgbevf_dev_link_up_down_handler() is called by
> >> ixgbevf_recv_pkts_fake() on a dataplane core. It means that the core
> >> that acquired the lock will loop during 100us + 1sec at least.
> >> If this core was also in charge of polling other queues of other
> >> ports, or timers, many packets will be dropped (even with a 100us
> >> loop). I don't think it is acceptable to actively wait inside a rx function.
> >>
> >> I think it would avoid many issues to delegate this work to the
> >> application, maybe by notifying it that the port is in a bad state
> >> and must be restarted. The application could then properly stop
> >> polling the queues, and stop and restart the port in a separate thread,
> without bothering the dataplane cores.
> > Thanks for the comments.
> > Yes, you're right. I had a wrong assumption that every queue is handled by one
> core.
> > But surely it's not right, we cannot tell how the users will deploy their system.
> >
> > I plan to update this patch set. The solution now is, first let the
> > users choose if they want this auto-reset feature. If so, we will
> > apply another series rx/tx functions which have lock. So we can stop the rx/tx
> of the bad ports.
> > And we also apply a reset API for users. The APPs should call this API in their
> management thread or so.
> > It means APPs should guarantee the thread safe for the API.
> > You see, there're 2 things,
> > 1, Lock the rx/tx to stop them for users.
> > 2, Apply a resetting API for users, and every NIC can do their own
> > job. APPs need not to worry about the difference between different NICs.
> >
> > Surely, it's not *automatic* now. The reason is DPDK doesn't guarantee
> > the thread safe. So the operations have to be left to the APPs and let them to
> guarantee the thread safe.
> >
> > And if the users choose not using auto-reset feature, we will leave
> > this work to the APP :)
> 
> Yes, I think having 2 modes is a good approach:
> 
> - the first mode would let the application know a reset has to
>    be performed, without active loop or lock in the rx/tx funcs.
> - the second mode would transparently manage the reset in the driver,
>    but may lock the core during some time.
For the second mode, at first we want to let the driver manage the reset transparently. But the bad news is
in driver layer the operations is not thread safe. If we want the reset to be transparent,
we need a whole new mechanism to guarantee the thread safe for the operations in driver layer.
Obviously, it need to be discussed and cannot be finished in this release.
So now we write a reset API for APP, and let APP call this API and guarantee the thread safe for all the operations.
It's not transparent. But seems it's what we can do at this stage.

> 
> By the way, you talk about a reset API, why not just using the usual stop/start
> functions? I think it would work the same.
For ixgbe/igb, stop/start is enough. But for i40e, some other work should be done. (For example, the resource of the queues should be re-init.)
So we think about introducing a new API, then different NICs can do what they have to do.

> 
> Regards,
> Olivier


More information about the dev mailing list