[dpdk-dev] [PATCH v9 1/8] ethdev: use locks to protect Rx/Tx callback lists

Ananyev, Konstantin konstantin.ananyev at intel.com
Wed Jun 15 16:20:27 CEST 2016



> -----Original Message-----
> From: Ivan Boule [mailto:ivan.boule at 6wind.com]
> Sent: Wednesday, June 15, 2016 3:07 PM
> To: Richardson, Bruce; Ananyev, Konstantin
> Cc: Thomas Monjalon; Pattan, Reshma; dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v9 1/8] ethdev: use locks to protect Rx/Tx callback lists
> 
> On 06/15/2016 03:29 PM, Bruce Richardson wrote:
> > On Wed, Jun 15, 2016 at 12:40:16PM +0000, Ananyev, Konstantin wrote:
> >> Hi Ivan,
> >>
> >>> -----Original Message-----
> >>> From: Ivan Boule [mailto:ivan.boule at 6wind.com]
> >>> Sent: Wednesday, June 15, 2016 1:15 PM
> >>> To: Thomas Monjalon; Ananyev, Konstantin
> >>> Cc: Pattan, Reshma; dev at dpdk.org
> >>> Subject: Re: [dpdk-dev] [PATCH v9 1/8] ethdev: use locks to protect Rx/Tx callback lists
> >>>
> >>> On 06/15/2016 10:48 AM, Thomas Monjalon wrote:
> >>>
> >>>>>
> >>>>>> I think the read access would need locking but we do not want it
> >>>>>> in fast path.
> >>>>>
> >>>>> I don't think it would be needed.
> >>>>> As I said - read/write interaction didn't change from what we have right now.
> >>>>> But if you have some particular scenario in mind that you believe would cause
> >>>>> a race condition - please speak up.
> >>>>
> >>>> If we add/remove a callback during a burst? Is it possible that the next
> >>>> pointer would have a wrong value leading to a crash?
> >>>> Maybe we need a comment to state that we should not alter burst
> >>>> callbacks while running burst functions.
> >>>>
> >>>
> >>> Hi Reshma,
> >>>
> >>> You claim that the "rte_eth_rx_cb_lock" does not need to be hold in the
> >>> function "rte_eth_rx_burst()" while parsing the "post_rx_burst_cbs" list
> >>> of RX callbacks associated with the polled RX queue to safely remove RX
> >>> callback(s) in parallel.
> >>> The problem is not [only] with the setting and the loading of "cb->next"
> >>> that you assume to be atomic operations, which is certainly true on most
> >>> CPUs.
> >>> I see the 2 important following issues:
> >>>
> >>> 1) the "rte_eth_rxtx_callback" data structure associated with a removed
> >>> RX callback could still be accessed in the callback parsing loop of the
> >>> function "rte_eth_rx_burst()" after having been freed in parallel.
> >>>
> >>> BUT SUCH A BAD SITUATION WILL NOT CURRENTLY HAPPEN, THANKS TO THE NICE
> >>> MEMORY LEAK BUG in the function "rte_eth_remove_rx_callback()"  that
> >>> does not free the "rte_eth_rxtx_callback" data structure associated with
> >>> the removed callback !
> >>
> >> Yes, though it is documented behaviour, someone can probably
> >> refer it as a feature, not a bug ;)
> >>
> >
> > +1
> > This is definitely not a bug, this is absolutely by design. One may argue with
> > the design, but it was done for a definite reason, so as to avoid paying the
> > penalty of having locks. It pushes more responsibility onto the app, but it
> > does allow the app to choose the best solution for managing the freeing of
> > memory for its situation. The alternative is to force all apps to pay the cost
> > of having locks, even if better options for freeing the memory are available.
> >
> > /Bruce
> >
> 
> -1 (not to say 0xFFFFFFFF)
> 
> This is definitely an API design bug !
> I would say that if you don't know how to free a resource that you
> allocate, it is very likely that you are wrong allocating it.
> And this is exactly what happens here with RX/TX callback data structures.
> This problem can easily be addressed by just changing the API as follows:
> 
> Change
>      void *
>      rte_eth_add_rx_callback(uint8_t port_id, uint16_t queue_id,
>                              rte_rx_callback_fn fn, void *user_param)
> 
> to
>      int
>      rte_eth_add_rx_callback(uint8_t port_id, uint16_t queue_id,
>                              struct rte_eth_rxtx_callback *cb)
> 
> In addition of solving the problem, this approach makes the API
> consistent and let the application allocate "rte_eth_rxtx_callback" data
> structures through any appropriate mean.

That might make API a bit cleaner, but I don't see how it fixes the generic problem:
there is still no way to know by the upper layer when it is safe to free/re-use
removed callback, but to make sure that all IO on that queue is stopped
(I.E. some external synchronisation around the queue).   
As you said in previous mail: 
> This is an example of a well-known more generic object deletion problem
> which must arrange to guarantee that a deleted object is not used and
> not accessible for use anymore before being actually deleted (freed, for
> instance).
Konstantin

> 
> Regards,
> Ivan
> 
> --
> Ivan Boule
> 6WIND Development Engineer


More information about the dev mailing list