[dpdk-dev] [PATCH] net/i40e: Improve i40evf buffer cleanup in tx vector mode
    Bruce Richardson 
    bruce.richardson at intel.com
       
    Fri Oct  6 15:51:53 CEST 2017
    
    
  
On Fri, Oct 06, 2017 at 08:38:01AM -0400, Roger B. Melton wrote:
> On 10/6/17 4:54 AM, Bruce Richardson wrote:
> > On Thu, Oct 05, 2017 at 03:11:11PM -0400, Roger B Melton wrote:
> > > ---
> > > 
> > > i40evf tx vector logic frees mbufs, but it does not remove the
> > > mbufs from software rings which leads to double frees.  This change
> > >   corrects that oversight.  We've validated this fix within our application.
> > > 
> > Hi Roger,
> > 
> > I'm a little concerned here by this driver fix, since if we are getting
> > double frees of mbufs there must be another bug somewhere in tracking
> > the free ring elements. Clearing the entries to NULL introduces extra
> > writes to the vector code path which will likely have a performance
> > impact, but also should be unnecessary, given proper tracking of the
> > ring status.
> > 
> > Can you provide us with some details as to how to reproduce the issue,
> > especially with a public sample app? I'd really like to look more into
> > why this is happening, and if other fixes may work.
> > 
> > Thanks,
> > /Bruce
> > 
> > .
> > 
> Hey Bruce,
> 
> I've not attempted to reproduce the issue using sample apps.  It was
> initially difficult for us to reproduce with our application until we
> stumbled on a recipe.  The symptoms of the double free are two fold:
> 
>  * Application crashes: Corrupted packet headers, walking a chain of rx
>    segments and loosing one in the middle,...
>  * i40e adminq lockup - Meaning VF sends an OP and PF does not process it
> 
> The former has been directly correlated to tx vector mode.  We still don't
> understand how this could lead to adminq lockup, but we have not observed
> the issue applied the patch I submitted.  We have questions out to Intel
> i40e team on this.
> 
> Here's a high level view of the scenario under which the issues are observed
> and how we concluded that there were issues with tx vector free. We provided
> this information to the Intel i40e DPDK team, they reviewed the tx vector
> logic and suggested changes.  With the changes suggested by Intel (the patch
> I submitted) we have re-enabled tx vector when MTU is < MBUF size and
> observed no crashes.
> 
> Below that you will find additional detail on the procedure within our
> application for changing MTU, including DPDK API calls.
> 
> Let me know if you have additional questions.
> 
> Regards,
> Roger
> 
>      o Scenario:
> 
>          + Intel i40e-da4 (4x10G) or Cisco branded equivalent
>          + KVM or ESXi 6.5 host
>          + In a single VM we have at least 1 VF from all 4 PFs.
>          + We are running traffic on all interfaces.
>          + We stop traffic
>          + We stop a device (VF)
>          + We start the device
>          + We start traffic
>          + At any point after we start the device we can crash
> 
>      o Experiment (Jumping over some of the work to get to the point
>        where we believed that i40e driver was doing double frees):
> 
>          + Our application attaches userdata to mbufs and we put that
>            userdata on a linked list.
>          + We observed that when we processed the userdata it had been
>            corrupted which lead to crashes.
>          + This gave us a hint that the mbuf was on multiple lists.
>          + We reviewed our application code and could not find a cause.
>          + We began to suspect a double free in the i40evf PMD.
> 
>              # We disabled rx free logic and observed crashes
>                (intentionally leaking mbufs in search of the double free).
>              # We disabled tx free logic and observed no crashes
>              # This gave us a hint that the double frees were coming
>                from the i40evf PMD tx logic.
> 
>          + We had also observed that if we forced MTU to large always
>            that there were no crashes
> 
>              # A side effect of forcing large MTU is that multi-segment
>                is enabled.
>              # This gave us a hint that enabling multi-segment was
>                somehow avoiding the double free.
> 
>          + We forced multi-segments regardless of MTU and permitted MTU
>            changes and observed no crashes.
>          + We reviewed the i40evf mbuf free logic to see the effect of
>            enabling multi-segment and observed that when multi-segment
>            is enabled, rx vector was enabled but tx vector was not.
>          + This lead us to examine RX vector mode free logic vs TX
>            vector mode free logic.
> 
>              # RX free logic has special handling for vector mode free
>              # TX free logic does not have any special handling for
>                vector free
> 
>      o By enabling multiple segments always (even if MTU does not
>        require multiple segments), we effectively disabled tx vector
>        mode and this avoided the double free.
>      o Our application no longer crashed, but it could not take
>        advantage of tx vector optimizations.
> 
> 
> 
> 
> 
> CP == Control Plane
> DP == Data Plane
> 
>  * CP sends admin down to DP
>  * DP disables RX/TX
>      o Block all future tx burst calls
>      o rte_eth_dev_set_link_down() invoked
>      o Block all future rx burst calls
>  * DP notifies CP admin down action is complete
>  * CP sends MTU change
>  * DP processes MTU change
>      o For each rxq:
>          + rte_eth_rx_queue_info_get()
>          + if not rxq_deferred_start rte_eth_dev_rx_queue_stop()
>      o For each txq:
>          + rte_eth_tx_queue_info_get()
>          + if not txq_deferred_start rte_eth_dev_tx_queue_stop()
>      o rte_eth_dev_stop()
>      o Re-configure the port: (Note this is original code, not new code
>        which is forcing multisegs always)
>          + Set rx_mode.jumbo_frame if MTU > 1518
>          + Set rx_mode.enable_scatter if MTU > 2048
>          + txq_flags = ETH_TXQ_NOOFLOADS
>          + if MTU > 2048, txq_flags |= ETH_TXQ_FLAGS_NOMULTSEGS
>          + rte_eth_promiscuous_get()
>          + rte_eth_dev_info_get()
>          + rte_eth_dev_configure()
>              # Init tx_vec_allowed and rx_vec_allowed to TRUE.
>          + rte_eth_dev_info_get()
>          + For each txq: rte_eth_tx_queue_setup()
>              # If new MTU > 2048, ETH_TXQ_FLAGS_NOMULTSEGS was set in
>                txq_flags & tx_vec_allowed will be cleared.
>          + For each rxq: rte_eth_rx_queue_setup()
>      o rte_eth_dev_set_mtu()
>      o rte_eth_dev_start()
>      o rte_eth_dev_info_get()
>      o For each rxq:
>          + if not rxq_deferred_start rte_eth_dev_rx_queue_start()
>      o For each txq:
>          + if not txq_deferred_start rte_eth_dev_tx_queue_start()
>  * DP notifies CP MTU change applied.
>  * CP sends admin up to DP
>  * DP enables RX/TX
>      o Enable all future tx burst calls
>      o rte_eth_dev_set_link_up() invoked
>      o Enable all future rx burst calls
>  * DP notifies CP admin up action is complete
> 
> 
> 
Thanks for the explanation.
I'll follow-up with our i40e driver team to see if they can give me more
detail on their investigation. I'd like to know more details about the
ultimate root cause of this double free - presumably it's something to
do with the behaviour when stopping and starting the queue or port - and
find a solution that does not involve more writes on the fast-path. If
we don't come up with a better solution, this fix will do, though.
Regards,
/Bruce
    
    
More information about the dev
mailing list