[dpdk-dev] [PATCH] net/i40e: Improve i40evf buffer cleanup in tx vector mode
Roger B. Melton
rmelton at cisco.com
Fri Oct 6 16:06:02 CEST 2017
On 10/6/17 9:51 AM, Bruce Richardson wrote:
> On Fri, Oct 06, 2017 at 08:38:01AM -0400, Roger B. Melton wrote:
>> On 10/6/17 4:54 AM, Bruce Richardson wrote:
>>> On Thu, Oct 05, 2017 at 03:11:11PM -0400, Roger B Melton wrote:
>>>> i40evf tx vector logic frees mbufs, but it does not remove the
>>>> mbufs from software rings which leads to double frees. This change
>>>> corrects that oversight. We've validated this fix within our application.
>>> Hi Roger,
>>> I'm a little concerned here by this driver fix, since if we are getting
>>> double frees of mbufs there must be another bug somewhere in tracking
>>> the free ring elements. Clearing the entries to NULL introduces extra
>>> writes to the vector code path which will likely have a performance
>>> impact, but also should be unnecessary, given proper tracking of the
>>> ring status.
>>> Can you provide us with some details as to how to reproduce the issue,
>>> especially with a public sample app? I'd really like to look more into
>>> why this is happening, and if other fixes may work.
>> Hey Bruce,
>> I've not attempted to reproduce the issue using sample apps. It was
>> initially difficult for us to reproduce with our application until we
>> stumbled on a recipe. The symptoms of the double free are two fold:
>> * Application crashes: Corrupted packet headers, walking a chain of rx
>> segments and loosing one in the middle,...
>> * i40e adminq lockup - Meaning VF sends an OP and PF does not process it
>> The former has been directly correlated to tx vector mode. We still don't
>> understand how this could lead to adminq lockup, but we have not observed
>> the issue applied the patch I submitted. We have questions out to Intel
>> i40e team on this.
>> Here's a high level view of the scenario under which the issues are observed
>> and how we concluded that there were issues with tx vector free. We provided
>> this information to the Intel i40e DPDK team, they reviewed the tx vector
>> logic and suggested changes. With the changes suggested by Intel (the patch
>> I submitted) we have re-enabled tx vector when MTU is < MBUF size and
>> observed no crashes.
>> Below that you will find additional detail on the procedure within our
>> application for changing MTU, including DPDK API calls.
>> Let me know if you have additional questions.
>> o Scenario:
>> + Intel i40e-da4 (4x10G) or Cisco branded equivalent
>> + KVM or ESXi 6.5 host
>> + In a single VM we have at least 1 VF from all 4 PFs.
>> + We are running traffic on all interfaces.
>> + We stop traffic
>> + We stop a device (VF)
>> + We start the device
>> + We start traffic
>> + At any point after we start the device we can crash
>> o Experiment (Jumping over some of the work to get to the point
>> where we believed that i40e driver was doing double frees):
>> + Our application attaches userdata to mbufs and we put that
>> userdata on a linked list.
>> + We observed that when we processed the userdata it had been
>> corrupted which lead to crashes.
>> + This gave us a hint that the mbuf was on multiple lists.
>> + We reviewed our application code and could not find a cause.
>> + We began to suspect a double free in the i40evf PMD.
>> # We disabled rx free logic and observed crashes
>> (intentionally leaking mbufs in search of the double free).
>> # We disabled tx free logic and observed no crashes
>> # This gave us a hint that the double frees were coming
>> from the i40evf PMD tx logic.
>> + We had also observed that if we forced MTU to large always
>> that there were no crashes
>> # A side effect of forcing large MTU is that multi-segment
>> is enabled.
>> # This gave us a hint that enabling multi-segment was
>> somehow avoiding the double free.
>> + We forced multi-segments regardless of MTU and permitted MTU
>> changes and observed no crashes.
>> + We reviewed the i40evf mbuf free logic to see the effect of
>> enabling multi-segment and observed that when multi-segment
>> is enabled, rx vector was enabled but tx vector was not.
>> + This lead us to examine RX vector mode free logic vs TX
>> vector mode free logic.
>> # RX free logic has special handling for vector mode free
>> # TX free logic does not have any special handling for
>> vector free
>> o By enabling multiple segments always (even if MTU does not
>> require multiple segments), we effectively disabled tx vector
>> mode and this avoided the double free.
>> o Our application no longer crashed, but it could not take
>> advantage of tx vector optimizations.
>> CP == Control Plane
>> DP == Data Plane
>> * CP sends admin down to DP
>> * DP disables RX/TX
>> o Block all future tx burst calls
>> o rte_eth_dev_set_link_down() invoked
>> o Block all future rx burst calls
>> * DP notifies CP admin down action is complete
>> * CP sends MTU change
>> * DP processes MTU change
>> o For each rxq:
>> + rte_eth_rx_queue_info_get()
>> + if not rxq_deferred_start rte_eth_dev_rx_queue_stop()
>> o For each txq:
>> + rte_eth_tx_queue_info_get()
>> + if not txq_deferred_start rte_eth_dev_tx_queue_stop()
>> o rte_eth_dev_stop()
>> o Re-configure the port: (Note this is original code, not new code
>> which is forcing multisegs always)
>> + Set rx_mode.jumbo_frame if MTU > 1518
>> + Set rx_mode.enable_scatter if MTU > 2048
>> + txq_flags = ETH_TXQ_NOOFLOADS
>> + if MTU > 2048, txq_flags |= ETH_TXQ_FLAGS_NOMULTSEGS
>> + rte_eth_promiscuous_get()
>> + rte_eth_dev_info_get()
>> + rte_eth_dev_configure()
>> # Init tx_vec_allowed and rx_vec_allowed to TRUE.
>> + rte_eth_dev_info_get()
>> + For each txq: rte_eth_tx_queue_setup()
>> # If new MTU > 2048, ETH_TXQ_FLAGS_NOMULTSEGS was set in
>> txq_flags & tx_vec_allowed will be cleared.
>> + For each rxq: rte_eth_rx_queue_setup()
>> o rte_eth_dev_set_mtu()
>> o rte_eth_dev_start()
>> o rte_eth_dev_info_get()
>> o For each rxq:
>> + if not rxq_deferred_start rte_eth_dev_rx_queue_start()
>> o For each txq:
>> + if not txq_deferred_start rte_eth_dev_tx_queue_start()
>> * DP notifies CP MTU change applied.
>> * CP sends admin up to DP
>> * DP enables RX/TX
>> o Enable all future tx burst calls
>> o rte_eth_dev_set_link_up() invoked
>> o Enable all future rx burst calls
>> * DP notifies CP admin up action is complete
> Thanks for the explanation.
> I'll follow-up with our i40e driver team to see if they can give me more
> detail on their investigation. I'd like to know more details about the
> ultimate root cause of this double free - presumably it's something to
> do with the behaviour when stopping and starting the queue or port - and
> find a solution that does not involve more writes on the fast-path. If
> we don't come up with a better solution, this fix will do, though.
Thanks Bruce, I'll go ahead and submit V2 properly signed off with same
More information about the dev