[PATCH v2 1/3] net/bonding: support Tx prepare
Chas Williams
3chas3 at gmail.com
Sat Sep 17 15:38:29 CEST 2022
On 9/16/22 22:35, fengchengwen wrote:
> Hi Chas,
>
> On 2022/9/15 0:59, Chas Williams wrote:
>> On 9/13/22 20:46, fengchengwen wrote:
>>>
>>> The main problem is hard to design a tx_prepare for bonding device:
>>> 1. as Chas Williams said, there maybe twice hash calc to get target slave
>>> devices.
>>> 2. also more important, if the slave devices have changes(e.g. slave device
>>> link down or remove), and if the changes happens between bond-tx-prepare and
>>> bond-tx-burst, the output slave will changes, and this may lead to checksum
>>> failed. (Note: a bond device with slave devices may from different vendors,
>>> and slave devices may have different requirements, e.g. slave-A support calc
>>> IPv4 pseudo-head automatic (no need driver pre-calc), but slave-B need driver
>>> pre-calc).
>>>
>>> Current design cover the above two scenarios by using in-place tx-prepare. and
>>> in addition, bond devices are not transparent to applications, I think it's a
>>> practical method to provide tx-prepare support in this way.
>>>
>>
>>
>> I don't think you need to export an enable/disable routine for the use of
>> rte_eth_tx_prepare. It's safe to just call that routine, even if it isn't
>> implemented. You are just trading one branch in DPDK librte_eth_dev for a
>> branch in drivers/net/bonding.
>
> Our first patch was just like yours (just add tx-prepare default), but community
> is concerned about impacting performance.
>
> As a trade-off, I think we can add the enable/disable API.
IMHO, that's a bad idea. If the rte_eth_dev_tx_prepare API affects
performance adversly, that is not a bonding problem. All applications
should be calling rte_eth_dev_tx_prepare. There's no defined API
to determine if rte_eth_dev_tx_prepare should be called. Therefore,
applications should always call rte_eth_dev_tx_prepare. Regardless,
as I previously mentioned, you are just trading the location of
the branch, especially in the bonding case.
If rte_eth_dev_tx_prepare is causing a performance drop, then that API
should be improved or rewritten. There are PMD that require you to use
that API. Locally, we had maintained a patch to eliminate the use of
rte_eth_dev_tx_prepare. However, that has been getting harder and harder
to maintain. The performance lost by just calling rte_eth_dev_tx_prepare
was marginal.
>
>>
>> I think you missed fixing tx_machine in 802.3ad support. We have been using
>> the following patch locally which I never got around to submitting.
>
> You are right, I will send V3 fix it.
>
>>
>>
>> From a458654d68ff5144266807ef136ac3dd2adfcd98 Mon Sep 17 00:00:00 2001
>> From: "Charles (Chas) Williams" <chwillia at ciena.com>
>> Date: Tue, 3 May 2022 16:52:37 -0400
>> Subject: [PATCH] net/bonding: call rte_eth_tx_prepare before rte_eth_tx_burst
>>
>> Some PMDs might require a call to rte_eth_tx_prepare before sending the
>> packets for transmission. Typically, the prepare step handles the VLAN
>> headers, but it may need to do other things.
>>
>> Signed-off-by: Chas Williams <chwillia at ciena.com>
>
> ...
>
>> * ring if transmission fails so the packet isn't lost.
>> @@ -1322,8 +1350,12 @@ bond_ethdev_tx_burst_broadcast(void *queue, struct rte_mbuf **bufs,
>>
>> /* Transmit burst on each active slave */
>> for (i = 0; i < num_of_slaves; i++) {
>> - slave_tx_total[i] = rte_eth_tx_burst(slaves[i], bd_tx_q->queue_id,
>> + uint16_t nb_prep;
>> +
>> + nb_prep = rte_eth_tx_prepare(slaves[i], bd_tx_q->queue_id,
>> bufs, nb_pkts);
>> + slave_tx_total[i] = rte_eth_tx_burst(slaves[i], bd_tx_q->queue_id,
>> + bufs, nb_prep);
>
> The tx-prepare may edit packet data, and the broadcast mode will send a packet to all slaves,
> the packet data is sent and edited at the same time. Is this likely to cause problems ?
This routine is already broken. You can't just increment the refcount
and send the packet into a PMD's transmit routine. Nothing guarantees
that a transmit routine will not modify the packet. Many PMDs perform an
rte_vlan_insert. You should at least perform a clone of the packet so
that the mbuf headers aren't mangled by each PMD. Just to be safe you
should perform a partial deep copy of the packet headers in case some
PMD does an rte_vlan_insert and the other PMDs in the bonding group do
not need an rte_vlan_insert.
So doing a blind rte_eth_dev_tx_preprare isn't making anything much
worse.
>
>>
>> if (unlikely(slave_tx_total[i] < nb_pkts))
>> tx_failed_flag = 1;
More information about the dev
mailing list