RE: 回复: [PATCH v3 2/3] net/i40e: enable direct rearm with separate API

Feifei Wang Feifei.Wang2 at arm.com
Thu Mar 23 11:49:38 CET 2023



> -----Original Message-----
> From: Konstantin Ananyev <konstantin.v.ananyev at yandex.ru>
> Sent: Monday, March 20, 2023 12:11 AM
> To: Feifei Wang <Feifei.Wang2 at arm.com>; Konstantin Ananyev
> <konstantin.ananyev at huawei.com>; Yuying Zhang
> <Yuying.Zhang at intel.com>; Beilei Xing <beilei.xing at intel.com>; Ruifeng
> Wang <Ruifeng.Wang at arm.com>
> Cc: dev at dpdk.org; nd <nd at arm.com>; Honnappa Nagarahalli
> <Honnappa.Nagarahalli at arm.com>
> Subject: Re: 回复: [PATCH v3 2/3] net/i40e: enable direct rearm with
> separate API
> 
> 
> >>>>>>> +int
> >>>>>>> +i40e_tx_fill_sw_ring(void *tx_queue,
> >>>>>>> +		struct rte_eth_rxq_rearm_data *rxq_rearm_data) {
> >>>>>>> +	struct i40e_tx_queue *txq = tx_queue;
> >>>>>>> +	struct i40e_tx_entry *txep;
> >>>>>>> +	void **rxep;
> >>>>>>> +	struct rte_mbuf *m;
> >>>>>>> +	int i, n;
> >>>>>>> +	int nb_rearm = 0;
> >>>>>>> +
> >>>>>>> +	if (*rxq_rearm_data->rearm_nb < txq->tx_rs_thresh ||
> >>>>>>> +			txq->nb_tx_free > txq->tx_free_thresh)
> >>>>>>> +		return 0;
> >>>>>>> +
> >>>>>>> +	/* check DD bits on threshold descriptor */
> >>>>>>> +	if ((txq->tx_ring[txq->tx_next_dd].cmd_type_offset_bsz &
> >>>>>>> +
> >> 	rte_cpu_to_le_64(I40E_TXD_QW1_DTYPE_MASK)) !=
> >>>>>>> +
> >>>>>> 	rte_cpu_to_le_64(I40E_TX_DESC_DTYPE_DESC_DONE))
> >>>>>>> +		return 0;
> >>>>>>> +
> >>>>>>> +	n = txq->tx_rs_thresh;
> >>>>>>> +
> >>>>>>> +	/* first buffer to free from S/W ring is at index
> >>>>>>> +	 * tx_next_dd - (tx_rs_thresh-1)
> >>>>>>> +	 */
> >>>>>>> +	txep = &txq->sw_ring[txq->tx_next_dd - (n - 1)];
> >>>>>>> +	rxep = rxq_rearm_data->rx_sw_ring;
> >>>>>>> +	rxep += *rxq_rearm_data->rearm_start;
> >>>>>>> +
> >>>>>>> +	if (txq->offloads &
> >> RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE) {
> >>>>>>> +		/* directly put mbufs from Tx to Rx */
> >>>>>>> +		for (i = 0; i < n; i++, rxep++, txep++)
> >>>>>>> +			*rxep = txep[0].mbuf;
> >>>>>>> +	} else {
> >>>>>>> +		for (i = 0; i < n; i++, rxep++) {
> >>>>>>> +			m = rte_pktmbuf_prefree_seg(txep[i].mbuf);
> >>>>
> >>>> One thing I forgot to ask:
> >>>> What would happen if this mbuf belongs to different mempool (not
> >>>> one that we specify at rx_queue_setup())?
> >>>> Do we need to check it here?
> >>>> Or would it be upper layer constraint?
> >>>> Or...?
> >>>>
> >>>
> >>> First, 'different mempool' is valid for no FAST_FREE path in
> tx_free_buffers.
> >>>
> >>> If buffers belong to different mempool, we can have an example here:
> >>> Buffer 1 from mempool 1, its recycle path is:
> >>> --------------------------------------------------------------------
> >>> --
> >>> ------------------- 1. queue_setup: rearm from mempool 1 into Rx
> >>> sw-ring 2. rte_eth_Rx_burst: used by user app (Rx) 3.
> >>> rte_eth_Tx_burst: mount on Tx sw-ring 4. rte_eth_direct_rearm: free
> >>> into Rx sw-ring:
> >>>             or
> >>>      tx_free_buffers: free into mempool 1 (no fast_free path)
> >>> --------------------------------------------------------------------
> >>> --
> >>> -------------------
> >>>
> >>> Buffer 2 from mempool 2, its recycle path is:
> >>> --------------------------------------------------------------------
> >>> --
> >>> ------------------- 1. queue_setup: rearm from mempool 2 into Rx
> >>> sw-ring 2. rte_eth_Rx_burst: used by user app (Rx) 3.
> >>> rte_eth_Tx_burst: mount on Tx sw-ring 4. rte_eth_direct_rearm: free
> >>> into Rx sw-ring
> >>>             or
> >>>      tx_free_buffers: free into mempool 2 (no fast_free_path)
> >>> --------------------------------------------------------------------
> >>> --
> >>> -------------------
> >>>
> >>> Thus, buffers from Tx different mempools are the same for Rx. The
> >>> difference point is that they will be freed into different mempool
> >>> if the
> >> thread  uses generic free buffers.
> >>> I think this cannot affect direct-rearm mode, and we do not need to
> >>> check
> >> this.
> >>
> >> I understand that it should work even with multiple mempools.
> >> What I am trying to say - user may not want to use mbufs from
> >> particular mempool for RX (while it is still ok to use it for TX).
> >> Let say user can have a separate mempool with small data-buffers
> >> (less then normal MTU) to send some 'special' paclets, or even use
> >> this memppol with small buffers for zero-copy updating of packet L2/L3
> headers, etc.
> >> Or it could be some 'special' user provided mempool.
> >> That's why I wonder should we allow only mbufs from mempool that is
> >> assigned to that RX queue.
> >
> > Sorry for my misleading. If I understand correctly this time, you
> > means a special mempool. Maybe its buffer size is very small and this Tx
> buffer is generated from control plane.
> >
> > However, if we recycle this Tx buffer into Rx buffer ring, there maybe
> > some error due to its size is so small.
> >
> > Thus we can only allow general buffers which is valid for Rx buffer
> > ring. Furthermore, this should be user's  responsibility to ensure the
> > Tx recycling buffers should be valid. If we check this in the data plane, it will
> cost a lot of CPU cycles. At last, what we can do is to add constraint in the
> notes to remind users.
> 
> As I thought: in theory we can add 'struct rte_mempool *mp'
> into rte_eth_rxq_rearm_data.
> And then:
> if (mbuf->pool == rxq_rearm_data->mp)
>    /* put mbuf into rearm buffer */
> else
>    /* free mbuf */
> For the 'proper' config (when txq contains mbufs from expected mempool)
> the overhead will be minimal.
> In other case it might be higher, but still would work and no need for extra
> limitations.

It's a good idea. And try to test performance with this change, there is currently
no performance degradation. Thus, I add this check in the latest version.

> 
> 
> >>
> >>>
> >>>>>>> +			if (m != NULL) {
> >>>>>>> +				*rxep = m;
> >>>>>>> +				nb_rearm++;
> >>>>>>> +			}
> >>>>>>> +		}
> >>>>>>> +		n = nb_rearm;
> >>>>>>> +	}
> >>>>>>> +
> >>>>>>> +	/* update counters for Tx */
> >>>>>>> +	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free + txq-
> >>> tx_rs_thresh);
> >>>>>>> +	txq->tx_next_dd = (uint16_t)(txq->tx_next_dd + txq-
> >>> tx_rs_thresh);
> >>>>>>> +	if (txq->tx_next_dd >= txq->nb_tx_desc)
> >>>>>>> +		txq->tx_next_dd = (uint16_t)(txq->tx_rs_thresh - 1);
> >>>>>>> +
> >>>>>>> +	return n;
> >>>>>>> +}
> >>>>>>> +



More information about the dev mailing list