回复: [PATCH v2 0/3] Direct re-arming of buffers on receive side

Feifei Wang Feifei.Wang2 at arm.com
Thu Sep 29 08:19:04 CEST 2022

Previous message (by thread): [PATCH v2 3/3] examples/l3fwd: enable direct rearm mode
Next message (by thread): Re: 回复: [PATCH v2 0/3] Direct re-arming of buffers on receive side
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

> -----邮件原件-----
> 发件人: Feifei Wang <feifei.wang2 at arm.com>
> 发送时间: Tuesday, September 27, 2022 10:48 AM
> 抄送: dev at dpdk.org; nd <nd at arm.com>; Feifei Wang
> <Feifei.Wang2 at arm.com>
> 主题: [PATCH v2 0/3] Direct re-arming of buffers on receive side
> 
> Currently, the transmit side frees the buffers into the lcore cache and the
> receive side allocates buffers from the lcore cache. The transmit side typically
> frees 32 buffers resulting in 32*8=256B of stores to lcore cache. The receive
> side allocates 32 buffers and stores them in the receive side software ring,
> resulting in 32*8=256B of stores and 256B of load from the lcore cache.
> 
> This patch proposes a mechanism to avoid freeing to/allocating from the
> lcore cache. i.e. the receive side will free the buffers from transmit side
> directly into it's software ring. This will avoid the 256B of loads and stores
> introduced by the lcore cache. It also frees up the cache lines used by the
> lcore cache.
> 
> However, this solution poses several constraints:
> 
> 1)The receive queue needs to know which transmit queue it should take the
> buffers from. The application logic decides which transmit port to use to send
> out the packets. In many use cases the NIC might have a single port ([1], [2],
> [3]), in which case a given transmit queue is always mapped to a single
> receive queue (1:1 Rx queue: Tx queue). This is easy to configure.
> 
> If the NIC has 2 ports (there are several references), then we will have
> 1:2 (RX queue: TX queue) mapping which is still easy to configure.
> However, if this is generalized to 'N' ports, the configuration can be long.
> More over the PMD would have to scan a list of transmit queues to pull the
> buffers from.
> 
> 2)The other factor that needs to be considered is 'run-to-completion' vs
> 'pipeline' models. In the run-to-completion model, the receive side and the
> transmit side are running on the same lcore serially. In the pipeline model.
> The receive side and transmit side might be running on different lcores in
> parallel. This requires locking. This is not supported at this point.
> 
> 3)Tx and Rx buffers must be from the same mempool. And we also must
> ensure Tx buffer free number is equal to Rx buffer free number:
> (txq->tx_rs_thresh == RTE_I40E_RXQ_REARM_THRESH) Thus, 'tx_next_dd'
> can be updated correctly in direct-rearm mode. This is due to tx_next_dd is a
> variable to compute tx sw-ring free location.
> Its value will be one more round than the position where next time free
> starts.
> 
> Current status in this patch:
> 1)Two APIs are added for users to enable direct-rearm mode:
>   In control plane, users can call 'rte_eth_txq_data_get' to get Tx sw_ring
>   pointer and its txq_info (This avoid Rx load Tx data directly);
> 
>   In data plane, users can  call 'rte_eth_rx_direct_rearm' to rearm Rx
>   buffers and free Tx buffers at the same time (Currently it supports 1:1
>   (rxq:txq) mapping:)
> -----------------------------------------------------------------------
>   control plane:
>   	rte_eth_txq_data_get(*txq_data);
>   data plane:
>   	loop {
>   		rte_eth_rx_direct_rearm(*txq_data){
>      			for (i = 0; i <= 32; i++) {
>        				rx.mbuf[i] = tx.mbuf[i];
>        				initialize descs[i];
>     			}
> 		}
> 		rte_eth_rx_burst;
> 		rte_eth_tx_burst;
>   	}
> -----------------------------------------------------------------------
> 2)The i40e driver is changed to do the direct re-arm of the receive
>   side.
> 3)L3fwd application is modified to enable direct rearm mode. Users can
>   enable direct-rearm and map queues by input parameters.
> 
> Testing status:
> 1.The testing results for L3fwd are as follows:
> -------------------------------------------------------------------
> enabled direct rearm
> -------------------------------------------------------------------
> Arm:
> N1SDP(neon path):
> without fast-free mode		with fast-free mode
> 	+15.09%				+4.2%
> 
> Ampere Altra(neon path):
> without fast-free mode		with fast-free mode
> 	+10.9%				+14.6%
> -------------------------------------------------------------------
> 
> 2.The testing results for VPP-L3fwd are as follows:
> -------------------------------------------------------------------
> Arm:
> N1SDP(neon path):
> with direct re-arm mode enabled
> 	+4.5%
> 
> Ampere Altra(neon path):
> with direct re-arm mode enabled
>         +6.5%
> -------------------------------------------------------------------
> 
> Reference:
> [1] https://store.nvidia.com/en-
> us/networking/store/product/MCX623105AN-
> CDAT/NVIDIAMCX623105ANCDATConnectX6DxENAdapterCard100GbECrypt
> oDisabled/
> [2] https://www.intel.com/content/www/us/en/products/sku/192561/intel-
> ethernet-network-adapter-e810cqda1/specifications.html
> [3] https://www.broadcom.com/products/ethernet-connectivity/network-
> adapters/100gb-nic-ocp/n1100g
> 
> V2:
> 1. Use data-plane API to enable direct-rearm (Konstantin, Honnappa) 2. Add
> 'txq_data_get' API to get txq info for Rx (Konstantin) 3. Use input parameter
> to enable direct rearm in l3fwd (Konstantin) 4. Add condition detection for
> direct rearm API (Morten, Andrew Rybchenko)
> 
PING

Hi, 

Would you please give some comments for this version?
Thanks very much.

Best Regards
Feifei

Previous message (by thread): [PATCH v2 3/3] examples/l3fwd: enable direct rearm mode
Next message (by thread): Re: 回复: [PATCH v2 0/3] Direct re-arming of buffers on receive side
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the dev mailing list