[PATCH v3 0/3] Direct re-arming of buffers on receive side
Morten Brørup
mb at smartsharesystems.com
Wed Mar 22 13:56:39 CET 2023
> From: Feifei Wang [mailto:feifei.wang2 at arm.com]
> Sent: Wednesday, 4 January 2023 08.31
>
> Currently, the transmit side frees the buffers into the lcore cache and
> the receive side allocates buffers from the lcore cache. The transmit
> side typically frees 32 buffers resulting in 32*8=256B of stores to
> lcore cache. The receive side allocates 32 buffers and stores them in
> the receive side software ring, resulting in 32*8=256B of stores and
> 256B of load from the lcore cache.
>
> This patch proposes a mechanism to avoid freeing to/allocating from
> the lcore cache. i.e. the receive side will free the buffers from
> transmit side directly into its software ring. This will avoid the 256B
> of loads and stores introduced by the lcore cache. It also frees up the
> cache lines used by the lcore cache.
I am starting to wonder if we have been adding unnecessary feature creep in order to make this feature too generic.
Could you please describe some of the most important high-volume use cases from real life? It would help setting the scope correctly.
>
> However, this solution poses several constraints:
>
> 1)The receive queue needs to know which transmit queue it should take
> the buffers from. The application logic decides which transmit port to
> use to send out the packets. In many use cases the NIC might have a
> single port ([1], [2], [3]), in which case a given transmit queue is
> always mapped to a single receive queue (1:1 Rx queue: Tx queue). This
> is easy to configure.
>
> If the NIC has 2 ports (there are several references), then we will have
> 1:2 (RX queue: TX queue) mapping which is still easy to configure.
> However, if this is generalized to 'N' ports, the configuration can be
> long. More over the PMD would have to scan a list of transmit queues to
> pull the buffers from.
>
> 2)The other factor that needs to be considered is 'run-to-completion' vs
> 'pipeline' models. In the run-to-completion model, the receive side and
> the transmit side are running on the same lcore serially. In the pipeline
> model. The receive side and transmit side might be running on different
> lcores in parallel. This requires locking. This is not supported at this
> point.
>
> 3)Tx and Rx buffers must be from the same mempool. And we also must
> ensure Tx buffer free number is equal to Rx buffer free number.
> Thus, 'tx_next_dd' can be updated correctly in direct-rearm mode. This
> is due to tx_next_dd is a variable to compute tx sw-ring free location.
> Its value will be one more round than the position where next time free
> starts.
>
> Current status in this patch:
> 1)Two APIs are added for users to enable direct-rearm mode:
> In control plane, users can call 'rte_eth_rx_queue_rearm_data_get'
> to get Rx sw_ring pointer and its rxq_info.
> (This avoid Tx load Rx data directly);
>
> In data plane, users can call 'rte_eth_dev_direct_rearm' to rearm Rx
> buffers and free Tx buffers at the same time. Specifically, in this
> API, there are two separated API for Rx and Tx.
> For Tx, 'rte_eth_tx_fill_sw_ring' can fill a given sw_ring by Tx freed
> buffers.
> For Rx, 'rte_eth_rx_flush_descriptor' can flush its descriptors based
> on the rearm buffers.
> Thus, this can separate Rx and Tx operation, and user can even re-arm
> RX queue not from the same driver's TX queue, but from different
> sources too.
> -----------------------------------------------------------------------
> control plane:
> rte_eth_rx_queue_rearm_data_get(*rxq_rearm_data);
> data plane:
> loop {
> rte_eth_dev_direct_rearm(*rxq_rearm_data){
>
> rte_eth_tx_fill_sw_ring{
> for (i = 0; i <= 32; i++) {
> sw_ring.mbuf[i] = tx.mbuf[i];
> }
> }
>
> rte_eth_rx_flush_descriptor{
> for (i = 0; i <= 32; i++) {
> flush descs[i];
> }
> }
> }
> rte_eth_rx_burst;
> rte_eth_tx_burst;
> }
> -----------------------------------------------------------------------
> 2)The i40e driver is changed to do the direct re-arm of the receive
> side.
> 3)The ixgbe driver is changed to do the direct re-arm of the receive
> side.
>
> Testing status:
> (1) dpdk l3fwd test with multiple drivers:
> port 0: 82599 NIC port 1: XL710 NIC
> -------------------------------------------------------------
> Without fast free With fast free
> Thunderx2: +9.44% +7.14%
> -------------------------------------------------------------
>
> (2) dpdk l3fwd test with same driver:
> port 0 && 1: XL710 NIC
> -------------------------------------------------------------
> *Direct rearm with exposing rx_sw_ring:
> Without fast free With fast free
> Ampere altra: +14.98% +15.77%
> n1sdp: +6.47% +0.52%
> -------------------------------------------------------------
>
> (3) VPP test with same driver:
> port 0 && 1: XL710 NIC
> -------------------------------------------------------------
> *Direct rearm with exposing rx_sw_ring:
> Ampere altra: +4.59%
> n1sdp: +5.4%
> -------------------------------------------------------------
>
> Reference:
> [1] https://store.nvidia.com/en-us/networking/store/product/MCX623105AN-
> CDAT/NVIDIAMCX623105ANCDATConnectX6DxENAdapterCard100GbECryptoDisabled/
> [2] https://www.intel.com/content/www/us/en/products/sku/192561/intel-
> ethernet-network-adapter-e810cqda1/specifications.html
> [3] https://www.broadcom.com/products/ethernet-connectivity/network-
> adapters/100gb-nic-ocp/n1100g
>
> V2:
> 1. Use data-plane API to enable direct-rearm (Konstantin, Honnappa)
> 2. Add 'txq_data_get' API to get txq info for Rx (Konstantin)
> 3. Use input parameter to enable direct rearm in l3fwd (Konstantin)
> 4. Add condition detection for direct rearm API (Morten, Andrew Rybchenko)
>
> V3:
> 1. Seperate Rx and Tx operation with two APIs in direct-rearm (Konstantin)
> 2. Delete L3fwd change for direct rearm (Jerin)
> 3. enable direct rearm in ixgbe driver in Arm
>
> Feifei Wang (3):
> ethdev: enable direct rearm with separate API
> net/i40e: enable direct rearm with separate API
> net/ixgbe: enable direct rearm with separate API
>
> drivers/net/i40e/i40e_ethdev.c | 1 +
> drivers/net/i40e/i40e_ethdev.h | 2 +
> drivers/net/i40e/i40e_rxtx.c | 19 +++
> drivers/net/i40e/i40e_rxtx.h | 4 +
> drivers/net/i40e/i40e_rxtx_vec_common.h | 54 +++++++
> drivers/net/i40e/i40e_rxtx_vec_neon.c | 42 ++++++
> drivers/net/ixgbe/ixgbe_ethdev.c | 1 +
> drivers/net/ixgbe/ixgbe_ethdev.h | 3 +
> drivers/net/ixgbe/ixgbe_rxtx.c | 19 +++
> drivers/net/ixgbe/ixgbe_rxtx.h | 4 +
> drivers/net/ixgbe/ixgbe_rxtx_vec_common.h | 48 ++++++
> drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c | 52 +++++++
> lib/ethdev/ethdev_driver.h | 10 ++
> lib/ethdev/ethdev_private.c | 2 +
> lib/ethdev/rte_ethdev.c | 52 +++++++
> lib/ethdev/rte_ethdev.h | 174 ++++++++++++++++++++++
> lib/ethdev/rte_ethdev_core.h | 11 ++
> lib/ethdev/version.map | 6 +
> 18 files changed, 504 insertions(+)
>
> --
> 2.25.1
>
More information about the dev
mailing list