[dpdk-dev] [PATCH v4 1/6] ethdev: introduce shared Rx queue
Andrew Rybchenko
andrew.rybchenko at oktetlabs.ru
Mon Oct 11 12:47:00 CEST 2021
On 9/30/21 5:55 PM, Xueming Li wrote:
> In current DPDK framework, each RX queue is pre-loaded with mbufs for
RX -> Rx
> incoming packets. When number of representors scale out in a switch
> domain, the memory consumption became significant. Most important,
> polling all ports leads to high cache miss, high latency and low
> throughput.
It should be highlighted that it is a problem of some PMDs.
Not all.
>
> This patch introduces shared RX queue. Ports with same configuration in
"This patch introduces" -> "Introduce"
RX -> Rx
> a switch domain could share RX queue set by specifying sharing group.
RX -> Rx
> Polling any queue using same shared RX queue receives packets from all
RX -> Rx
> member ports. Source port is identified by mbuf->port.
>
> Port queue number in a shared group should be identical. Queue index is
> 1:1 mapped in shared group.
>
> Share RX queue must be polled on single thread or core.
RX -> Rx
>
> Multiple groups is supported by group ID.
is -> are
>
> Signed-off-by: Xueming Li <xuemingl at nvidia.com>
> Cc: Jerin Jacob <jerinjacobk at gmail.com>
The patch should update release notes.
> ---
> Rx queue object could be used as shared Rx queue object, it's important
> to clear all queue control callback api that using queue object:
> https://mails.dpdk.org/archives/dev/2021-July/215574.html
> ---
> doc/guides/nics/features.rst | 11 +++++++++++
> doc/guides/nics/features/default.ini | 1 +
> doc/guides/prog_guide/switch_representation.rst | 10 ++++++++++
> lib/ethdev/rte_ethdev.c | 1 +
> lib/ethdev/rte_ethdev.h | 7 +++++++
> 5 files changed, 30 insertions(+)
>
> diff --git a/doc/guides/nics/features.rst b/doc/guides/nics/features.rst
> index 4fce8cd1c97..69bc1d5719c 100644
> --- a/doc/guides/nics/features.rst
> +++ b/doc/guides/nics/features.rst
> @@ -626,6 +626,17 @@ Supports inner packet L4 checksum.
> ``tx_offload_capa,tx_queue_offload_capa:DEV_TX_OFFLOAD_OUTER_UDP_CKSUM``.
>
>
> +.. _nic_features_shared_rx_queue:
> +
> +Shared Rx queue
> +---------------
> +
> +Supports shared Rx queue for ports in same switch domain.
> +
> +* **[uses] rte_eth_rxconf,rte_eth_rxmode**: ``offloads:RTE_ETH_RX_OFFLOAD_SHARED_RXQ``.
> +* **[provides] mbuf**: ``mbuf.port``.
> +
> +
> .. _nic_features_packet_type_parsing:
>
> Packet type parsing
> diff --git a/doc/guides/nics/features/default.ini b/doc/guides/nics/features/default.ini
> index 754184ddd4d..ebeb4c18512 100644
> --- a/doc/guides/nics/features/default.ini
> +++ b/doc/guides/nics/features/default.ini
> @@ -19,6 +19,7 @@ Free Tx mbuf on demand =
> Queue start/stop =
> Runtime Rx queue setup =
> Runtime Tx queue setup =
> +Shared Rx queue =
> Burst mode info =
> Power mgmt address monitor =
> MTU update =
> diff --git a/doc/guides/prog_guide/switch_representation.rst b/doc/guides/prog_guide/switch_representation.rst
> index ff6aa91c806..bc7ce65fa3d 100644
> --- a/doc/guides/prog_guide/switch_representation.rst
> +++ b/doc/guides/prog_guide/switch_representation.rst
> @@ -123,6 +123,16 @@ thought as a software "patch panel" front-end for applications.
> .. [1] `Ethernet switch device driver model (switchdev)
> <https://www.kernel.org/doc/Documentation/networking/switchdev.txt>`_
>
> +- Memory usage of representors is huge when number of representor grows,
> + because PMD always allocate mbuf for each descriptor of Rx queue.
It is a problem of some PMDs only. So, it must be rewritten to
highlight it.
> + Polling the large number of ports brings more CPU load, cache miss and
> + latency. Shared Rx queue can be used to share Rx queue between PF and
> + representors in same switch. ``RTE_ETH_RX_OFFLOAD_SHARED_RXQ`` is
> + present in Rx offloading capability of device info. Setting the
> + offloading flag in device Rx mode or Rx queue configuration to enable
> + shared Rx queue. Polling any member port of the shared Rx queue can return
> + packets of all ports in the group, port ID is saved in ``mbuf.port``.
> +
> Basic SR-IOV
> ------------
>
> diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
> index 61aa49efec6..73270c10492 100644
> --- a/lib/ethdev/rte_ethdev.c
> +++ b/lib/ethdev/rte_ethdev.c
> @@ -127,6 +127,7 @@ static const struct {
> RTE_RX_OFFLOAD_BIT2STR(OUTER_UDP_CKSUM),
> RTE_RX_OFFLOAD_BIT2STR(RSS_HASH),
> RTE_ETH_RX_OFFLOAD_BIT2STR(BUFFER_SPLIT),
> + RTE_ETH_RX_OFFLOAD_BIT2STR(SHARED_RXQ),
> };
>
> #undef RTE_RX_OFFLOAD_BIT2STR
> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
> index afdc53b674c..d7ac625ee74 100644
> --- a/lib/ethdev/rte_ethdev.h
> +++ b/lib/ethdev/rte_ethdev.h
> @@ -1077,6 +1077,7 @@ struct rte_eth_rxconf {
> uint8_t rx_drop_en; /**< Drop packets if no descriptors are available. */
> uint8_t rx_deferred_start; /**< Do not start queue with rte_eth_dev_start(). */
> uint16_t rx_nseg; /**< Number of descriptions in rx_seg array. */
> + uint32_t shared_group; /**< Shared port group index in switch domain. */
> /**
> * Per-queue Rx offloads to be set using DEV_RX_OFFLOAD_* flags.
> * Only offloads set on rx_queue_offload_capa or rx_offload_capa
> @@ -1403,6 +1404,12 @@ struct rte_eth_conf {
> #define DEV_RX_OFFLOAD_OUTER_UDP_CKSUM 0x00040000
> #define DEV_RX_OFFLOAD_RSS_HASH 0x00080000
> #define RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT 0x00100000
> +/**
> + * Rx queue is shared among ports in same switch domain to save memory,
> + * avoid polling each port. Any port in the group can be used to receive
> + * packets. Real source port number saved in mbuf->port field.
> + */
> +#define RTE_ETH_RX_OFFLOAD_SHARED_RXQ 0x00200000
>
> #define DEV_RX_OFFLOAD_CHECKSUM (DEV_RX_OFFLOAD_IPV4_CKSUM | \
> DEV_RX_OFFLOAD_UDP_CKSUM | \
>
IMHO it should be squashed with the second patch to make it
easier to review. Otherwise it is hard to understand what is
shared_group and the offlaod which are dead in the patch.
More information about the dev
mailing list