[dpdk-dev] [PATCH v1] ethdev: introduce shared Rx queue
Jerin Jacob
jerinjacobk at gmail.com
Mon Aug 9 15:50:58 CEST 2021
On Mon, Aug 9, 2021 at 5:18 PM Xueming Li <xuemingl at nvidia.com> wrote:
>
> In current DPDK framework, each RX queue is pre-loaded with mbufs for
> incoming packets. When number of representors scale out in a switch
> domain, the memory consumption became significant. Most important,
> polling all ports leads to high cache miss, high latency and low
> throughput.
>
> This patch introduces shared RX queue. Ports with same configuration in
> a switch domain could share RX queue set by specifying sharing group.
> Polling any queue using same shared RX queue receives packets from all
> member ports. Source port is identified by mbuf->port.
>
> Port queue number in a shared group should be identical. Queue index is
> 1:1 mapped in shared group.
>
> Share RX queue is supposed to be polled on same thread.
>
> Multiple groups is supported by group ID.
Is this offload specific to the representor? If so can this name be
changed specifically to representor?
If it is for a generic case, how the flow ordering will be maintained?
>
> Signed-off-by: Xueming Li <xuemingl at nvidia.com>
> ---
> doc/guides/nics/features.rst | 11 +++++++++++
> doc/guides/nics/features/default.ini | 1 +
> doc/guides/prog_guide/switch_representation.rst | 10 ++++++++++
> lib/ethdev/rte_ethdev.c | 1 +
> lib/ethdev/rte_ethdev.h | 7 +++++++
> 5 files changed, 30 insertions(+)
>
> diff --git a/doc/guides/nics/features.rst b/doc/guides/nics/features.rst
> index a96e12d155..2e2a9b1554 100644
> --- a/doc/guides/nics/features.rst
> +++ b/doc/guides/nics/features.rst
> @@ -624,6 +624,17 @@ Supports inner packet L4 checksum.
> ``tx_offload_capa,tx_queue_offload_capa:DEV_TX_OFFLOAD_OUTER_UDP_CKSUM``.
>
>
> +.. _nic_features_shared_rx_queue:
> +
> +Shared Rx queue
> +---------------
> +
> +Supports shared Rx queue for ports in same switch domain.
> +
> +* **[uses] rte_eth_rxconf,rte_eth_rxmode**: ``offloads:RTE_ETH_RX_OFFLOAD_SHARED_RXQ``.
> +* **[provides] mbuf**: ``mbuf.port``.
> +
> +
> .. _nic_features_packet_type_parsing:
>
> Packet type parsing
> diff --git a/doc/guides/nics/features/default.ini b/doc/guides/nics/features/default.ini
> index 754184ddd4..ebeb4c1851 100644
> --- a/doc/guides/nics/features/default.ini
> +++ b/doc/guides/nics/features/default.ini
> @@ -19,6 +19,7 @@ Free Tx mbuf on demand =
> Queue start/stop =
> Runtime Rx queue setup =
> Runtime Tx queue setup =
> +Shared Rx queue =
> Burst mode info =
> Power mgmt address monitor =
> MTU update =
> diff --git a/doc/guides/prog_guide/switch_representation.rst b/doc/guides/prog_guide/switch_representation.rst
> index ff6aa91c80..45bf5a3a10 100644
> --- a/doc/guides/prog_guide/switch_representation.rst
> +++ b/doc/guides/prog_guide/switch_representation.rst
> @@ -123,6 +123,16 @@ thought as a software "patch panel" front-end for applications.
> .. [1] `Ethernet switch device driver model (switchdev)
> <https://www.kernel.org/doc/Documentation/networking/switchdev.txt>`_
>
> +- Memory usage of representors is huge when number of representor grows,
> + because PMD always allocate mbuf for each descriptor of Rx queue.
> + Polling the large number of ports brings more CPU load, cache miss and
> + latency. Shared Rx queue can be used to share Rx queue between PF and
> + representors in same switch domain. ``RTE_ETH_RX_OFFLOAD_SHARED_RXQ``
> + is present in Rx offloading capability of device info. Setting the
> + offloading flag in device Rx mode or Rx queue configuration to enable
> + shared Rx queue. Polling any member port of shared Rx queue can return
> + packets of all ports in group, port ID is saved in ``mbuf.port``.
> +
> Basic SR-IOV
> ------------
>
> diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
> index 9d95cd11e1..1361ff759a 100644
> --- a/lib/ethdev/rte_ethdev.c
> +++ b/lib/ethdev/rte_ethdev.c
> @@ -127,6 +127,7 @@ static const struct {
> RTE_RX_OFFLOAD_BIT2STR(OUTER_UDP_CKSUM),
> RTE_RX_OFFLOAD_BIT2STR(RSS_HASH),
> RTE_ETH_RX_OFFLOAD_BIT2STR(BUFFER_SPLIT),
> + RTE_ETH_RX_OFFLOAD_BIT2STR(SHARED_RXQ),
> };
>
> #undef RTE_RX_OFFLOAD_BIT2STR
> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
> index d2b27c351f..a578c9db9d 100644
> --- a/lib/ethdev/rte_ethdev.h
> +++ b/lib/ethdev/rte_ethdev.h
> @@ -1047,6 +1047,7 @@ struct rte_eth_rxconf {
> uint8_t rx_drop_en; /**< Drop packets if no descriptors are available. */
> uint8_t rx_deferred_start; /**< Do not start queue with rte_eth_dev_start(). */
> uint16_t rx_nseg; /**< Number of descriptions in rx_seg array. */
> + uint32_t shared_group; /**< Shared port group index in switch domain. */
> /**
> * Per-queue Rx offloads to be set using DEV_RX_OFFLOAD_* flags.
> * Only offloads set on rx_queue_offload_capa or rx_offload_capa
> @@ -1373,6 +1374,12 @@ struct rte_eth_conf {
> #define DEV_RX_OFFLOAD_OUTER_UDP_CKSUM 0x00040000
> #define DEV_RX_OFFLOAD_RSS_HASH 0x00080000
> #define RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT 0x00100000
> +/**
> + * Rx queue is shared among ports in same switch domain to save memory,
> + * avoid polling each port. Any port in group can be used to receive packets.
> + * Real source port number saved in mbuf->port field.
> + */
> +#define RTE_ETH_RX_OFFLOAD_SHARED_RXQ 0x00200000
>
> #define DEV_RX_OFFLOAD_CHECKSUM (DEV_RX_OFFLOAD_IPV4_CKSUM | \
> DEV_RX_OFFLOAD_UDP_CKSUM | \
> --
> 2.25.1
>
More information about the dev
mailing list