[dpdk-dev] [PATCH v1] ethdev: introduce shared Rx queue

Jerin Jacob jerinjacobk at gmail.com
Mon Aug 9 15:50:58 CEST 2021


On Mon, Aug 9, 2021 at 5:18 PM Xueming Li <xuemingl at nvidia.com> wrote:
>
> In current DPDK framework, each RX queue is pre-loaded with mbufs for
> incoming packets. When number of representors scale out in a switch
> domain, the memory consumption became significant. Most important,
> polling all ports leads to high cache miss, high latency and low
> throughput.
>
> This patch introduces shared RX queue. Ports with same configuration in
> a switch domain could share RX queue set by specifying sharing group.
> Polling any queue using same shared RX queue receives packets from all
> member ports. Source port is identified by mbuf->port.
>
> Port queue number in a shared group should be identical. Queue index is
> 1:1 mapped in shared group.
>
> Share RX queue is supposed to be polled on same thread.
>
> Multiple groups is supported by group ID.

Is this offload specific to the representor? If so can this name be
changed specifically to representor?
If it is for a generic case, how the flow ordering will be maintained?

>
> Signed-off-by: Xueming Li <xuemingl at nvidia.com>
> ---
>  doc/guides/nics/features.rst                    | 11 +++++++++++
>  doc/guides/nics/features/default.ini            |  1 +
>  doc/guides/prog_guide/switch_representation.rst | 10 ++++++++++
>  lib/ethdev/rte_ethdev.c                         |  1 +
>  lib/ethdev/rte_ethdev.h                         |  7 +++++++
>  5 files changed, 30 insertions(+)
>
> diff --git a/doc/guides/nics/features.rst b/doc/guides/nics/features.rst
> index a96e12d155..2e2a9b1554 100644
> --- a/doc/guides/nics/features.rst
> +++ b/doc/guides/nics/features.rst
> @@ -624,6 +624,17 @@ Supports inner packet L4 checksum.
>    ``tx_offload_capa,tx_queue_offload_capa:DEV_TX_OFFLOAD_OUTER_UDP_CKSUM``.
>
>
> +.. _nic_features_shared_rx_queue:
> +
> +Shared Rx queue
> +---------------
> +
> +Supports shared Rx queue for ports in same switch domain.
> +
> +* **[uses]     rte_eth_rxconf,rte_eth_rxmode**: ``offloads:RTE_ETH_RX_OFFLOAD_SHARED_RXQ``.
> +* **[provides] mbuf**: ``mbuf.port``.
> +
> +
>  .. _nic_features_packet_type_parsing:
>
>  Packet type parsing
> diff --git a/doc/guides/nics/features/default.ini b/doc/guides/nics/features/default.ini
> index 754184ddd4..ebeb4c1851 100644
> --- a/doc/guides/nics/features/default.ini
> +++ b/doc/guides/nics/features/default.ini
> @@ -19,6 +19,7 @@ Free Tx mbuf on demand =
>  Queue start/stop     =
>  Runtime Rx queue setup =
>  Runtime Tx queue setup =
> +Shared Rx queue      =
>  Burst mode info      =
>  Power mgmt address monitor =
>  MTU update           =
> diff --git a/doc/guides/prog_guide/switch_representation.rst b/doc/guides/prog_guide/switch_representation.rst
> index ff6aa91c80..45bf5a3a10 100644
> --- a/doc/guides/prog_guide/switch_representation.rst
> +++ b/doc/guides/prog_guide/switch_representation.rst
> @@ -123,6 +123,16 @@ thought as a software "patch panel" front-end for applications.
>  .. [1] `Ethernet switch device driver model (switchdev)
>         <https://www.kernel.org/doc/Documentation/networking/switchdev.txt>`_
>
> +- Memory usage of representors is huge when number of representor grows,
> +  because PMD always allocate mbuf for each descriptor of Rx queue.
> +  Polling the large number of ports brings more CPU load, cache miss and
> +  latency. Shared Rx queue can be used to share Rx queue between PF and
> +  representors in same switch domain. ``RTE_ETH_RX_OFFLOAD_SHARED_RXQ``
> +  is present in Rx offloading capability of device info. Setting the
> +  offloading flag in device Rx mode or Rx queue configuration to enable
> +  shared Rx queue. Polling any member port of shared Rx queue can return
> +  packets of all ports in group, port ID is saved in ``mbuf.port``.
> +
>  Basic SR-IOV
>  ------------
>
> diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
> index 9d95cd11e1..1361ff759a 100644
> --- a/lib/ethdev/rte_ethdev.c
> +++ b/lib/ethdev/rte_ethdev.c
> @@ -127,6 +127,7 @@ static const struct {
>         RTE_RX_OFFLOAD_BIT2STR(OUTER_UDP_CKSUM),
>         RTE_RX_OFFLOAD_BIT2STR(RSS_HASH),
>         RTE_ETH_RX_OFFLOAD_BIT2STR(BUFFER_SPLIT),
> +       RTE_ETH_RX_OFFLOAD_BIT2STR(SHARED_RXQ),
>  };
>
>  #undef RTE_RX_OFFLOAD_BIT2STR
> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
> index d2b27c351f..a578c9db9d 100644
> --- a/lib/ethdev/rte_ethdev.h
> +++ b/lib/ethdev/rte_ethdev.h
> @@ -1047,6 +1047,7 @@ struct rte_eth_rxconf {
>         uint8_t rx_drop_en; /**< Drop packets if no descriptors are available. */
>         uint8_t rx_deferred_start; /**< Do not start queue with rte_eth_dev_start(). */
>         uint16_t rx_nseg; /**< Number of descriptions in rx_seg array. */
> +       uint32_t shared_group; /**< Shared port group index in switch domain. */
>         /**
>          * Per-queue Rx offloads to be set using DEV_RX_OFFLOAD_* flags.
>          * Only offloads set on rx_queue_offload_capa or rx_offload_capa
> @@ -1373,6 +1374,12 @@ struct rte_eth_conf {
>  #define DEV_RX_OFFLOAD_OUTER_UDP_CKSUM  0x00040000
>  #define DEV_RX_OFFLOAD_RSS_HASH                0x00080000
>  #define RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT 0x00100000
> +/**
> + * Rx queue is shared among ports in same switch domain to save memory,
> + * avoid polling each port. Any port in group can be used to receive packets.
> + * Real source port number saved in mbuf->port field.
> + */
> +#define RTE_ETH_RX_OFFLOAD_SHARED_RXQ   0x00200000
>
>  #define DEV_RX_OFFLOAD_CHECKSUM (DEV_RX_OFFLOAD_IPV4_CKSUM | \
>                                  DEV_RX_OFFLOAD_UDP_CKSUM | \
> --
> 2.25.1
>


More information about the dev mailing list