[dpdk-dev] [PATCH v6 1/5] ethdev: introduce shared Rx queue

Xueming(Steven) Li xuemingl at nvidia.com
Fri Oct 15 12:54:22 CEST 2021
Previous message (by thread): [dpdk-dev] [PATCH v6 1/5] ethdev: introduce shared Rx queue
Next message (by thread): [dpdk-dev] [PATCH v6 1/5] ethdev: introduce shared Rx queue
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Fri, 2021-10-15 at 12:28 +0300, Andrew Rybchenko wrote:
> On 10/12/21 5:39 PM, Xueming Li wrote:
> > In current DPDK framework, each Rx queue is pre-loaded with mbufs to
> > save incoming packets. For some PMDs, when number of representors scale
> > out in a switch domain, the memory consumption became significant.
> > Polling all ports also leads to high cache miss, high latency and low
> > throughput.
> > 
> > This patch introduce shared Rx queue. Ports in same Rx domain and
> > switch domain could share Rx queue set by specifying non-zero sharing
> > group in Rx queue configuration.
> > 
> > No special API is defined to receive packets from shared Rx queue.
> > Polling any member port of a shared Rx queue receives packets of that
> > queue for all member ports, source port is identified by mbuf->port.
> > 
> > Shared Rx queue must be polled in same thread or core, polling a queue
> > ID of any member port is essentially same.
> > 
> > Multiple share groups are supported by non-zero share group ID. Device
> 
> "by non-zero share group ID" is not required. Since it must be
> always non-zero to enable sharing.
> 
> > should support mixed configuration by allowing multiple share
> > groups and non-shared Rx queue.
> > 
> > Even Rx queue shared, queue configuration like offloads and RSS should
> > not be impacted.
> 
> I don't understand above sentence.
> Even when Rx queues are shared, queue configuration like
> offloads and RSS may differ. If a PMD has some limitation,
> it should care about consistency itself. These limitations
> should be documented in the PMD documentation.
> 

OK, I'll remove this line.

> > 
> > Example grouping and polling model to reflect service priority:
> >  Group1, 2 shared Rx queues per port: PF, rep0, rep1
> >  Group2, 1 shared Rx queue per port: rep2, rep3, ... rep127
> >  Core0: poll PF queue0
> >  Core1: poll PF queue1
> >  Core2: poll rep2 queue0
> 
> 
> Can I have:
> PF RxQ#0, RxQ#1
> Rep0 RxQ#0 shared with PF RxQ#0
> Rep1 RxQ#0 shared with PF RxQ#1
> 
> I guess no, since it looks like RxQ ID must be equal.
> Or am I missing something? Otherwise grouping rules
> are not obvious to me. May be we need dedicated
> shared_qid in boundaries of the share_group?

Yes, RxQ ID must be equal, following configuration should work:
  Rep1 RxQ#1 shared with PF RxQ#1
Equal mapping should work by default instead of a new field that must
be set. I'll add some description to emphasis, how do you think?

> 
> > 
> > PMD driver advertise shared Rx queue capability via
> > RTE_ETH_DEV_CAPA_RXQ_SHARE.
> > 
> > PMD driver is responsible for shared Rx queue consistency checks to
> > avoid member port's configuration contradict to each other.
> > 
> > Signed-off-by: Xueming Li <xuemingl at nvidia.com>
> > ---
> >  doc/guides/nics/features.rst                  | 13 ++++++++++++
> >  doc/guides/nics/features/default.ini          |  1 +
> >  .../prog_guide/switch_representation.rst      | 10 +++++++++
> >  doc/guides/rel_notes/release_21_11.rst        |  5 +++++
> >  lib/ethdev/rte_ethdev.c                       |  9 ++++++++
> >  lib/ethdev/rte_ethdev.h                       | 21 +++++++++++++++++++
> >  6 files changed, 59 insertions(+)
> > 
> > diff --git a/doc/guides/nics/features.rst b/doc/guides/nics/features.rst
> > index e346018e4b8..b64433b8ea5 100644
> > --- a/doc/guides/nics/features.rst
> > +++ b/doc/guides/nics/features.rst
> > @@ -615,6 +615,19 @@ Supports inner packet L4 checksum.
> >    ``tx_offload_capa,tx_queue_offload_capa:DEV_TX_OFFLOAD_OUTER_UDP_CKSUM``.
> >  
> >  
> > +.. _nic_features_shared_rx_queue:
> > +
> > +Shared Rx queue
> > +---------------
> > +
> > +Supports shared Rx queue for ports in same Rx domain of a switch domain.
> > +
> > +* **[uses]     rte_eth_dev_info**: ``dev_capa:RTE_ETH_DEV_CAPA_RXQ_SHARE``.
> > +* **[uses]     rte_eth_dev_info，rte_eth_switch_info**: ``rx_domain``, ``domain_id``.
> > +* **[uses]     rte_eth_rxconf**: ``share_group``.
> > +* **[provides] mbuf**: ``mbuf.port``.
> > +
> > +
> >  .. _nic_features_packet_type_parsing:
> >  
> >  Packet type parsing
> > diff --git a/doc/guides/nics/features/default.ini b/doc/guides/nics/features/default.ini
> > index d473b94091a..93f5d1b46f4 100644
> > --- a/doc/guides/nics/features/default.ini
> > +++ b/doc/guides/nics/features/default.ini
> > @@ -19,6 +19,7 @@ Free Tx mbuf on demand =
> >  Queue start/stop     =
> >  Runtime Rx queue setup =
> >  Runtime Tx queue setup =
> > +Shared Rx queue      =
> >  Burst mode info      =
> >  Power mgmt address monitor =
> >  MTU update           =
> > diff --git a/doc/guides/prog_guide/switch_representation.rst b/doc/guides/prog_guide/switch_representation.rst
> > index ff6aa91c806..de41db8385d 100644
> > --- a/doc/guides/prog_guide/switch_representation.rst
> > +++ b/doc/guides/prog_guide/switch_representation.rst
> > @@ -123,6 +123,16 @@ thought as a software "patch panel" front-end for applications.
> >  .. [1] `Ethernet switch device driver model (switchdev)
> >         <https://www.kernel.org/doc/Documentation/networking/switchdev.txt>`_
> >  
> > +- For some PMDs, memory usage of representors is huge when number of
> > +  representor grows, mbufs are allocated for each descriptor of Rx queue.
> > +  Polling large number of ports brings more CPU load, cache miss and
> > +  latency. Shared Rx queue can be used to share Rx queue between PF and
> > +  representors among same Rx domain. ``RTE_ETH_DEV_CAPA_RXQ_SHARE`` is
> > +  present in device capability of device info. Setting non-zero share group
> > +  in Rx queue configuration to enable share. Polling any member port can
> > +  receive packets of all member ports in the group, port ID is saved in
> > +  ``mbuf.port``.
> > +
> >  Basic SR-IOV
> >  ------------
> >  
> > diff --git a/doc/guides/rel_notes/release_21_11.rst b/doc/guides/rel_notes/release_21_11.rst
> > index 5036641842c..d72fc97f4fb 100644
> > --- a/doc/guides/rel_notes/release_21_11.rst
> > +++ b/doc/guides/rel_notes/release_21_11.rst
> > @@ -141,6 +141,11 @@ New Features
> >    * Added tests to validate packets hard expiry.
> >    * Added tests to verify tunnel header verification in IPsec inbound.
> >  
> > +* **Added ethdev shared Rx queue support.**
> > +
> > +  * Added new device capability flag and rx domain field to switch info.
> > +  * Added share group to Rx queue configuration.
> > +  * Added testpmd support and dedicate forwarding engine.
> 
> Please, add one more empty line since it must be two
> before the next section. Also it should be put after
> the last ethdev item above since list of features has
> defined order.
> 
> >  
> >  Removed Items
> >  -------------
> > diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
> > index 028907bc4b9..9b1b66370a7 100644
> > --- a/lib/ethdev/rte_ethdev.c
> > +++ b/lib/ethdev/rte_ethdev.c
> > @@ -2159,6 +2159,15 @@ rte_eth_rx_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
> >  		return -EINVAL;
> >  	}
> >  
> > +	if (local_conf.share_group > 0 &&
> > +	    (dev_info.dev_capa & RTE_ETH_DEV_CAPA_RXQ_SHARE) == 0) {
> > +		RTE_ETHDEV_LOG(ERR,
> > +			"Ethdev port_id=%d rx_queue_id=%d, enabled share_group=%u while device doesn't support Rx queue share in %s()\n",
> > +			port_id, rx_queue_id, local_conf.share_group,
> > +			__func__);
> 
> I'd remove function name logging here. Log is unique enough.
> 
> > +		return -EINVAL;
> > +	}
> > +
> >  	/*
> >  	 * If LRO is enabled, check that the maximum aggregated packet
> >  	 * size is supported by the configured device.
> > diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
> > index 6d80514ba7a..041da6ee52f 100644
> > --- a/lib/ethdev/rte_ethdev.h
> > +++ b/lib/ethdev/rte_ethdev.h
> > @@ -1044,6 +1044,13 @@ struct rte_eth_rxconf {
> >  	uint8_t rx_drop_en; /**< Drop packets if no descriptors are available. */
> >  	uint8_t rx_deferred_start; /**< Do not start queue with rte_eth_dev_start(). */
> >  	uint16_t rx_nseg; /**< Number of descriptions in rx_seg array. */
> > +	/**
> > +	 * Share group index in Rx domain and switch domain.
> > +	 * Non-zero value to enable Rx queue share, zero value disable share.
> > +	 * PMD driver is responsible for Rx queue consistency checks to avoid
> > +	 * member port's configuration contradict to each other.
> > +	 */
> > +	uint32_t share_group;
> 
> I think that we don't need 32-bit for shared groups.
> 16-bits sounds more than enough.
> 
> >  	/**
> >  	 * Per-queue Rx offloads to be set using DEV_RX_OFFLOAD_* flags.
> >  	 * Only offloads set on rx_queue_offload_capa or rx_offload_capa
> > @@ -1445,6 +1452,14 @@ struct rte_eth_conf {
> >  #define RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP 0x00000001
> >  /** Device supports Tx queue setup after device started. */
> >  #define RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP 0x00000002
> > +/**
> > + * Device supports shared Rx queue among ports within Rx domain and
> > + * switch domain. Mbufs are consumed by shared Rx queue instead of
> > + * every port. Multiple groups is supported by share_group of Rx
> > + * queue configuration. Polling any port in the group receive packets
> > + * of all member ports, source port identified by mbuf->port field.
> > + */
> > +#define RTE_ETH_DEV_CAPA_RXQ_SHARE              0x00000004
> 
> Let's use RTE_BIT64(2)
> 
> I think above two should be fixed in a separate
> cleanup patch.

Not only above two, more Rx/Tx offload bits to cleanup, let's do it
later.

> 
> >  /**@}*/
> >  
> >  /*
> > @@ -1488,6 +1503,12 @@ struct rte_eth_switch_info {
> >  	 * but each driver should explicitly define the mapping of switch
> >  	 * port identifier to that physical interconnect/switch
> >  	 */
> > +	uint16_t rx_domain;
> > +	/**<
> > +	 * Shared Rx queue sub-domain boundary. Only ports in same Rx domain
> > +	 * and switch domain can share Rx queue. Valid only if device advertised
> > +	 * RTE_ETH_DEV_CAPA_RXQ_SHARE capability.
> > +	 */
> 
> Please, put the documentation before the documented
> field.
> 
> [snip]
Previous message (by thread): [dpdk-dev] [PATCH v6 1/5] ethdev: introduce shared Rx queue
Next message (by thread): [dpdk-dev] [PATCH v6 1/5] ethdev: introduce shared Rx queue
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the dev mailing list