[dpdk-dev] [PATCH v2 1/2] ethdev: support queue-based priority flow control

Ferruh Yigit ferruh.yigit at intel.com
Tue Jan 25 18:34:25 CET 2022


On 1/13/2022 10:27 AM, jerinj at marvell.com wrote:
> From: Jerin Jacob <jerinj at marvell.com>
> 
> Based on device support and use-case need, there are two different ways
> to enable PFC. The first case is the port level PFC configuration, in
> this case, rte_eth_dev_priority_flow_ctrl_set() API shall be used to
> configure the PFC, and PFC frames will be generated using based on VLAN
> TC value.
> 
> The second case is the queue level PFC configuration, in this
> case, Any packet field content can be used to steer the packet to the
> specific queue using rte_flow or RSS and then use
> rte_eth_dev_priority_flow_ctrl_queue_set() to set the TC mapping on each
> queue. Based on congestion selected on the specific queue, configured TC
> shall be used to generate PFC frames.
> > Operation of these modes are mutually exclusive, when driver sets
> non zero value for rte_eth_dev_info::pfc_queue_tc_max,
> application must use queue level PFC configuration via
> rte_eth_dev_priority_flow_ctrl_queue_set() API instead of port level
> PFC configuration via rte_eth_dev_priority_flow_ctrl_set() API to
> realize PFC configuration.
> 
> This patch enables the configuration for second case a.k.a queue
> based PFC also updates rte_eth_dev_priority_flow_ctrl_set()
> implmentaion to adheher to rte_eth_dev_info::pfc_queue_tc_max
> handling.
> 
> Also updated libabigail.abignore to ignore the update
> to reserved fields in rte_eth_dev_info.
> 
> Signed-off-by: Jerin Jacob <jerinj at marvell.com>
> Signed-off-by: Sunil Kumar Kori <skori at marvell.com>
> ---
> 
> A driver implemtion based on this API is at
> https://patches.dpdk.org/project/dpdk/patch/20220111081831.881374-1-skori@marvell.com/
> 
> RFC..v1
>   - Added queue based PFC config API instead port based
> 
> v1..v2
> - Updated libabigail.ignore as
>    has_data_member_inserted_between = {offset_of(reserved_64s), end}
> - Updated doxygen comments of rte_eth_pfc_queue_conf
> 
> 
>   devtools/libabigail.abignore           |   5 ++
>   doc/guides/nics/features.rst           |   5 +-
>   doc/guides/rel_notes/release_22_03.rst |   3 +
>   lib/ethdev/ethdev_driver.h             |   6 +-
>   lib/ethdev/rte_ethdev.c                | 109 +++++++++++++++++++++++++
>   lib/ethdev/rte_ethdev.h                |  83 ++++++++++++++++++-
>   lib/ethdev/version.map                 |   3 +
>   7 files changed, 208 insertions(+), 6 deletions(-)
> 
> diff --git a/devtools/libabigail.abignore b/devtools/libabigail.abignore
> index 4b676f317d..3bdecaaef0 100644
> --- a/devtools/libabigail.abignore
> +++ b/devtools/libabigail.abignore
> @@ -11,3 +11,8 @@
>   ; Ignore generated PMD information strings
>   [suppress_variable]
>           name_regexp = _pmd_info$
> +
> +;Ignore fields inserted in place of reserved fields of rte_eth_dev_info
> +[suppress_type]
> +       name = rte_eth_dev_info
> +       has_data_member_inserted_between = {offset_of(reserved_64s), end}
> diff --git a/doc/guides/nics/features.rst b/doc/guides/nics/features.rst
> index 27be2d2576..277a784f4e 100644
> --- a/doc/guides/nics/features.rst
> +++ b/doc/guides/nics/features.rst
> @@ -379,9 +379,10 @@ Flow control
>   Supports configuring link flow control.
>   
>   * **[implements] eth_dev_ops**: ``flow_ctrl_get``, ``flow_ctrl_set``,
> -  ``priority_flow_ctrl_set``.
> +  ``priority_flow_ctrl_set``, ``priority_flow_ctrl_queue_set``.
>   * **[related]    API**: ``rte_eth_dev_flow_ctrl_get()``, ``rte_eth_dev_flow_ctrl_set()``,
> -  ``rte_eth_dev_priority_flow_ctrl_set()``.
> +  ``rte_eth_dev_priority_flow_ctrl_set()``, ``rte_eth_dev_priority_flow_ctrl_queue_set()``.
> +* **[provides]   rte_eth_dev_info**: ``pfc_queue_tc_max``.
>   
>   
>   .. _nic_features_rate_limitation:
> diff --git a/doc/guides/rel_notes/release_22_03.rst b/doc/guides/rel_notes/release_22_03.rst
> index 6d99d1eaa9..b75c0356e6 100644
> --- a/doc/guides/rel_notes/release_22_03.rst
> +++ b/doc/guides/rel_notes/release_22_03.rst
> @@ -55,6 +55,9 @@ New Features
>        Also, make sure to start the actual text at the margin.
>        =======================================================
>   
> +* **Added an API to enable queue based priority flow ctrl(PFC).**
> +
> +  A new API, ``rte_eth_dev_priority_flow_ctrl_queue_set()``, was added.
>   
>   Removed Items
>   -------------
> diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h
> index d95605a355..e0bbfe89d7 100644
> --- a/lib/ethdev/ethdev_driver.h
> +++ b/lib/ethdev/ethdev_driver.h
> @@ -532,6 +532,9 @@ typedef int (*flow_ctrl_set_t)(struct rte_eth_dev *dev,
>   /** @internal Setup priority flow control parameter on an Ethernet device. */
>   typedef int (*priority_flow_ctrl_set_t)(struct rte_eth_dev *dev,
>   				struct rte_eth_pfc_conf *pfc_conf);
> +/** @internal Queue setup for priority flow control parameter on an Ethernet device. */
> +typedef int (*priority_flow_ctrl_queue_set_t)(struct rte_eth_dev *dev,
> +				struct rte_eth_pfc_queue_conf *pfc_queue_conf);
>   
>   /** @internal Update RSS redirection table on an Ethernet device. */
>   typedef int (*reta_update_t)(struct rte_eth_dev *dev,
> @@ -1080,7 +1083,8 @@ struct eth_dev_ops {
>   	flow_ctrl_set_t            flow_ctrl_set; /**< Setup flow control */
>   	/** Setup priority flow control */
>   	priority_flow_ctrl_set_t   priority_flow_ctrl_set;
> -
> +	/** Priority flow control queue setup */
> +	priority_flow_ctrl_queue_set_t   priority_flow_ctrl_queue_set;
>   	/** Set Unicast Table Array */
>   	eth_uc_hash_table_set_t    uc_hash_table_set;
>   	/** Set Unicast hash bitmap */
> diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
> index a1d475a292..6def057720 100644
> --- a/lib/ethdev/rte_ethdev.c
> +++ b/lib/ethdev/rte_ethdev.c
> @@ -3998,7 +3998,9 @@ int
>   rte_eth_dev_priority_flow_ctrl_set(uint16_t port_id,
>   				   struct rte_eth_pfc_conf *pfc_conf)
>   {
> +	struct rte_eth_dev_info dev_info;
>   	struct rte_eth_dev *dev;
> +	int ret;
>   
>   	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
>   	dev = &rte_eth_devices[port_id];
> @@ -4010,6 +4012,17 @@ rte_eth_dev_priority_flow_ctrl_set(uint16_t port_id,
>   		return -EINVAL;
>   	}
>   
> +	ret = rte_eth_dev_info_get(port_id, &dev_info);
> +	if (ret != 0)
> +		return ret;
> +
> +	if (dev_info.pfc_queue_tc_max != 0) {
> +		RTE_ETHDEV_LOG(ERR,
> +			"Ethdev port %u driver does not support port level PFC config\n",
> +			port_id);
> +		return -ENOTSUP;
> +	}
> +
>   	if (pfc_conf->priority > (RTE_ETH_DCB_NUM_USER_PRIORITIES - 1)) {
>   		RTE_ETHDEV_LOG(ERR, "Invalid priority, only 0-7 allowed\n");
>   		return -EINVAL;
> @@ -4022,6 +4035,102 @@ rte_eth_dev_priority_flow_ctrl_set(uint16_t port_id,
>   	return -ENOTSUP;
>   }
>   
> +static inline int
> +validate_rx_pause_config(struct rte_eth_dev_info *dev_info,
> +			 struct rte_eth_pfc_queue_conf *pfc_queue_conf)
> +{
> +	if ((pfc_queue_conf->mode == RTE_ETH_FC_RX_PAUSE) ||
> +	    (pfc_queue_conf->mode == RTE_ETH_FC_FULL)) {
> +		if (pfc_queue_conf->rx_pause.tx_qid >= dev_info->nb_tx_queues) {
> +			RTE_ETHDEV_LOG(ERR, "Tx queue not in range for Rx pause"
> +				       " (requested: %d configured: %d)\n",
> +				       pfc_queue_conf->rx_pause.tx_qid,
> +				       dev_info->nb_tx_queues);
> +			return -EINVAL;
> +		}
> +
> +		if (pfc_queue_conf->rx_pause.tc >= dev_info->pfc_queue_tc_max) {
> +			RTE_ETHDEV_LOG(ERR, "TC not in range for Rx pause"
> +				       " (requested: %d max: %d)\n",
> +				       pfc_queue_conf->rx_pause.tc,
> +				       dev_info->pfc_queue_tc_max);
> +			return -EINVAL;
> +		}
> +	}
> +
> +	return 0;
> +}
> +
> +static inline int
> +validate_tx_pause_config(struct rte_eth_dev_info *dev_info,
> +			 struct rte_eth_pfc_queue_conf *pfc_queue_conf)
> +{
> +	if ((pfc_queue_conf->mode == RTE_ETH_FC_TX_PAUSE) ||
> +	     (pfc_queue_conf->mode == RTE_ETH_FC_FULL)) {
> +		if (pfc_queue_conf->tx_pause.rx_qid >= dev_info->nb_rx_queues) {
> +			RTE_ETHDEV_LOG(ERR, "Rx queue not in range for Tx pause"
> +				       "(requested: %d configured: %d)\n",
> +				       pfc_queue_conf->tx_pause.rx_qid,
> +				       dev_info->nb_rx_queues);
> +			return -EINVAL;
> +		}
> +
> +		if (pfc_queue_conf->tx_pause.tc >= dev_info->pfc_queue_tc_max) {
> +			RTE_ETHDEV_LOG(ERR, "TC not in range for Tx pause"
> +				       "(requested: %d max: %d)\n",
> +				       pfc_queue_conf->tx_pause.tc,
> +				       dev_info->pfc_queue_tc_max);
> +			return -EINVAL;
> +		}
> +	}
> +
> +	return 0;
> +}
> +
> +int
> +rte_eth_dev_priority_flow_ctrl_queue_set(
> +	uint16_t port_id, struct rte_eth_pfc_queue_conf *pfc_queue_conf)
> +{
> +	struct rte_eth_dev_info dev_info;
> +	struct rte_eth_dev *dev;
> +	int ret;
> +
> +	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
> +	dev = &rte_eth_devices[port_id];
> +
> +	if (pfc_queue_conf == NULL) {
> +		RTE_ETHDEV_LOG(ERR, "PFC parameters are NULL for port (%u)\n",
> +			       port_id);
> +		return -EINVAL;
> +	}
> +
> +	ret = rte_eth_dev_info_get(port_id, &dev_info);
> +	if (ret != 0)
> +		return ret;
> +
> +	if (dev_info.pfc_queue_tc_max == 0) {
> +		RTE_ETHDEV_LOG(ERR,
> +			"Ethdev port %u does not support PFC TC values\n",
> +			port_id);
> +		return -ENOTSUP;
> +	}
> +
> +	ret = validate_rx_pause_config(&dev_info, pfc_queue_conf);
> +	if (ret != 0)
> +		return ret;
> +
> +	ret = validate_tx_pause_config(&dev_info, pfc_queue_conf);
> +	if (ret != 0)
> +		return ret;
> +
> +
> +	if (*dev->dev_ops->priority_flow_ctrl_queue_set)
> +		return eth_err(port_id,
> +			       (*dev->dev_ops->priority_flow_ctrl_queue_set)(
> +				       dev, pfc_queue_conf));
> +	return -ENOTSUP;
> +}
> +
>   static int
>   eth_check_reta_mask(struct rte_eth_rss_reta_entry64 *reta_conf,
>   			uint16_t reta_size)
> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
> index fa299c8ad7..33e43f6e1d 100644
> --- a/lib/ethdev/rte_ethdev.h
> +++ b/lib/ethdev/rte_ethdev.h
> @@ -1395,6 +1395,28 @@ struct rte_eth_pfc_conf {
>   	uint8_t priority;          /**< VLAN User Priority. */
>   };
>   
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
> + *
> + * A structure used to configure Ethernet priority flow control parameter for
> + * ethdev queues.
> + */
> +struct rte_eth_pfc_queue_conf {
> +	enum rte_eth_fc_mode mode;  /**< Link flow control mode */
> +
> +	struct {
> +		uint16_t tx_qid; /**< Tx queue ID */
> +		uint8_t tc; /**< Traffic class as per PFC (802.1Qbb) spec */
> +	} rx_pause; /* Valid when (mode == FC_RX_PAUSE || mode == FC_FULL) */
> +
> +	struct {
> +		uint16_t pause_time; /**< Pause quota in the Pause frame */
> +		uint16_t rx_qid; /**< Rx queue ID */
> +		uint8_t tc; /**< Traffic class as per PFC (802.1Qbb) spec */
> +	} tx_pause; /* Valid when (mode == FC_TX_PAUSE || mode == FC_FULL) */
> +};
> +
>   /**
>    * Tunnel type for device-specific classifier configuration.
>    * @see rte_eth_udp_tunnel
> @@ -1841,8 +1863,30 @@ struct rte_eth_dev_info {
>   	 * embedded managed interconnect/switch.
>   	 */
>   	struct rte_eth_switch_info switch_info;
> -
> -	uint64_t reserved_64s[2]; /**< Reserved for future fields */
> +	/**
> +	 * Maximum supported traffic class as per PFC (802.1Qbb) specification.
> +	 *
> +	 * Based on device support and use-case need, there are two different
> +	 * ways to enable PFC. The first case is the port level PFC
> +	 * configuration, in this case, rte_eth_dev_priority_flow_ctrl_set()
> +	 * API shall be used to configure the PFC, and PFC frames will be
> +	 * generated using based on VLAN TC value.
> +	 * The second case is the queue level PFC configuration, in this case,
> +	 * Any packet field content can be used to steer the packet to the
> +	 * specific queue using rte_flow or RSS and then use
> +	 * rte_eth_dev_priority_flow_ctrl_queue_set() to set the TC mapping
> +	 * on each queue. Based on congestion selected on the specific queue,
> +	 * configured TC shall be used to generate PFC frames.
> +	 *
> +	 * When set to non zero value, application must use queue level
> +	 * PFC configuration via rte_eth_dev_priority_flow_ctrl_queue_set() API
> +	 * instead of port level PFC configuration via
> +	 * rte_eth_dev_priority_flow_ctrl_set() API to realize
> +	 * PFC configuration.
> +	 */
> +	uint8_t pfc_queue_tc_max;


'rte_eth_dev_info_get()' is one of the APIs that anyone using ethdev needs to use.

Instead of expanding it with less used features, what do you think to have a
specific API to get the 'pfc_queue_tc_max'?

It also can be used by application to detect queue based PFC is supported by
driver or not.
Assume API is 'rte_eth_dev_priority_flow_ctrl_get()', if it returns '-ENOTSUP'
application can know that PMD doesn't support queue based PFC.


> +	uint8_t reserved_8s[7];
> +	uint64_t reserved_64s[1]; /**< Reserved for future fields */
>   	void *reserved_ptrs[2];   /**< Reserved for future fields */
>   };
>   
> @@ -4109,6 +4153,9 @@ int rte_eth_dev_flow_ctrl_set(uint16_t port_id,
>    * Configure the Ethernet priority flow control under DCB environment
>    * for Ethernet device.
>    *
> + * @see struct rte_eth_dev_info::pfc_queue_tc_max priority
> + * flow control usage models.
> + *
>    * @param port_id
>    * The port identifier of the Ethernet device.
>    * @param pfc_conf
> @@ -4119,10 +4166,40 @@ int rte_eth_dev_flow_ctrl_set(uint16_t port_id,
>    *   - (-ENODEV)  if *port_id* invalid.
>    *   - (-EINVAL)  if bad parameter
>    *   - (-EIO)     if flow control setup failure or device is removed.
> + *
>    */
>   int rte_eth_dev_priority_flow_ctrl_set(uint16_t port_id,
> -				struct rte_eth_pfc_conf *pfc_conf);
> +				       struct rte_eth_pfc_conf *pfc_conf);

Above syntax changes are not needed.

DPDK coding convention is using tabs (mostly two) for multi line function
decleration/definition, please be consistant with usage, this patch has
multiple variations.

>   
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Configure the Ethernet priority flow control for a given queue
> + * for Ethernet device.
> + *
> + * @see struct rte_eth_dev_info::pfc_queue_tc_max priority flow control
> + * usage models.
> + *
> + * @note When an ethdev port switches to PFC mode, the unconfigured

Doesit mean queue based PFC mode?

> + * queues shall be configured by the driver with default values such as
> + * lower priority value for TC etc.
> + *

I assume there is no way for application to know what the defaults values are,
also not sure if application interested in this.

> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param pfc_queue_conf
> + *   The pointer to the structure of the priority flow control parameters
> + *   for the queue.
> + * @return
> + *   - (0) if successful.
> + *   - (-ENOTSUP) if hardware doesn't support priority flow control mode.
> + *   - (-ENODEV)  if *port_id* invalid.
> + *   - (-EINVAL)  if bad parameter
> + *   - (-EIO)     if flow control setup queue failure
> + */
> +__rte_experimental
> +int rte_eth_dev_priority_flow_ctrl_queue_set(uint16_t port_id,
> +					     struct rte_eth_pfc_queue_conf *pfc_queue_conf);

I wonder if Rx/Tx queue id should be API arguments, to be consistent
with some other APIs, and will it help application that configures queues
in a loop.
But I can see 'rx_pause' or 'tx_pause' (in config) can be valid or not
based on the 'mode', so I understand to have queue ids in the struct.
No strong opinion.


More information about the dev mailing list