[dpdk-dev] [PATCH v3 1/2] ethdev: support queue-based priority flow control

Ferruh Yigit ferruh.yigit at intel.com
Thu Feb 3 17:00:33 CET 2022


On 1/31/2022 6:08 PM, jerinj at marvell.com wrote:
> From: Jerin Jacob <jerinj at marvell.com>
> 
> Based on device support and use-case need, there are two different ways
> to enable PFC. The first case is the port level PFC configuration, in
> this case, rte_eth_dev_priority_flow_ctrl_set() API shall be used to
> configure the PFC, and PFC frames will be generated using based on VLAN
> TC value.
> 
> The second case is the queue level PFC configuration, in this
> case, Any packet field content can be used to steer the packet to the
> specific queue using rte_flow or RSS and then use
> rte_eth_dev_priority_flow_ctrl_queue_configure() to configure the
> TC mapping on each queue.
> Based on congestion selected on the specific queue, configured TC
> shall be used to generate PFC frames.
> 

Hi Jerin, Sunil,

Please find below minor comments, mostly syntax issues.

> Signed-off-by: Jerin Jacob <jerinj at marvell.com>
> Signed-off-by: Sunil Kumar Kori <skori at marvell.com>
> ---
> 
> v2..v1:
> - Introduce rte_eth_dev_priority_flow_ctrl_queue_info_get() to
> avoid updates to rte_eth_dev_info
> - Removed devtools/libabigail.abignore changes
> - Address the comment from Ferruh in
> http://patches.dpdk.org/project/dpdk/patch/20220113102718.3167282-1-jerinj@marvell.com/
> 
>   doc/guides/nics/features.rst           |   7 +-
>   doc/guides/rel_notes/release_22_03.rst |   6 ++
>   lib/ethdev/ethdev_driver.h             |  12 ++-
>   lib/ethdev/rte_ethdev.c                | 132 +++++++++++++++++++++++++
>   lib/ethdev/rte_ethdev.h                |  89 +++++++++++++++++
>   lib/ethdev/version.map                 |   4 +
>   6 files changed, 247 insertions(+), 3 deletions(-)
> 
> diff --git a/doc/guides/nics/features.rst b/doc/guides/nics/features.rst
> index 27be2d2576..1cacdc883a 100644
> --- a/doc/guides/nics/features.rst
> +++ b/doc/guides/nics/features.rst
> @@ -379,9 +379,12 @@ Flow control
>   Supports configuring link flow control.
>   
>   * **[implements] eth_dev_ops**: ``flow_ctrl_get``, ``flow_ctrl_set``,
> -  ``priority_flow_ctrl_set``.
> +  ``priority_flow_ctrl_set``, ``priority_flow_ctrl_queue_info_get``,
> +  ``priority_flow_ctrl_queue_configure``
>   * **[related]    API**: ``rte_eth_dev_flow_ctrl_get()``, ``rte_eth_dev_flow_ctrl_set()``,
> -  ``rte_eth_dev_priority_flow_ctrl_set()``.
> +  ``rte_eth_dev_priority_flow_ctrl_set()``,
> +  ``rte_eth_dev_priority_flow_ctrl_queue_info_get()``,
> +  ``rte_eth_dev_priority_flow_ctrl_queue_configure()``.
>   
>   
>   .. _nic_features_rate_limitation:
> diff --git a/doc/guides/rel_notes/release_22_03.rst b/doc/guides/rel_notes/release_22_03.rst
> index 3bc0630c7c..e988c104e8 100644
> --- a/doc/guides/rel_notes/release_22_03.rst
> +++ b/doc/guides/rel_notes/release_22_03.rst
> @@ -69,6 +69,12 @@ New Features
>   
>     The new API ``rte_event_eth_rx_adapter_event_port_get()`` was added.
>   
> +* **Added an API to enable queue based priority flow ctrl(PFC).**
> +
> +  New APIs, ``rte_eth_dev_priority_flow_ctrl_queue_info_get()`` and
> +  ``rte_eth_dev_priority_flow_ctrl_queue_configure()``, was added.
> +
> +
>  


Can you please move this update before ethdev driver updates.
And no need double empty lines.
  
>   Removed Items
>   -------------
> diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h
> index d95605a355..320a364766 100644
> --- a/lib/ethdev/ethdev_driver.h
> +++ b/lib/ethdev/ethdev_driver.h
> @@ -533,6 +533,13 @@ typedef int (*flow_ctrl_set_t)(struct rte_eth_dev *dev,
>   typedef int (*priority_flow_ctrl_set_t)(struct rte_eth_dev *dev,
>   				struct rte_eth_pfc_conf *pfc_conf);
>   
> +/** @internal Get info for queue based PFC on an Ethernet device. */
> +typedef int (*priority_flow_ctrl_queue_info_get_t)(
> +	struct rte_eth_dev *dev, struct rte_eth_pfc_queue_info *pfc_queue_info);
> +/** @internal Configure queue based PFC parameter on an Ethernet device. */
> +typedef int (*priority_flow_ctrl_queue_config_t)(
> +	struct rte_eth_dev *dev, struct rte_eth_pfc_queue_conf *pfc_queue_conf);
> +

Instead of ending line with opening parantesis '(', can you break the line after
first argument, like:

typedef int (*priority_flow_ctrl_queue_config_t)(struct rte_eth_dev *dev,
				struct rte_eth_pfc_queue_conf *pfc_queue_conf);

Same for all instances.

>   /** @internal Update RSS redirection table on an Ethernet device. */
>   typedef int (*reta_update_t)(struct rte_eth_dev *dev,
>   			     struct rte_eth_rss_reta_entry64 *reta_conf,
> @@ -1080,7 +1087,10 @@ struct eth_dev_ops {
>   	flow_ctrl_set_t            flow_ctrl_set; /**< Setup flow control */
>   	/** Setup priority flow control */
>   	priority_flow_ctrl_set_t   priority_flow_ctrl_set;
> -
> +	/** Priority flow control queue info get */
> +	priority_flow_ctrl_queue_info_get_t priority_flow_ctrl_queue_info_get;
> +	/** Priority flow control queue configure */
> +	priority_flow_ctrl_queue_config_t priority_flow_ctrl_queue_config;

Can you please keep empty line before next (hash) group?

>   	/** Set Unicast Table Array */
>   	eth_uc_hash_table_set_t    uc_hash_table_set;
>   	/** Set Unicast hash bitmap */
> diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
> index a1d475a292..2ce38cd2c5 100644
> --- a/lib/ethdev/rte_ethdev.c
> +++ b/lib/ethdev/rte_ethdev.c
> @@ -4022,6 +4022,138 @@ rte_eth_dev_priority_flow_ctrl_set(uint16_t port_id,
>   	return -ENOTSUP;
>   }
>   
> +static inline int

Not sure if there is a value to ask function to be 'inline', this is in control
path, only static can be enough.

> +validate_rx_pause_config(struct rte_eth_dev_info *dev_info, uint8_t tc_max,
> +			 struct rte_eth_pfc_queue_conf *pfc_queue_conf)
> +{
> +	if ((pfc_queue_conf->mode == RTE_ETH_FC_RX_PAUSE) ||
> +	    (pfc_queue_conf->mode == RTE_ETH_FC_FULL)) {

We don't allign to paranthesis in dpdk coding convenion [1], it should be as:

if ((pfc_queue_conf->mode == RTE_ETH_FC_RX_PAUSE) ||
		(pfc_queue_conf->mode == RTE_ETH_FC_FULL)) {
	if (pfc_queue_conf->rx_pause.tx_qid >= dev_info->nb_tx_queues) {
		...
         }
}



[1]
Altough I am aware many instances sneaked in, still I think better to follow
the convention.

> +		if (pfc_queue_conf->rx_pause.tx_qid >= dev_info->nb_tx_queues) {
> +			RTE_ETHDEV_LOG(ERR, "Tx queue not in range for Rx pause"
> +				       " (requested: %d configured: %d)\n",
> +				       pfc_queue_conf->rx_pause.tx_qid,
> +				       dev_info->nb_tx_queues);


Should log mention that this is related to the "priority flow Rx queue control"?

> +			return -EINVAL;
> +		}
> +
> +		if (pfc_queue_conf->rx_pause.tc >= tc_max) {

Should we document somewhere that 'tc_max' value itself is an invalid value?

> +			RTE_ETHDEV_LOG(ERR, "TC not in range for Rx pause"
> +				       " (requested: %d max: %d)\n",
> +				       pfc_queue_conf->rx_pause.tc, tc_max);
> +			return -EINVAL;
> +		}
> +	}
> +
> +	return 0;
> +}
> +
> +static inline int
> +validate_tx_pause_config(struct rte_eth_dev_info *dev_info, uint8_t tc_max,
> +			 struct rte_eth_pfc_queue_conf *pfc_queue_conf)
> +{
> +	if ((pfc_queue_conf->mode == RTE_ETH_FC_TX_PAUSE) ||
> +	     (pfc_queue_conf->mode == RTE_ETH_FC_FULL)) {
> +		if (pfc_queue_conf->tx_pause.rx_qid >= dev_info->nb_rx_queues) {
> +			RTE_ETHDEV_LOG(ERR, "Rx queue not in range for Tx pause"
> +				       "(requested: %d configured: %d)\n",
> +				       pfc_queue_conf->tx_pause.rx_qid,
> +				       dev_info->nb_rx_queues);
> +			return -EINVAL;
> +		}
> +
> +		if (pfc_queue_conf->tx_pause.tc >= tc_max) {
> +			RTE_ETHDEV_LOG(ERR, "TC not in range for Tx pause"
> +				       "(requested: %d max: %d)\n",
> +				       pfc_queue_conf->tx_pause.tc, tc_max);
> +			return -EINVAL;
> +		}
> +	}
> +
> +	return 0;
> +}
> +
> +int
> +rte_eth_dev_priority_flow_ctrl_queue_info_get(
> +	uint16_t port_id, struct rte_eth_pfc_queue_info *pfc_queue_info)
> +{
> +	struct rte_eth_dev *dev;
> +
> +	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
> +	dev = &rte_eth_devices[port_id];
> +
> +	if (pfc_queue_info == NULL) {
> +		RTE_ETHDEV_LOG(ERR, "PFC info param is NULL for port (%u)\n",
> +			       port_id);
> +		return -EINVAL;
> +	}
> +
> +	if (*dev->dev_ops->priority_flow_ctrl_queue_info_get)
> +		return eth_err(port_id,
> +			       (*dev->dev_ops->priority_flow_ctrl_queue_info_get)(
> +				       dev, pfc_queue_info));
> +	return -ENOTSUP;
> +}
> +
> +int
> +rte_eth_dev_priority_flow_ctrl_queue_configure(
> +	uint16_t port_id, struct rte_eth_pfc_queue_conf *pfc_queue_conf)
> +{
> +	struct rte_eth_pfc_queue_info pfc_info;
> +	struct rte_eth_dev_info dev_info;
> +	struct rte_eth_dev *dev;
> +	int ret;
> +
> +	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
> +	dev = &rte_eth_devices[port_id];
> +
> +	if (pfc_queue_conf == NULL) {
> +		RTE_ETHDEV_LOG(ERR, "PFC parameters are NULL for port (%u)\n",
> +			       port_id);
> +		return -EINVAL;
> +	}
> +
> +	ret = rte_eth_dev_info_get(port_id, &dev_info);
> +	if (ret != 0)
> +		return ret;
> +
> +	ret = rte_eth_dev_priority_flow_ctrl_queue_info_get(port_id, &pfc_info);
> +	if (ret != 0)
> +		return ret;
> +
> +	if (pfc_info.capa == 0) {
> +		RTE_ETHDEV_LOG(ERR, "Ethdev port %u does not support PFC\n",
> +			port_id);
> +		return -ENOTSUP;
> +	}
> +
> +	if (pfc_info.tc_max == 0) {
> +		RTE_ETHDEV_LOG(ERR,
> +			"Ethdev port %u does not support PFC TC values\n",
> +			port_id);
> +		return -ENOTSUP;
> +	}
> +
> +	if (pfc_info.capa & RTE_ETH_PFC_QUEUE_CAPA_RX_PAUSE) {
> +		ret = validate_rx_pause_config(&dev_info, pfc_info.tc_max,
> +					       pfc_queue_conf);

There is capablilty flags for RTE_ETH_PFC_QUEUE_CAPA_RX_PAUSE and RTE_ETH_PFC_QUEUE_CAPA_TX_PAUSE
also there is config flags RTE_ETH_FC_RX_PAUSE, RTE_ETH_FC_TX_PAUSE and RTE_ETH_FC_FULL


What should happen if driver only support RX_PAUSE but app config request only
TX_PAUSE?
As far as can see with current code it pass the validation, but should it?


> +		if (ret != 0)
> +			return ret;
> +	}
> +
> +	if (pfc_info.capa & RTE_ETH_PFC_QUEUE_CAPA_TX_PAUSE) {
> +		ret = validate_tx_pause_config(&dev_info, pfc_info.tc_max,
> +					       pfc_queue_conf);

syntax, please don't align to paranthesis

> +		if (ret != 0)
> +			return ret;
> +	}
> +
> +	if (*dev->dev_ops->priority_flow_ctrl_queue_config)
> +		return eth_err(port_id,
> +			       (*dev->dev_ops->priority_flow_ctrl_queue_config)(
> +				       dev, pfc_queue_conf));
> +	return -ENOTSUP;
> +}
> +
>   static int
>   eth_check_reta_mask(struct rte_eth_rss_reta_entry64 *reta_conf,
>   			uint16_t reta_size)
> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
> index fa299c8ad7..383ad1cdd7 100644
> --- a/lib/ethdev/rte_ethdev.h
> +++ b/lib/ethdev/rte_ethdev.h
> @@ -1395,6 +1395,48 @@ struct rte_eth_pfc_conf {
>   	uint8_t priority;          /**< VLAN User Priority. */
>   };
>   
> +/** Device supports Rx pause for queue based PFC. */
> +#define RTE_ETH_PFC_QUEUE_CAPA_RX_PAUSE RTE_BIT64(0)
> +/** Device supports Tx pause for queue based PFC. */
> +#define RTE_ETH_PFC_QUEUE_CAPA_TX_PAUSE RTE_BIT64(1)
> +

There is already control flow mode enum 'enum rte_eth_fc_mode', those
enum items use FC as abrivation (RTE_ETH_FC_RX_PAUSE), above ones use PFC,
should they use same prefix 'RTE_ETH_FC_' for consistency?

And overall, what for struct and functins too, is the correct abreviation
'pfc' or 'fc', since old code has 'fc' as far as I see?

> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
> + *
> + * A structure used to retrieve information of queue based PFC.
> + */
> +struct rte_eth_pfc_queue_info {
> +	/**
> +	 * Maximum supported traffic class as per PFC (802.1Qbb) specification.

Will it be redundant to say valid value should be bigger than 0?

> +	 */
> +	uint8_t tc_max;
> +	/** PFC queue capabilities (RTE_ETH_PFC_QUEUE_CAPA_). */

Can move doxygen comments to same line if they fit to 80 char limit:
	uint64_t capa; /**< PFC queue capabilities (RTE_ETH_PFC_QUEUE_CAPA_). */

> +	uint64_t capa;
> +};
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
> + *
> + * A structure used to configure Ethernet priority flow control parameter for
> + * ethdev queues.
> + */
> +struct rte_eth_pfc_queue_conf {
> +	enum rte_eth_fc_mode mode; /**< Link flow control mode */
> +
> +	struct {
> +		uint16_t tx_qid; /**< Tx queue ID */

'tx_qid' within 'rx_pause' struct, this seems done intentionally but just to double
check, can you please describe here the intendent usage?

> +		uint8_t tc; /**< Traffic class as per PFC (802.1Qbb) spec */
> +	} rx_pause; /* Valid when (mode == FC_RX_PAUSE || mode == FC_FULL) */
> +
> +	struct {
> +		uint16_t pause_time; /**< Pause quota in the Pause frame */
> +		uint16_t rx_qid;     /**< Rx queue ID */
> +		uint8_t tc; /**< Traffic class as per PFC (802.1Qbb) spec */
> +	} tx_pause; /* Valid when (mode == FC_TX_PAUSE || mode == FC_FULL) */
> +};
> +
>   /**
>    * Tunnel type for device-specific classifier configuration.
>    * @see rte_eth_udp_tunnel
> @@ -4144,6 +4186,53 @@ int rte_eth_dev_priority_flow_ctrl_set(uint16_t port_id,
>   int rte_eth_dev_mac_addr_add(uint16_t port_id, struct rte_ether_addr *mac_addr,
>   				uint32_t pool);
>   
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Retrieve the information for queue based PFC.
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param pfc_queue_info
> + *   A pointer to a structure of type *rte_eth_pfc_queue_info* to be filled with
> + *   the information about queue based PFC.
> + * @return
> + *   - (0) if successful.
> + *   - (-ENOTSUP) if support for priority_flow_ctrl_queue_info_get does not exist.
> + *   - (-ENODEV) if *port_id* invalid.
> + *   - (-EINVAL) if bad parameter.
> + */
> +__rte_experimental
> +int rte_eth_dev_priority_flow_ctrl_queue_info_get(uint16_t port_id,
> +				   struct rte_eth_pfc_queue_info *pfc_queue_info);
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Configure the queue based priority flow control for a given queue
> + * for Ethernet device.
> + *
> + * @note When an ethdev port switches to queue based PFC mode, the
> + * unconfigured queues shall be configured by the driver with
> + * default values such as lower priority value for TC etc.
> + *

May be good to document it has dependency to 'rte_eth_dev_info_get()' API?

> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param pfc_queue_conf
> + *   The pointer to the structure of the priority flow control parameters
> + *   for the queue.
> + * @return
> + *   - (0) if successful.
> + *   - (-ENOTSUP) if hardware doesn't support queue based PFC mode.
> + *   - (-ENODEV)  if *port_id* invalid.
> + *   - (-EINVAL)  if bad parameter
> + *   - (-EIO)     if flow control setup queue failure
> + */
> +__rte_experimental
> +int rte_eth_dev_priority_flow_ctrl_queue_configure(uint16_t port_id,
> +			      struct rte_eth_pfc_queue_conf *pfc_queue_conf);
> +
>   /**
>    * Remove a MAC address from the internal array of addresses.
>    *
> diff --git a/lib/ethdev/version.map b/lib/ethdev/version.map
> index c2fb0669a4..49523ebc45 100644
> --- a/lib/ethdev/version.map
> +++ b/lib/ethdev/version.map
> @@ -256,6 +256,10 @@ EXPERIMENTAL {
>   	rte_flow_flex_item_create;
>   	rte_flow_flex_item_release;
>   	rte_flow_pick_transfer_proxy;
> +
> +	# added in 22.03
> +	rte_eth_dev_priority_flow_ctrl_queue_configure;
> +	rte_eth_dev_priority_flow_ctrl_queue_info_get;
>   };
>   
>   INTERNAL {



More information about the dev mailing list