[PATCH v8 2/4] ethdev: introduce protocol hdr based buffer split

Andrew Rybchenko andrew.rybchenko at oktetlabs.ru
Thu Oct 6 12:11:52 CEST 2022


On 10/6/22 02:18, Yuan Wang wrote:
> Currently, Rx buffer split supports length based split. With Rx queue
> offload RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT enabled and Rx packet segment
> configured, PMD will be able to split the received packets into
> multiple segments.
> 
> However, length based buffer split is not suitable for NICs that do split
> based on protocol headers. Given an arbitrarily variable length in Rx
> packet segment, it is almost impossible to pass a fixed protocol header to
> driver. Besides, the existence of tunneling results in the composition of
> a packet is various, which makes the situation even worse.
> 
> This patch extends current buffer split to support protocol header based
> buffer split. A new proto_hdr field is introduced in the reserved field
> of rte_eth_rxseg_split structure to specify protocol header. The proto_hdr
> field defines the split position of packet, splitting will always happen
> after the protocol header defined in the Rx packet segment. When Rx queue
> offload RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT is enabled and corresponding
> protocol header is configured, driver will split the ingress packets into
> multiple segments.
> 
> Examples for proto_hdr field defines:
> To split after ETH-IPV4-UDP, it should be defined as
> proto_hdr = RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
>              RTE_PTYPE_L4_UDP
> 
> For inner ETH-IPV4-UDP, it should be defined as
> proto_hdr = RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER |
>              RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_UDP
> 
> If the protocol header is repeated with the previously defined one,
> the repeated part can be omitted. For example, split after ETH, ETH-IPV4
> and ETH-IPV4-UDP, it should be defined as
> proto_hdr0 = RTE_PTYPE_L2_ETHER
> proto_hdr1 = RTE_PTYPE_L3_IPV4_EXT_UNKNOWN
> proto_hdr2 = RTE_PTYPE_L4_UDP

Ack

> 
> struct rte_eth_rxseg_split {
>          struct rte_mempool *mp; /* memory pools to allocate segment from */
>          uint16_t length; /* segment maximal data length,
>                              configures split point */
>          uint16_t offset; /* data offset from beginning
>                              of mbuf data buffer */
>          /**
> 	 * Proto_hdr defines a bit mask of the protocol sequence as
>           * RTE_PTYPE_*, configures split point. The last RTE_PTYPE*
>           * in the mask indicates the split position.
>           * If one protocol header is defined to split packets into two
>           * segments, for non-tunneling packets, the complete protocol
>           * sequence should be defined.
>           * For tunneling packets, for simplicity,
>           * only the tunnel and inner part of comple protocol sequence
>           * is required.
>           * If several protocol headers are defined to split packets into
>           * multi-segments, the repeated parts of adjacent segments
>           * should be omitted.
> 	 */
>          uint32_t proto_hdr;
> };

Sorry, but I see no reason to repeat in the descrtion.
What is the purpose of the duplication?

> 
> If protocol header split can be supported by a PMD, the
> rte_eth_buffer_split_get_supported_hdr_ptypes function can
> be use to obtain a list of these protocol headers.
> 
> For example, let's suppose we configured the Rx queue with the
> following segments:
>          seg0 - pool0, proto_hdr0=RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4,
>                 off0=2B
>          seg1 - pool1, proto_hdr1=RTE_PTYPE_L4_UDP, off1=128B
>          seg2 - pool2, off1=0B
> 
> The packet consists of ETH_IPV4_UDP_PAYLOAD will be split like
> following:
>          seg0 - ipv4 header @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from pool0
>          seg1 - udp header @ 128 in mbuf from pool1
>          seg2 - payload @ 0 in mbuf from pool2
> 
> Note: NIC will only do split when the packets exactly match all the
> protocol headers in the segments. For example, if ARP packets received
> with above config, the NIC won't do split for ARP packets since
> it does not contains ipv4 header and udp header. These packets will be put

ipv4 -> IPv4, udp -> UDP.

> into the last valid mempool, with zero offset.

What should happen if we have seg1 -> ETH, seg2 -> IPv4, seg3 - 
remaining and receive ARP? Will we see ETH header split in seg1
and everything else in the seg3? I would say yes.

May be we should define intended behavior using pseudo-code?

> 
> Now buffer split can be configured in two modes. For length based
> buffer split, the mp, length, offset field in Rx packet segment should
> be configured, while the proto_hdr field will be ignored.
> For protocol header based buffer split, the mp, offset, proto_hdr field
> in Rx packet segment should be configured, while the length field will
> be ignored.
> 
> The split limitations imposed by underlying driver is reported in the
> rte_eth_dev_info->rx_seg_capa field. The memory attributes for the split
> parts may differ either, dpdk memory and external memory, respectively.
> 
> Signed-off-by: Yuan Wang <yuanx.wang at intel.com>
> Signed-off-by: Xuan Ding <xuan.ding at intel.com>
> Signed-off-by: Wenxuan Wu <wenxuanx.wu at intel.com>
> ---
>   doc/guides/rel_notes/release_22_11.rst |  4 ++
>   lib/ethdev/rte_ethdev.c                | 89 ++++++++++++++++++++++----
>   lib/ethdev/rte_ethdev.h                | 34 +++++++++-
>   3 files changed, 115 insertions(+), 12 deletions(-)
> 
> diff --git a/doc/guides/rel_notes/release_22_11.rst b/doc/guides/rel_notes/release_22_11.rst
> index 141fd9442b..4c3a7f8b8b 100644
> --- a/doc/guides/rel_notes/release_22_11.rst
> +++ b/doc/guides/rel_notes/release_22_11.rst
> @@ -127,6 +127,10 @@ New Features
>   
>     * Added ``rte_eth_buffer_split_get_supported_hdr_ptypes()``, to get supported
>       header protocols of a PMD to split.
> +  * Ethdev: The ``reserved`` field in the ``rte_eth_rxseg_split`` structure is
> +    replaced with ``proto_hdr`` to support protocol header based buffer split.
> +    User can choose length or protocol header to configure buffer split
> +    according to NIC's capability.

It sounds like it should be mentioned in API change section as
well. Here I'd concentrate on top level feature overview only.
I.e. Supported protocol-based buffer split using added
``proto_hdr`` in structure ``rte_eth_rxseg_split``.

>   
>   
>   Removed Items
> diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
> index ee3b490889..60fe6eb2bd 100644
> --- a/lib/ethdev/rte_ethdev.c
> +++ b/lib/ethdev/rte_ethdev.c
> @@ -1650,14 +1650,18 @@ rte_eth_dev_is_removed(uint16_t port_id)
>   }
>   
>   static int
> -rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg,
> -			     uint16_t n_seg, uint32_t *mbp_buf_size,
> -			     const struct rte_eth_dev_info *dev_info)
> +rte_eth_rx_queue_check_split(uint16_t port_id,
> +			const struct rte_eth_rxseg_split *rx_seg,
> +			uint16_t n_seg, uint32_t *mbp_buf_size,
> +			const struct rte_eth_dev_info *dev_info)
>   {
>   	const struct rte_eth_rxseg_capa *seg_capa = &dev_info->rx_seg_capa;
>   	struct rte_mempool *mp_first;
>   	uint32_t offset_mask;
>   	uint16_t seg_idx;
> +	int ptype_cnt;
> +	uint32_t *ptypes;
> +	int i;
>   
>   	if (n_seg > seg_capa->max_nseg) {
>   		RTE_ETHDEV_LOG(ERR,
> @@ -1675,6 +1679,7 @@ rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg,
>   		struct rte_mempool *mpl = rx_seg[seg_idx].mp;
>   		uint32_t length = rx_seg[seg_idx].length;
>   		uint32_t offset = rx_seg[seg_idx].offset;
> +		uint32_t proto_hdr = rx_seg[seg_idx].proto_hdr;
>   
>   		if (mpl == NULL) {
>   			RTE_ETHDEV_LOG(ERR, "null mempool pointer\n");
> @@ -1708,13 +1713,75 @@ rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg,
>   		}
>   		offset += seg_idx != 0 ? 0 : RTE_PKTMBUF_HEADROOM;
>   		*mbp_buf_size = rte_pktmbuf_data_room_size(mpl);
> -		length = length != 0 ? length : *mbp_buf_size;
> -		if (*mbp_buf_size < length + offset) {
> -			RTE_ETHDEV_LOG(ERR,
> -				       "%s mbuf_data_room_size %u < %u (segment length=%u + segment offset=%u)\n",
> -				       mpl->name, *mbp_buf_size,
> -				       length + offset, length, offset);
> -			return -EINVAL;
> +
> +		if (proto_hdr > 0) {

proto_hdr != 0, please. I know that it is the same, but != 0
raises a bit less question if the field is signed or unsigned.

As the first condition here we should check if protocol-based
split is supported at all (see note about separate helper
function below).

> +			/* Split based on protocol headers. */
> +			if (length != 0) {
> +				RTE_ETHDEV_LOG(ERR,
> +					"Do not set length split and protocol split within a segment\n"
> +					);
> +				return -EINVAL;
> +			}
> +
> +			if (seg_idx == n_seg - 1) {
> +				RTE_ETHDEV_LOG(ERR,
> +					"The proto_hdr in the last segment should be 0\n"
> +					);
> +				return -EINVAL;
> +			}

I think here we should check if we have seen any segment
with proto_hdr == 0 before. If so, we can't do protocol
based split any more. Since we need to collect already
split protcols (prev_proto_hdrs), I would use the variable
as a marker and set it to all 1's MASK as soon as
proto_hdr==0 met.

So, the condition will be
if ((proto_hdr & prev_proto_hdrs) != 0)

So, it will check two since no repeat of previou
protocol headers which are already split and no
ptoto-split after length-based split.

> +
> +			if (*mbp_buf_size < offset) {
> +				RTE_ETHDEV_LOG(ERR,
> +						"%s mbuf_data_room_size %u < %u segment offset)\n",
> +						mpl->name, *mbp_buf_size,
> +						offset);
> +				return -EINVAL;
> +			}
> +

(separate helper function starts here)

> +			ptype_cnt = rte_eth_buffer_split_get_supported_hdr_ptypes(port_id, NULL, 0);

Three is no point to do it in a loop. It should be done
outside. Moreover, it should be a helper function
which does it to make this functionshort.

> +			if (ptype_cnt <= 0) {
> +				RTE_ETHDEV_LOG(ERR,
> +					"Port %u failed to supported buffer split header protocols\n",
> +					port_id);
> +				return -EINVAL;
> +			}
> +
> +			ptypes = malloc(sizeof(uint32_t) * ptype_cnt);
> +			if (ptypes == NULL)
> +				return -ENOMEM;
> +
> +			ptype_cnt = rte_eth_buffer_split_get_supported_hdr_ptypes(port_id,
> +										ptypes, ptype_cnt);
> +			if (ptype_cnt < 0) {
> +				RTE_ETHDEV_LOG(ERR,
> +					"Port %u failed to supported buffer split header protocols\n",
> +					port_id);
> +				free(ptypes);
> +				return -EINVAL;
> +			}

(separate helper function ends here)

> +
> +			for (i = 0; i < ptype_cnt; i++)
> +				if (ptypes[i] == proto_hdr)

It should be if ((prev_proto_hdrs | proto_hdr) == ptypes[i])

> +					break;
> +
> +			free(ptypes);
> +
> +			if (i == ptype_cnt) {
> +				RTE_ETHDEV_LOG(ERR,
> +					"Requested Rx split header protocols 0x%x is not supported.\n",
> +					proto_hdr);
> +				return -EINVAL;
> +			}

prev_proto_hdrs |= proto_hdr;

> +		} else {

NOTE If driver does not support length-based split,
it should reject such configuration itself.

> +			/* Split at fixed length. */
> +			length = length != 0 ? length : *mbp_buf_size;
> +			if (*mbp_buf_size < length + offset) {
> +				RTE_ETHDEV_LOG(ERR,
> +					"%s mbuf_data_room_size %u < %u (segment length=%u + segment offset=%u)\n",
> +					mpl->name, *mbp_buf_size,
> +					length + offset, length, offset);
> +				return -EINVAL;
> +			}

prev_proto_hdrs = RTE_PTYPE_ALL_MASK;

>   		}
>   	}
>   	return 0;
> @@ -1794,7 +1861,7 @@ rte_eth_rx_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
>   		n_seg = rx_conf->rx_nseg;
>   
>   		if (rx_conf->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT) {
> -			ret = rte_eth_rx_queue_check_split(rx_seg, n_seg,
> +			ret = rte_eth_rx_queue_check_split(port_id, rx_seg, n_seg,
>   							   &mbp_buf_size,
>   							   &dev_info);
>   			if (ret != 0)
> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
> index c51c1f3fa0..4c9b121355 100644
> --- a/lib/ethdev/rte_ethdev.h
> +++ b/lib/ethdev/rte_ethdev.h
> @@ -994,6 +994,9 @@ struct rte_eth_txmode {
>    *   specified in the first array element, the second buffer, from the
>    *   pool in the second element, and so on.
>    *
> + * - The proto_hdrs in the elements define the split position of
> + *   received packets.
> + *
>    * - The offsets from the segment description elements specify
>    *   the data offset from the buffer beginning except the first mbuf.
>    *   The first segment offset is added with RTE_PKTMBUF_HEADROOM.
> @@ -1015,12 +1018,41 @@ struct rte_eth_txmode {
>    *     - pool from the last valid element
>    *     - the buffer size from this pool
>    *     - zero offset
> + *
> + * - Length based buffer split:
> + *     - mp, length, offset should be configured.
> + *     - The proto_hdr field must be 0.
> + *
> + * - Protocol header based buffer split:
> + *     - mp, offset, proto_hdr should be configured.
> + *     - The length field must be 0.
> + *     - The proto_hdr field in the last segment should be 0.
> + *
> + * - For Protocol header based buffer split, if the received packets
> + *   don't exactly match all protocol headers in the elements, packets
> + *   will not be split.
> + *   These packets will be put into:
> + *     - pool from the last valid element
> + *     - the buffer size from this pool
> + *     - zero offset
>    */
>   struct rte_eth_rxseg_split {
>   	struct rte_mempool *mp; /**< Memory pool to allocate segment from. */
>   	uint16_t length; /**< Segment data length, configures split point. */
>   	uint16_t offset; /**< Data offset from beginning of mbuf data buffer. */
> -	uint32_t reserved; /**< Reserved field. */
> +	/**
> +	 * Proto_hdr defines a bit mask of the protocol sequence as RTE_PTYPE_*,
> +	 * configures split point. The last RTE_PTYPE* in the mask indicates the
> +	 * split position.
> +	 *
> +	 * If one protocol header is defined to split packets into two segments,
> +	 * for non-tunneling packets, the complete protocol sequence should be defined.
> +	 * For tunneling packets, for simplicity, only the tunnel and inner part of
> +	 * comple protocol sequence is required.
> +	 * If several protocol headers are defined to split packets into multi-segments,
> +	 * the repeated parts of adjacent segments should be omitted.
> +	 */
> +	uint32_t proto_hdr;
>   };
>   
>   /**



More information about the dev mailing list