[PATCH v8 1/3] ethdev: introduce protocol hdr based buffer split
Andrew Rybchenko
andrew.rybchenko at oktetlabs.ru
Thu Jun 2 15:20:33 CEST 2022
Is it the right one since it is listed in patchwork?
On 6/1/22 16:50, wenxuanx.wu at intel.com wrote:
> From: Wenxuan Wu <wenxuanx.wu at intel.com>
>
> Currently, Rx buffer split supports length based split. With Rx queue
> offload RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT enabled and Rx packet segment
> configured, PMD will be able to split the received packets into
> multiple segments.
>
> However, length based buffer split is not suitable for NICs that do split
> based on protocol headers. Given a arbitrarily variable length in Rx packet
a -> an
> segment, it is almost impossible to pass a fixed protocol header to PMD.
> Besides, the existence of tunneling results in the composition of a packet
> is various, which makes the situation even worse.
>
> This patch extends current buffer split to support protocol header based
> buffer split. A new proto_hdr field is introduced in the reserved field
> of rte_eth_rxseg_split structure to specify protocol header. The proto_hdr
> field defines the split position of packet, splitting will always happens
> after the protocol header defined in the Rx packet segment. When Rx queue
> offload RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT is enabled and corresponding
> protocol header is configured, PMD will split the ingress packets into
> multiple segments.
>
> struct rte_eth_rxseg_split {
>
> struct rte_mempool *mp; /* memory pools to allocate segment from */
> uint16_t length; /* segment maximal data length,
> configures "split point" */
> uint16_t offset; /* data offset from beginning
> of mbuf data buffer */
> uint32_t proto_hdr; /* inner/outer L2/L3/L4 protocol header,
> configures "split point" */
> };
>
> Both inner and outer L2/L3/L4 level protocol header split can be supported.
> Corresponding protocol header capability is RTE_PTYPE_L2_ETHER,
> RTE_PTYPE_L3_IPV4, RTE_PTYPE_L3_IPV6, RTE_PTYPE_L4_TCP, RTE_PTYPE_L4_UDP,
> RTE_PTYPE_L4_SCTP, RTE_PTYPE_INNER_L2_ETHER, RTE_PTYPE_INNER_L3_IPV4,
> RTE_PTYPE_INNER_L3_IPV6, RTE_PTYPE_INNER_L4_TCP, RTE_PTYPE_INNER_L4_UDP,
> RTE_PTYPE_INNER_L4_SCTP.
>
> For example, let's suppose we configured the Rx queue with the
> following segments:
> seg0 - pool0, proto_hdr0=RTE_PTYPE_L3_IPV4, off0=2B
> seg1 - pool1, proto_hdr1=RTE_PTYPE_L4_UDP, off1=128B
> seg2 - pool2, off1=0B
>
> The packet consists of MAC_IPV4_UDP_PAYLOAD will be split like
> following:
> seg0 - ipv4 header @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from pool0
> seg1 - udp header @ 128 in mbuf from pool1
> seg2 - payload @ 0 in mbuf from pool2
It must be defined how ICMPv4 packets will be split in such case.
And how UDP over IPv6 will be split.
>
> Now buffet split can be configured in two modes. For length based
> buffer split, the mp, length, offset field in Rx packet segment should
> be configured, while the proto_hdr field should not be configured.
> For protocol header based buffer split, the mp, offset, proto_hdr field
> in Rx packet segment should be configured, while the length field should
> not be configured.
>
> The split limitations imposed by underlying PMD is reported in the
> rte_eth_dev_info->rx_seg_capa field. The memory attributes for the split
> parts may differ either, dpdk memory and external memory, respectively.
>
> Signed-off-by: Xuan Ding <xuan.ding at intel.com>
> Signed-off-by: Yuan Wang <yuanx.wang at intel.com>
> Signed-off-by: Wenxuan Wu <wenxuanx.wu at intel.com>
> Reviewed-by: Qi Zhang <qi.z.zhang at intel.com>
> Acked-by: Ray Kinsella <mdr at ashroe.eu>
> ---
> lib/ethdev/rte_ethdev.c | 40 +++++++++++++++++++++++++++++++++-------
> lib/ethdev/rte_ethdev.h | 28 +++++++++++++++++++++++++++-
> 2 files changed, 60 insertions(+), 8 deletions(-)
>
> diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
> index 29a3d80466..fbd55cdd9d 100644
> --- a/lib/ethdev/rte_ethdev.c
> +++ b/lib/ethdev/rte_ethdev.c
> @@ -1661,6 +1661,7 @@ rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg,
> struct rte_mempool *mpl = rx_seg[seg_idx].mp;
> uint32_t length = rx_seg[seg_idx].length;
> uint32_t offset = rx_seg[seg_idx].offset;
> + uint32_t proto_hdr = rx_seg[seg_idx].proto_hdr;
>
> if (mpl == NULL) {
> RTE_ETHDEV_LOG(ERR, "null mempool pointer\n");
> @@ -1694,13 +1695,38 @@ rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg,
> }
> offset += seg_idx != 0 ? 0 : RTE_PKTMBUF_HEADROOM;
> *mbp_buf_size = rte_pktmbuf_data_room_size(mpl);
> - length = length != 0 ? length : *mbp_buf_size;
> - if (*mbp_buf_size < length + offset) {
> - RTE_ETHDEV_LOG(ERR,
> - "%s mbuf_data_room_size %u < %u (segment length=%u + segment offset=%u)\n",
> - mpl->name, *mbp_buf_size,
> - length + offset, length, offset);
> - return -EINVAL;
> + if (proto_hdr == RTE_PTYPE_UNKNOWN) {
> + /* Split at fixed length. */
> + length = length != 0 ? length : *mbp_buf_size;
> + if (*mbp_buf_size < length + offset) {
> + RTE_ETHDEV_LOG(ERR,
> + "%s mbuf_data_room_size %u < %u (segment length=%u + segment offset=%u)\n",
> + mpl->name, *mbp_buf_size,
> + length + offset, length, offset);
> + return -EINVAL;
> + }
> + } else {
> + /* Split after specified protocol header. */
> + if (!(proto_hdr & RTE_BUFFER_SPLIT_PROTO_HDR_MASK)) {
The condition looks suspicious. It will be true if proto_hdr has no
single bit from the mask. I guess it is not the intent.
I guess the condition should be
proto_hdr & ~RTE_BUFFER_SPLIT_PROTO_HDR_MASK
i.e. there is unsupported bits in proto_hdr
IMHO we need extra field in dev_info to report supported protocols to
split on. Or a new API to get an array similar to ptype get.
May be a new API is a better choice to not overload dev_info and to
be more flexible in reporting.
> + RTE_ETHDEV_LOG(ERR,
> + "Protocol header %u not supported)\n",
> + proto_hdr);
I think it would be useful to log unsupported bits only, if we say so.
> + return -EINVAL;
> + }
> +
> + if (length != 0) {
> + RTE_ETHDEV_LOG(ERR, "segment length should be set to zero in protocol header "
> + "based buffer split\n");
> + return -EINVAL;
> + }
> +
> + if (*mbp_buf_size < offset) {
> + RTE_ETHDEV_LOG(ERR,
> + "%s mbuf_data_room_size %u < %u segment offset)\n",
> + mpl->name, *mbp_buf_size,
> + offset);
> + return -EINVAL;
> + }
> }
> }
> return 0;
> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
> index 04cff8ee10..0cd9dd6cc0 100644
> --- a/lib/ethdev/rte_ethdev.h
> +++ b/lib/ethdev/rte_ethdev.h
> @@ -1187,6 +1187,9 @@ struct rte_eth_txmode {
> * mbuf) the following data will be pushed to the next segment
> * up to its own length, and so on.
> *
> + * - The proto_hdrs in the elements define the split position of
> + * received packets.
> + *
> * - If the length in the segment description element is zero
> * the actual buffer size will be deduced from the appropriate
> * memory pool properties.
> @@ -1197,14 +1200,37 @@ struct rte_eth_txmode {
> * - pool from the last valid element
> * - the buffer size from this pool
> * - zero offset
> + *
> + * - Length based buffer split:
> + * - mp, length, offset should be configured.
> + * - The proto_hdr field should not be configured.
> + *
> + * - Protocol header based buffer split:
> + * - mp, offset, proto_hdr should be configured.
> + * - The length field should not be configured.
> */
> struct rte_eth_rxseg_split {
> struct rte_mempool *mp; /**< Memory pool to allocate segment from. */
> uint16_t length; /**< Segment data length, configures split point. */
> uint16_t offset; /**< Data offset from beginning of mbuf data buffer. */
> - uint32_t reserved; /**< Reserved field. */
> + uint32_t proto_hdr; /**< Inner/outer L2/L3/L4 protocol header, configures split point. */
> };
>
> +/* Buffer split protocol header capability. */
> +#define RTE_BUFFER_SPLIT_PROTO_HDR_MASK ( \
> + RTE_PTYPE_L2_ETHER | \
> + RTE_PTYPE_L3_IPV4 | \
> + RTE_PTYPE_L3_IPV6 | \
> + RTE_PTYPE_L4_TCP | \
> + RTE_PTYPE_L4_UDP | \
> + RTE_PTYPE_L4_SCTP | \
> + RTE_PTYPE_INNER_L2_ETHER | \
> + RTE_PTYPE_INNER_L3_IPV4 | \
> + RTE_PTYPE_INNER_L3_IPV6 | \
> + RTE_PTYPE_INNER_L4_TCP | \
> + RTE_PTYPE_INNER_L4_UDP | \
> + RTE_PTYPE_INNER_L4_SCTP)
> +
> /**
> * @warning
> * @b EXPERIMENTAL: this structure may change without prior notice.
More information about the dev
mailing list