[dpdk-dev] [PATCH v6 1/6] ethdev: introduce Rx buffer split

Ajit Khaparde ajit.khaparde at broadcom.com
Thu Oct 15 00:50:12 CEST 2020


On Wed, Oct 14, 2020 at 11:13 AM Viacheslav Ovsiienko
<viacheslavo at nvidia.com> wrote:
>
> The DPDK datapath in the transmit direction is very flexible.
> An application can build the multi-segment packet and manages
> almost all data aspects - the memory pools where segments
> are allocated from, the segment lengths, the memory attributes
> like external buffers, registered for DMA, etc.
>
> In the receiving direction, the datapath is much less flexible,
> an application can only specify the memory pool to configure the
> receiving queue and nothing more. In order to extend receiving
> datapath capabilities it is proposed to add the way to provide
> extended information how to split the packets being received.
>
> The following structure is introduced to specify the Rx packet
> segment:
>
> struct rte_eth_rxseg {
>     struct rte_mempool *mp; /* memory pools to allocate segment from */
>     uint16_t length; /* segment maximal data length,
>                         configures "split point" */
>     uint16_t offset; /* data offset from beginning
>                         of mbuf data buffer */
>     uint32_t reserved; /* reserved field */
> };
>
> The segment descriptions are added to the rte_eth_rxconf structure:
>    rx_seg - pointer the array of segment descriptions, each element
>              describes the memory pool, maximal data length, initial
>              data offset from the beginning of data buffer in mbuf.
>              This array allows to specify the different settings for
>              each segment in individual fashion.
>    rx_nseg - number of elements in the array
>
> If the extended segment descriptions is provided with these new
> fields the mp parameter of the rte_eth_rx_queue_setup must be
> specified as NULL to avoid ambiguity.
>
> There are two options to specifiy Rx buffer configuration:
> - mp is not NULL, rx_conf.rx_seg is NULL, rx_conf.rx_nseg is zero,
>   it is compatible configuraion, follows existing implementation,
>   provides single pool and no description for segment sizes
>   and offsets.
> - mp is NULL, rx_conf.rx_seg is not NULL, rx_conf.rx_nseg is not
>   zero, it provides the extended configuration, individually for
>   each segment.
>
> The new offload flag RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT in device
> capabilities is introduced to present the way for PMD to report to
> application about supporting Rx packet split to configurable
> segments. Prior invoking the rte_eth_rx_queue_setup() routine
> application should check RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT flag.
>
> If the Rx queue is configured with new settings the packets being
> received will be split into multiple segments pushed to the mbufs
> with specified attributes. The PMD will split the received packets
> into multiple segments according to the specification in the
> description array:
>
> - the first network buffer will be allocated from the memory pool,
>   specified in the first segment description element, the second
>   network buffer - from the pool in the second segment description
>   element and so on. If there is no enough elements to describe
>   the buffer for entire packet of maximal length the pool from the
>   last valid element will be used to allocate the buffers from for the
>   rest of segments
>
> - the offsets from the segment description elements will provide
>   the data offset from the buffer beginning except the first mbuf -
>   for this one the offset is added to the RTE_PKTMBUF_HEADROOM to get
>   actual offset from the buffer beginning. If there is no enough
>   elements to describe the buffer for entire packet of maximal length
>   the offsets for the rest of segment will be supposed to be zero.
>
> - the data length being received to each segment is limited  by the
>   length specified in the segment description element. The data
>   receiving starts with filling up the first mbuf data buffer, if the
>   specified maximal segment length is reached and there are data
>   remaining (packet is longer than buffer in the first mbuf) the
>   following data will be pushed to the next segment up to its own
>   maximal length. If the first two segments is not enough to store
>   all the packet remaining data  the next (third) segment will
>   be engaged and so on. If the length in the segment description
>   element is zero the actual buffer size will be deduced from
>   the appropriate memory pool properties. If there is no enough
>   elements to describe the buffer for entire packet of maximal
>   length the buffer size will be deduced from the pool of the last
>   valid element for the remaining segments.
>
> For example, let's suppose we configured the Rx queue with the
> following segments:
>     seg0 - pool0, len0=14B, off0=2
>     seg1 - pool1, len1=20B, off1=128B
>     seg2 - pool2, len2=20B, off2=0B
>     seg3 - pool3, len3=512B, off3=0B
>
> The packet 46 bytes long will look like the following:
>     seg0 - 14B long @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from pool0
>     seg1 - 20B long @ 128 in mbuf from pool1
>     seg2 - 12B long @ 0 in mbuf from pool2
>
> The packet 1500 bytes long will look like the following:
>     seg0 - 14B @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from pool0
>     seg1 - 20B @ 128 in mbuf from pool1
>     seg2 - 20B @ 0 in mbuf from pool2
>     seg3 - 512B @ 0 in mbuf from pool3
>     seg4 - 512B @ 0 in mbuf from pool3
>     seg5 - 422B @ 0 in mbuf from pool3
>
> The offload RTE_ETH_RX_OFFLOAD_SCATTER must be present and
> configured to support new buffer split feature (if rx_nseg
> is greater than one).
>
> The new approach would allow splitting the ingress packets into
> multiple parts pushed to the memory with different attributes.
> For example, the packet headers can be pushed to the embedded
> data buffers within mbufs and the application data into
> the external buffers attached to mbufs allocated from the
> different memory pools. The memory attributes for the split
> parts may differ either - for example the application data
> may be pushed into the external memory located on the dedicated
> physical device, say GPU or NVMe. This would improve the DPDK
> receiving datapath flexibility with preserving compatibility
> with existing API.
>
> Signed-off-by: Viacheslav Ovsiienko <viacheslavo at nvidia.com>
Acked-by: Ajit Khaparde <ajit.khaparde at broadcom.com>

> ---
>  doc/guides/nics/features.rst           |  15 +++++
>  doc/guides/rel_notes/deprecation.rst   |   5 --
>  doc/guides/rel_notes/release_20_11.rst |   9 +++
>  lib/librte_ethdev/rte_ethdev.c         | 111 +++++++++++++++++++++++++--------
>  lib/librte_ethdev/rte_ethdev.h         |  62 +++++++++++++++++-
>  5 files changed, 171 insertions(+), 31 deletions(-)
>
> diff --git a/doc/guides/nics/features.rst b/doc/guides/nics/features.rst
> index dd8c955..832ea3b 100644
> --- a/doc/guides/nics/features.rst
> +++ b/doc/guides/nics/features.rst
> @@ -185,6 +185,21 @@ Supports receiving segmented mbufs.
>  * **[related]    eth_dev_ops**: ``rx_pkt_burst``.
>
>
> +.. _nic_features_buffer_split:
> +
> +Buffer Split on Rx
> +------------------
> +
> +Scatters the packets being received on specified boundaries to segmented mbufs.
> +
> +* **[uses]       rte_eth_rxconf,rte_eth_rxmode**: ``offloads:RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT``.
> +* **[uses]       rte_eth_rxconf**: ``rx_conf.rx_seg, rx_conf.rx_nseg``.
> +* **[implements] datapath**: ``Buffer Split functionality``.
> +* **[provides]   rte_eth_dev_info**: ``rx_offload_capa:RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT``.
> +* **[provides]   eth_dev_ops**: ``rxq_info_get:buffer_split``.
> +* **[related] API**: ``rte_eth_rx_queue_setup()``.
> +
> +
>  .. _nic_features_lro:
>
>  LRO
> diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
> index 584e720..232cd54 100644
> --- a/doc/guides/rel_notes/deprecation.rst
> +++ b/doc/guides/rel_notes/deprecation.rst
> @@ -138,11 +138,6 @@ Deprecation Notices
>    In 19.11 PMDs will still update the field even when the offload is not
>    enabled.
>
> -* ethdev: Add new fields to ``rte_eth_rxconf`` to configure the receiving
> -  queues to split ingress packets into multiple segments according to the
> -  specified lengths into the buffers allocated from the specified
> -  memory pools. The backward compatibility to existing API is preserved.
> -
>  * ethdev: ``rx_descriptor_done`` dev_ops and ``rte_eth_rx_descriptor_done``
>    will be removed in 21.11.
>    Existing ``rte_eth_rx_descriptor_status`` and ``rte_eth_tx_descriptor_status``
> diff --git a/doc/guides/rel_notes/release_20_11.rst b/doc/guides/rel_notes/release_20_11.rst
> index bcc0fc2..bcc2479 100644
> --- a/doc/guides/rel_notes/release_20_11.rst
> +++ b/doc/guides/rel_notes/release_20_11.rst
> @@ -60,6 +60,12 @@ New Features
>    Added the FEC API which provides functions for query FEC capabilities and
>    current FEC mode from device. Also, API for configuring FEC mode is also provided.
>
> +* **Introduced extended buffer description for receiving.**
> +
> +  Added the extended Rx buffer description for Rx queue setup routine
> +  providing the individual settings for each Rx segment with maximal size,
> +  buffer offset and memory pool to allocate data buffers from.
> +
>  * **Updated Broadcom bnxt driver.**
>
>    Updated the Broadcom bnxt driver with new features and improvements, including:
> @@ -253,6 +259,9 @@ API Changes
>    As the data of ``uint8_t`` will be truncated when queue number under
>    a TC is greater than 256.
>
> +* ethdev: Added fields rx_seg and rx_nseg to rte_eth_rxconf structure
> +  to provide extended description of the receiving buffer.
> +
>  * vhost: Moved vDPA APIs from experimental to stable.
>
>  * rawdev: Added a structure size parameter to the functions
> diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
> index 892c246..96ecb91 100644
> --- a/lib/librte_ethdev/rte_ethdev.c
> +++ b/lib/librte_ethdev/rte_ethdev.c
> @@ -105,6 +105,9 @@ struct rte_eth_xstats_name_off {
>  #define RTE_RX_OFFLOAD_BIT2STR(_name)  \
>         { DEV_RX_OFFLOAD_##_name, #_name }
>
> +#define RTE_ETH_RX_OFFLOAD_BIT2STR(_name)      \
> +       { RTE_ETH_RX_OFFLOAD_##_name, #_name }
> +
>  static const struct {
>         uint64_t offload;
>         const char *name;
> @@ -128,9 +131,11 @@ struct rte_eth_xstats_name_off {
>         RTE_RX_OFFLOAD_BIT2STR(SCTP_CKSUM),
>         RTE_RX_OFFLOAD_BIT2STR(OUTER_UDP_CKSUM),
>         RTE_RX_OFFLOAD_BIT2STR(RSS_HASH),
> +       RTE_ETH_RX_OFFLOAD_BIT2STR(BUFFER_SPLIT),
>  };
>
>  #undef RTE_RX_OFFLOAD_BIT2STR
> +#undef RTE_ETH_RX_OFFLOAD_BIT2STR
>
>  #define RTE_TX_OFFLOAD_BIT2STR(_name)  \
>         { DEV_TX_OFFLOAD_##_name, #_name }
> @@ -1784,38 +1789,94 @@ struct rte_eth_dev *
>                 return -EINVAL;
>         }
>
> -       if (mp == NULL) {
> -               RTE_ETHDEV_LOG(ERR, "Invalid null mempool pointer\n");
> -               return -EINVAL;
> -       }
> -
>         RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_setup, -ENOTSUP);
>
> -       /*
> -        * Check the size of the mbuf data buffer.
> -        * This value must be provided in the private data of the memory pool.
> -        * First check that the memory pool has a valid private data.
> -        */
>         ret = rte_eth_dev_info_get(port_id, &dev_info);
>         if (ret != 0)
>                 return ret;
>
> -       if (mp->private_data_size < sizeof(struct rte_pktmbuf_pool_private)) {
> -               RTE_ETHDEV_LOG(ERR, "%s private_data_size %d < %d\n",
> -                       mp->name, (int)mp->private_data_size,
> -                       (int)sizeof(struct rte_pktmbuf_pool_private));
> -               return -ENOSPC;
> -       }
> -       mbp_buf_size = rte_pktmbuf_data_room_size(mp);
> +       if (mp) {
> +               /* Single pool configuration check. */
> +               if (rx_conf->rx_seg || rx_conf->rx_nseg) {
> +                       RTE_ETHDEV_LOG(ERR,
> +                                      "Ambiguous segment configuration\n");
> +                       return -EINVAL;
> +               }
> +               /*
> +                * Check the size of the mbuf data buffer, this value
> +                * must be provided in the private data of the memory pool.
> +                * First check that the memory pool(s) has a valid private data.
> +                */
> +               if (mp->private_data_size <
> +                               sizeof(struct rte_pktmbuf_pool_private)) {
> +                       RTE_ETHDEV_LOG(ERR, "%s private_data_size %u < %u\n",
> +                               mp->name, mp->private_data_size,
> +                               (unsigned int)
> +                               sizeof(struct rte_pktmbuf_pool_private));
> +                       return -ENOSPC;
> +               }
> +               mbp_buf_size = rte_pktmbuf_data_room_size(mp);
> +               if (mbp_buf_size < dev_info.min_rx_bufsize +
> +                                  RTE_PKTMBUF_HEADROOM) {
> +                       RTE_ETHDEV_LOG(ERR,
> +                                      "%s mbuf_data_room_size %u < %u"
> +                                      " (RTE_PKTMBUF_HEADROOM=%u +"
> +                                      " min_rx_bufsize(dev)=%u)\n",
> +                                      mp->name, mbp_buf_size,
> +                                      RTE_PKTMBUF_HEADROOM +
> +                                      dev_info.min_rx_bufsize,
> +                                      RTE_PKTMBUF_HEADROOM,
> +                                      dev_info.min_rx_bufsize);
> +                       return -EINVAL;
> +               }
> +       } else {
> +               const struct rte_eth_rxseg *rx_seg = rx_conf->rx_seg;
> +               uint16_t n_seg = rx_conf->rx_nseg;
> +               uint16_t seg_idx;
>
> -       if (mbp_buf_size < dev_info.min_rx_bufsize + RTE_PKTMBUF_HEADROOM) {
> -               RTE_ETHDEV_LOG(ERR,
> -                       "%s mbuf_data_room_size %d < %d (RTE_PKTMBUF_HEADROOM=%d + min_rx_bufsize(dev)=%d)\n",
> -                       mp->name, (int)mbp_buf_size,
> -                       (int)(RTE_PKTMBUF_HEADROOM + dev_info.min_rx_bufsize),
> -                       (int)RTE_PKTMBUF_HEADROOM,
> -                       (int)dev_info.min_rx_bufsize);
> -               return -EINVAL;
> +               /* Extended multi-segment configuration check. */
> +               if (!rx_conf->rx_seg || !rx_conf->rx_nseg) {
> +                       RTE_ETHDEV_LOG(ERR,
> +                                      "Memory pool is null and no"
> +                                      " extended configuration provided\n");
> +                       return -EINVAL;
> +               }
> +               /*
> +                * Check the sizes and offsets against buffer sizes
> +                * for each segment specified in extended configuration.
> +                */
> +               for (seg_idx = 0; seg_idx < n_seg; seg_idx++) {
> +                       struct rte_mempool *mpl = rx_seg[seg_idx].mp;
> +                       uint32_t length = rx_seg[seg_idx].length;
> +                       uint32_t offset = rx_seg[seg_idx].offset;
> +                       uint32_t head_room = seg_idx ? 0 : RTE_PKTMBUF_HEADROOM;
> +
> +                       if (mpl == NULL) {
> +                               RTE_ETHDEV_LOG(ERR, "null mempool pointer\n");
> +                               return -EINVAL;
> +                       }
> +                       if (mpl->private_data_size <
> +                               sizeof(struct rte_pktmbuf_pool_private)) {
> +                               RTE_ETHDEV_LOG(ERR,
> +                                              "%s private_data_size %u < %u\n",
> +                                              mpl->name,
> +                                              mpl->private_data_size,
> +                                              (unsigned int)sizeof(struct
> +                                              rte_pktmbuf_pool_private));
> +                               return -ENOSPC;
> +                       }
> +                       mbp_buf_size = rte_pktmbuf_data_room_size(mpl);
> +                       length = length ? length : (mbp_buf_size - head_room);
> +                       if (mbp_buf_size < length + offset + head_room) {
> +                               RTE_ETHDEV_LOG(ERR,
> +                                       "%s mbuf_data_room_size %u < %u"
> +                                       " (segment length=%u +"
> +                                       " segment offset=%u)\n",
> +                                       mpl->name, mbp_buf_size,
> +                                       length + offset, length, offset);
> +                               return -EINVAL;
> +                       }
> +               }
>         }
>
>         /* Use default specified by driver, if nb_rx_desc is zero */
> diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
> index 5bcfbb8..e019f4a 100644
> --- a/lib/librte_ethdev/rte_ethdev.h
> +++ b/lib/librte_ethdev/rte_ethdev.h
> @@ -970,6 +970,16 @@ struct rte_eth_txmode {
>  };
>
>  /**
> + * A structure used to configure an RX packet segment to split.
> + */
> +struct rte_eth_rxseg {
> +       struct rte_mempool *mp; /**< Memory pool to allocate segment from. */
> +       uint16_t length; /**< Segment data length, configures split point. */
> +       uint16_t offset; /**< Data offset from beginning of mbuf data buffer. */
> +       uint32_t reserved; /**< Reserved field. */
> +};
> +
> +/**
>   * A structure used to configure an RX ring of an Ethernet port.
>   */
>  struct rte_eth_rxconf {
> @@ -977,6 +987,43 @@ struct rte_eth_rxconf {
>         uint16_t rx_free_thresh; /**< Drives the freeing of RX descriptors. */
>         uint8_t rx_drop_en; /**< Drop packets if no descriptors are available. */
>         uint8_t rx_deferred_start; /**< Do not start queue with rte_eth_dev_start(). */
> +       uint16_t rx_nseg; /**< Number of descriptions in rx_seg array. */
> +       /**
> +        * Points to the array of segment descriptions. Each array element
> +        * describes the properties for each segment in the receiving
> +        * buffer.
> +        *
> +        * If RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT flag is set in offloads field,
> +        * the PMD will split the received packets into multiple segments
> +        * according to the specification in the description array:
> +        *
> +        * - the first network buffer will be allocated from the memory pool,
> +        *   specified in the first array element, the second buffer, from the
> +        *   pool in the second element, and so on.
> +        *
> +        * - the offsets from the segment description elements specify
> +        *   the data offset from the buffer beginning except the first mbuf.
> +        *   For this one the offset is added with RTE_PKTMBUF_HEADROOM.
> +        *
> +        * - the lengthes in the elements define the maximal data amount
> +        *   being received to each segment. The receiving starts with filling
> +        *   up the first mbuf data buffer up to specified length. If the
> +        *   there are data remaining (packet is longer than buffer in the first
> +        *   mbuf) the following data will be pushed to the next segment
> +        *   up to its own length, and so on.
> +        *
> +        * - If the length in the segment description element is zero
> +        *   the actual buffer size will be deduced from the appropriate
> +        *   memory pool properties.
> +        *
> +        * - if there is not enough elements to describe the buffer for entire
> +        *   packet of maximal length the following parameters will be used
> +        *   for the all remaining segments:
> +        *     - pool from the last valid element
> +        *     - the buffer size from this pool
> +        *     - zero offset
> +        */
> +       struct rte_eth_rxseg *rx_seg;
>         /**
>          * Per-queue Rx offloads to be set using DEV_RX_OFFLOAD_* flags.
>          * Only offloads set on rx_queue_offload_capa or rx_offload_capa
> @@ -1260,6 +1307,7 @@ struct rte_eth_conf {
>  #define DEV_RX_OFFLOAD_SCTP_CKSUM      0x00020000
>  #define DEV_RX_OFFLOAD_OUTER_UDP_CKSUM  0x00040000
>  #define DEV_RX_OFFLOAD_RSS_HASH                0x00080000
> +#define RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT 0x00100000
>
>  #define DEV_RX_OFFLOAD_CHECKSUM (DEV_RX_OFFLOAD_IPV4_CKSUM | \
>                                  DEV_RX_OFFLOAD_UDP_CKSUM | \
> @@ -2027,9 +2075,21 @@ int rte_eth_dev_configure(uint16_t port_id, uint16_t nb_rx_queue,
>   *   No need to repeat any bit in rx_conf->offloads which has already been
>   *   enabled in rte_eth_dev_configure() at port level. An offloading enabled
>   *   at port level can't be disabled at queue level.
> + *   The configuration structure also contains the pointer to the array
> + *   of the receiving buffer segment descriptions, see rx_seg and rx_nseg
> + *   fields, this extended configuration might be used by split offloads like
> + *   RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT. If mp_pool is not NULL,
> + *   the extended configuration fields must be set to NULL and zero.
>   * @param mb_pool
>   *   The pointer to the memory pool from which to allocate *rte_mbuf* network
> - *   memory buffers to populate each descriptor of the receive ring.
> + *   memory buffers to populate each descriptor of the receive ring. There are
> + *   two options to provide Rx buffer configuration:
> + *   - single pool:
> + *     mb_pool is not NULL, rx_conf.rx_seg is NULL, rx_conf.rx_nseg is 0.
> + *   - multiple segments description:
> + *     mb_pool is NULL, rx_conf.rx_seg is not NULL, rx_conf.rx_nseg is not 0.
> + *     Taken only if flag RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT is set in offloads.
> + *
>   * @return
>   *   - 0: Success, receive queue correctly set up.
>   *   - -EIO: if device is removed.
> --
> 1.8.3.1
>


More information about the dev mailing list