[dpdk-dev] [PATCH v2 1/9] ethdev: introduce Rx buffer split

Thomas Monjalon thomas at monjalon.net
Mon Oct 12 00:17:42 CEST 2020


07/10/2020 17:06, Viacheslav Ovsiienko:
> The DPDK datapath in the transmit direction is very flexible.
> An application can build the multi-segment packet and manages
> almost all data aspects - the memory pools where segments
> are allocated from, the segment lengths, the memory attributes
> like external buffers, registered for DMA, etc.
> 
> In the receiving direction, the datapath is much less flexible,
> an application can only specify the memory pool to configure the
> receiving queue and nothing more. In order to extend receiving
> datapath capabilities it is proposed to add the way to provide
> extended information how to split the packets being received.
> 
> The following structure is introduced to specify the Rx packet
> segment:
> 
> struct rte_eth_rxseg {
>     struct rte_mempool *mp; /* memory pools to allocate segment from */
>     uint16_t length; /* segment maximal data length */

The "length" parameter is configuring a split point.
Worth to note in the comment I think.

>     uint16_t offset; /* data offset from beginning of mbuf data buffer */

Is it replacing RTE_PKTMBUF_HEADROOM?

>     uint32_t reserved; /* reserved field */
> };
> 
> The new routine rte_eth_rx_queue_setup_ex() is introduced to
> setup the given Rx queue using the new extended Rx packet segment
> description:
> 
> int
> rte_eth_rx_queue_setup_ex(uint16_t port_id, uint16_t rx_queue_id,
>                           uint16_t nb_rx_desc, unsigned int socket_id,
>                           const struct rte_eth_rxconf *rx_conf,
> 		          const struct rte_eth_rxseg *rx_seg,
>                           uint16_t n_seg)

An alternative name for this function:
	rte_eth_rxseg_queue_setup

> This routine presents the two new parameters:
>     rx_seg - pointer the array of segment descriptions, each element
>              describes the memory pool, maximal data length, initial
>              data offset from the beginning of data buffer in mbuf
>     n_seg - number of elements in the array

Not clear why we need an array.
I suggest writing here that each segment of the same packet
can have different properties, the array representing the full packet.

> The new offload flag DEV_RX_OFFLOAD_BUFFER_SPLIT in device

The name should start with RTE_ prefix.

> capabilities is introduced to present the way for PMD to report to
> application about supporting Rx packet split to configurable
> segments. Prior invoking the rte_eth_rx_queue_setup_ex() routine
> application should check DEV_RX_OFFLOAD_BUFFER_SPLIT flag.
> 
> If the Rx queue is configured with new routine the packets being
> received will be split into multiple segments pushed to the mbufs
> with specified attributes. The PMD will allocate the first mbuf
> from the pool specified in the first segment descriptor and puts
> the data staring at specified offset in the allocated mbuf data
> buffer. If packet length exceeds the specified segment length
> the next mbuf will be allocated according to the next segment
> descriptor (if any) and data will be put in its data buffer at
> specified offset and not exceeding specified length. If there is
> no next descriptor the next mbuf will be allocated and filled in the
> same way (from the same pool and with the same buffer offset/length)
> as the current one.
> 
> For example, let's suppose we configured the Rx queue with the
> following segments:
>     seg0 - pool0, len0=14B, off0=RTE_PKTMBUF_HEADROOM
>     seg1 - pool1, len1=20B, off1=0B
>     seg2 - pool2, len2=20B, off2=0B
>     seg3 - pool3, len3=512B, off3=0B
> 
> The packet 46 bytes long will look like the following:
>     seg0 - 14B long @ RTE_PKTMBUF_HEADROOM in mbuf from pool0
>     seg1 - 20B long @ 0 in mbuf from pool1
>     seg2 - 12B long @ 0 in mbuf from pool2
> 
> The packet 1500 bytes long will look like the following:
>     seg0 - 14B @ RTE_PKTMBUF_HEADROOM in mbuf from pool0
>     seg1 - 20B @ 0 in mbuf from pool1
>     seg2 - 20B @ 0 in mbuf from pool2
>     seg3 - 512B @ 0 in mbuf from pool3
>     seg4 - 512B @ 0 in mbuf from pool3
>     seg5 - 422B @ 0 in mbuf from pool3
> 
> The offload DEV_RX_OFFLOAD_SCATTER must be present and
> configured to support new buffer split feature (if n_seg
> is greater than one).
> 
> The new approach would allow splitting the ingress packets into
> multiple parts pushed to the memory with different attributes.
> For example, the packet headers can be pushed to the embedded
> data buffers within mbufs and the application data into
> the external buffers attached to mbufs allocated from the
> different memory pools. The memory attributes for the split
> parts may differ either - for example the application data
> may be pushed into the external memory located on the dedicated
> physical device, say GPU or NVMe. This would improve the DPDK
> receiving datapath flexibility with preserving compatibility
> with existing API.
> 
> Also, the proposed segment description might be used to specify
> Rx packet split for some other features. For example, provide
> the way to specify the extra memory pool for the Header Split
> feature of some Intel PMD.

I don't understand what you are referring in this last paragraph.
I think explanation above is enough to demonstrate the flexibility.

> Signed-off-by: Viacheslav Ovsiienko <viacheslavo at nvidia.com>

Thank you, I like this feature.
More minor comments below.

[...]
> +* **Introduced extended buffer description for receiving.**

Rewording:
	Introduced extended setup of Rx queue

> +  * Added extended Rx queue setup routine
> +  * Added description for Rx segment sizes

not only "sizes", but also offset and mempool.

> +  * Added capability to specify the memory pool for each segment

This one can be merged with the above, or offset should be added.

[...]
The doxygen comment is missing here.

> +int rte_eth_rx_queue_setup_ex(uint16_t port_id, uint16_t rx_queue_id,
> +		uint16_t nb_rx_desc, unsigned int socket_id,
> +		const struct rte_eth_rxconf *rx_conf,
> +		const struct rte_eth_rxseg *rx_seg, uint16_t n_seg);

This new function should be experimental and it should be added to the .map file.




More information about the dev mailing list