[dpdk-dev] [PATCH v3 2/5] lib/ring: add zero copy APIs

Honnappa Nagarahalli Honnappa.Nagarahalli at arm.com
Sat Oct 24 17:45:04 CEST 2020


Hi Konstantin,
	Thank you for the quick comments. Please see the responses inline.

<snip>

> 
> 
> >
> > Add zero-copy APIs. These APIs provide the capability to copy the data
> > to/from the ring memory directly, without having a temporary copy (for
> > ex: an array of mbufs on the stack). Use cases that involve copying
> > large amount of data to/from the ring can benefit from these APIs.
> 
> LGTM in general.
> Few nits, see below.
> 
> >
> > Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli at arm.com>
> > Reviewed-by: Dharmik Thakkar <dharmik.thakkar at arm.com>
> > ---
> >  lib/librte_ring/meson.build        |   1 +
> >  lib/librte_ring/rte_ring_elem.h    |   1 +
> >  lib/librte_ring/rte_ring_peek_zc.h | 542
> > +++++++++++++++++++++++++++++
> >  3 files changed, 544 insertions(+)
> >  create mode 100644 lib/librte_ring/rte_ring_peek_zc.h
> 
> Need to update documentation: PG and RN.
> 
> >
> > diff --git a/lib/librte_ring/meson.build b/lib/librte_ring/meson.build
> > index 31c0b4649..36fdcb6a5 100644
> > --- a/lib/librte_ring/meson.build
> > +++ b/lib/librte_ring/meson.build
> > @@ -11,5 +11,6 @@ headers = files('rte_ring.h',
> >  		'rte_ring_hts_c11_mem.h',
> >  		'rte_ring_peek.h',
> >  		'rte_ring_peek_c11_mem.h',
> > +		'rte_ring_peek_zc.h',
> >  		'rte_ring_rts.h',
> >  		'rte_ring_rts_c11_mem.h')
> > diff --git a/lib/librte_ring/rte_ring_elem.h
> > b/lib/librte_ring/rte_ring_elem.h index 938b398fc..7034d29c0 100644
> > --- a/lib/librte_ring/rte_ring_elem.h
> > +++ b/lib/librte_ring/rte_ring_elem.h
> > @@ -1079,6 +1079,7 @@ rte_ring_dequeue_burst_elem(struct rte_ring *r,
> > void *obj_table,
> >
> >  #ifdef ALLOW_EXPERIMENTAL_API
> >  #include <rte_ring_peek.h>
> > +#include <rte_ring_peek_zc.h>
> >  #endif
> >
> >  #include <rte_ring.h>
> > diff --git a/lib/librte_ring/rte_ring_peek_zc.h
> > b/lib/librte_ring/rte_ring_peek_zc.h
> > new file mode 100644
> > index 000000000..9db2d343f
> > --- /dev/null
> > +++ b/lib/librte_ring/rte_ring_peek_zc.h
> > @@ -0,0 +1,542 @@
> > +/* SPDX-License-Identifier: BSD-3-Clause
> > + *
> > + * Copyright (c) 2020 Arm Limited
> > + * Copyright (c) 2007-2009 Kip Macy kmacy at freebsd.org
> > + * All rights reserved.
> > + * Derived from FreeBSD's bufring.h
> > + * Used as BSD-3 Licensed with permission from Kip Macy.
> > + */
> > +
> > +#ifndef _RTE_RING_PEEK_ZC_H_
> > +#define _RTE_RING_PEEK_ZC_H_
> > +
> > +/**
> > + * @file
> > + * @b EXPERIMENTAL: this API may change without prior notice
> > + * It is not recommended to include this file directly.
> > + * Please include <rte_ring_elem.h> instead.
> > + *
> > + * Ring Peek Zero Copy APIs
> > + * These APIs make it possible to split public enqueue/dequeue API
> > + * into 3 parts:
> > + * - enqueue/dequeue start
> > + * - copy data to/from the ring
> > + * - enqueue/dequeue finish
> > + * Along with the advantages of the peek APIs, these APIs provide the
> > +ability
> > + * to avoid copying of the data to temporary area (for ex: array of
> > +mbufs
> > + * on the stack).
> > + *
> > + * Note that currently these APIs are available only for two sync modes:
> > + * 1) Single Producer/Single Consumer (RTE_RING_SYNC_ST)
> > + * 2) Serialized Producer/Serialized Consumer
> (RTE_RING_SYNC_MT_HTS).
> > + * It is user's responsibility to create/init ring with appropriate
> > +sync
> > + * modes selected.
> > + *
> > + * Following are some examples showing the API usage.
> > + * 1)
> > + * struct elem_obj {uint64_t a; uint32_t b, c;};
> > + * struct elem_obj *obj;
> > + *
> > + * // Create ring with sync type RTE_RING_SYNC_ST or
> > +RTE_RING_SYNC_MT_HTS
> > + * // Reserve space on the ring
> > + * n = rte_ring_enqueue_zc_bulk_elem_start(r, sizeof(elem_obj), 1,
> > +&zcd, NULL);
> > + *
> > + * // Produce the data directly on the ring memory
> > + * obj = (struct elem_obj *)zcd->ptr1;
> > + * obj.a = rte_get_a();
> 
> As obj is a pointer, should be obj->a = ...
> Same for b and c.
Will fix.

> 
> > + * obj.b = rte_get_b();
> > + * obj.c = rte_get_c();
> > + * rte_ring_enqueue_zc_elem_finish(ring, n);
> > + *
> > + * 2)
> > + * // Create ring with sync type RTE_RING_SYNC_ST or
> > + RTE_RING_SYNC_MT_HTS
> > + * // Reserve space on the ring
> > + * n = rte_ring_enqueue_zc_burst_start(r, 32, &zcd, NULL);
> > + *
> > + * // Pkt I/O core polls packets from the NIC
> > + * if (n == 32)
> > + *	nb_rx = rte_eth_rx_burst(portid, queueid, zcd->ptr1, 32);
> > + * else
> > + *	nb_rx = rte_eth_rx_burst(portid, queueid, zcd->ptr1, zcd->n1);
> 
> Hmm, that doesn't look exactly correct to me.
> It could be that n == 32, but we still need to do wrap-around.
> Shouldn't it be:
> 
> If (n != 0) {
> 	nb_rx = rte_eth_rx_burst(portid, queueid, zcd->ptr1, zcd->n1);
> 	if (nb_rx == zcd->n1 && nb_rx != n)
> 		nb_rx += rte_eth_rx_burst(portid, queueid, zcd->ptr2, n -
> nb_rx); }
Agree

> 
> > + *
> > + * // Provide packets to the packet processing cores
> > + * rte_ring_enqueue_zc_finish(r, nb_rx);
> > + *
> > + * Note that between _start_ and _finish_ none other thread can
> > + proceed
> > + * with enqueue/dequeue operation till _finish_ completes.
> > + */
> > +
> > +#ifdef __cplusplus
> > +extern "C" {
> > +#endif
> > +
> > +#include <rte_ring_peek_c11_mem.h>
> > +
> > +/**
> > + * Ring zero-copy information structure.
> > + *
> > + * This structure contains the pointers and length of the space
> > + * reserved on the ring storage.
> > + */
> > +struct rte_ring_zc_data {
> > +	/* Pointer to the first space in the ring */
> > +	void **ptr1;
> 
> Why not just 'void *ptr1;'?
> Same for ptr2.
Agree

> 
> > +	/* Pointer to the second space in the ring if there is wrap-around */
> > +	void **ptr2;
> > +	/* Number of elements in the first pointer. If this is equal to
> > +	 * the number of elements requested, then ptr2 is NULL.
> > +	 * Otherwise, subtracting n1 from number of elements requested
> > +	 * will give the number of elements available at ptr2.
> > +	 */
> > +	unsigned int n1;
> > +} __rte_cache_aligned;
> > +
> > +static __rte_always_inline void
> > +__rte_ring_get_elem_addr(struct rte_ring *r, uint32_t head,
> > +	uint32_t esize, uint32_t num, void **dst1, uint32_t *n1, void
> > +**dst2) {
> > +	uint32_t idx, scale, nr_idx;
> > +	uint32_t *ring = (uint32_t *)&r[1];
> > +
> > +	/* Normalize to uint32_t */
> > +	scale = esize / sizeof(uint32_t);
> > +	idx = head & r->mask;
> > +	nr_idx = idx * scale;
> > +
> > +	*dst1 = ring + nr_idx;
> > +	*n1 = num;
> > +
> > +	if (idx + num > r->size) {
> > +		*n1 = r->size - idx;
> > +		*dst2 = ring;
> > +	}
> 
> Seems like missing:
> else {*dst2 = NULL;}
I did not add it since dst2 should be accessed only if there is wrap-around. Will call it out in the struct above.

> 
> > +}
> > +
> > +/**
> > + * @internal This function moves prod head value.
> > + */
> > +static __rte_always_inline unsigned int
> > +__rte_ring_do_enqueue_zc_elem_start(struct rte_ring *r, unsigned int
> esize,
> > +		uint32_t n, enum rte_ring_queue_behavior behavior,
> > +		struct rte_ring_zc_data *zcd, unsigned int *free_space) {
> > +	uint32_t free, head, next;
> > +
> > +	switch (r->prod.sync_type) {
> > +	case RTE_RING_SYNC_ST:
> > +		n = __rte_ring_move_prod_head(r, RTE_RING_SYNC_ST, n,
> > +			behavior, &head, &next, &free);
> > +		__rte_ring_get_elem_addr(r, head, esize, n, (void **)&zcd-
> >ptr1,
> 
> If you change ptr1, ptr2 to be just 'void *', then probably no extra type-cast
> will be needed here.
Thanks for catching this, pointers are out of whack.

> 
> > +			&zcd->n1, (void **)&zcd->ptr2);
> > +		break;
> > +	case RTE_RING_SYNC_MT_HTS:
> > +		n = __rte_ring_hts_move_prod_head(r, n, behavior, &head,
> &free);
> > +		__rte_ring_get_elem_addr(r, head, esize, n, (void **)&zcd-
> >ptr1,
> > +			&zcd->n1, (void **)&zcd->ptr2);
> > +		break;
> > +	case RTE_RING_SYNC_MT:
> > +	case RTE_RING_SYNC_MT_RTS:
> > +	default:
> > +		/* unsupported mode, shouldn't be here */
> > +		RTE_ASSERT(0);
> > +		n = 0;
> > +		free = 0;
> > +	}
> 
> Would it make sense to move __rte_ring_get_elem_addr() here and do it
> only when n != 0?
> I.E:
> 
> if (n != 0)
> 	__rte_ring_get_elem_addr(...);
It adds an 'if' statement. We can add a return to the default case and skip the if statement.

> 
> Same comments for _dequeue_ analog.
> 
> > +
> > +	if (free_space != NULL)
> > +		*free_space = free - n;
> > +	return n;
> > +}
> > +


More information about the dev mailing list