[dpdk-dev] [PATCH 2/4] distributor: new packet distributor library

Neil Horman nhorman at tuxdriver.com
Tue May 20 20:18:44 CEST 2014


On Tue, May 20, 2014 at 11:00:55AM +0100, Bruce Richardson wrote:
> This adds the code for a new Intel DPDK library for packet distribution.
> The distributor is a component which is designed to pass packets
> one-at-a-time to workers, with dynamic load balancing. Using the RSS
> field in the mbuf as a tag, the distributor tracks what packet tag is
> being processed by what worker and then ensures that no two packets with
> the same tag are in-flight simultaneously. Once a tag is not in-flight,
> then the next packet with that tag will be sent to the next available
> core.
> 
> Signed-off-by: Bruce Richardson <bruce.richardson at intel.com>
><snip>

> +#define RTE_DISTRIB_GET_BUF (1)
> +#define RTE_DISTRIB_RETURN_BUF (2)
> +
Can you document the meaning of these bits please, the code makes it somewhat
confusing to differentiate them.  As I read the code, GET_BUF is used as a flag
to indicate that rte_distributor_get_pkt needs to wait while a buffer is
filled in by the processing thread, while RETURN_BUF indicates that a worker is
leaving and the buffer needs to be (re)assigned to an alternate worker, is that
correct?

> +#define RTE_DISTRIB_BACKLOG_SIZE 8
> +#define RTE_DISTRIB_BACKLOG_MASK (RTE_DISTRIB_BACKLOG_SIZE - 1)
> +
> +#define RTE_DISTRIB_MAX_RETURNS 128
> +#define RTE_DISTRIB_RETURNS_MASK (RTE_DISTRIB_MAX_RETURNS - 1)
> +
> +union rte_distributor_buffer {
> +	volatile int64_t bufptr64;
> +	char pad[CACHE_LINE_SIZE*3];
Do you need the pad, if you mark the struct as cache aligned?
> +} __rte_cache_aligned;
> 
+
><snip>
> +
> +struct rte_mbuf *
> +rte_distributor_get_pkt(struct rte_distributor *d,
> +		unsigned worker_id, struct rte_mbuf *oldpkt,
> +		unsigned reserved __rte_unused)
> +{
> +	union rte_distributor_buffer *buf = &d->bufs[worker_id];
> +	int64_t req = (((int64_t)(uintptr_t)oldpkt) << RTE_DISTRIB_FLAG_BITS) | \
> +			RTE_DISTRIB_GET_BUF;
> +	while (unlikely(buf->bufptr64 & RTE_DISTRIB_FLAGS_MASK))
> +		rte_pause();
> +	buf->bufptr64 = req;
> +	while (buf->bufptr64 & RTE_DISTRIB_GET_BUF)
> +		rte_pause();
You may want to document the fact that this is deadlock prone.  You clearly
state that only a single thread can run the processing routine, but if a user
selects a single worker thread to preform double duty, the GET_BUF_FLAG will
never get cleared here, and no other queues will get processed.

> +	/* since bufptr64 is a signed value, this should be an arithmetic shift */
> +	int64_t ret = buf->bufptr64 >> RTE_DISTRIB_FLAG_BITS;
> +	return (struct rte_mbuf *)((uintptr_t)ret);
> +}
> +
> +int
> +rte_distributor_return_pkt(struct rte_distributor *d,
> +		unsigned worker_id, struct rte_mbuf *oldpkt)
> +{
Maybe some optional sanity checking, here and above, to ensure that a packet
returned through get_pkt doesn't also get returned here, mangling the flags
field?

><snip>
> +
> +/* flush the distributor, so that there are no outstanding packets in flight or
> + * queued up. */
> +int
> +rte_distributor_flush(struct rte_distributor *d)
> +{
You need to document that this function can only be called by the same thread
that is running rte_distributor_process, lest your corrupt your queue data.
Alternatively, it might be nicer to modify this functions internals to set a
flag in the distributor status bits to make the process routine do the flush
work when it gets set.  that would allow this function to be called by any
other thread, which seems like a more natural interface.

><snip>
> +}
> +
> +/* clears the internal returns array in the distributor */
> +void
> +rte_distributor_clear_returns(struct rte_distributor *d)
> +{
This can also only be called by the same thread that runs the process routine,
lest the start and count values get mis-assigned.

> +	d->returns.start = d->returns.count = 0;
> +#ifndef __OPTIMIZE__
> +	memset(d->returns.mbufs, 0, sizeof(d->returns.mbufs));
> +#endif
> +}
> +
> +/* creates a distributor instance */
> +struct rte_distributor *
> +rte_distributor_create(const char *name,
> +		unsigned socket_id,
> +		unsigned num_workers,
> +		struct rte_distributor_extra_args *args __rte_unused)
> +{
> +	struct rte_distributor *d;
> +	struct rte_distributor_list *distributor_list;
> +	char mz_name[RTE_MEMZONE_NAMESIZE];
> +	const struct rte_memzone *mz;
> +
> +	/* compilation-time checks */
> +	RTE_BUILD_BUG_ON((sizeof(*d) & CACHE_LINE_MASK) != 0);
> +	RTE_BUILD_BUG_ON((RTE_MAX_LCORE & 7) != 0);
> +
> +	if (name == NULL || num_workers >= RTE_MAX_LCORE) {
> +		rte_errno = EINVAL;
> +		return NULL;
> +	}
> +	rte_snprintf(mz_name, sizeof(mz_name), RTE_DISTRIB_PREFIX"%s", name);
> +	mz = rte_memzone_reserve(mz_name, sizeof(*d), socket_id, NO_FLAGS);
> +	if (mz == NULL) {
> +		rte_errno = ENOMEM;
> +		return NULL;
> +	}
> +
> +	/* check that we have an initialised tail queue */
> +	if ((distributor_list = RTE_TAILQ_LOOKUP_BY_IDX(RTE_TAILQ_DISTRIBUTOR,
> +			rte_distributor_list)) == NULL) {
> +		rte_errno = E_RTE_NO_TAILQ;
> +		return NULL;
> +	}
> +
> +	d = mz->addr;
> +	rte_snprintf(d->name, sizeof(d->name), "%s", name);
> +	d->num_workers = num_workers;
> +	TAILQ_INSERT_TAIL(distributor_list, d, next);
You need locking around this list unless you intend to assert that distributor
creation and destruction must only be preformed from a single thread.  Also,
where is the API method to tear down a distributor instance?

><snip>
> +#endif
> +
> +#include <rte_mbuf.h>
> +
> +#define RTE_DISTRIBUTOR_NAMESIZE 32 /**< Length of name for instance */
> +
> +struct rte_distributor;
> +
> +struct rte_distributor_extra_args { }; /**< reserved for future use*/
> +
You don't need to reserve a struct name for future use.  No one will use it
until you create it.

> +struct rte_mbuf *
> +rte_distributor_get_pkt(struct rte_distributor *d,
> +		unsigned worker_id, struct rte_mbuf *oldpkt, unsigned reserved);
> +
Don't need to reserve an extra argument here.  You're not ABI safe currently,
and if DPDK becomes ABI safe in the future, we will use a linker script to
provide versions with backward compatibility easily enough.

Neil



More information about the dev mailing list