[dpdk-dev] [PATCH] mbuf: add helpers to prefetch mbuf

Wiles, Keith keith.wiles at intel.com
Tue May 10 00:02:52 CEST 2016


>diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
>index 529debb..e3ee0b3 100644
>--- a/lib/librte_mbuf/rte_mbuf.h
>+++ b/lib/librte_mbuf/rte_mbuf.h
>@@ -842,6 +842,44 @@ struct rte_mbuf {
> 	uint16_t timesync;
> } __rte_cache_aligned;
> 
>+/**
>+ * Prefetch the first part of the mbuf
>+ *
>+ * The first 64 bytes of the mbuf corresponds to fields that are used early
>+ * in the receive path. If the cache line of the architecture is higher than
>+ * 64B, the second part will also be prefetched.
>+ *
>+ * @param m
>+ *   The pointer to the mbuf.
>+ */
>+static inline void
>+rte_mbuf_prefetch_part0(struct rte_mbuf *m)
>+{
>+	rte_prefetch0(&m->cacheline0);
>+}
>+
>+/**
>+ * Prefetch the second part of the mbuf
>+ *
>+ * The next 64 bytes of the mbuf corresponds to fields that are used in the
>+ * transmit path. If the cache line of the architecture is higher than 64B,
>+ * this function does nothing as it is expected that the full mbuf is
>+ * already in cache.
>+ *
>+ * @param m
>+ *   The pointer to the mbuf.
>+ */
>+static inline void
>+rte_mbuf_prefetch_part1(struct rte_mbuf *m)
>+{
>+#if RTE_CACHE_LINE_SIZE == 64
>+	rte_prefetch0(&m->cacheline1);
>+#else
>+	RTE_SET_USED(m);
>+#endif
>+}

I am not super happy with the names here, but I understand that rte_mbuf_prefetch_cacheline0() is a bit long. I could live with them being longer if that makes more sense and adds to readability.

Another idea is to have only one function for both:

enum { MBUF_CACHELINE0 = 0, MBUF_CACHELINE1, MBUF_CACHELINES }; 	// Optional enum if you want

static inline void
rte_mbuf_prefetch(struct rte_mbuf *m, unsigned cacheline)	// Make sure we add a comment about the constant value
{
	if (cacheline == MBUF_CACHELINE0)
		rte_prefetch0(&m->cacheline0);
	else if (cacheline == MBUF_CACHELINE1)
		rte_prefetch0(&m->cacheline1);
	else {
		rte_prefetch0(&m->cacheline0);
		rte_prefetch0(&m->cacheline1);
	}
}

I believe if you use constant value in the call for the cacheline variable then the extra code should be optimized out. If not then what about a macro instead.

#define rte_mbuf_prefetch(m, c)	\
	do { \
		if ((c) == MBUF_CACHELINE0) \
			rte_prefetch0(&(m)->cacheline0); \
		else if ((c) == MBUF_CACHELINE1) \
			rte_prefetch0(&(m)->cacheline1); \
		else { \
			rte_prefetch0(&(m)->cacheline0); \
			rte_prefetch0(&(m)->cacheline1); \
		} \
	} while((0))

Call like this:
	rte_mbuf_prefetch(m, 0);	// For cacheline 0
	rte_mbuf_prefetch(m, 1);		// For cacheline 1
	rte_mbuf_prefetch(m, 2);		// For cacheline 0 and 1

We could have another routine:
	rte_mbuf_prefetch_data(m, 0);	// Prefetch the first cacheline of the packet data.

Just a thought and I did not test the above code, so I hope it works that way. I noticed something like this in the linux spinlock code a few years ago.



>+
>+
> static inline uint16_t rte_pktmbuf_priv_size(struct rte_mempool *mp);
> 
> /**
>-- 
>2.8.0.rc3
>
>


Regards,
Keith






More information about the dev mailing list