[dpdk-dev] [PATCH v2 5/6] net/af_xdp: enable zero copy

Zhang, Qi Z qi.z.zhang at intel.com
Wed Mar 20 10:48:48 CET 2019

From: David Marchand [mailto:david.marchand at redhat.com] 
Sent: Wednesday, March 20, 2019 5:22 PM
To: Ye, Xiaolong <xiaolong.ye at intel.com>
Cc: dev <dev at dpdk.org>; Zhang, Qi Z <qi.z.zhang at intel.com>; Karlsson, Magnus <magnus.karlsson at intel.com>; Topel, Bjorn <bjorn.topel at intel.com>
Subject: Re: [dpdk-dev] [PATCH v2 5/6] net/af_xdp: enable zero copy

On Tue, Mar 19, 2019 at 8:17 AM Xiaolong Ye <xiaolong.ye at intel.com> wrote:
Try to check if external mempool (from rx_queue_setup) is fit for
af_xdp, if it is, it will be registered to af_xdp socket directly and
there will be no packet data copy on Rx and Tx.

Signed-off-by: Xiaolong Ye <xiaolong.ye at intel.com>
 drivers/net/af_xdp/rte_eth_af_xdp.c | 128 ++++++++++++++++++++--------
 1 file changed, 91 insertions(+), 37 deletions(-)

diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c b/drivers/net/af_xdp/rte_eth_af_xdp.c
index fc60cb5c5..c22791e51 100644
--- a/drivers/net/af_xdp/rte_eth_af_xdp.c
+++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
@@ -62,6 +62,7 @@ struct xsk_umem_info {
        struct xsk_umem *umem;
        struct rte_mempool *mb_pool;
        void *buffer;
+       uint8_t zc;

 struct pkt_rx_queue {
@@ -76,6 +77,7 @@ struct pkt_rx_queue {

        struct pkt_tx_queue *pair;
        uint16_t queue_idx;
+       uint8_t zc;

 struct pkt_tx_queue {
@@ -191,17 +193,24 @@ eth_af_xdp_rx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
                uint32_t len = xsk_ring_cons__rx_desc(rx, idx_rx++)->len;
                char *pkt = xsk_umem__get_data(rxq->umem->buffer, addr);

-               mbuf = rte_pktmbuf_alloc(rxq->mb_pool);
-               if (mbuf) {
-                       memcpy(rte_pktmbuf_mtod(mbuf, void*), pkt, len);
+               if (rxq->zc) {
+                       mbuf = addr_to_mbuf(rxq->umem, addr);
                        rte_pktmbuf_pkt_len(mbuf) =
                                rte_pktmbuf_data_len(mbuf) = len;
-                       rx_bytes += len;
                        bufs[count++] = mbuf;
                } else {
-                       dropped++;
+                       mbuf = rte_pktmbuf_alloc(rxq->mb_pool);
+                       if (mbuf) {
+                               memcpy(rte_pktmbuf_mtod(mbuf, void*), pkt, len);
+                               rte_pktmbuf_pkt_len(mbuf) =
+                                       rte_pktmbuf_data_len(mbuf) = len;
+                               rx_bytes += len;
+                               bufs[count++] = mbuf;
+                       } else {
+                               dropped++;
+                       }
+                       rte_pktmbuf_free(addr_to_mbuf(umem, addr));
-               rte_pktmbuf_free(addr_to_mbuf(umem, addr));

Did not understand how the zc parts are working, but at least looking at the rx_burst function, when multi q will be supported, is there any reason we would have zc enabled on one rxq and not others?

[Qi:] the answer is no, we can't anticipate which memory pool application use during rx queue setup, also at the case multi queue share the same memory pool, umem still can't be shared due to race condition, so only one queue could be zc. To make all the queue have zc, we have to assign each queue different memory pool.

If the answer is that we would have either all or none rxq with zc, we could have dedicated rx_burst functions and avoid this per mbuf test on rxq->zc.

For the tx part, I don't understand the relation between rx and tx.
Should not the zc capability be global to the ethdev port ?

You might also want to look at "simple" tx burst functions like in i40e so that you only need to look at the first mbuf to check its originating pool.

[Qi:] if you mean DEV_TX_OFFLOAD_MBUF_FAST_FREE, yes I think that's good point.

David Marchand

More information about the dev mailing list