[dpdk-dev] Issue with non-scattered rx in ixgbe and i40e when mbuf private area size is odd

Ananyev, Konstantin konstantin.ananyev at intel.com
Thu Jul 30 11:00:40 CEST 2015


Hi Olivier,

> -----Original Message-----
> From: Olivier MATZ [mailto:olivier.matz at 6wind.com]
> Sent: Thursday, July 30, 2015 9:12 AM
> To: Zhang, Helin; Ananyev, Konstantin; Martin Weiser
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] Issue with non-scattered rx in ixgbe and i40e when mbuf private area size is odd
> 
> Hi,
> 
> On 07/29/2015 10:24 PM, Zhang, Helin wrote:
> > Hi Martin
> >
> > Thank you very much for the good catch!
> >
> > The similar situation in i40e, as explained by Konstantin.
> > As header split hasn't been supported by DPDK till now. It would be better to put the header address in RX descriptor to 0.
> > But in the future, during header split enabling. We may need to pay extra attention to that. As at least x710 datasheet said
> specifically as below.
> > "The header address should be set by the software to an even number (word aligned address)". We may need to find a way to
> ensure that during mempool/mbuf allocation.
> 
> Indeed it would be good to force the priv_size to be aligned.
> 
> The priv_size could be aligned automatically in
> rte_pktmbuf_pool_create(). The only possible problem I could see
> is that it would break applications that access to the data buffer
> by doing (sizeof(mbuf) + sizeof(priv)), which is probably not the
> best thing to do (I didn't find any applications like this in dpdk).


Might be just make rte_pktmbuf_pool_create() fail if input priv_size % MIN_ALIGN != 0?

> 
> For applications that directly use rte_mempool_create() instead of
> rte_pktmbuf_pool_create(), we could add a check using an assert in
> rte_pktmbuf_init() and some words in the documentation.
> 
> The question is: what should be the proper alignment? I would say
> at least 8 bytes, but maybe cache_aligned is an option too.

8 bytes seems enough to me.

Konstantin 

> 
> Regards,
> Olivier
> 
> 
> >
> > Regards,
> > Helin
> >
> >> -----Original Message-----
> >> From: Ananyev, Konstantin
> >> Sent: Wednesday, July 29, 2015 11:12 AM
> >> To: Martin Weiser; Zhang, Helin; olivier.matz at 6wind.com
> >> Cc: dev at dpdk.org
> >> Subject: RE: [dpdk-dev] Issue with non-scattered rx in ixgbe and i40e when mbuf
> >> private area size is odd
> >>
> >> Hi Martin,
> >>
> >>> -----Original Message-----
> >>> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Martin Weiser
> >>> Sent: Wednesday, July 29, 2015 4:07 PM
> >>> To: Zhang, Helin; olivier.matz at 6wind.com
> >>> Cc: dev at dpdk.org
> >>> Subject: [dpdk-dev] Issue with non-scattered rx in ixgbe and i40e when
> >>> mbuf private area size is odd
> >>>
> >>> Hi Helin, Hi Olivier,
> >>>
> >>> we are seeing an issue with the ixgbe and i40e drivers which we could
> >>> track down to our setting of the private area size of the mbufs.
> >>> The issue can be easily reproduced with the l2fwd example application
> >>> when a small modification is done: just set the priv_size parameter in
> >>> the call to the rte_pktmbuf_pool_create function to an odd number like
> >>> 1. In our tests this causes every call to rte_eth_rx_burst to return
> >>> 32 (which is the setting of nb_pkts) nonsense mbufs although no
> >>> packets are received on the interface and the hardware counters do not
> >>> report any received packets.
> >>
> >>  From Niantic datasheet:
> >>
> >> "7.1.6.1 Advanced Receive Descriptors — Read Format Table 7-15 lists the
> >> advanced receive descriptor programming by the software. The ...
> >> Packet Buffer Address (64)
> >> This is the physical address of the packet buffer. The lowest bit is A0 (LSB of the
> >> address).
> >> Header Buffer Address (64)
> >> The physical address of the header buffer with the lowest bit being Descriptor
> >> Done (DD).
> >> When a packet spans in multiple descriptors, the header buffer address is used
> >> only on the first descriptor. During the programming phase, software must set
> >> the DD bit to zero (see the description of the DD bit in this section). This means
> >> that header buffer addresses are always word aligned."
> >>
> >> Right now, in ixgbe PMD we always setup  Packet Buffer Address (PBA)and
> >> Header Buffer Address (HBA) to the same value:
> >> buf_physaddr + RTE_PKTMBUF_HEADROOM
> >> So when pirv_size==1, DD bit in RXD is always set to one by SW itself, and then
> >> SW considers that HW already done with it.
> >> In other words, right now for ixgbe you can't use RX buffer that is not aligned on
> >> word boundary.
> >>
> >> So the advice would be, right now - don't set priv_size to the odd value.
> >> As we don't support split header feature anyway, I think we can fix it just by
> >> always setting HBA in the RXD to zero.
> >> Could you try the fix for ixgbe below?
> >>
> >> Same story with FVL, I believe.
> >> Konstantin
> >>
> >>
> >>> Interestingly this does not happen if we force the scattered rx path.
> >>>
> >>> I assume the drivers have some expectations regarding the alignment of
> >>> the buf_addr in the mbuf and setting an odd private are size breaks
> >>> this alignment in the rte_pktmbuf_init function. If this is the case
> >>> then one possible fix might be to enforce an alignment on the private area size.
> >>>
> >>> Best regards,
> >>> Martin
> >>
> >> diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c
> >> index a0c8847..94967c5 100644
> >> --- a/drivers/net/ixgbe/ixgbe_rxtx.c
> >> +++ b/drivers/net/ixgbe/ixgbe_rxtx.c
> >> @@ -1183,7 +1183,7 @@ ixgbe_rx_alloc_bufs(struct ixgbe_rx_queue *rxq, bool
> >> reset_mbuf)
> >>
> >>                  /* populate the descriptors */
> >>                  dma_addr =
> >> rte_cpu_to_le_64(RTE_MBUF_DATA_DMA_ADDR_DEFAULT(mb));
> >> -               rxdp[i].read.hdr_addr = dma_addr;
> >> +               rxdp[i].read.hdr_addr = 0;
> >>                  rxdp[i].read.pkt_addr = dma_addr;
> >>          }
> >>
> >> @@ -1414,7 +1414,7 @@ ixgbe_recv_pkts(void *rx_queue, struct rte_mbuf
> >> **rx_pkts,
> >>                  rxe->mbuf = nmb;
> >>                  dma_addr =
> >>
> >> rte_cpu_to_le_64(RTE_MBUF_DATA_DMA_ADDR_DEFAULT(nmb));
> >> -               rxdp->read.hdr_addr = dma_addr;
> >> +               rxdp->read.hdr_addr = 0;
> >>                  rxdp->read.pkt_addr = dma_addr;
> >>
> >>                  /*
> >> @@ -1741,7 +1741,7 @@ next_desc:
> >>                          rxe->mbuf = nmb;
> >>
> >>                          rxm->data_off = RTE_PKTMBUF_HEADROOM;
> >> -                       rxdp->read.hdr_addr = dma;
> >> +                       rxdp->read.hdr_addr = 0;
> >>                          rxdp->read.pkt_addr = dma;
> >>                  } else
> >>                          rxe->mbuf = NULL; @@ -3633,7 +3633,7 @@
> >> ixgbe_alloc_rx_queue_mbufs(struct ixgbe_rx_queue *rxq)
> >>                  dma_addr =
> >>
> >> rte_cpu_to_le_64(RTE_MBUF_DATA_DMA_ADDR_DEFAULT(mbuf));
> >>                  rxd = &rxq->rx_ring[i];
> >> -               rxd->read.hdr_addr = dma_addr;
> >> +               rxd->read.hdr_addr = 0;
> >>                  rxd->read.pkt_addr = dma_addr;
> >>                  rxe[i].mbuf = mbuf;
> >>          }
> >> diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec.c
> >> b/drivers/net/ixgbe/ixgbe_rxtx_vec.c
> >> index 6c1647e..16a9c64 100644
> >> --- a/drivers/net/ixgbe/ixgbe_rxtx_vec.c
> >> +++ b/drivers/net/ixgbe/ixgbe_rxtx_vec.c
> >> @@ -56,6 +56,8 @@ ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq)
> >>                          RTE_PKTMBUF_HEADROOM);
> >>          __m128i dma_addr0, dma_addr1;
> >>
> >> +       const __m128i hba_msk = _mm_set_epi64x(0, UINT64_MAX);
> >> +
> >>          rxdp = rxq->rx_ring + rxq->rxrearm_start;
> >>
> >>          /* Pull 'n' more MBUFs into the software ring */ @@ -108,6 +110,9 @@
> >> ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq)
> >>                  dma_addr0 = _mm_add_epi64(dma_addr0, hdr_room);
> >>                  dma_addr1 = _mm_add_epi64(dma_addr1, hdr_room);
> >>
> >> +               dma_addr0 =  _mm_and_si128(dma_addr0, hba_msk);
> >> +               dma_addr1 =  _mm_and_si128(dma_addr1, hba_msk);
> >> +
> >>                  /* flush desc with pa dma_addr */
> >>                  _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0);
> >>                  _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1);
> >> bash-4.2$ cat patch1 diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c
> >> b/drivers/net/ixgbe/ixgbe_rxtx.c index a0c8847..94967c5 100644
> >> --- a/drivers/net/ixgbe/ixgbe_rxtx.c
> >> +++ b/drivers/net/ixgbe/ixgbe_rxtx.c
> >> @@ -1183,7 +1183,7 @@ ixgbe_rx_alloc_bufs(struct ixgbe_rx_queue *rxq, bool
> >> reset_mbuf)
> >>
> >>                  /* populate the descriptors */
> >>                  dma_addr =
> >> rte_cpu_to_le_64(RTE_MBUF_DATA_DMA_ADDR_DEFAULT(mb));
> >> -               rxdp[i].read.hdr_addr = dma_addr;
> >> +               rxdp[i].read.hdr_addr = 0;
> >>                  rxdp[i].read.pkt_addr = dma_addr;
> >>          }
> >>
> >> @@ -1414,7 +1414,7 @@ ixgbe_recv_pkts(void *rx_queue, struct rte_mbuf
> >> **rx_pkts,
> >>                  rxe->mbuf = nmb;
> >>                  dma_addr =
> >>
> >> rte_cpu_to_le_64(RTE_MBUF_DATA_DMA_ADDR_DEFAULT(nmb));
> >> -               rxdp->read.hdr_addr = dma_addr;
> >> +               rxdp->read.hdr_addr = 0;
> >>                  rxdp->read.pkt_addr = dma_addr;
> >>
> >>                  /*
> >> @@ -1741,7 +1741,7 @@ next_desc:
> >>                          rxe->mbuf = nmb;
> >>
> >>                          rxm->data_off = RTE_PKTMBUF_HEADROOM;
> >> -                       rxdp->read.hdr_addr = dma;
> >> +                       rxdp->read.hdr_addr = 0;
> >>                          rxdp->read.pkt_addr = dma;
> >>                  } else
> >>                          rxe->mbuf = NULL; @@ -3633,7 +3633,7 @@
> >> ixgbe_alloc_rx_queue_mbufs(struct ixgbe_rx_queue *rxq)
> >>                  dma_addr =
> >>
> >> rte_cpu_to_le_64(RTE_MBUF_DATA_DMA_ADDR_DEFAULT(mbuf));
> >>                  rxd = &rxq->rx_ring[i];
> >> -               rxd->read.hdr_addr = dma_addr;
> >> +               rxd->read.hdr_addr = 0;
> >>                  rxd->read.pkt_addr = dma_addr;
> >>                  rxe[i].mbuf = mbuf;
> >>          }
> >> diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec.c
> >> b/drivers/net/ixgbe/ixgbe_rxtx_vec.c
> >> index 6c1647e..16a9c64 100644
> >> --- a/drivers/net/ixgbe/ixgbe_rxtx_vec.c
> >> +++ b/drivers/net/ixgbe/ixgbe_rxtx_vec.c
> >> @@ -56,6 +56,8 @@ ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq)
> >>                          RTE_PKTMBUF_HEADROOM);
> >>          __m128i dma_addr0, dma_addr1;
> >>
> >> +       const __m128i hba_msk = _mm_set_epi64x(0, UINT64_MAX);
> >> +
> >>          rxdp = rxq->rx_ring + rxq->rxrearm_start;
> >>
> >>          /* Pull 'n' more MBUFs into the software ring */ @@ -108,6 +110,9 @@
> >> ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq)
> >>                  dma_addr0 = _mm_add_epi64(dma_addr0, hdr_room);
> >>                  dma_addr1 = _mm_add_epi64(dma_addr1, hdr_room);
> >>
> >> +               dma_addr0 =  _mm_and_si128(dma_addr0, hba_msk);
> >> +               dma_addr1 =  _mm_and_si128(dma_addr1, hba_msk);
> >> +
> >>                  /* flush desc with pa dma_addr */
> >>                  _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0);
> >>                  _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1);



More information about the dev mailing list