[dpdk-dev] Issue with non-scattered rx in ixgbe and i40e when mbuf private area size is odd

Olivier MATZ olivier.matz at 6wind.com
Thu Jul 30 13:22:22 CEST 2015


Hi,

On 07/30/2015 11:43 AM, Ananyev, Konstantin wrote:
>
>
>> -----Original Message-----
>> From: Olivier MATZ [mailto:olivier.matz at 6wind.com]
>> Sent: Thursday, July 30, 2015 10:10 AM
>> To: Ananyev, Konstantin; Zhang, Helin; Martin Weiser
>> Cc: dev at dpdk.org
>> Subject: Re: [dpdk-dev] Issue with non-scattered rx in ixgbe and i40e when mbuf private area size is odd
>>
>> Hi Konstantin,
>>
>> On 07/30/2015 11:00 AM, Ananyev, Konstantin wrote:
>>> Hi Olivier,
>>>
>>>> -----Original Message-----
>>>> From: Olivier MATZ [mailto:olivier.matz at 6wind.com]
>>>> Sent: Thursday, July 30, 2015 9:12 AM
>>>> To: Zhang, Helin; Ananyev, Konstantin; Martin Weiser
>>>> Cc: dev at dpdk.org
>>>> Subject: Re: [dpdk-dev] Issue with non-scattered rx in ixgbe and i40e when mbuf private area size is odd
>>>>
>>>> Hi,
>>>>
>>>> On 07/29/2015 10:24 PM, Zhang, Helin wrote:
>>>>> Hi Martin
>>>>>
>>>>> Thank you very much for the good catch!
>>>>>
>>>>> The similar situation in i40e, as explained by Konstantin.
>>>>> As header split hasn't been supported by DPDK till now. It would be better to put the header address in RX descriptor to 0.
>>>>> But in the future, during header split enabling. We may need to pay extra attention to that. As at least x710 datasheet said
>>>> specifically as below.
>>>>> "The header address should be set by the software to an even number (word aligned address)". We may need to find a way to
>>>> ensure that during mempool/mbuf allocation.
>>>>
>>>> Indeed it would be good to force the priv_size to be aligned.
>>>>
>>>> The priv_size could be aligned automatically in
>>>> rte_pktmbuf_pool_create(). The only possible problem I could see
>>>> is that it would break applications that access to the data buffer
>>>> by doing (sizeof(mbuf) + sizeof(priv)), which is probably not the
>>>> best thing to do (I didn't find any applications like this in dpdk).
>>>
>>>
>>> Might be just make rte_pktmbuf_pool_create() fail if input priv_size % MIN_ALIGN != 0?
>>
>> Hmm maybe it would break more applications: an odd priv_size is
>> probably rare, but a priv_size that is not aligned to 8 bytes is
>> maybe more common.
>
> My thought was that rte_mempool_create() was just introduced in 2.1,
> so if we add extra requirement for the input parameter now -
> there would be no ABI breakage, and not many people started to use it already.
> For me just seems a bit easier and more straightforward then silent alignment -
> user would not have wrong assumptions here.
> Though if you think that a silent alignment would be more convenient
> for most users - I wouldn't insist.


Yes, I agree on the principle, but it depends whether this fix
is integrated for 2.1 or not.
I think it may already be a bit late for that, especially as it
is not a very critical bug.

Thomas, what do you think?


Olivier




> Konstantin
>
>> It's maybe safer to align the size transparently?
>>
>>
>> Regards,
>> Olivier
>>
>>
>>
>>>
>>>>
>>>> For applications that directly use rte_mempool_create() instead of
>>>> rte_pktmbuf_pool_create(), we could add a check using an assert in
>>>> rte_pktmbuf_init() and some words in the documentation.
>>>>
>>>> The question is: what should be the proper alignment? I would say
>>>> at least 8 bytes, but maybe cache_aligned is an option too.
>>>
>>> 8 bytes seems enough to me.
>>>
>>> Konstantin
>>>
>>>>
>>>> Regards,
>>>> Olivier
>>>>
>>>>
>>>>>
>>>>> Regards,
>>>>> Helin
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Ananyev, Konstantin
>>>>>> Sent: Wednesday, July 29, 2015 11:12 AM
>>>>>> To: Martin Weiser; Zhang, Helin; olivier.matz at 6wind.com
>>>>>> Cc: dev at dpdk.org
>>>>>> Subject: RE: [dpdk-dev] Issue with non-scattered rx in ixgbe and i40e when mbuf
>>>>>> private area size is odd
>>>>>>
>>>>>> Hi Martin,
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Martin Weiser
>>>>>>> Sent: Wednesday, July 29, 2015 4:07 PM
>>>>>>> To: Zhang, Helin; olivier.matz at 6wind.com
>>>>>>> Cc: dev at dpdk.org
>>>>>>> Subject: [dpdk-dev] Issue with non-scattered rx in ixgbe and i40e when
>>>>>>> mbuf private area size is odd
>>>>>>>
>>>>>>> Hi Helin, Hi Olivier,
>>>>>>>
>>>>>>> we are seeing an issue with the ixgbe and i40e drivers which we could
>>>>>>> track down to our setting of the private area size of the mbufs.
>>>>>>> The issue can be easily reproduced with the l2fwd example application
>>>>>>> when a small modification is done: just set the priv_size parameter in
>>>>>>> the call to the rte_pktmbuf_pool_create function to an odd number like
>>>>>>> 1. In our tests this causes every call to rte_eth_rx_burst to return
>>>>>>> 32 (which is the setting of nb_pkts) nonsense mbufs although no
>>>>>>> packets are received on the interface and the hardware counters do not
>>>>>>> report any received packets.
>>>>>>
>>>>>>    From Niantic datasheet:
>>>>>>
>>>>>> "7.1.6.1 Advanced Receive Descriptors — Read Format Table 7-15 lists the
>>>>>> advanced receive descriptor programming by the software. The ...
>>>>>> Packet Buffer Address (64)
>>>>>> This is the physical address of the packet buffer. The lowest bit is A0 (LSB of the
>>>>>> address).
>>>>>> Header Buffer Address (64)
>>>>>> The physical address of the header buffer with the lowest bit being Descriptor
>>>>>> Done (DD).
>>>>>> When a packet spans in multiple descriptors, the header buffer address is used
>>>>>> only on the first descriptor. During the programming phase, software must set
>>>>>> the DD bit to zero (see the description of the DD bit in this section). This means
>>>>>> that header buffer addresses are always word aligned."
>>>>>>
>>>>>> Right now, in ixgbe PMD we always setup  Packet Buffer Address (PBA)and
>>>>>> Header Buffer Address (HBA) to the same value:
>>>>>> buf_physaddr + RTE_PKTMBUF_HEADROOM
>>>>>> So when pirv_size==1, DD bit in RXD is always set to one by SW itself, and then
>>>>>> SW considers that HW already done with it.
>>>>>> In other words, right now for ixgbe you can't use RX buffer that is not aligned on
>>>>>> word boundary.
>>>>>>
>>>>>> So the advice would be, right now - don't set priv_size to the odd value.
>>>>>> As we don't support split header feature anyway, I think we can fix it just by
>>>>>> always setting HBA in the RXD to zero.
>>>>>> Could you try the fix for ixgbe below?
>>>>>>
>>>>>> Same story with FVL, I believe.
>>>>>> Konstantin
>>>>>>
>>>>>>
>>>>>>> Interestingly this does not happen if we force the scattered rx path.
>>>>>>>
>>>>>>> I assume the drivers have some expectations regarding the alignment of
>>>>>>> the buf_addr in the mbuf and setting an odd private are size breaks
>>>>>>> this alignment in the rte_pktmbuf_init function. If this is the case
>>>>>>> then one possible fix might be to enforce an alignment on the private area size.
>>>>>>>
>>>>>>> Best regards,
>>>>>>> Martin
>>>>>>
>>>>>> diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c
>>>>>> index a0c8847..94967c5 100644
>>>>>> --- a/drivers/net/ixgbe/ixgbe_rxtx.c
>>>>>> +++ b/drivers/net/ixgbe/ixgbe_rxtx.c
>>>>>> @@ -1183,7 +1183,7 @@ ixgbe_rx_alloc_bufs(struct ixgbe_rx_queue *rxq, bool
>>>>>> reset_mbuf)
>>>>>>
>>>>>>                    /* populate the descriptors */
>>>>>>                    dma_addr =
>>>>>> rte_cpu_to_le_64(RTE_MBUF_DATA_DMA_ADDR_DEFAULT(mb));
>>>>>> -               rxdp[i].read.hdr_addr = dma_addr;
>>>>>> +               rxdp[i].read.hdr_addr = 0;
>>>>>>                    rxdp[i].read.pkt_addr = dma_addr;
>>>>>>            }
>>>>>>
>>>>>> @@ -1414,7 +1414,7 @@ ixgbe_recv_pkts(void *rx_queue, struct rte_mbuf
>>>>>> **rx_pkts,
>>>>>>                    rxe->mbuf = nmb;
>>>>>>                    dma_addr =
>>>>>>
>>>>>> rte_cpu_to_le_64(RTE_MBUF_DATA_DMA_ADDR_DEFAULT(nmb));
>>>>>> -               rxdp->read.hdr_addr = dma_addr;
>>>>>> +               rxdp->read.hdr_addr = 0;
>>>>>>                    rxdp->read.pkt_addr = dma_addr;
>>>>>>
>>>>>>                    /*
>>>>>> @@ -1741,7 +1741,7 @@ next_desc:
>>>>>>                            rxe->mbuf = nmb;
>>>>>>
>>>>>>                            rxm->data_off = RTE_PKTMBUF_HEADROOM;
>>>>>> -                       rxdp->read.hdr_addr = dma;
>>>>>> +                       rxdp->read.hdr_addr = 0;
>>>>>>                            rxdp->read.pkt_addr = dma;
>>>>>>                    } else
>>>>>>                            rxe->mbuf = NULL; @@ -3633,7 +3633,7 @@
>>>>>> ixgbe_alloc_rx_queue_mbufs(struct ixgbe_rx_queue *rxq)
>>>>>>                    dma_addr =
>>>>>>
>>>>>> rte_cpu_to_le_64(RTE_MBUF_DATA_DMA_ADDR_DEFAULT(mbuf));
>>>>>>                    rxd = &rxq->rx_ring[i];
>>>>>> -               rxd->read.hdr_addr = dma_addr;
>>>>>> +               rxd->read.hdr_addr = 0;
>>>>>>                    rxd->read.pkt_addr = dma_addr;
>>>>>>                    rxe[i].mbuf = mbuf;
>>>>>>            }
>>>>>> diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec.c
>>>>>> b/drivers/net/ixgbe/ixgbe_rxtx_vec.c
>>>>>> index 6c1647e..16a9c64 100644
>>>>>> --- a/drivers/net/ixgbe/ixgbe_rxtx_vec.c
>>>>>> +++ b/drivers/net/ixgbe/ixgbe_rxtx_vec.c
>>>>>> @@ -56,6 +56,8 @@ ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq)
>>>>>>                            RTE_PKTMBUF_HEADROOM);
>>>>>>            __m128i dma_addr0, dma_addr1;
>>>>>>
>>>>>> +       const __m128i hba_msk = _mm_set_epi64x(0, UINT64_MAX);
>>>>>> +
>>>>>>            rxdp = rxq->rx_ring + rxq->rxrearm_start;
>>>>>>
>>>>>>            /* Pull 'n' more MBUFs into the software ring */ @@ -108,6 +110,9 @@
>>>>>> ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq)
>>>>>>                    dma_addr0 = _mm_add_epi64(dma_addr0, hdr_room);
>>>>>>                    dma_addr1 = _mm_add_epi64(dma_addr1, hdr_room);
>>>>>>
>>>>>> +               dma_addr0 =  _mm_and_si128(dma_addr0, hba_msk);
>>>>>> +               dma_addr1 =  _mm_and_si128(dma_addr1, hba_msk);
>>>>>> +
>>>>>>                    /* flush desc with pa dma_addr */
>>>>>>                    _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0);
>>>>>>                    _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1);
>>>>>> bash-4.2$ cat patch1 diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c
>>>>>> b/drivers/net/ixgbe/ixgbe_rxtx.c index a0c8847..94967c5 100644
>>>>>> --- a/drivers/net/ixgbe/ixgbe_rxtx.c
>>>>>> +++ b/drivers/net/ixgbe/ixgbe_rxtx.c
>>>>>> @@ -1183,7 +1183,7 @@ ixgbe_rx_alloc_bufs(struct ixgbe_rx_queue *rxq, bool
>>>>>> reset_mbuf)
>>>>>>
>>>>>>                    /* populate the descriptors */
>>>>>>                    dma_addr =
>>>>>> rte_cpu_to_le_64(RTE_MBUF_DATA_DMA_ADDR_DEFAULT(mb));
>>>>>> -               rxdp[i].read.hdr_addr = dma_addr;
>>>>>> +               rxdp[i].read.hdr_addr = 0;
>>>>>>                    rxdp[i].read.pkt_addr = dma_addr;
>>>>>>            }
>>>>>>
>>>>>> @@ -1414,7 +1414,7 @@ ixgbe_recv_pkts(void *rx_queue, struct rte_mbuf
>>>>>> **rx_pkts,
>>>>>>                    rxe->mbuf = nmb;
>>>>>>                    dma_addr =
>>>>>>
>>>>>> rte_cpu_to_le_64(RTE_MBUF_DATA_DMA_ADDR_DEFAULT(nmb));
>>>>>> -               rxdp->read.hdr_addr = dma_addr;
>>>>>> +               rxdp->read.hdr_addr = 0;
>>>>>>                    rxdp->read.pkt_addr = dma_addr;
>>>>>>
>>>>>>                    /*
>>>>>> @@ -1741,7 +1741,7 @@ next_desc:
>>>>>>                            rxe->mbuf = nmb;
>>>>>>
>>>>>>                            rxm->data_off = RTE_PKTMBUF_HEADROOM;
>>>>>> -                       rxdp->read.hdr_addr = dma;
>>>>>> +                       rxdp->read.hdr_addr = 0;
>>>>>>                            rxdp->read.pkt_addr = dma;
>>>>>>                    } else
>>>>>>                            rxe->mbuf = NULL; @@ -3633,7 +3633,7 @@
>>>>>> ixgbe_alloc_rx_queue_mbufs(struct ixgbe_rx_queue *rxq)
>>>>>>                    dma_addr =
>>>>>>
>>>>>> rte_cpu_to_le_64(RTE_MBUF_DATA_DMA_ADDR_DEFAULT(mbuf));
>>>>>>                    rxd = &rxq->rx_ring[i];
>>>>>> -               rxd->read.hdr_addr = dma_addr;
>>>>>> +               rxd->read.hdr_addr = 0;
>>>>>>                    rxd->read.pkt_addr = dma_addr;
>>>>>>                    rxe[i].mbuf = mbuf;
>>>>>>            }
>>>>>> diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec.c
>>>>>> b/drivers/net/ixgbe/ixgbe_rxtx_vec.c
>>>>>> index 6c1647e..16a9c64 100644
>>>>>> --- a/drivers/net/ixgbe/ixgbe_rxtx_vec.c
>>>>>> +++ b/drivers/net/ixgbe/ixgbe_rxtx_vec.c
>>>>>> @@ -56,6 +56,8 @@ ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq)
>>>>>>                            RTE_PKTMBUF_HEADROOM);
>>>>>>            __m128i dma_addr0, dma_addr1;
>>>>>>
>>>>>> +       const __m128i hba_msk = _mm_set_epi64x(0, UINT64_MAX);
>>>>>> +
>>>>>>            rxdp = rxq->rx_ring + rxq->rxrearm_start;
>>>>>>
>>>>>>            /* Pull 'n' more MBUFs into the software ring */ @@ -108,6 +110,9 @@
>>>>>> ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq)
>>>>>>                    dma_addr0 = _mm_add_epi64(dma_addr0, hdr_room);
>>>>>>                    dma_addr1 = _mm_add_epi64(dma_addr1, hdr_room);
>>>>>>
>>>>>> +               dma_addr0 =  _mm_and_si128(dma_addr0, hba_msk);
>>>>>> +               dma_addr1 =  _mm_and_si128(dma_addr1, hba_msk);
>>>>>> +
>>>>>>                    /* flush desc with pa dma_addr */
>>>>>>                    _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0);
>>>>>>                    _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1);
>>>
>



More information about the dev mailing list