[dpdk-dev] [PATCH] ixgbe: prefetch packet headers in vector PMD receive function

Zoltan Kiss zoltan.kiss at linaro.org
Mon Sep 7 16:15:25 CEST 2015



On 07/09/15 13:57, Richardson, Bruce wrote:
>
>
>> -----Original Message-----
>> From: Zoltan Kiss [mailto:zoltan.kiss at linaro.org]
>> Sent: Monday, September 7, 2015 1:26 PM
>> To: dev at dpdk.org
>> Cc: Ananyev, Konstantin; Richardson, Bruce
>> Subject: Re: [PATCH] ixgbe: prefetch packet headers in vector PMD receive
>> function
>>
>> Hi,
>>
>> I just realized I've missed the "[PATCH]" tag from the subject. Did anyone
>> had time to review this?
>>
>
> Hi Zoltan,
>
> the big thing that concerns me with this is the addition of new instructions for
> each packet in the fast path. Ideally, this prefetching would be better handled
> in the application itself, as for some apps, e.g. those using pipelining, the
> core doing the RX from the NIC may not touch the packet data at all, and the
> prefetches will instead cause a performance slowdown.
>
> Is it possible to get the same performance increase - or something close to it -
> by making changes in OVS?

OVS already does a prefetch when it's processing the previous packet, 
but apparently it's not early enough. At least for my test scenario, 
where I'm forwarding UDP packets with the least possible overhead. I 
guess in tests where OVS does more complex processing it should be fine.
I'll try to move the prefetch earlier in OVS codebase, but I'm not sure 
if it'll help.
Also, I've checked the PMD receive functions, and generally it's quite 
mixed whether they prefetch the header or not. All the other 3 ixgbe 
receive functions do that for example, as well as the following drivers:

bnx2x
e1000
fm10k (scattered)
i40e
igb
virtio

While these drivers don't do that:

cxgbe
enic
fm10k (non-scattered)
mlx4

I think it would be better to add rte_packet_prefetch() everywhere, 
because then applications can turn that off with 
CONFIG_RTE_PMD_PACKET_PREFETCH.

>
> Regards,
> /Bruce
>
>> Regards,
>>
>> Zoltan
>>
>> On 01/09/15 20:17, Zoltan Kiss wrote:
>>> The lack of this prefetch causes a significant performance drop in
>>> OVS-DPDK: 13.3 Mpps instead of 14 when forwarding 64 byte packets.
>>> Even though OVS prefetches the next packet's header before it starts
>>> processing the current one, it doesn't get there fast enough. This
>>> aligns with the behaviour of other receive functions.
>>>
>>> Signed-off-by: Zoltan Kiss <zoltan.kiss at linaro.org>
>>> ---
>>> diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec.c
>>> b/drivers/net/ixgbe/ixgbe_rxtx_vec.c
>>> index cf25a53..51299fa 100644
>>> --- a/drivers/net/ixgbe/ixgbe_rxtx_vec.c
>>> +++ b/drivers/net/ixgbe/ixgbe_rxtx_vec.c
>>> @@ -502,6 +502,15 @@ _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq,
>> struct rte_mbuf **rx_pkts,
>>>                   _mm_storeu_si128((void *)&rx_pkts[pos]-
>>> rx_descriptor_fields1,
>>>                                   pkt_mb1);
>>>
>>> +               rte_packet_prefetch((char*)(rx_pkts[pos]->buf_addr) +
>>> +                                   RTE_PKTMBUF_HEADROOM);
>>> +               rte_packet_prefetch((char*)(rx_pkts[pos + 1]->buf_addr)
>> +
>>> +                                   RTE_PKTMBUF_HEADROOM);
>>> +               rte_packet_prefetch((char*)(rx_pkts[pos + 2]->buf_addr)
>> +
>>> +                                   RTE_PKTMBUF_HEADROOM);
>>> +               rte_packet_prefetch((char*)(rx_pkts[pos + 3]->buf_addr)
>> +
>>> +                                   RTE_PKTMBUF_HEADROOM);
>>> +
>>>                   /* C.4 calc avaialbe number of desc */
>>>                   var = __builtin_popcountll(_mm_cvtsi128_si64(staterr));
>>>                   nb_pkts_recd += var;
>>>


More information about the dev mailing list