[dpdk-dev] [PATCH v4 1/9] examples/ipsec-secgw: avoid to request unused TX offloads

Akhil Goyal akhil.goyal at nxp.com
Mon Dec 24 12:24:46 CET 2018



On 12/24/2018 4:52 PM, Ananyev, Konstantin wrote:
>
>> -----Original Message-----
>> From: Akhil Goyal [mailto:akhil.goyal at nxp.com]
>> Sent: Monday, December 24, 2018 10:54 AM
>> To: Ananyev, Konstantin <konstantin.ananyev at intel.com>; dev at dpdk.org
>> Cc: Nicolau, Radu <radu.nicolau at intel.com>; Horton, Remy <remy.horton at intel.com>
>> Subject: Re: [dpdk-dev] [PATCH v4 1/9] examples/ipsec-secgw: avoid to request unused TX offloads
>>
>>
>>
>> On 12/24/2018 3:49 PM, Ananyev, Konstantin wrote:
>>>>>> On 12/14/2018 10:10 PM, Konstantin Ananyev wrote:
>>>>>>> ipsec-secgw always enables TX offloads
>>>>>>> (DEV_TX_OFFLOAD_MULTI_SEGS, DEV_TX_OFFLOAD_SECURITY),
>>>>>>> even when they are not requested by the config.
>>>>>>> That causes many PMD to choose full-featured TX function,
>>>>>>> which in many cases is much slower then one without offloads.
>>>>>>> That patch adds checks to enabled extra HW offloads, only when
>>>>>>> they were requested.
>>>>>>> Plus it enables DEV_TX_OFFLOAD_IPV4_CKSUM,
>>>>>>> only when other HW TX ofloads are going to be enabled.
>>>>>>> Otherwise SW version of ip cksum calculation is used.
>>>>>>> That allows to use vector TX function, when inline-ipsec is not
>>>>>>> requested.
>>>>>>>
>>>>>>> Signed-off-by: Remy Horton <remy.horton at intel.com>
>>>>>>> Signed-off-by: Konstantin Ananyev <konstantin.ananyev at intel.com>
>>>>>>> Acked-by: Radu Nicolau <radu.nicolau at intel.com>
>>>>>>> ---
>>>>>>>      examples/ipsec-secgw/ipsec-secgw.c | 44 +++++++++++++++--------
>>>>>>>      examples/ipsec-secgw/ipsec.h       |  6 ++++
>>>>>>>      examples/ipsec-secgw/sa.c          | 56 ++++++++++++++++++++++++++++++
>>>>>>>      3 files changed, 91 insertions(+), 15 deletions(-)
>>>>>>>
>>>>>>> diff --git a/examples/ipsec-secgw/ipsec-secgw.c b/examples/ipsec-secgw/ipsec-secgw.c
>>>>>>> index 1bc0b5b50..cfc2b05e5 100644
>>>>>>> --- a/examples/ipsec-secgw/ipsec-secgw.c
>>>>>>> +++ b/examples/ipsec-secgw/ipsec-secgw.c
>>>>>>> @@ -208,8 +208,6 @@ static struct rte_eth_conf port_conf = {
>>>>>>>      	},
>>>>>>>      	.txmode = {
>>>>>>>      		.mq_mode = ETH_MQ_TX_NONE,
>>>>>>> -		.offloads = (DEV_TX_OFFLOAD_IPV4_CKSUM |
>>>>>>> -			     DEV_TX_OFFLOAD_MULTI_SEGS),
>>>>>> I believe this is disabling checksum offload for all cases and then
>>>>>> enabling only for inline crypto and inline proto.
>>>>> Yes.
>>>>>
>>>>>> This is breaking lookaside proto and lookaside none cases. Please
>>>>>> correct me if I am wrong.
>>>>> Why breaking?
>>>> reduction in performance is kind of breaking the code.
>>> I didn’t observe any performance drop with that patch.
>>> In fact there was a tiny improvement (see below).
>>> Did you see any regression with this patch on your HW?
>> NXP hardware are low -end to mid end devices and we are always
>> bottleneck by core cycles.
>> So we would like to have as much offloads to HW as possible.
> Ok, then I suppose we need to introduce new cmd-line options,
> Something like: --txoffloads=<tx_offload_mask> --rx_offloads=<rx_offload_mask>
> to keep everyone happy.
> Are you ok with that?
I think it should be taken from the PMD capabilities. cmd line for every 
parameter will make it very complex.
> Konstantin
>
>>>>> For cases when HW cksum offload is disabled, IPv4 cksum calculation
>>>>> will be done in SW, see below:
>>>>> prepare_tx_pkt(...)
>>>>> {
>>>>>       ...
>>>>>        +
>>>>>        +		/* calculate IPv4 cksum in SW */
>>>>>        +		if ((pkt->ol_flags & PKT_TX_IP_CKSUM) == 0)
>>>>>        +			ip->ip_sum = rte_ipv4_cksum((struct ipv4_hdr *)ip);
>>>>>
>>>>>
>>>>> We tested lookaside-none case quite extensively - all works well,
>>>>> in fact on Intel NICs it became even a bit faster because of that change
>>>>> (though not much).
>>>> yes, it may work well on one hardware, but may not perform good in other
>>>> hardware where cores are limited.
>>> Could you elaborate a bit more what do you mean by 'cores are limited' here?
>> we have single core devices as well on which we run ipsec-secgw.
>>> Do you mean that for some low end cpus calculating IPv4 cksum in SW is too expensive?
>> yes, limited by core cycles and not by HW
>>> Note that prepare_tx_pkts() and friends read/write L2/L3 packet headers anyway -
>>> so IPv4 header will be in L1 cache already.
>> Agreed, but still it will consume some cycles which are more than that
>> of HW.
>>>>> Disabling HW offloads when they are not really required has 2 benefits:
>>>>>     1) allows app to be run on NICs without HW offloads support.
>>>>>     2) allows dev_configure() for TX path to select simple/vector TX functions
>>>>>         which for many NICs are significantly faster.
>>>>>
>>>>> Konstantin
>>>>>
>>>>>> So a NACK for this if my understanding is correct.
>>>>>>



More information about the dev mailing list