[dpdk-dev] [PATCH v4 1/9] examples/ipsec-secgw: avoid to request unused TX offloads

Ananyev, Konstantin konstantin.ananyev at intel.com
Mon Dec 24 12:22:11 CET 2018



> -----Original Message-----
> From: Akhil Goyal [mailto:akhil.goyal at nxp.com]
> Sent: Monday, December 24, 2018 10:54 AM
> To: Ananyev, Konstantin <konstantin.ananyev at intel.com>; dev at dpdk.org
> Cc: Nicolau, Radu <radu.nicolau at intel.com>; Horton, Remy <remy.horton at intel.com>
> Subject: Re: [dpdk-dev] [PATCH v4 1/9] examples/ipsec-secgw: avoid to request unused TX offloads
> 
> 
> 
> On 12/24/2018 3:49 PM, Ananyev, Konstantin wrote:
> >
> >>>> On 12/14/2018 10:10 PM, Konstantin Ananyev wrote:
> >>>>> ipsec-secgw always enables TX offloads
> >>>>> (DEV_TX_OFFLOAD_MULTI_SEGS, DEV_TX_OFFLOAD_SECURITY),
> >>>>> even when they are not requested by the config.
> >>>>> That causes many PMD to choose full-featured TX function,
> >>>>> which in many cases is much slower then one without offloads.
> >>>>> That patch adds checks to enabled extra HW offloads, only when
> >>>>> they were requested.
> >>>>> Plus it enables DEV_TX_OFFLOAD_IPV4_CKSUM,
> >>>>> only when other HW TX ofloads are going to be enabled.
> >>>>> Otherwise SW version of ip cksum calculation is used.
> >>>>> That allows to use vector TX function, when inline-ipsec is not
> >>>>> requested.
> >>>>>
> >>>>> Signed-off-by: Remy Horton <remy.horton at intel.com>
> >>>>> Signed-off-by: Konstantin Ananyev <konstantin.ananyev at intel.com>
> >>>>> Acked-by: Radu Nicolau <radu.nicolau at intel.com>
> >>>>> ---
> >>>>>     examples/ipsec-secgw/ipsec-secgw.c | 44 +++++++++++++++--------
> >>>>>     examples/ipsec-secgw/ipsec.h       |  6 ++++
> >>>>>     examples/ipsec-secgw/sa.c          | 56 ++++++++++++++++++++++++++++++
> >>>>>     3 files changed, 91 insertions(+), 15 deletions(-)
> >>>>>
> >>>>> diff --git a/examples/ipsec-secgw/ipsec-secgw.c b/examples/ipsec-secgw/ipsec-secgw.c
> >>>>> index 1bc0b5b50..cfc2b05e5 100644
> >>>>> --- a/examples/ipsec-secgw/ipsec-secgw.c
> >>>>> +++ b/examples/ipsec-secgw/ipsec-secgw.c
> >>>>> @@ -208,8 +208,6 @@ static struct rte_eth_conf port_conf = {
> >>>>>     	},
> >>>>>     	.txmode = {
> >>>>>     		.mq_mode = ETH_MQ_TX_NONE,
> >>>>> -		.offloads = (DEV_TX_OFFLOAD_IPV4_CKSUM |
> >>>>> -			     DEV_TX_OFFLOAD_MULTI_SEGS),
> >>>> I believe this is disabling checksum offload for all cases and then
> >>>> enabling only for inline crypto and inline proto.
> >>> Yes.
> >>>
> >>>> This is breaking lookaside proto and lookaside none cases. Please
> >>>> correct me if I am wrong.
> >>> Why breaking?
> >> reduction in performance is kind of breaking the code.
> > I didn’t observe any performance drop with that patch.
> > In fact there was a tiny improvement (see below).
> > Did you see any regression with this patch on your HW?
> NXP hardware are low -end to mid end devices and we are always
> bottleneck by core cycles.
> So we would like to have as much offloads to HW as possible.

Ok, then I suppose we need to introduce new cmd-line options,
Something like: --txoffloads=<tx_offload_mask> --rx_offloads=<rx_offload_mask>
to keep everyone happy.
Are you ok with that?
Konstantin 

> >
> >>> For cases when HW cksum offload is disabled, IPv4 cksum calculation
> >>> will be done in SW, see below:
> >>> prepare_tx_pkt(...)
> >>> {
> >>>      ...
> >>>       +
> >>>       +		/* calculate IPv4 cksum in SW */
> >>>       +		if ((pkt->ol_flags & PKT_TX_IP_CKSUM) == 0)
> >>>       +			ip->ip_sum = rte_ipv4_cksum((struct ipv4_hdr *)ip);
> >>>
> >>>
> >>> We tested lookaside-none case quite extensively - all works well,
> >>> in fact on Intel NICs it became even a bit faster because of that change
> >>> (though not much).
> >> yes, it may work well on one hardware, but may not perform good in other
> >> hardware where cores are limited.
> > Could you elaborate a bit more what do you mean by 'cores are limited' here?
> we have single core devices as well on which we run ipsec-secgw.
> > Do you mean that for some low end cpus calculating IPv4 cksum in SW is too expensive?
> yes, limited by core cycles and not by HW
> > Note that prepare_tx_pkts() and friends read/write L2/L3 packet headers anyway -
> > so IPv4 header will be in L1 cache already.
> Agreed, but still it will consume some cycles which are more than that
> of HW.
> >
> >>> Disabling HW offloads when they are not really required has 2 benefits:
> >>>    1) allows app to be run on NICs without HW offloads support.
> >>>    2) allows dev_configure() for TX path to select simple/vector TX functions
> >>>        which for many NICs are significantly faster.
> >>>
> >>> Konstantin
> >>>
> >>>> So a NACK for this if my understanding is correct.
> >>>>



More information about the dev mailing list