[dpdk-dev] [PATCH] net/mlx5: add Rx checksum offload flag return bad

Jiawei Zhu 17826875952 at 163.com
Sun Mar 28 15:39:14 CEST 2021

Previous message (by thread): [dpdk-dev] [PATCH] net/mlx5: add Rx checksum offload flag return bad
Next message (by thread): [dpdk-dev] How about add rte flow get capablity APIs
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hi, Slava

Thanks for your detailed explanation！You are right，I didn't look 
carefully!

With best regards,
Jiawei

On 2021/3/25 7:55 下午, Slava Ovsiienko wrote:
> Hi, Jiawei
> 
>> -----Original Message-----
>> From: Jiawei Zhu <17826875952 at 163.com>
>> Sent: Wednesday, March 24, 2021 18:22
>> To: Slava Ovsiienko <viacheslavo at nvidia.com>; dev at dpdk.org
>> Cc: zhujiawei12 at huawei.com; Matan Azrad <matan at nvidia.com>; Shahaf
>> Shuler <shahafs at nvidia.com>
>> Subject: Re: [PATCH] net/mlx5: add Rx checksum offload flag return bad
>>
>> Hi，Slava
>>
>> Thanks for your explain，the multiplications and divisions are in the
>> TRANSPOSE，not in the rte_be_to_cpu_16.
> 
> [SO]
> Yes, TRANSPOSE is the macro with mul and div operators. But, these ones
> are translated by compiler to the simple shifts (due to operands are power of 2).
> The only place where TRANSPOSE is used is the rxq_cq_to_ol_flags() routine.
> I've compiled this one  and provided the assembly listing - please see one
> in my previous reply. It illustrates how TRASPOSE was compiled to and
> presents the x86 code - we see only shifts:
> 
> 43 0047 48C1EA02 	 shrq $2,%rdx
> 44 004b 48C1E802 	 shrq $2,%rax
> 
> No any mul/div, exactly as we expected.
> 
>> So I think use if-else directly could improves the performance.
> 
> [SO]
> The if/else construction is usually compiled to conditional jumps, the branch
> prediction in CPU over the various ingress traffic patterns  (we are analyzing the
> flags of the received packets) might not work well and we’ll get performance penalty.
> Hence, it seems the best practice is not to have the conditional jumps at all.
> The existing code follows this approach as we can see from the assembly listing - there
> is no conditional jumps.
> 
> With best regards,
> Slava
> 
> PS. Just removed embarrassing details from the listing - this is merely the resulting code
> of rxq_cq_to_ol_flags(). I removed static and made this one non-inline to see the
> isolated piece of code:
> 
> rxq_cq_to_ol_flags:
>    movzwl 28(%rdi),%edx   // endianness conversion optimized out at all
>    movl %edx,%eax
>    andw $512,%dx
>    andw $1024,%ax
>    movzwl %dx,%edx
>    movzwl %ax,%eax
>    shrq $2,%rdx
>    shrq $2,%rax
>    orl %edx,%eax
>    ret
> 
> PPS. As we can see - the shift values are the same for both flags, so there might be some area to optimize
> (we could have only one shift and only one masking with AND)
>

Previous message (by thread): [dpdk-dev] [PATCH] net/mlx5: add Rx checksum offload flag return bad
Next message (by thread): [dpdk-dev] How about add rte flow get capablity APIs
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the dev mailing list