[dpdk-dev] [PATCH] net/mlx5: add Rx checksum offload flag return bad
Jiawei Zhu
17826875952 at 163.com
Sun Mar 28 15:39:14 CEST 2021
Hi, Slava
Thanks for your detailed explanation!You are right,I didn't look
carefully!
With best regards,
Jiawei
On 2021/3/25 7:55 下午, Slava Ovsiienko wrote:
> Hi, Jiawei
>
>> -----Original Message-----
>> From: Jiawei Zhu <17826875952 at 163.com>
>> Sent: Wednesday, March 24, 2021 18:22
>> To: Slava Ovsiienko <viacheslavo at nvidia.com>; dev at dpdk.org
>> Cc: zhujiawei12 at huawei.com; Matan Azrad <matan at nvidia.com>; Shahaf
>> Shuler <shahafs at nvidia.com>
>> Subject: Re: [PATCH] net/mlx5: add Rx checksum offload flag return bad
>>
>> Hi,Slava
>>
>> Thanks for your explain,the multiplications and divisions are in the
>> TRANSPOSE,not in the rte_be_to_cpu_16.
>
> [SO]
> Yes, TRANSPOSE is the macro with mul and div operators. But, these ones
> are translated by compiler to the simple shifts (due to operands are power of 2).
> The only place where TRANSPOSE is used is the rxq_cq_to_ol_flags() routine.
> I've compiled this one and provided the assembly listing - please see one
> in my previous reply. It illustrates how TRASPOSE was compiled to and
> presents the x86 code - we see only shifts:
>
> 43 0047 48C1EA02 shrq $2,%rdx
> 44 004b 48C1E802 shrq $2,%rax
>
> No any mul/div, exactly as we expected.
>
>> So I think use if-else directly could improves the performance.
>
> [SO]
> The if/else construction is usually compiled to conditional jumps, the branch
> prediction in CPU over the various ingress traffic patterns (we are analyzing the
> flags of the received packets) might not work well and we’ll get performance penalty.
> Hence, it seems the best practice is not to have the conditional jumps at all.
> The existing code follows this approach as we can see from the assembly listing - there
> is no conditional jumps.
>
> With best regards,
> Slava
>
> PS. Just removed embarrassing details from the listing - this is merely the resulting code
> of rxq_cq_to_ol_flags(). I removed static and made this one non-inline to see the
> isolated piece of code:
>
> rxq_cq_to_ol_flags:
> movzwl 28(%rdi),%edx // endianness conversion optimized out at all
> movl %edx,%eax
> andw $512,%dx
> andw $1024,%ax
> movzwl %dx,%edx
> movzwl %ax,%eax
> shrq $2,%rdx
> shrq $2,%rax
> orl %edx,%eax
> ret
>
> PPS. As we can see - the shift values are the same for both flags, so there might be some area to optimize
> (we could have only one shift and only one masking with AND)
>
More information about the dev
mailing list