[dpdk-dev] [PATCH v1 3/3] net/i40e: auto-vectorization to speed up Tx free

Jerin Jacob jerinjacobk at gmail.com
Fri Mar 6 08:44:48 CET 2020


On Fri, Mar 6, 2020 at 10:35 AM Gavin Hu <gavin.hu at arm.com> wrote:
>
> Tx mbuf free is a hotspot for i40e on aarch64, as there are no
> inter-loop dependencies, it is safe to enable auto-vectorization
> to speed up.
>
> This patch showed 2~3% performance lift on ThunderX2 and no degradation
> on Arm N1SDP. The test case is single core RFC2544 zero-loss test.
>
> Signed-off-by: Gavin Hu <gavin.hu at arm.com>
> Reviewed-by: Steve Capper <steve.capper at arm.com>
> ---
>  drivers/net/i40e/i40e_rxtx_vec_common.h | 5 +++++
>  1 file changed, 5 insertions(+)
>
> diff --git a/drivers/net/i40e/i40e_rxtx_vec_common.h b/drivers/net/i40e/i40e_rxtx_vec_common.h
> index 0e6ffa007..fc0fa45d4 100644
> --- a/drivers/net/i40e/i40e_rxtx_vec_common.h
> +++ b/drivers/net/i40e/i40e_rxtx_vec_common.h
> @@ -98,6 +98,11 @@ i40e_tx_free_bufs(struct i40e_tx_queue *txq)
>         if (likely(m != NULL)) {
>                 free[0] = m;
>                 nb_free = 1;
> +#if defined(__clang__)
> +#pragma clang loop vectorize(assume_safety)
> +#elif defined(__GNUC__)
> +#pragma GCC ivdep
> +#endif

IMO, It is better to abstract the compiler features  (above compiler
feature and __restrict__) as macros in
rte_common.h or so. It will help to support other compilers(ICC or
Windows) and enable them to have "changes" in one place.



>                 for (i = 1; i < n; i++) {
>                         m = rte_pktmbuf_prefree_seg(txep[i].mbuf);
>                         if (likely(m != NULL)) {
> --
> 2.17.1
>


More information about the dev mailing list