[dpdk-dev] [PATCH v2 2/2] net/enic: add AVX2 based vectorized Rx handler

Ferruh Yigit ferruh.yigit at intel.com
Tue Oct 2 18:08:21 CEST 2018


On 9/28/2018 8:25 PM, John Daley wrote:
> From: Hyong Youb Kim <hyonkim at cisco.com>
> 
> Add the vectorized version of the no-scatter Rx handler. It aims to
> process 8 descriptors per loop using AVX2 SIMD instructions. This
> handler is in its own file enic_rxtx_vec_avx2.c, and makefile and
> meson.build are modified to compile it when the compiler supports
> AVX2. Under ideal conditions, the vectorized handler reduces
> cycles/packet by more than 30%, when compared against the no-scatter
> Rx handler. Most implementation ideas come from i40e's AVX2 based
> handler, so credit goes to its authors.
> 
> At this point, the new handler is meant for field trials, and is not
> selected by default. So add a new devarg enable-avx2-rx to allow the
> user to request the use of the new handler. When enable-avx2-rx=1, the
> driver will consider using the new handler.
> 
> Also update the guide doc and introduce the vectorized handler.
> 
> Signed-off-by: Hyong Youb Kim <hyonkim at cisco.com>
> Reviewed-by: John Daley <johndale at cisco.com>

<...>

> +Vectorized Rx Handler
> +---------------------
> +
> +ENIC PMD includes a version of the receive handler that is vectorized using
> +AVX2 SIMD instructions. It is meant for bulk, throughput oriented workloads
> +where reducing cycles/packet in PMD is a priority. In order to use the
> +vectorized handler, take the following steps.
> +
> +- Use a recent version of gcc, icc, or clang and build 64-bit DPDK. If
> +  the compiler is known to support AVX2, DPDK build system
> +  automatically compiles the vectorized handler. Otherwise, the
> +  handler is not available.
> +
> +- Set ``devargs`` parameter ``enable-avx2-rx=1`` to explicitly request that
> +  PMD consider the vectorized handler when selecting the receive handler.

Can be good to show a usage sample, as done in disable-overlay=1, like
-w 12:00.0,enable-avx2-rx=1

> +
> +  As the current implementation is intended for field trials, by default, the
> +  vectorized handler is not considerd (``enable-avx2-rx=0``).
> +
> +- Run on a UCS M4 or later server with CPUs that support AVX2.
> +
> +PMD selects the vectorized handler when the handler is compiled into
> +the driver, the user requests its use via ``enable-avx2-rx=1``, CPU
> +supports AVX2, and scatter Rx is not used. To verify that the

Code doesn't check if user requested DEV_RX_OFFLOAD_SCATTER, if so perhaps
shouldn't enable vector path.

> +vectorized handler is selected, enable debug logging
> +(``--log-level=pmd,debug``) and check the following message.
> +
> +.. code-block:: console
> +
> +    enic_use_vector_rx_handler use the non-scatter avx2 Rx handler
> +
>  .. _enic_limitations:
>  
>  Limitations
> diff --git a/drivers/net/enic/Makefile b/drivers/net/enic/Makefile
> index 7c6c29cc0..3ec6f9159 100644
> --- a/drivers/net/enic/Makefile
> +++ b/drivers/net/enic/Makefile
> @@ -39,4 +39,11 @@ SRCS-$(CONFIG_RTE_LIBRTE_ENIC_PMD) += base/vnic_intr.c
>  SRCS-$(CONFIG_RTE_LIBRTE_ENIC_PMD) += base/vnic_rq.c
>  SRCS-$(CONFIG_RTE_LIBRTE_ENIC_PMD) += base/vnic_rss.c
>  
> +ifeq ($(findstring RTE_MACHINE_CPUFLAG_AVX2,$(CFLAGS)),RTE_MACHINE_CPUFLAG_AVX2)
> +# The current implementation assumes 64-bit pointers
> +ifeq ($(CONFIG_RTE_ARCH_X86_64),y)
> +	SRCS-$(CONFIG_RTE_LIBRTE_ENIC_PMD) += enic_rxtx_vec_avx2.c
> +endif
> +endif

This is not exactly enough.

CFLAGS has MACHINE_CPUFLAG based on the parameters pass the compiler. This flag
is what compiler supports.

For example if DPDK build for "default" architecture AVX2 won't be supported and
your vector path won't be build at all. Please check i40e Makefile.

> +
>  include $(RTE_SDK)/mk/rte.lib.mk
> diff --git a/drivers/net/enic/enic.h b/drivers/net/enic/enic.h
> index 775cd5d55..665f5668a 100644
> --- a/drivers/net/enic/enic.h
> +++ b/drivers/net/enic/enic.h
> @@ -106,6 +106,7 @@ struct enic {
>  	struct vnic_dev_bar bar0;
>  	struct vnic_dev *vdev;
>  
> +	uint64_t mbuf_initializer;

A comment can be good.

<...>

> @@ -535,6 +552,21 @@ int enic_enable(struct enic *enic)
>  	int err;
>  	struct rte_eth_dev *eth_dev = enic->rte_dev;
>  	uint64_t simple_tx_offloads;
> +	uintptr_t p;
> +	struct rte_mbuf mb_def = { .buf_addr = 0 };
> +
> +	/*
> +	 * mbuf_initializer contains const-after-init fields of
> +	 * receive mbufs (i.e. 64 bits of fields from rearm_data).
> +	 * It is currently used by the vectorized handler.
> +	 */
> +	mb_def.nb_segs = 1;
> +	mb_def.data_off = RTE_PKTMBUF_HEADROOM;
> +	mb_def.port = enic->port_id;
> +	rte_mbuf_refcnt_set(&mb_def, 1);
> +	rte_compiler_barrier();
> +	p = (uintptr_t)&mb_def.rearm_data;
> +	enic->mbuf_initializer = *(uint64_t *)p;

What do you think wrapping this block with "enic->enable_avx2_rx" check, mostly
to show the usage intention?

<...>

> +	if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX2)) {
> +		PMD_INIT_LOG(DEBUG, " use the non-scatter avx2"
> +			     " Rx handler");

No need to split the log line.


More information about the dev mailing list