[dpdk-dev] [PATCH v2 0/3] enable AVX512 for iavf

Bruce Richardson bruce.richardson at intel.com
Thu Sep 17 11:13:03 CEST 2020


On Thu, Sep 17, 2020 at 09:37:29AM +0200, Morten Brørup wrote:
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Wenzhuo Lu
> > Sent: Thursday, September 17, 2020 3:40 AM
> > 
> > AVX512 instructions is supported by more and more platforms. These
> > instructions
> > can be used in the data path to enhance the per-core performance of
> > packet
> > processing.
> > Comparing with the existing implementation, this path set introduces
> > some AVX512
> > instructions into the iavf data path, and we get a better per-code
> > throughput.
> > 
> > v2:
> > Update meson.build.
> > Repalce the deprecated 'buf_physaddr' by 'buf_iova'.
> > 
> > Wenzhuo Lu (3):
> >   net/iavf: enable AVX512 for legacy RX
> >   net/iavf: enable AVX512 for flexible RX
> >   net/iavf: enable AVX512 for TX
> > 
> >  doc/guides/rel_notes/release_20_11.rst  |    3 +
> >  drivers/net/iavf/iavf_ethdev.c          |    3 +-
> >  drivers/net/iavf/iavf_rxtx.c            |   69 +-
> >  drivers/net/iavf/iavf_rxtx.h            |   18 +
> >  drivers/net/iavf/iavf_rxtx_vec_avx512.c | 1720
> > +++++++++++++++++++++++++++++++
> >  drivers/net/iavf/meson.build            |   17 +
> >  6 files changed, 1818 insertions(+), 12 deletions(-)
> >  create mode 100644 drivers/net/iavf/iavf_rxtx_vec_avx512.c
> > 
> > --
> > 1.9.3
> > 
> 
> I am not sure I understand the full context here, so please bear with me if I'm completely off...
> 
> With this patch set, it looks like the driver manipulates the mempool cache directly, bypassing the libararies encapsulating it.
> 
> Isn't that going deeper into a library than expected... What if the implementation of the mempool library changes radically?
> 
> And if there are performance gains to be achieved by using vector instructions for manipulating the mempool, perhaps your vector optimizations should go into the mempool library instead?
> 

Looking specifically at the descriptor re-arm code, the benefit from
working off the mempool cache directly comes from saving loads by merging
the code blocks, rather than directly from the vectorization itself -
though the vectorization doesn't hurt. The original code having a separate
mempool function worked roughly like below:

1. mempool code loads mbuf pointers from cache
2. mempool code writes mbuf pointers to the SW ring for the NIC
3. driver code loads the mempool pointers from the SW ring
4. driver code then does the rest of the descriptor re-arm.

The benefit comes from eliminating step 3, the loads in the driver, which
are dependent upon the previous stores. By having the driver itself read
from the mempool cache (the code still uses mempool functions for every
other part, since everything beyond the cache depends on the
ring/stack/bucket implementation), we can have the stores go out, and while
they are completing reuse the already-loaded data to do the descriptor
rearm.

Hope this clarifies things.

/Bruce



More information about the dev mailing list