[dpdk-dev] [PATCH v2 00/17] ACL: New AVX2 classify method and several other enhancements.

Neil Horman nhorman at tuxdriver.com
Wed Jan 14 19:39:28 CET 2015


On Mon, Jan 12, 2015 at 07:16:04PM +0000, Konstantin Ananyev wrote:
> v2 changes:
> - When build with the compilers that don't support AVX2 instructions,
> make rte_acl_classify_avx2() do nothing and return an error.
> - Remove unneeded 'ifdef __AVX2__' in acl_run_avx2.*.
> - Reorder order of patches in the set, to keep RTE_LIBRTE_ACL_STANDALONE=y
> always buildable.
> 
> This patch series contain several fixes and enhancements for ACL library.
> See complete list below.
> Two main changes that are externally visible:
> - Introduce new classify method:  RTE_ACL_CLASSIFY_AVX2.
> It uses AVX2 instructions and 256 bit wide data types
> to perform internal trie traversal.
> That helps to increase classify() throughput.
> This method is selected as default one on CPUs that supports AVX2.
> - Introduce new field in the build config structure: max_size.
> It specifies maximum size that internal RT structure for given context
> can reach.
> The purpose of that is to allow user to decide about space/performance trade-off
> (faster classify() vs less space for RT internal structures)
> for each given set of rules.
> 
> Konstantin Ananyev (17):
>   fix fix compilation issues with RTE_LIBRTE_ACL_STANDALONE=y
>   app/test: few small fixes fot test_acl.c
>   librte_acl: make data_indexes long enough to survive idle transitions.
>   librte_acl: remove build phase heuristsic with negative perfomance
>     effect.
>   librte_acl: fix a bug at build phase that can cause matches beeing
>     overwirtten.
>   librte_acl: introduce DFA nodes compression (group64) for identical
>     entries.
>   librte_acl: build/gen phase - simplify the way match nodes are
>     allocated.
>   librte_acl: make scalar RT code to be more similar to vector one.
>   librte_acl: a bit of RT code deduplication.
>   EAL: introduce rte_ymm and relatives in rte_common_vect.h.
>   librte_acl: add AVX2 as new rte_acl_classify() method
>   test-acl: add ability to manually select RT method.
>   librte_acl: Remove search_sse_2 and relatives.
>   libter_acl: move lo/hi dwords shuffle out from calc_addr
>   libte_acl: make calc_addr a define to deduplicate the code.
>   libte_acl: introduce max_size into rte_acl_config.
>   libte_acl: remove unused macros.
> 
>  app/test-acl/main.c                             | 126 +++--
>  app/test/test_acl.c                             |   8 +-
>  examples/l3fwd-acl/main.c                       |   3 +-
>  examples/l3fwd/main.c                           |   2 +-
>  lib/librte_acl/Makefile                         |  18 +
>  lib/librte_acl/acl.h                            |  58 ++-
>  lib/librte_acl/acl_bld.c                        | 392 +++++++---------
>  lib/librte_acl/acl_gen.c                        | 268 +++++++----
>  lib/librte_acl/acl_run.h                        |   7 +-
>  lib/librte_acl/acl_run_avx2.c                   |  54 +++
>  lib/librte_acl/acl_run_avx2.h                   | 284 ++++++++++++
>  lib/librte_acl/acl_run_scalar.c                 |  65 ++-
>  lib/librte_acl/acl_run_sse.c                    | 585 +-----------------------
>  lib/librte_acl/acl_run_sse.h                    | 357 +++++++++++++++
>  lib/librte_acl/acl_vect.h                       | 132 +++---
>  lib/librte_acl/rte_acl.c                        |  47 +-
>  lib/librte_acl/rte_acl.h                        |   4 +
>  lib/librte_acl/rte_acl_osdep_alone.h            |  47 +-
>  lib/librte_eal/common/include/rte_common_vect.h |  39 +-
>  lib/librte_lpm/rte_lpm.h                        |   2 +-
>  20 files changed, 1444 insertions(+), 1054 deletions(-)
>  create mode 100644 lib/librte_acl/acl_run_avx2.c
>  create mode 100644 lib/librte_acl/acl_run_avx2.h
>  create mode 100644 lib/librte_acl/acl_run_sse.h
> 
> -- 
> 1.8.5.3
> 
> 
Series
Acked-by: Neil Horman <nhorman at tuxdriver.com>

Nice work
Neil



More information about the dev mailing list