[dpdk-dev] [PATCH v13 0/5] use WFE for aarch64

David Marchand david.marchand at redhat.com
Fri Jan 17 12:15:56 CET 2020


On Thu, Nov 7, 2019 at 10:35 PM David Marchand
<david.marchand at redhat.com> wrote:
>
> DPDK has multiple use cases where the core repeatedly polls a location in
> memory. This polling results in many cache and memory transactions.
>
> Arm architecture provides WFE (Wait For Event) instruction, which allows
> the cpu core to enter a low power state until woken up by the update to the
> memory location being polled. Thus reducing the cache and memory
> transactions.
>
> x86 has the PAUSE hint instruction to reduce such overhead.
>
> The rte_wait_until_equal_xxx APIs abstract the functionality of 'polling
> for a memory location to become equal to a given value'.
>
> For non-Arm platforms, these APIs are just wrappers around do-while loop
> with rte_pause, so there are no performance differences.
>
> For Arm platforms, use of WFE can be configured using CONFIG_RTE_USE_WFE
> option. It is disabled by default.
>
> Currently, use of WFE is supported only for aarch64 platforms. armv7
> platforms do support the WFE instruction, but they require explicit wake up
> events(sev) and are less performannt.
>
> Testing shows that, performance varies across different platforms, with
> some showing degradation.
>
> CONFIG_RTE_USE_WFE should be enabled depending on the performance on the
> target platforms.
>
> V13:
> - added release notes update,
> - reworked arm implementation to avoid exporting inlines,
> - added assert in generic implementation,
>
> V12:
> - remove the 'rte_' prefix from the arm specific functions (David Marchand)
> - use the __atomic_load_ex_xx functions in arm specific implementations of
>   APIS (David Marchand)
> - remove the experimental warnings (David Marchand)
> - tweak the macros working scope (David Marchand)
> V11:
> - add rte_ prefix to the __atomic_load_ex_x funtions (Ananyev Konstantin)
> - define the above rte_atomic_load_ex_x funtions even if not
>   RTE_WAIT_UNTIL_EQUAL_ARCH_DEFINED for future non-wfe usages (Ananyev
>   Konstantin)
> - use the above functions for arm specific rte_wait_until_equal_x functions
>   (Ananyev Konstantin)
> - simplify the generic implementation by immersing "if" into "while"
>   (Ananyev Konstantin)
>
> V10:
> - move arm specific stuff to arch/arm/rte_pause_64.h (Ananyev Konstantin)
>
> V9:
> - fix a weblink broken (David Marchand)
> - define rte_wfe and rte_sev() (Ananyev Konstantin)
> - explicitly define three function APIs instead of marcos (Ananyev Konstantin)
> - incorporate common rte_wfe and rte_sev into the generic rte_spinlock (David
>   Marchand)
> - define arch neutral RTE_WAIT_UNTIL_EQUAL_ARCH_DEFINED (Ananyev Konstantin)
> - define rte_load_ex_16/32/64 functions to use load-exclusive instruction for
>   aarch64, which is required for wake up of WFE
> - drop the rte_spinlock patch from this series, as the it calls this
>   experimental API and it is widely included by a lot of components each
>   requires the ALLOW_EXPERIMENRAL_API for the Makefile and meson.build, leave
>   it to future after the experimental is removed.
>
> V8:
> - simplify dmb definition to use io barriers (David Marchand)
> - define wfe() and sev() macros and use them inside normal C code (Ananyev
>   Konstantin)
> - pass memorder as parameter, not to incorporate it into function name, less
>   functions, similar to C11 atomic intrinsics (Ananyev Konstantin)
> - remove mandating RTE_FORCE_INTRINSICS in arm spinlock implementation (David
>   Marchand)
> - undef __WAIT_UNTIL_EQUAL after use (David Marchand)
> - add experimental tag and warning (David Marchand)
> - add the limitation of using WFE instruction in the commit log (David
>   Marchand)
> - tweak the use of RTE_FORCE_INSTRINSICS (still mandatory for aarch64) and
>   RTE_ARM_USE_WFE for spinlock (David Marchand)
> - drop the rte_ring patch from this series, as the rte_ring.h calls this API
>   and it is widely included by a lot of components each requires the
>   ALLOW_EXPERIMENRAL_API for the Makefile and meson.build, leave it to future
>   after the experimental is removed.
>
> V7:
> - fix the checkpatch LONG_LINE_COMMENT issue
>
> V6:
> - squash the RTE_ARM_USE_WFE configuration entry patch into the new API patch
> - move the new configuration to the end of EAL
> - add doxygen comments to reflect the relaxed and acquire semantics
> - correct the meson configuration
>
> V5:
> - add doxygen comments for the new APIs
> - spinlock early exit without wfe if the spinlock not taken by others.
> - add two patches on top for opdl and thunderx
>
> V4:
> - rename the config as CONFIG_RTE_ARM_USE_WFE to indicate it applys to arm only
> - introduce a macro for assembly Skelton to reduce the duplication of code
> - add one patch for nxp fslmc to address a compiling error
>
> V3:
> - Convert RFCs to patches
>
> V2:
> - Use inline functions instead of marcos
> - Add load and compare in the beginning of the APIs
> - Fix some style errors in asm inline
>
> V1:
> - Add the new APIs and use it for ring and locks

Series applied.
Thanks.


-- 
David Marchand



More information about the dev mailing list