[dpdk-dev] [PATCH] event/octeontx2: remove WFE from dualslot dequeue
Gavin Hu
Gavin.Hu at arm.com
Sat Feb 15 06:56:52 CET 2020
Hi Pavan,
> -----Original Message-----
> From: pbhagavatula at marvell.com <pbhagavatula at marvell.com>
> Sent: Friday, February 14, 2020 2:45 PM
> To: jerinj at marvell.com; Pavan Nikhilesh <pbhagavatula at marvell.com>
> Cc: Gavin Hu <Gavin.Hu at arm.com>; dev at dpdk.org
> Subject: [dpdk-dev] [PATCH] event/octeontx2: remove WFE from dualslot
> dequeue
>
> From: Pavan Nikhilesh <pbhagavatula at marvell.com>
>
> Each workslot is always bound to a specific lcore there is no multi-core
> contention to cause cache trashing as a result it is safe to remove the
> WFE. Also, in dual workslot dequeue work will mostlikely be available on
> the pair workslot making WFE impractical.
Does SSO still signal EVENTI to exit from WFE? Then the core ignore it?
Can this be disabled as WFE is removed?
>
> Signed-off-by: Pavan Nikhilesh <pbhagavatula at marvell.com>
> ---
>
> Also, this in-turn reduces the branch misses
>
> Before:
> 0
> arm_spe_0/ts_enable=1,pct_enable=1,pa_enable=1,branch_filter=1,jitter=1,
> min_latency=0/
> 0 dummy:u
> 0 llc-miss
> 0 tlb-miss
> 853 branch-miss
> 0 remote-access
> 0 l1d-miss
>
> After:
> 0
> arm_spe_0/ts_enable=1,pct_enable=1,pa_enable=1,branch_filter=1,jitter=1,
> min_latency=0/
> 0 dummy:u
> 0 llc-miss
> 0 tlb-miss
> 250 branch-miss
> 0 remote-access
> 0 l1d-miss
>
> WFE Data:
>
> 0x4C40 - WFI_WFE_WAIT_CYCLES - Number of cycles waiting at a WFI or
> WFE instruction.
>
> - WFE Cycles before the patch for Dual workslot
> #perf stat -C 20 -e r4C40 sleep 1
> Performance counter stats for 'CPU(s) 20':
>
> 264 r4C40
> 1.002494168 seconds time elapsed
>
> - WFE Cycles for single workslot
> #perf stat -C 20 -e r4C40 sleep 1
> Performance counter stats for 'CPU(s) 20':
>
> 908,778,351 r4C40
> 1.002598253 seconds time elapsed
>
> drivers/event/octeontx2/otx2_worker_dual.h | 6 +-----
> 1 file changed, 1 insertion(+), 5 deletions(-)
>
> diff --git a/drivers/event/octeontx2/otx2_worker_dual.h
> b/drivers/event/octeontx2/otx2_worker_dual.h
> index 5134e3d52..c88420eb4 100644
> --- a/drivers/event/octeontx2/otx2_worker_dual.h
> +++ b/drivers/event/octeontx2/otx2_worker_dual.h
> @@ -29,11 +29,7 @@ otx2_ssogws_dual_get_work(struct
> otx2_ssogws_state *ws,
> rte_prefetch_non_temporal(lookup_mem);
> #ifdef RTE_ARCH_ARM64
> asm volatile(
> - " ldr %[tag], [%[tag_loc]] \n"
> - " ldr %[wqp], [%[wqp_loc]] \n"
> - " tbz %[tag], 63, done%= \n"
> - " sevl \n"
> - "rty%=: wfe \n"
> + "rty%=: \n"
> " ldr %[tag], [%[tag_loc]] \n"
> " ldr %[wqp], [%[wqp_loc]] \n"
> " tbnz %[tag], 63, rty%= \n"
> --
> 2.17.1
More information about the dev
mailing list