[PATCH] examples/l3fwd: optimize packet prefetch
huangdengdui
huangdengdui at huawei.com
Thu Jan 9 12:31:00 CET 2025
On 2025/1/8 21:42, Konstantin Ananyev wrote:
>
>
>>
>> The prefetch window depending on the hardware platform. The current prefetch
>> policy may not be applicable to all platforms. In most cases, the number of
>> packets received by Rx burst is small (64 is used in most performance reports).
>> In L3fwd, the maximum value cannot exceed 512. Therefore, prefetching all
>> packets before processing can achieve better performance.
>
> As you mentioned 'prefetch' behavior differs a lot from one HW platform to another.
> So it could easily be that changes you suggesting will cause performance
> boost on one platform and degradation on another.
> In fact, right now l3fwd 'prefetch' usage is a bit of mess:
> - l3fwd_lpm_neon.h uses FWDSTEP as a prefetch window.
> - l3fwd_fib.c uses FIB_PREFETCH_OFFSET for that purpose
> - rest of the code uses either PREFETCH_OFFSET or doesn't use 'prefetch' at all
>
> Probably what we need here is some unified approach:
> configurable at run-time prefetch_window_size that all code-paths will obey.
Agreed, I'll add a parameter to configure the prefetch window.
>
>> Signed-off-by: Dengdui Huang <huangdengdui at huawei.com>
>> ---
>> examples/l3fwd/l3fwd_lpm_neon.h | 42 ++++-----------------------------
>> 1 file changed, 5 insertions(+), 37 deletions(-)
>>
>> diff --git a/examples/l3fwd/l3fwd_lpm_neon.h b/examples/l3fwd/l3fwd_lpm_neon.h
>> index 3c1f827424..0b51782b8c 100644
>> --- a/examples/l3fwd/l3fwd_lpm_neon.h
>> +++ b/examples/l3fwd/l3fwd_lpm_neon.h
>> @@ -91,53 +91,21 @@ l3fwd_lpm_process_packets(int nb_rx, struct rte_mbuf **pkts_burst,
>> const int32_t k = RTE_ALIGN_FLOOR(nb_rx, FWDSTEP);
>> const int32_t m = nb_rx % FWDSTEP;
>>
>> - if (k) {
>> - for (i = 0; i < FWDSTEP; i++) {
>> - rte_prefetch0(rte_pktmbuf_mtod(pkts_burst[i],
>> - void *));
>> - }
>> - for (j = 0; j != k - FWDSTEP; j += FWDSTEP) {
>> - for (i = 0; i < FWDSTEP; i++) {
>> - rte_prefetch0(rte_pktmbuf_mtod(
>> - pkts_burst[j + i + FWDSTEP],
>> - void *));
>> - }
>> + /* The number of packets is small. Prefetch all packets. */
>> + for (i = 0; i < nb_rx; i++)
>> + rte_prefetch0(rte_pktmbuf_mtod(pkts_burst[i], void *));
>>
>> + if (k) {
>> + for (j = 0; j != k; j += FWDSTEP) {
>> processx4_step1(&pkts_burst[j], &dip, &ipv4_flag);
>> processx4_step2(qconf, dip, ipv4_flag, portid,
>> &pkts_burst[j], &dst_port[j]);
>> if (do_step3)
>> processx4_step3(&pkts_burst[j], &dst_port[j]);
>> }
>> -
>> - processx4_step1(&pkts_burst[j], &dip, &ipv4_flag);
>> - processx4_step2(qconf, dip, ipv4_flag, portid, &pkts_burst[j],
>> - &dst_port[j]);
>> - if (do_step3)
>> - processx4_step3(&pkts_burst[j], &dst_port[j]);
>> -
>> - j += FWDSTEP;
>> }
>>
>> if (m) {
>> - /* Prefetch last up to 3 packets one by one */
>> - switch (m) {
>> - case 3:
>> - rte_prefetch0(rte_pktmbuf_mtod(pkts_burst[j],
>> - void *));
>> - j++;
>> - /* fallthrough */
>> - case 2:
>> - rte_prefetch0(rte_pktmbuf_mtod(pkts_burst[j],
>> - void *));
>> - j++;
>> - /* fallthrough */
>> - case 1:
>> - rte_prefetch0(rte_pktmbuf_mtod(pkts_burst[j],
>> - void *));
>> - j++;
>> - }
>> - j -= m;
>> /* Classify last up to 3 packets one by one */
>> switch (m) {
>> case 3:
>> --
>> 2.33.0
>
More information about the dev
mailing list