[PATCH] examples/l3fwd: optimize packet prefetch

huangdengdui huangdengdui at huawei.com
Thu Jan 9 12:31:00 CET 2025

Previous message (by thread): [PATCH] examples/l3fwd: optimize packet prefetch
Next message (by thread): [PATCH v9 00/30] fix packing of structs when building with MSVC
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 2025/1/8 21:42, Konstantin Ananyev wrote:
> 
> 
>>
>> The prefetch window depending on the hardware platform. The current prefetch
>> policy may not be applicable to all platforms. In most cases, the number of
>> packets received by Rx burst is small (64 is used in most performance reports).
>> In L3fwd, the maximum value cannot exceed 512. Therefore, prefetching all
>> packets before processing can achieve better performance.
> 
> As you mentioned 'prefetch' behavior differs a lot from one HW platform to another.
> So it could easily be that changes you suggesting will cause performance
> boost on one platform and degradation on another.
> In fact, right now l3fwd 'prefetch' usage is a bit of mess:
> - l3fwd_lpm_neon.h uses  FWDSTEP as a prefetch window.
> - l3fwd_fib.c uses FIB_PREFETCH_OFFSET for that purpose
> - rest of the code uses either PREFETCH_OFFSET or doesn't use 'prefetch' at all
>  
> Probably what we need here is some unified approach:
> configurable at run-time prefetch_window_size that all code-paths will obey. 

Agreed, I'll add a parameter to configure the prefetch window.

> 
>> Signed-off-by: Dengdui Huang <huangdengdui at huawei.com>
>> ---
>>  examples/l3fwd/l3fwd_lpm_neon.h | 42 ++++-----------------------------
>>  1 file changed, 5 insertions(+), 37 deletions(-)
>>
>> diff --git a/examples/l3fwd/l3fwd_lpm_neon.h b/examples/l3fwd/l3fwd_lpm_neon.h
>> index 3c1f827424..0b51782b8c 100644
>> --- a/examples/l3fwd/l3fwd_lpm_neon.h
>> +++ b/examples/l3fwd/l3fwd_lpm_neon.h
>> @@ -91,53 +91,21 @@ l3fwd_lpm_process_packets(int nb_rx, struct rte_mbuf **pkts_burst,
>>  	const int32_t k = RTE_ALIGN_FLOOR(nb_rx, FWDSTEP);
>>  	const int32_t m = nb_rx % FWDSTEP;
>>
>> -	if (k) {
>> -		for (i = 0; i < FWDSTEP; i++) {
>> -			rte_prefetch0(rte_pktmbuf_mtod(pkts_burst[i],
>> -							void *));
>> -		}
>> -		for (j = 0; j != k - FWDSTEP; j += FWDSTEP) {
>> -			for (i = 0; i < FWDSTEP; i++) {
>> -				rte_prefetch0(rte_pktmbuf_mtod(
>> -						pkts_burst[j + i + FWDSTEP],
>> -						void *));
>> -			}
>> +	/* The number of packets is small. Prefetch all packets. */
>> +	for (i = 0; i < nb_rx; i++)
>> +		rte_prefetch0(rte_pktmbuf_mtod(pkts_burst[i], void *));
>>
>> +	if (k) {
>> +		for (j = 0; j != k; j += FWDSTEP) {
>>  			processx4_step1(&pkts_burst[j], &dip, &ipv4_flag);
>>  			processx4_step2(qconf, dip, ipv4_flag, portid,
>>  					&pkts_burst[j], &dst_port[j]);
>>  			if (do_step3)
>>  				processx4_step3(&pkts_burst[j], &dst_port[j]);
>>  		}
>> -
>> -		processx4_step1(&pkts_burst[j], &dip, &ipv4_flag);
>> -		processx4_step2(qconf, dip, ipv4_flag, portid, &pkts_burst[j],
>> -				&dst_port[j]);
>> -		if (do_step3)
>> -			processx4_step3(&pkts_burst[j], &dst_port[j]);
>> -
>> -		j += FWDSTEP;
>>  	}
>>
>>  	if (m) {
>> -		/* Prefetch last up to 3 packets one by one */
>> -		switch (m) {
>> -		case 3:
>> -			rte_prefetch0(rte_pktmbuf_mtod(pkts_burst[j],
>> -							void *));
>> -			j++;
>> -			/* fallthrough */
>> -		case 2:
>> -			rte_prefetch0(rte_pktmbuf_mtod(pkts_burst[j],
>> -							void *));
>> -			j++;
>> -			/* fallthrough */
>> -		case 1:
>> -			rte_prefetch0(rte_pktmbuf_mtod(pkts_burst[j],
>> -							void *));
>> -			j++;
>> -		}
>> -		j -= m;
>>  		/* Classify last up to 3 packets one by one */
>>  		switch (m) {
>>  		case 3:
>> --
>> 2.33.0
>

Previous message (by thread): [PATCH] examples/l3fwd: optimize packet prefetch
Next message (by thread): [PATCH v9 00/30] fix packing of structs when building with MSVC
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the dev mailing list