dpdk Tx falling short
Ivan Malov
ivan.malov at arknetworks.am
Tue Jul 8 16:18:58 CEST 2025
Hi Ed,
On Tue, 8 Jul 2025, Lombardo, Ed wrote:
> Hi Stephen,
> When I replace rte_eth_tx_burst() with mbuf free bulk I do not see the tx ring fill up. I think this is valuable information. Also, perf analysis of the tx thread shows common_ring_mp_enqueue and rte_atomic32_cmpset, where I did not expect to see if I created all the Tx rings as SP and SC (and the workers and ack rings as well, essentially all the 16 rings).
>
> Perf report snippet:
> + 57.25% DPDK_TX_1 test [.] common_ring_mp_enqueue
> + 25.51% DPDK_TX_1 test [.] rte_atomic32_cmpset
> + 9.13% DPDK_TX_1 test [.] i40e_xmit_pkts
> + 6.50% DPDK_TX_1 test [.] rte_pause
> 0.21% DPDK_TX_1 test [.] rte_mempool_ops_enqueue_bulk.isra.0
> 0.20% DPDK_TX_1 test [.] dpdk_tx_thread
>
> The traffic load is constant 10 Gbps 84 bytes packets with no idles. The burst size of 512 is a desired burst of mbufs, however the tx thread will transmit what ever it can get from the Tx ring.
>
> I think if resolving why the perf analysis shows ring is MP when it has been created as SP / SC should resolve this issue.
The 'common_ring_mp_enqueue' is the enqueue method of mempool variant 'ring',
that is, based on RTE Ring internally. When you say that ring has been created
as SP / SC you seemingly refer to the regular RTE ring created by your
application logic, not the internal ring of the mempool. Am I missing something?
Thank you.
>
> Thanks,
> ed
>
> -----Original Message-----
> From: Stephen Hemminger <stephen at networkplumber.org>
> Sent: Tuesday, July 8, 2025 9:47 AM
> To: Lombardo, Ed <Ed.Lombardo at netscout.com>
> Cc: Ivan Malov <ivan.malov at arknetworks.am>; users <users at dpdk.org>
> Subject: Re: dpdk Tx falling short
>
> External Email: This message originated outside of NETSCOUT. Do not click links or open attachments unless you recognize the sender and know the content is safe.
>
> On Tue, 8 Jul 2025 04:10:05 +0000
> "Lombardo, Ed" <Ed.Lombardo at netscout.com> wrote:
>
>> Hi Stephen,
>> I ensured that in every pipeline stage that enqueue or dequeues mbufs it uses the burst version, perf showed the repercussions of doing one mbuf dequeue and enqueue.
>> For the receive stage rte_eth_rx_burst() is used and Tx stage we use rte_eth_tx_burst(). The burst size used in tx_thread for dequeue burst is 512 Mbufs.
>
> You might try buffering like rte_eth_tx_buffer does.
> Need to add an additional mechanism to ensure that buffer gets flushed when you detect idle period.
>
More information about the users
mailing list