<div dir="ltr">I get 125 Mpps from single port using 12 lcores:<div>numactl -N 1 -m 1 /opt/dpdk-21.11/build/app/dpdk-testpmd -l 64-127 -n 4 -a 0000:c1:00.0 -- --stats-period 1 --nb-cores=12 --rxq=12 --txq=12 --rxd=512<br></div><div><br></div><div>With 63 cores i get 35 Mpps:</div><div>numactl -N 1 -m 1 /opt/dpdk-21.11/build/app/dpdk-testpmd -l 64-127 -n 4 -a 0000:c1:00.0 -- --stats-period 1 --nb-cores=63 --rxq=63 --txq=63 --rxd=512<br></div><div><br></div><div>I'm using this guide as a reference - <a href="https://fast.dpdk.org/doc/perf/DPDK_20_11_Mellanox_NIC_performance_report.pdf">https://fast.dpdk.org/doc/perf/DPDK_20_11_Mellanox_NIC_performance_report.pdf</a></div><div>This reference suggests examples of how to get the best performance but all of them use maximum 12 lcores. </div><div>125 Mpps with 12 lcores is nearly the maximum I can get from single 100GB port (148Mpps theoretical maximum for 64byte packet). I just want to understand - why I get good performance with 12 lcores and bad performance with 63 cores?</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">пт, 18 февр. 2022 г. в 16:30, Asaf Penso <<a href="mailto:asafp@nvidia.com">asafp@nvidia.com</a>>:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<div dir="auto">Hello Dmitry,</div>
<div dir="auto"><br>
</div>
<div dir="auto">Could you please paste the testpmd commands per each experiment?</div>
<div dir="auto"><br>
</div>
<div dir="auto">Also, have you looked into <a href="http://dpdk.org" target="_blank">dpdk.org</a> performance report to see how to tune for best results?</div>
<div><br>
</div>
<div id="gmail-m_-5122324262711783622ms-outlook-mobile-signature" dir="auto">
<div>Regards,</div>
<div dir="auto">Asaf Penso</div>
</div>
<hr style="display:inline-block;width:98%">
<div id="gmail-m_-5122324262711783622divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" style="font-size:11pt" color="#000000"><b>From:</b> Дмитрий Степанов <<a href="mailto:stepanov.dmit@gmail.com" target="_blank">stepanov.dmit@gmail.com</a>><br>
<b>Sent:</b> Friday, February 18, 2022 9:32:59 AM<br>
<b>To:</b> <a href="mailto:users@dpdk.org" target="_blank">users@dpdk.org</a> <<a href="mailto:users@dpdk.org" target="_blank">users@dpdk.org</a>><br>
<b>Subject:</b> Mellanox performance degradation with more than 12 lcores</font>
<div> </div>
</div>
<div>
<div dir="ltr">
<div>Hi folks!</div>
<div><br>
</div>
I'm using Mellanox ConnectX-6 Dx EN adapter card (100GbE; Dual-port QSFP56; PCIe 4.0/3.0 x16) with DPDK 21.11 on a server with AMD EPYC 7702 64-Core Processor (NUMA system with 2 sockets). Hyperthreading is turned off.<br>
I'm testing the maximum receive throughput I can get from a single port using testpmd utility (shipped with dpdk). My generator produces random UDP packets with zero payload length.
<br>
<br>
I get the maximum performance using 8-12 lcores (overall 120-125Mpps on receive path of single port):<br>
<br>
numactl -N 1 -m 1 /opt/dpdk-21.11/build/app/dpdk-testpmd -l 64-127 -n 4 -a 0000:c1:00.0 -- --stats-period 1 --nb-cores=12 --rxq=12 --txq=12 --rxd=512<br>
<br>
With more than 12 lcores overall receive performance reduces. With 16-32 lcores I get 100-110 Mpps, and I get a significant performance fall with 33 lcores - 84Mpps. With 63 cores I get even 35Mpps overall receive performance.<br>
<br>
Are there any limitations on the total number of receive queues (total lcores) that can handle a single port on a given NIC?<br>
<div><br>
</div>
<div>Thanks,<br>
Dmitriy Stepanov<br>
</div>
</div>
</div>
</div>
</blockquote></div>