<div dir="ltr">Thanks for the clarification! <div>I was able to get 148Mpps with 12 lcores after some BIOS tunings. </div><div>Looks like due to these HW limitations I have to use ring buffer as you suggested to support more than 32 lcores! </div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">пт, 18 февр. 2022 г. в 16:40, Dmitry Kozlyuk <<a href="mailto:dkozlyuk@nvidia.com">dkozlyuk@nvidia.com</a>>:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hi,<br>
<br>
> With more than 12 lcores overall receive performance reduces.<br>
> With 16-32 lcores I get 100-110 Mpps,<br>
<br>
It is more about the number of queues than the number of cores:<br>
12 queues are the threshold when Multi-Packet Receive Queue (MPRQ)<br>
is automatically enabled in mlx5 PMD.<br>
Try increasing --rxd and check out mprq_en device argument.<br>
Please see mlx5 PMD user guide for details about MPRQ.<br>
You should be able to get full 148 Mpps with your HW.<br>
<br>
> and I get a significant performance fall with 33 lcores - 84Mpps.<br>
> With 63 cores I get even 35Mpps overall receive performance.<br>
> <br>
> Are there any limitations on the total number of receive queues (total<br>
> lcores) that can handle a single port on a given NIC?<br>
<br>
This is a hardware limitation.<br>
The limit on the number of queues you can create is very high (16M),<br>
but performance can perfectly scale only up to 32 queues<br>
at high packet rates (as opposed to bit rates).<br>
Using more queues can even degrade it, just as you observe.<br>
One way to overcome this (not specific to mlx5)<br>
is to use a ring buffer for incoming packets,<br>
from which any number of processing cores can take packets.<br>
</blockquote></div>