<div dir="ltr">Thanks for the clarification! <div>I was able to get 148Mpps with 12 lcores after some BIOS tunings. </div><div>Looks like due to these HW limitations I have to use ring buffer as you suggested to support more than 32 lcores! </div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">пт, 18 февр. 2022 г. в 16:40, Dmitry Kozlyuk <<a href="mailto:dkozlyuk@nvidia.com">dkozlyuk@nvidia.com</a>>:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hi,<br> <br> > With more than 12 lcores overall receive performance reduces.<br> > With 16-32 lcores I get 100-110 Mpps,<br> <br> It is more about the number of queues than the number of cores:<br> 12 queues are the threshold when Multi-Packet Receive Queue (MPRQ)<br> is automatically enabled in mlx5 PMD.<br> Try increasing --rxd and check out mprq_en device argument.<br> Please see mlx5 PMD user guide for details about MPRQ.<br> You should be able to get full 148 Mpps with your HW.<br> <br> > and I get a significant performance fall with 33 lcores - 84Mpps.<br> > With 63 cores I get even 35Mpps overall receive performance.<br> > <br> > Are there any limitations on the total number of receive queues (total<br> > lcores) that can handle a single port on a given NIC?<br> <br> This is a hardware limitation.<br> The limit on the number of queues you can create is very high (16M),<br> but performance can perfectly scale only up to 32 queues<br> at high packet rates (as opposed to bit rates).<br> Using more queues can even degrade it, just as you observe.<br> One way to overcome this (not specific to mlx5)<br> is to use a ring buffer for incoming packets,<br> from which any number of processing cores can take packets.<br> </blockquote></div>