[dpdk-dev] [PATCH v2] net/mlx5: improve vMPRQ descriptors allocation locality

Thomas Monjalon thomas at monjalon.net
Fri Nov 13 18:58:36 CET 2020


> > There is a performance penalty for the replenish scheme used in vectorized Rx
> > burst for both MPRQ and SPRQ.
> > Mbuf elements are being filled at the end of the mbufs array and being
> > replenished at the beginning. That leads to an increase in cache misses and the
> > performance drop.
> > The more Rx descriptors are used the worse the situation.
> > 
> > Change the allocation scheme for vectorized MPRQ Rx burst:
> > allocate new mbufs only when consumed mbufs are almost depleted (always
> > have one burst gap between allocated and consumed indices). Keeping a small
> > number of mbufs allocated improves cache locality and improves performance
> > a lot.
> > 
> > Unfortunately, this approach cannot be applied to SPRQ Rx burst routine. In
> > MPRQ Rx burst we simply copy packets from external MPRQ buffers or attach
> > these buffers to mbufs.
> > In SPRQ Rx burst we allow the NIC to fill mbufs for us.
> > Hence keeping a small number of allocated mbufs will limit NIC ability to fill as
> > many buffers as possible. This fact offsets the advantage of better cache
> > locality.
> > 
> > Fixes: 0f20acbf5e ("net/mlx5: implement vectorized MPRQ burst")
> > 
> > Signed-off-by: Alexander Kozyrev <akozyrev at nvidia.com>
> Acked-by: Viacheslav Ovsiienko <viacheslavo at nvidia.com>

Applied in next-net-mlx, thanks.




More information about the dev mailing list