[dpdk-dev] [RFC] net/mlx5: improve out of box performance

Yongseok Koh yskoh at mellanox.com
Fri Mar 1 02:15:07 CET 2019


Hi,

This one could never be done last time, but this will be tried again
for v19.05.

Thanks,
Yongseok

> On Jun 7, 2018, at 5:38 PM, Yongseok Koh <yskoh at mellanox.com> wrote:
> 
> In mlx5 PMD, there are multiple Tx burst functions,
>   mlx5_tx_burst()
>   mlx5_tx_burst_mpw()
>   mlx5_tx_burst_mpw_inline()
>   mlx5_tx_burst_burst_empw()
>   mlx5_tx_burst_raw_vec()
>   mlx5_tx_burst_vec()
> 
> To provide better user experience and the best out-of-box performance,
> those will need to be consolidated. There will be only one non-vector
> function. As mlx5_tx_burst_vec() calls mlx5_tx_burst_raw_vec(), there'll be
> no change with vector fuctions.
> 
> The reason for multiple Tx burst functions was because newer device has
> enhanced features to improve throughput by further saving PCIe BW.  For the
> new features (e.g. Tx packet inlining), new Tx burst functions had been
> added incrementally. Such new functions were to support new type of Tx
> descriptors. However, problem with selecting a Tx burst statically is,
> although newer devices support all the descriptor types including legacy
> ones, the new function doesn't fall back to the old modes.
> 
> Another issue is that it is very hard to introduce a new feature on Tx
> path. For example, mlx5 supports TSO but currently it is only supported by
> the basic mlx5_tx_burst(). We could've added TSO support to other Tx
> bursts but it is so much painful to add the same code in multiple
> locations. And it isn't even a good idea from maintenance perspective. As a
> result, even though a user wants to enjoy Mellanox's best-in-class
> performance, if TSO is required, mlx5 PMD can't satisfy the user.
> 
> The consolidated Tx burst function will be all-inclusive. This will support
> all types of Tx descriptors (WQE) and HW offloads. WQE type for a
> transmitting packet would be determined dynamically. Decision for packet
> inline will be made by sensing PCIe bottleneck.
> 
> And selection between the consolidated function and the existing vector
> function will still be done during configuration. But CPU architecture will
> also be taken into account.
> 
> Signed-off-by: Yongseok Koh <yskoh at mellanox.com>



More information about the dev mailing list