[PATCH dpdk] net/mlx5: add option to reduce Tx datapath compilation time

Bruce Richardson bruce.richardson at intel.com
Wed Apr 22 09:27:39 CEST 2026


On Tue, Apr 21, 2026 at 11:23:55PM +0200, Robin Jarry wrote:
> The mlx5 Tx datapath compiles 42 variants of the burst function, each
> a specialization of mlx5_tx_burst_tmpl() with a different combination
> of offload flags. The compiler must instantiate and optimize the entire
> 3800+ line template for every variant, which dominates build time for
> all the code base.
> 
> When MLX5_MINIMAL_TX is defined, only 11 variants are compiled instead
> of 42. Two new "full without inline" superset variants (full_noi and
> full_noi_empw) are introduced to satisfy the selection algorithm
> constraint that the INLINE bit must match exactly between request and
> variant. The remaining 9 variants are existing ones that already cover
> all reachable combinations of the EMPW, MPW, INLINE and TXPP flags.
> 
> The selection function is unchanged. At runtime, it picks the best
> matching variant from whatever is available. With the minimal set, each
> selected variant may include a few unnecessary offload checks compared
> to the precisely-tailored original, which has negligible impact on
> performance since modern branch predictors handle static never-taken
> branches well.
> 
> Compilation times (MM:SS) measured on Intel Core Ultra 7 165U with GCC
> 16.0.1:
> 
> FILE              BUILD          BEFORE   AFTER   DELTA
> ================= ============== ======== ======= ===============
> mlx5_tx_mpw.c     debug          00:31    00:22   -00:09 (-29.0%)
> mlx5_tx_txpp.c                   00:39    00:25   -00:14 (-35.9%)
> mlx5_tx_empw.c                   01:11    00:19   -00:52 (-73.2%)
> mlx5_tx_nompw.c                  01:13    00:16   -00:57 (-78.1%)
> ----------------- -------------- -------- ------- ---------------
> mlx5_tx_mpw.c     debug+asan     03:15    02:45   -00:30 (-15.4%)
> mlx5_tx_txpp.c                  *06:28*   03:13   -03:15 (-50.3%)
> mlx5_tx_empw.c                  *12:07*   01:55   -10:12 (-84.2%)
> mlx5_tx_nompw.c                 *12:54*   01:45   -11:09 (-86.4%)
> ----------------- -------------- -------- ------- ---------------
> mlx5_tx_mpw.c     release        00:12    00:09   -00:03 (-25.0%)
> mlx5_tx_txpp.c                   00:31    00:24   -00:07 (-22.6%)
> mlx5_tx_empw.c                   00:32    00:18   -00:14 (-43.8%)
> mlx5_tx_nompw.c                  00:34    00:16   -00:18 (-52.9%)
> ----------------- -------------- -------- ------- ---------------
> mlx5_tx_mpw.c     release+asan   00:25    00:23   -00:02 (-8.0%)
> mlx5_tx_empw.c                   01:24    00:42   -00:42 (-50.0%)
> mlx5_tx_txpp.c                   01:32    00:59   -00:33 (-35.9%)
> mlx5_tx_nompw.c                  01:38    00:37   -01:01 (-62.2%)
> 
> To enable, pass -DMLX5_MINIMAL_TX via c_args:
> 
>   meson setup build -Dc_args='-DMLX5_MINIMAL_TX'
> 
> Signed-off-by: Robin Jarry <rjarry at redhat.com>

Out of interest, do you have any numbers for the performance delta between
the release builds with and without the new flag? I'm just wondering if the
flag can be the default to speed up builds generally? [Not that it should
affect me that much, I generally use -Denable_drivers flag when configuring
my builds to only select the specific drivers I want]

/Bruce


More information about the dev mailing list