[PATCH dpdk] net/mlx5: add option to reduce Tx datapath compilation time
Bruce Richardson
bruce.richardson at intel.com
Wed Apr 22 09:27:39 CEST 2026
On Tue, Apr 21, 2026 at 11:23:55PM +0200, Robin Jarry wrote:
> The mlx5 Tx datapath compiles 42 variants of the burst function, each
> a specialization of mlx5_tx_burst_tmpl() with a different combination
> of offload flags. The compiler must instantiate and optimize the entire
> 3800+ line template for every variant, which dominates build time for
> all the code base.
>
> When MLX5_MINIMAL_TX is defined, only 11 variants are compiled instead
> of 42. Two new "full without inline" superset variants (full_noi and
> full_noi_empw) are introduced to satisfy the selection algorithm
> constraint that the INLINE bit must match exactly between request and
> variant. The remaining 9 variants are existing ones that already cover
> all reachable combinations of the EMPW, MPW, INLINE and TXPP flags.
>
> The selection function is unchanged. At runtime, it picks the best
> matching variant from whatever is available. With the minimal set, each
> selected variant may include a few unnecessary offload checks compared
> to the precisely-tailored original, which has negligible impact on
> performance since modern branch predictors handle static never-taken
> branches well.
>
> Compilation times (MM:SS) measured on Intel Core Ultra 7 165U with GCC
> 16.0.1:
>
> FILE BUILD BEFORE AFTER DELTA
> ================= ============== ======== ======= ===============
> mlx5_tx_mpw.c debug 00:31 00:22 -00:09 (-29.0%)
> mlx5_tx_txpp.c 00:39 00:25 -00:14 (-35.9%)
> mlx5_tx_empw.c 01:11 00:19 -00:52 (-73.2%)
> mlx5_tx_nompw.c 01:13 00:16 -00:57 (-78.1%)
> ----------------- -------------- -------- ------- ---------------
> mlx5_tx_mpw.c debug+asan 03:15 02:45 -00:30 (-15.4%)
> mlx5_tx_txpp.c *06:28* 03:13 -03:15 (-50.3%)
> mlx5_tx_empw.c *12:07* 01:55 -10:12 (-84.2%)
> mlx5_tx_nompw.c *12:54* 01:45 -11:09 (-86.4%)
> ----------------- -------------- -------- ------- ---------------
> mlx5_tx_mpw.c release 00:12 00:09 -00:03 (-25.0%)
> mlx5_tx_txpp.c 00:31 00:24 -00:07 (-22.6%)
> mlx5_tx_empw.c 00:32 00:18 -00:14 (-43.8%)
> mlx5_tx_nompw.c 00:34 00:16 -00:18 (-52.9%)
> ----------------- -------------- -------- ------- ---------------
> mlx5_tx_mpw.c release+asan 00:25 00:23 -00:02 (-8.0%)
> mlx5_tx_empw.c 01:24 00:42 -00:42 (-50.0%)
> mlx5_tx_txpp.c 01:32 00:59 -00:33 (-35.9%)
> mlx5_tx_nompw.c 01:38 00:37 -01:01 (-62.2%)
>
> To enable, pass -DMLX5_MINIMAL_TX via c_args:
>
> meson setup build -Dc_args='-DMLX5_MINIMAL_TX'
>
> Signed-off-by: Robin Jarry <rjarry at redhat.com>
Out of interest, do you have any numbers for the performance delta between
the release builds with and without the new flag? I'm just wondering if the
flag can be the default to speed up builds generally? [Not that it should
affect me that much, I generally use -Denable_drivers flag when configuring
my builds to only select the specific drivers I want]
/Bruce
More information about the dev
mailing list