[PATCH grout] l2: optimize bridge FDB source MAC learning

Morten Brørup mb at smartsharesystems.com
Thu May 21 23:09:45 CEST 2026


> From: Robin Jarry [mailto:rjarry at redhat.com]
> Sent: Thursday, 21 May 2026 22.12
> 
> Hi Morten,
> 
> Morten Brørup, May 21, 2026 at 19:59:
> > In the bridge_input node, read the clock once outside the loop, cache
> the
> > value, and pass the cached value as a parameter to fdb_learn() inside
> the
> > loop.
> > This reduces the number of times the node reads the clock, which is a
> > costly operation, from once per packet to once per burst of packets.
> 
> I wonder if it would make sense to have a per-lcore timestamp variable
> accessible via a static inline function that would be refreshed every
> housekeeping turns of main loop, e.g.:
> 
> diff --git a/modules/infra/datapath/datapath.h
> b/modules/infra/datapath/datapath.h
> index 2c1e58e5f8ec..e0784daa3949 100644
> --- a/modules/infra/datapath/datapath.h
> +++ b/modules/infra/datapath/datapath.h
> @@ -3,4 +3,14 @@
> 
>  #pragma once
> 
> +#include <rte_per_lcore.h>
> +
> +#include <time.h>
> +
>  void *gr_datapath_loop(void *priv);
> +
> +RTE_DECLARE_PER_LCORE(clock_t, datapath_timestamp);
> +
> +static inline clock_t datapath_timestamp(void) {
> +	return RTE_PER_LCORE(datapath_timestamp);
> +}
> diff --git a/modules/infra/datapath/main_loop.c
> b/modules/infra/datapath/main_loop.c
> index f462cfbd42dd..2be0885319ba 100644
> --- a/modules/infra/datapath/main_loop.c
> +++ b/modules/infra/datapath/main_loop.c
> @@ -10,6 +10,8 @@
>  #include "vec.h"
>  #include "worker.h"
> 
> +#include <gr_clock.h>
> +
>  #include <rte_common.h>
>  #include <rte_eal.h>
>  #include <rte_errno.h>
> @@ -182,6 +184,8 @@ err:
> 
>  static struct rte_rcu_qsbr *rcu;
> 
> +RTE_DEFINE_PER_LCORE(clock_t, datapath_timestamp);
> +
>  void *gr_datapath_loop(void *priv) {
>  	struct stats_context ctx = {
>  		.stats = NULL,
> @@ -257,6 +261,7 @@ reconfig:
>  	loop = 0;
>  	sleep = 0;
>  	timestamp = rte_rdtsc();
> +	RTE_PER_LCORE(datapath_timestamp) = gr_clock_us();
>  	for (;;) {
>  		rte_graph_walk(graph);
> 
> @@ -292,6 +297,7 @@ reconfig:
>  			ctx.w_stats->total_cycles += cycles;
>  			ctx.w_stats->loop_cycles += cycles;
>  			ctx.w_stats->n_loops += HOUSEKEEPING_INTERVAL;
> +			RTE_PER_LCORE(datapath_timestamp) = gr_clock_us();
>  		}
>  	}
> 
> 
> What do you think?

We started off with something simple as that.
But races started showing up, where e.g. a packet could seemingly spend negative time in a shaper if the enqueueing thread had snapshotted its clock after the dequeueing thread had snapshotted its clock - although the enqueueing thread had obviously put the packet in the queue before the dequeueing thread took it out of the queue.

So we kept the RTE_PER_LCORE() timestamps in various formats, and introduced a "void per_lcore_clock_update(void)" function to update the snapshots of the calling thread.
Now, the trick is: Any burst processing function relying on timestamps must call this update function first. Thinking about the need for this requires the same way of thinking as for locks; you need to always consider ordering and races.

If I added a "clock" module exposing these features for public use (not only internal use), where in the tree should I put the files?
There's going to be a C file with the gr_per_lcore_clock_update() function implementation and the RTE_DEFINE_PER_LCORE(gr_clock_xyz) timestamps in various formats, and an accompanying public header file with the function declaration and the RTE_DECLARE_PER_LCORE(gr_clock_xyz) timestamps in various formats.

Then I can update the bridge FDB patch to use this central clock instead.
And for now, I think updating the clock is only required where you put it in your example above.

I'm not yet familiar with Grout naming conventions.
The "void gr_per_lcore_clock_update(void)" function updates the RTE_PER_LCORE() timestamp variables for the calling thread.
Any guidance for how to name the function?

> 
> > Only update the bridge FDB entry's last_seen timestamp if at least
> 1/4
> > second has passed since its last update.
> > This reduces the pressure on the CPU's store unit.
> >
> > Signed-off-by: Morten Brørup <mb at smartsharesystems.com>
> > ---
> >  modules/l2/control/fdb.c           | 19 +++++++++++++------
> >  modules/l2/control/l2.h            |  3 ++-
> >  modules/l2/datapath/bridge_input.c |  6 +++++-
> >  3 files changed, 20 insertions(+), 8 deletions(-)
> 
> PS(off topic): Since we don't have many email contributions, I didn't
> bother setting up automated CI. That reminds me of the discussions we
> had at the summit about synchronizing the mailing lists and GitHub pull
> requests. I will see if I can push my ideas [1] a little further.
> 
> [1] https://lists.ozlabs.org/pipermail/patchwork/2025-March/007489.html

Well... maybe I should just learn to use the Github workflow instead.
It's probably not a completely obscure and useless skill. ;-)



More information about the grout mailing list