[PATCH grout] add high-performance clock
Morten Brørup
mb at smartsharesystems.com
Thu Jun 11 11:26:54 CEST 2026
> From: Robin Jarry [mailto:rjarry at redhat.com]
> Sent: Wednesday, 10 June 2026 17.17
>
> Hi Morten,
>
> Morten Brørup, Jun 09, 2026 at 21:06:
> > The current clock is based on clock_gettime(CLOCK_MONOTONIC_RAW),
> which is
> > significantly slower than rte_rdtsc(), even though the kernel exposes
> it
> > as a vDSO.
> >
> > CLOCK_MONOTONIC_RAW is typically based on and in sync with the TSC,
> so use
> > the faster rte_rdtsc() to read the clock when this is the case.
> >
> > Also, introduce a per-thread snapshot of the clock for use in the
> > dataplane, where reading the snapshot is sufficiently accurate, and
> much
> > faster than reading the clock.
> >
> > Signed-off-by: Morten Brørup <mb at smartsharesystems.com>
> > ---
> > Note: Optimizations relying on the clock snapshot will be submitted
> later.
> > ---
> > api/gr_clock.h | 38 ++++++++++-
> > main/clock.c | 103
> +++++++++++++++++++++++++++++
> > main/clock.h | 58 ++++++++++++++++
> > main/meson.build | 1 +
> > modules/infra/datapath/main_loop.c | 3 +
> > 5 files changed, 201 insertions(+), 2 deletions(-)
> > create mode 100644 main/clock.c
> > create mode 100644 main/clock.h
> >
> > diff --git a/api/gr_clock.h b/api/gr_clock.h
> > index d2d98fba..c70a03a1 100644
> > --- a/api/gr_clock.h
> > +++ b/api/gr_clock.h
> > @@ -14,20 +14,54 @@
> > // in calculations where race conditions may cause negative
> differences.
> > typedef int64_t gr_clock_ns_t;
> >
> > +#define GR_NS_PER_S (gr_clock_ns_t)INT64_C(1000000000)
> > +
> > +#ifdef __GROUT_MAIN__
> > +#include <rte_cycles.h>
> > +
> > +// Ref: main/clock.h
> > +extern uint64_t clock_tsc_hz;
> > +#endif
> > +
> > // Get powered-on (non-suspended, non-hibernated) time since last
> boot,
> > // using a common clock across all processes.
> > static inline struct timespec gr_clock_raw(void) {
> > struct timespec tp = {0};
> > +#ifdef __GROUT_MAIN__
> > + if (clock_tsc_hz != 0) {
> > + const uint64_t tsc = rte_rdtsc();
> > + tp.tv_sec = (tsc / clock_tsc_hz);
> > + tp.tv_nsec = (tsc % clock_tsc_hz) * GR_NS_PER_S /
> clock_tsc_hz;
> > + } else {
> > + clock_gettime(CLOCK_MONOTONIC_RAW, &tp);
> > + }
> > + __rte_assume(tp.tv_sec >= 0);
> > + __rte_assume(tp.tv_nsec >= 0);
> > + return tp;
> > +#else
> > clock_gettime(CLOCK_MONOTONIC_RAW, &tp);
> > return tp;
> > +#endif
> > }
> >
> > -#define GR_NS_PER_S (gr_clock_ns_t)1000000000LL
> > -
> > // Get powered-on (non-suspended, non-hibernated) time since last
> boot [nanoseconds],
> > // using a common clock across all processes.
> > // Does not return negative values.
> > static inline gr_clock_ns_t gr_clock_ns(void) {
> > +#ifdef __GROUT_MAIN__
> > + gr_clock_ns_t ret;
> > + if (clock_tsc_hz != 0) {
> > + const uint64_t tsc = rte_rdtsc();
> > + ret = (gr_clock_ns_t)((tsc / clock_tsc_hz) * GR_NS_PER_S
> > + + (tsc % clock_tsc_hz) * GR_NS_PER_S / clock_tsc_hz);
> > + } else {
> > + struct timespec tp = gr_clock_raw();
> > + ret = (gr_clock_ns_t)(tp.tv_sec * GR_NS_PER_S +
> tp.tv_nsec);
> > + }
> > + __rte_assume(ret >= 0);
> > + return ret;
> > +#else
> > struct timespec tp = gr_clock_raw();
> > return tp.tv_sec * GR_NS_PER_S + tp.tv_nsec;
> > +#endif
>
> This means gr_clock_ns() values returned in API messages by the grout
> daemon cannot be compared with values returned in grout clients (e.g.
> grcli).
>
> It will break commands that display time differences such as:
>
> $ grcli fdb show
>
> https://github.com/DPDK/grout/blob/v0.16.0/modules/l2/cli/fdb.c#L141
>
> $ grcli conntrack show
>
> https://github.com/DPDK/grout/blob/v0.16.0/modules/policy/cli/conntrack
> .c#L58
>
Good point.
I will try to make the API integrate deeper, so Grout clients can somehow determine the clock the same way as Grout itself.
I strongly prefer that Grout's dataplane is able to base its high-res clock on rte_rdtsc(), because it executes more than an order of magnitude faster than clock_gettime(CLOCK_MONOTONIC_RAW).
I have seen clock_gettime(CLOCK_MONOTONIC_RAW) execution times of 400-700 CPU cycles.
For comparison, rte_rdtsc() only takes ca. 40 CPU cycles, inclusive the conversion to nanoseconds.
> > diff --git a/main/clock.h b/main/clock.h
> > new file mode 100644
> > index 00000000..c268f5e6
> > --- /dev/null
> > +++ b/main/clock.h
> > @@ -0,0 +1,58 @@
> > +// SPDX-License-Identifier: BSD-3-Clause
> > +// Copyright (c) 2026 SmartShare Systems
> > +
> > +#pragma once
> > +
> > +#include <gr_clock.h>
> > +
> > +#include <rte_common.h>
> > +#include <rte_cycles.h>
> > +#include <rte_per_lcore.h>
> > +
> > +// TSC frequency in Hz.
> > +//
> > +// If non-zero, the TSC is in sync with the common clock.
> > +// If zero, the TSC is out of sync with the common clock.
> > +extern uint64_t clock_tsc_hz;
> > +
> > +// Get common (monotonically increasing) clock from snapshot
> [nanoseconds].
> > +//
> > +// Resembles CLOCK_MONOTONIC_RAW:
> > +// - Pauses (does not increase) while the system is suspended or
> hibernated.
> > +// - Accurate for short intervals, where NTP adjustments would
> distort the measurement.
> > +// - Not accurate for long intervals. It drifts with hardware.
> > +// - - Drifts up to 4.3 seconds/day = 26 minutes/year. (Typical PC
> XTAL with 50 PPM accuracy.)
> > +//
> > +// Wraps around after hundreds of years.
> > +// Does not return negative values.
> > +//
> > +// Call clock_update() to update the clock snapshots for the current
> thread.
> > +static __rte_always_inline gr_clock_ns_t clock_ns(void) {
> > + RTE_DECLARE_PER_LCORE(uint64_t, clock_ns);
>
> This is weird to have a RTE_DECLARE_PER_LCORE inside an inline
> function.
> This should probably move out of the function block
Ack.
> and to declare it
> with the proper type directly to avoid casting:
>
> RTE_DECLARE_PER_LCORE(gr_clock_ns_t, clock_ns);
>
> Also, I would advocate to use this value in gr_clock_ns().
Ack.
>
> > +
> > + const gr_clock_ns_t ret = (gr_clock_ns_t)RTE_PER_LCORE(clock_ns);
> > + __rte_assume(ret >= 0);
> > + return ret;
> > +}
> > +
> > +// Get common (monotonically increasing) clock from snapshot
> [seconds].
> > +//
> > +// Resembles CLOCK_MONOTONIC_RAW:
> > +// - Pauses (does not increase) while the system is suspended or
> hibernated.
> > +// - Not accurate for long intervals. It drifts with hardware.
> > +// - - Drifts up to 4.3 seconds/day = 26 minutes/year. (Typical PC
> XTAL with 50 PPM accuracy.)
> > +//
> > +// Wraps around after hundreds of years.
> > +// Does not return negative values.
> > +//
> > +// Call clock_update() to update the clock snapshots for the current
> thread.
> > +static __rte_always_inline int32_t clock_s(void) {
> > + RTE_DECLARE_PER_LCORE(uint32_t, clock_s);
> > +
> > + const int32_t ret = (int32_t)RTE_PER_LCORE(clock_s);
> > + __rte_assume(ret >= 0);
> > + return ret;
> > +}
> > +
> > +// Update the clock snapshots for the current thread.
> > +void clock_update(void);
> > diff --git a/main/meson.build b/main/meson.build
> > index a57d8600..f0823ff3 100644
> > --- a/main/meson.build
> > +++ b/main/meson.build
> > @@ -3,6 +3,7 @@
> >
> > src += files(
> > 'api.c',
> > + 'clock.c',
> > 'control_queue.c',
> > 'dpdk.c',
> > 'event.c',
> > diff --git a/modules/infra/datapath/main_loop.c
> b/modules/infra/datapath/main_loop.c
> > index f462cfbd..4127631d 100644
> > --- a/modules/infra/datapath/main_loop.c
> > +++ b/modules/infra/datapath/main_loop.c
> > @@ -1,6 +1,8 @@
> > // SPDX-License-Identifier: BSD-3-Clause
> > // Copyright (c) 2023 Robin Jarry
> > +// Copyright (c) 2026 SmartShare Systems
> >
> > +#include "clock.h"
> > #include "config.h"
> > #include "datapath.h"
> > #include "log.h"
> > @@ -258,6 +260,7 @@ reconfig:
> > sleep = 0;
> > timestamp = rte_rdtsc();
> > for (;;) {
> > + clock_update();
>
> Could you move clock_update() inside the housekeeping block? I don't
> think we need more precision that once every 256 rounds of graph walk.
The way it's used now, it could move to housekeeping, yes.
But I plan to call it much more frequently in the future, maybe even inside some graph nodes.
So I prefer to keep it here, to highlight that it is intended for being called very often.
>
> > rte_graph_walk(graph);
> >
> > if (++loop == HOUSEKEEPING_INTERVAL) {
>
>
> --
> Robin
>
> > At participating locations only.
More information about the grout
mailing list