[PATCH] ring: avoid extra store at move head
Morten Brørup
mb at smartsharesystems.com
Tue Jun 2 11:21:27 CEST 2026
> > Then had Claude compare results:
> >
> > Key metric (two physical cores legacy MP/MC bulk n=128):
> > main: 5.380 cycles/elem
> > sync-bool: 5.377 cycles/elem (-0.07%)
> > avoid-store: 5.892 cycles/elem (+9.52%) ← regresses
> >
> >
> > Looking at the dissassembly of ring_enqueue_bulk:
> >
> > The inner loop of main and sync-bool versions is:
> > mov 0x80(%rdi),%r11d ; load d->head via displacement
> > mov 0x104(%rdi),%ebx ; load s->tail
> > add %ecx,%ebx
> > sub %r11d,%ebx
> > cmp %ebx,%r12d
> > jae [exit]
> > lea (%r8,%r11,1),%r13d ; new_head = old_head + n
> > mov %r11d,%eax ; expected → eax
> > lock cmpxchg %r13d,0x80(%rdi) ; ← displacement addressing
> > jne [retry] ; ← direct jne, eax preserved
> >
> > Using atomic_compare_exchange and your patch:
> > mov 0x38(%rdi),%r10d
> > mov 0x80(%rdi),%eax ; load d->head directly into %eax
> > lea 0x80(%rdi),%rcx ; ← MATERIALIZE &d->head into
> %rcx
> > lea -0x1(%r8),%r12d
> > mov 0x104(%rdi),%r11d
> > add %r10d,%r11d
> > sub %eax,%r11d
> > cmp %r11d,%r12d
> > jae [exit]
> > lea (%r8,%rax,1),%r13d ; new_head
> > lock cmpxchg %r13d,(%rcx) ; ← INDIRECT addressing via %rcx
> > mov %eax,%ebx ; ← EXTRA: save post-CAS %eax to
> %ebx
> > jne [retry]
> >
> > Bottom line: good idea but still fighting with Gcc optimizer here.
>
> Thanks for trying.
> On my box (AMD EPYC 9534) with same test, there is no much difference
> between all of them:
> use-sync-bool: 2.2273
> use-c11-current-version: 2.2422
> use-c11-patched: 2.2431
> Anyway, -10% on some boxes - that's probably good enough reason to keep
> specific version
> for __rte_ring_headtail_move_head_mt().
> My ask would be to have some special macro for it, so users can
> enable/disable it via 'meson setup' at will.
This seems very exotic as a meson command line option.
Either put it in rte_config.h, or make it CPU specific.
More information about the dev
mailing list