<div dir="auto">A quick hack might just to increase cache line size as experiment </div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Aug 28, 2023, 11:54 AM Morten Brørup <<a href="mailto:mb@smartsharesystems.com">mb@smartsharesystems.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">> From: Mattias Rönnblom [mailto:<a href="mailto:hofors@lysator.liu.se" target="_blank" rel="noreferrer">hofors@lysator.liu.se</a>]<br>
> Sent: Monday, 28 August 2023 10.46<br>
> <br>
> On 2023-08-28 08:32, Morten Brørup wrote:<br>
> >> From: Mattias Rönnblom [mailto:<a href="mailto:hofors@lysator.liu.se" target="_blank" rel="noreferrer">hofors@lysator.liu.se</a>]<br>
> >> Sent: Monday, 28 August 2023 00.31<br>
> >><br>
> >> On 2023-08-27 17:40, Morten Brørup wrote:<br>
> >>>> From: Mattias Rönnblom [mailto:<a href="mailto:hofors@lysator.liu.se" target="_blank" rel="noreferrer">hofors@lysator.liu.se</a>]<br>
> >>>> Sent: Sunday, 27 August 2023 15.55<br>
<br>
[...]<br>
<br>
> >>> So, this gets added to rte_common.h:<br>
> >>><br>
> >>> /**<br>
> >>> * Empty cache lines, to guard against false sharing-like effects<br>
> >>> * on systems with a next-N-lines hardware prefetcher.<br>
> >>> *<br>
> >>> * Use as spacing between data accessed by different lcores,<br>
> >>> * to prevent cache thrashing on hardware with speculative<br>
> prefetching.<br>
> >>> */<br>
> >>> #if RTE_CACHE_GUARD_LINES > 0<br>
> >>> #define _RTE_CACHE_GUARD_HELPER2(unique) \<br>
> >>> char cache_guard_ ## unique[RTE_CACHE_LINE_SIZE *<br>
> >> RTE_CACHE_GUARD_LINES] \<br>
> >>> __rte_cache_aligned;<br>
> >>> #define _RTE_CACHE_GUARD_HELPER1(unique)<br>
> _RTE_CACHE_GUARD_HELPER2(unique)<br>
> >>> #define RTE_CACHE_GUARD _RTE_CACHE_GUARD_HELPER1(__COUNTER__)<br>
> >>> #else<br>
> >>> #define RTE_CACHE_GUARD<br>
> >>> #endif<br>
> >>><br>
> >><br>
> >> Seems like a good solution. I thought as far as using __LINE__ to<br>
> build<br>
> >> a unique name, but __COUNTER__ is much cleaner, provided it's<br>
> available<br>
> >> in relevant compilers. (It's not in C11.)<br>
> ><br>
> > I considered __LINE__ too, but came to the same conclusion...<br>
> __COUNTER__ is cleaner for this purpose.<br>
> ><br>
> > And since __COUNTER__ is being used elsewhere in DPDK, I assume it is<br>
> available for use here too.<br>
> ><br>
> > If it turns out causing problems, we can easily switch to __LINE__<br>
> instead.<br>
> ><br>
> >><br>
> >> Should the semicolon be included or not in HELPER2? If left out, a<br>
> >> lonely ";" will be left for RTE_CACHE_GUARD_LINES == 0, but I don't<br>
> >> think that is a problem.<br>
> ><br>
> > I tested it on Godbolt, and the lonely ";" in a struct didn't seem to<br>
> be a problem.<br>
> ><br>
> > With the semicolon in HELPER2, there will be a lonely ";" in the<br>
> struct in both cases, i.e. with and without cache guards enabled.<br>
> ><br>
> >><br>
> >> I don't see why __rte_cache_aligned is needed here. The adjacent<br>
> struct<br>
> >> must be cache-line aligned. Maybe it makes it more readable, having<br>
> the<br>
> >> explicit guard padding starting at the start of the actual guard<br>
> cache<br>
> >> lines, rather than potentially at some earlier point before, and<br>
> having<br>
> >> non-guard padding at the end of the struct (from __rte_cache_aligned<br>
> on<br>
> >> the struct level).<br>
> ><br>
> > Having both __rte_cache_aligned and the char array with full cache<br>
> lines ensures that the guard field itself is on its own separate cache<br>
> line, regardless of the organization of adjacent fields in the struct.<br>
> E.g. this will also work:<br>
> ><br>
> > struct test {<br>
> > char x;<br>
> > RTE_CACHE_GUARD;<br>
> > char y;<br>
> > };<br>
> ><br>
> <br>
> That struct declaration is broken, since it will create false sharing<br>
> between x and y, in case RTE_CACHE_GUARD_LINES is defined to 0.<br>
> <br>
> Maybe the most intuitive function (semantics) of the RTE_CACHE_GUARD<br>
> macro would be have it deal exclusively with the issue resulting from<br>
> next-N-line (and similar) hardware prefetching, and leave<br>
> __rte_cache_aligned to deal with "classic" (same-cache line) false<br>
> sharing.<br>
<br>
Excellent review feedback!<br>
<br>
I only thought of the cache guard as a means to provide spacing between elements where the developer already prevented (same-cache line) false sharing by some other means. I didn't even consider the alternative interpretation of its purpose.<br>
<br>
Your feedback leaves no doubt that we should extend the cache guard's purpose to also enforce cache alignment (under all circumstances, also when RTE_CACHE_GUARD_LINES is 0).<br>
<br>
> <br>
> Otherwise you would have to have something like<br>
> <br>
> struct test<br>
> {<br>
> char x;<br>
> RTE_CACHE_GUARD(char, y);<br>
> };<br>
> <br>
> ...so that 'y' can be made __rte_cache_aligned by the macro.<br>
<br>
There's an easier solution...<br>
<br>
We can copy the concept from the RTE_MARKER type, which uses a zero-length array. By simply omitting the #if RTE_CACHE_GUARD_LINES > 0, the macro will serve both purposes:<br>
<br>
#define _RTE_CACHE_GUARD_HELPER2(unique) \<br>
char cache_guard_ ## unique[RTE_CACHE_LINE_SIZE * RTE_CACHE_GUARD_LINES] \<br>
__rte_cache_aligned;<br>
#define _RTE_CACHE_GUARD_HELPER1(unique) _RTE_CACHE_GUARD_HELPER2(unique)<br>
#define RTE_CACHE_GUARD _RTE_CACHE_GUARD_HELPER1(__COUNTER__)<br>
<br>
I have verified on Godbolt that this works. The memif driver also uses RTE_MARKER this way [1].<br>
<br>
[1]: <a href="https://elixir.bootlin.com/dpdk/latest/source/drivers/net/memif/memif.h#L173" rel="noreferrer noreferrer" target="_blank">https://elixir.bootlin.com/dpdk/latest/source/drivers/net/memif/memif.h#L173</a><br>
<br>
> <br>
> RTE_HW_PREFETCH_GUARD could be an alternative name, but I think I like<br>
> RTE_CACHE_GUARD better.<br>
> <br>
<br>
When the macro serves both purposes (regardless of the value of RTE_CACHE_GUARD_LINES), I think we can stick with the RTE_CACHE_GUARD name.<br>
<br>
<br>
</blockquote></div>