[dpdk-dev] [PATCH 1/2] eal: add macro to mark variable mostly read only

Pavan Nikhilesh pbhagavatula at caviumnetworks.com
Thu Apr 19 11:20:52 CEST 2018


On Wed, Apr 18, 2018 at 07:03:06PM +0100, Ferruh Yigit wrote:
> On 4/18/2018 6:55 PM, Pavan Nikhilesh wrote:
> > On Wed, Apr 18, 2018 at 06:43:11PM +0100, Ferruh Yigit wrote:
> >> On 4/18/2018 4:30 PM, Pavan Nikhilesh wrote:
> >>> Add macro to mark a variable to be mostly read only and place it in a
> >>> separate section.
> >>>
> >>> Signed-off-by: Pavan Nikhilesh <pbhagavatula at caviumnetworks.com>
> >>> ---
> >>>
> >>>  Group together mostly read only data to avoid cacheline bouncing, also
> >>>  useful for auditing purposes.
> >>>
> >>>  lib/librte_eal/common/include/rte_common.h | 5 +++++
> >>>  1 file changed, 5 insertions(+)
> >>>
> >>> diff --git a/lib/librte_eal/common/include/rte_common.h b/lib/librte_eal/common/include/rte_common.h
> >>> index 6c5bc5a76..f2ff2e9e6 100644
> >>> --- a/lib/librte_eal/common/include/rte_common.h
> >>> +++ b/lib/librte_eal/common/include/rte_common.h
> >>> @@ -114,6 +114,11 @@ static void __attribute__((constructor(prio), used)) func(void)
> >>>   */
> >>>  #define __rte_noinline  __attribute__((noinline))
> >>>
> >>> +/**
> >>> + * Mark a variable to be mostly read only and place it in a separate section.
> >>> + */
> >>> +#define __rte_read_mostly __attribute__((__section__(".read_mostly")))
> >>
> >
> > Hi Ferruh,
> >
> >> Hi Pavan,
> >>
> >> Is the section ".read_mostly" treated specially [1] or is this just for grouping
> >> symbols together (to reduce cacheline bouncing)?
> >
> > The section .read_mostly is not treated specially it's just for grouping
> > symbols.
>
> I have encounter with a blog post claiming this is not working:
>
> "
> The problem with the above approach is that once all the __read_mostly variables
> are grouped into one section, the remaining "non-read-mostly" variables end-up
> together too. This increases the chances that two frequently used elements (in
> the "non-read-mostly" region) will end-up competing for the same position (or
> cache-line, the basic fixed-sized block for memory<-->cache transfers) in the
> cache. Thus frequent accesses will cause excessive cache thrashing on that
> particular cache-line thereby degrading the overall system performance.
> "
>
> https://thecodeartist.blogspot.com/2011/12/why-readmostly-does-not-work-as-it.html
>

The author is concerned about processors with less cache set-associativity,
almost all modern processors have >= 16 way set associativity. And the above
issue can happen even now when two frequently written global variables are
placed next to each other.

Currently, we don't have much control over how the global variables are
arranged and a single addition/deletion to the global variables causes change
in alignment and in some cases minor performance regression.
Tagging them as __read_mostly we can easily identify the alignment changes
across builds by comparing map files global variable section.

I have verified the patch-set on arm64 (16-way set-associative) and didn't
notice any performance regression.
Did you have a chance to verify if there is any performance regression?

> >
> >>
> >> [1]
> >> If this is special section, can you please point counter part in the kernel?
> >
> > The kernel has something similar[1] but they have a custom linker script to
> > arrange symbols.
> >
> > [1] https://github.com/torvalds/linux/blob/a27fc14219f2e3c4a46ba9177b04d9b52c875532/arch/x86/include/asm/cache.h#L11
> > kernel commit id 54cb27a71f51d304342c79e62fd7667f2171062b
> >
> >>
> >>
> >>> +
> >>>  /*********** Macros for pointer arithmetic ********/
> >>>
> >>>  /**
> >>> --
> >>> 2.17.0
> >>>
> >>
>


More information about the dev mailing list