[RFC 0/2] introduce LLC aware functions
Mattias Rönnblom
hofors at lysator.liu.se
Wed Sep 4 11:30:59 CEST 2024
On 2024-09-02 02:39, Varghese, Vipin wrote:
> <snipped>
>
> Thank you Mattias for the comments and question, please let me try to
> explain the same below
>
>> We shouldn't have a separate CPU/cache hierarchy API instead?
>
> Based on the intention to bring in CPU lcores which share same L3 (for
> better cache hits and less noisy neighbor) current API focuses on using
>
> Last Level Cache. But if the suggestion is `there are SoC where L2 cache
> are also shared, and the new API should be provisioned`, I am also
>
> comfortable with the thought.
>
Rather than some AMD special case API hacked into <rte_lcore.h>, I think
we are better off with no DPDK API at all for this kind of functionality.
A DPDK CPU/memory hierarchy topology API very much makes sense, but it
should be reasonably generic and complete from the start.
>>
>> Could potentially be built on the 'hwloc' library.
>
> There are 3 reason on AMD SoC we did not explore this path, reasons are
>
> 1. depending n hwloc version and kernel version certain SoC hierarchies
> are not available
>
> 2. CPU NUMA and IO (memory & PCIe) NUMA are independent on AMD Epyc Soc.
>
> 3. adds the extra dependency layer of library layer to be made available
> to work.
>
>
> hence we have tried to use Linux Documented generic layer of `sysfs CPU
> cache`.
>
> I will try to explore more on hwloc and check if other libraries within
> DPDK leverages the same.
>
>>
>> I much agree cache/core topology may be of interest of the application
>> (or a work scheduler, like a DPDK event device), but it's not limited to
>> LLC. It may well be worthwhile to care about which cores shares L2
>> cache, for example. Not sure the RTE_LCORE_FOREACH_* approach scales.
>
> yes, totally understand as some SoC, multiple lcores shares same L2 cache.
>
>
> Can we rework the API to be rte_get_cache_<function> where user argument
> is desired lcore index.
>
> 1. index-1: SMT threads
>
> 2. index-2: threads sharing same L2 cache
>
> 3. index-3: threads sharing same L3 cache
>
> 4. index-MAX: identify the threads sharing last level cache.
>
>>
>>> < Function: Purpose >
>>> ---------------------
>>> - rte_get_llc_first_lcores: Retrieves all the first lcores in the
>>> shared LLC.
>>> - rte_get_llc_lcore: Retrieves all lcores that share the LLC.
>>> - rte_get_llc_n_lcore: Retrieves the first n or skips the first n
>>> lcores in the shared LLC.
>>>
>>> < MACRO: Purpose >
>>> ------------------
>>> RTE_LCORE_FOREACH_LLC_FIRST: iterates through all first lcore from
>>> each LLC.
>>> RTE_LCORE_FOREACH_LLC_FIRST_WORKER: iterates through all first worker
>>> lcore from each LLC.
>>> RTE_LCORE_FOREACH_LLC_WORKER: iterates lcores from LLC based on hint
>>> (lcore id).
>>> RTE_LCORE_FOREACH_LLC_SKIP_FIRST_WORKER: iterates lcores from LLC
>>> while skipping first worker.
>>> RTE_LCORE_FOREACH_LLC_FIRST_N_WORKER: iterates through `n` lcores
>>> from each LLC.
>>> RTE_LCORE_FOREACH_LLC_SKIP_N_WORKER: skip first `n` lcores, then
>>> iterates through reaming lcores in each LLC.
>>>
> While the MACRO are simple wrapper invoking appropriate API. can this be
> worked out in this fashion?
>
> <snipped>
More information about the dev
mailing list