[RFC 0/2] introduce LLC aware functions

Burakov, Anatoly anatoly.burakov at intel.com
Wed Aug 28 10:38:57 CEST 2024

Previous message (by thread): [RFC 0/2] introduce LLC aware functions
Next message (by thread): [PATCH v3 0/9] riscv: implement accelerated crc using zbc
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 8/27/2024 5:10 PM, Vipin Varghese wrote:
> As core density continues to increase, chiplet-based
> core packing has become a key trend. In AMD SoC EPYC
> architectures, core complexes within the same chiplet
> share a Last-Level Cache (LLC). By packing logical cores
> within the same LLC, we can enhance pipeline processing
> stages due to reduced latency and improved data locality.
> 
> To leverage these benefits, DPDK libraries and examples
> can utilize localized lcores. This approach ensures more
> consistent latencies by minimizing the dispersion of lcores
> across different chiplet complexes and enhances packet
> processing by ensuring that data for subsequent pipeline
> stages is likely to reside within the LLC.
> 
> < Function: Purpose >
> ---------------------
>   - rte_get_llc_first_lcores: Retrieves all the first lcores in the shared LLC.
>   - rte_get_llc_lcore: Retrieves all lcores that share the LLC.
>   - rte_get_llc_n_lcore: Retrieves the first n or skips the first n lcores in the shared LLC.
> 
> < MACRO: Purpose >
> ------------------
> RTE_LCORE_FOREACH_LLC_FIRST: iterates through all first lcore from each LLC.
> RTE_LCORE_FOREACH_LLC_FIRST_WORKER: iterates through all first worker lcore from each LLC.
> RTE_LCORE_FOREACH_LLC_WORKER: iterates lcores from LLC based on hint (lcore id).
> RTE_LCORE_FOREACH_LLC_SKIP_FIRST_WORKER: iterates lcores from LLC while skipping first worker.
> RTE_LCORE_FOREACH_LLC_FIRST_N_WORKER: iterates through `n` lcores from each LLC.
> RTE_LCORE_FOREACH_LLC_SKIP_N_WORKER: skip first `n` lcores, then iterates through reaming lcores in each LLC.
> 

Hi Vipin,

I recently looked into how Intel's Sub-NUMA Clustering would work within 
DPDK, and found that I actually didn't have to do anything, because the 
SNC "clusters" present themselves as NUMA nodes, which DPDK already 
supports natively.

Does AMD's implementation of chiplets not report themselves as separate 
NUMA nodes? Because if it does, I don't really think any changes are 
required because NUMA nodes would give you the same thing, would it not?

-- 
Thanks,
Anatoly

Previous message (by thread): [RFC 0/2] introduce LLC aware functions
Next message (by thread): [PATCH v3 0/9] riscv: implement accelerated crc using zbc
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the dev mailing list