[RFC 0/2] introduce LLC aware functions
Varghese, Vipin
vipin.varghese at amd.com
Mon Sep 2 03:08:20 CEST 2024
<Snipped>
Thank you Antaloy for the response. Let me try to share my understanding.
> I recently looked into how Intel's Sub-NUMA Clustering would work within
> DPDK, and found that I actually didn't have to do anything, because the
> SNC "clusters" present themselves as NUMA nodes, which DPDK already
> supports natively.
yes, this is correct. In Intel Xeon Platinum BIOS one can enable
`Cluster per NUMA` as `1,2 or4`.
This divides the tiles into Sub-Numa parition, each having separate
lcores,memory controllers, PCIe
and accelerator.
>
> Does AMD's implementation of chiplets not report themselves as separate
> NUMA nodes?
In AMD EPYC Soc, this is different. There are 2 BIOS settings, namely
1. NPS: `Numa Per Socket` which allows the IO tile (memory, PCIe and
Accelerator) to be partitioned as Numa 0, 1, 2 or 4.
2. L3 as NUMA: `L3 cache of CPU tiles as individual NUMA`. This allows
all CPU tiles to be independent NUMA cores.
The above settings are possible because CPU is independent from IO tile.
Thus allowing 4 combinations be available for use.
These are covered in the tuning gudie for the SoC in 12. How to get best
performance on AMD platform — Data Plane Development Kit 24.07.0
documentation (dpdk.org)
<https://doc.dpdk.org/guides/linux_gsg/amd_platform.html>.
> Because if it does, I don't really think any changes are
> required because NUMA nodes would give you the same thing, would it not?
I have a different opinion to this outlook. An end user can
1. Identify the lcores and it's NUMA user `usertools/cpu-layout.py`
2. But it is core mask in eal arguments which makes the threads
available to be used in a process.
3. there are no API which distinguish L3 numa domain. Function
`rte_socket_id
<https://doc.dpdk.org/api/rte__lcore_8h.html#a7c8da4664df26a64cf05dc508a4f26df>`
for CPU tiles like AMD SoC will return physical socket.
Example: In AMD EPYC Genoa, there are total of 13 tiles. 12 CPU tiles
and 1 IO tile. Setting
1. NPS to 4 will divide the memory, PCIe and accelerator into 4 domain.
While the all CPU will appear as single NUMA but each 12 tile having
independent L3 caches.
2. Setting `L3 as NUMA` allows each tile to appear as separate L3 clusters.
Hence, adding an API which allows to select available lcores based on
Split L3 is essential irrespective of the BIOS setting.
>
> --
> Thanks,
> Anatoly
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mails.dpdk.org/archives/dev/attachments/20240902/be4487b2/attachment-0001.htm>
More information about the dev
mailing list