<!DOCTYPE html><html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body>
<p><Snipped></p>
<p>Thank you Antaloy for the response. Let me try to share my
understanding.<br>
</p>
<blockquote type="cite" cite="mid:288d9e9e-aaec-4dac-b969-54e01956ef4e@intel.com">I
recently looked into how Intel's Sub-NUMA Clustering would work
within
<br>
DPDK, and found that I actually didn't have to do anything,
because the
<br>
SNC "clusters" present themselves as NUMA nodes, which DPDK
already
<br>
supports natively.
<br>
</blockquote>
<p>yes, this is correct. In Intel Xeon Platinum BIOS one can enable
`Cluster per NUMA` as `1,2 or4`.</p>
<p>This divides the tiles into Sub-Numa parition, each having
separate lcores,memory controllers, PCIe</p>
<p>and accelerator. <br>
</p>
<blockquote type="cite" cite="mid:288d9e9e-aaec-4dac-b969-54e01956ef4e@intel.com">
<br>
Does AMD's implementation of chiplets not report themselves as
separate
<br>
NUMA nodes? </blockquote>
<p>In AMD EPYC Soc, this is different. There are 2 BIOS settings,
namely</p>
<p>1. NPS: `Numa Per Socket` which allows the IO tile (memory, PCIe
and Accelerator) to be partitioned as Numa 0, 1, 2 or 4.</p>
<p>2. L3 as NUMA: `L3 cache of CPU tiles as individual NUMA`. This
allows all CPU tiles to be independent NUMA cores.</p>
<p><br>
</p>
<p>The above settings are possible because CPU is independent from
IO tile. Thus allowing 4 combinations be available for use.</p>
<p>These are covered in the tuning gudie for the SoC in <a href="https://doc.dpdk.org/guides/linux_gsg/amd_platform.html">12.
How to get best performance on AMD platform — Data Plane
Development Kit 24.07.0 documentation (dpdk.org)</a>.</p>
<p><br>
</p>
<blockquote type="cite" cite="mid:288d9e9e-aaec-4dac-b969-54e01956ef4e@intel.com">Because
if it does, I don't really think any changes are
<br>
required because NUMA nodes would give you the same thing, would
it not?
<br>
</blockquote>
<p>I have a different opinion to this outlook. An end user can</p>
<p>1. Identify the lcores and it's NUMA user
`usertools/cpu-layout.py`</p>
<p>2. But it is core mask in eal arguments which makes the threads
available to be used in a process.</p>
<p>3. there are no API which distinguish L3 numa domain. Function `<a class="el" href="https://doc.dpdk.org/api/rte__lcore_8h.html#a7c8da4664df26a64cf05dc508a4f26df" style="color: rgb(61, 87, 140); font-weight: bold; text-decoration: none; font-family: Roboto, sans-serif; font-size: 14px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; white-space: normal;">rte_socket_id</a><span style="color: rgb(0, 0, 0); font-family: Roboto, sans-serif; font-size: 14px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; white-space: normal; background-color: rgb(249, 250, 252); text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial; display: inline !important; float: none;"><span>
` for CPU tiles like AMD SoC will return physical socket.</span></span></p>
<p><span style="color: rgb(0, 0, 0); font-family: Roboto, sans-serif; font-size: 14px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; white-space: normal; background-color: rgb(249, 250, 252); text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial; display: inline !important; float: none;"><span><br>
</span></span></p>
<p><span style="color: rgb(0, 0, 0); font-family: Roboto, sans-serif; font-size: 14px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; white-space: normal; background-color: rgb(249, 250, 252); text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial; display: inline !important; float: none;"><span>Example:
In AMD EPYC Genoa, there are total of 13 tiles. 12 CPU tiles
and 1 IO tile. Setting </span></span></p>
<p><span style="color: rgb(0, 0, 0); font-family: Roboto, sans-serif; font-size: 14px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; white-space: normal; background-color: rgb(249, 250, 252); text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial; display: inline !important; float: none;"><span>1.
NPS to 4 will divide the memory, PCIe and accelerator into 4
domain. While the all CPU will appear as single NUMA but each
12 tile having independent L3 caches. <br>
</span></span></p>
<p><span style="color: rgb(0, 0, 0); font-family: Roboto, sans-serif; font-size: 14px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; white-space: normal; background-color: rgb(249, 250, 252); text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial; display: inline !important; float: none;"><span>2.
Setting `L3 as NUMA` allows each tile to appear as separate L3
clusters.</span></span></p>
<p><span style="color: rgb(0, 0, 0); font-family: Roboto, sans-serif; font-size: 14px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; white-space: normal; background-color: rgb(249, 250, 252); text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial; display: inline !important; float: none;"><span><br>
</span></span></p>
<p><span style="color: rgb(0, 0, 0); font-family: Roboto, sans-serif; font-size: 14px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; white-space: normal; background-color: rgb(249, 250, 252); text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial; display: inline !important; float: none;"><span>Hence,
adding an API which allows to select available lcores based on
Split L3 is essential irrespective of the BIOS setting.<br>
</span></span></p>
<p><br>
</p>
<blockquote type="cite" cite="mid:288d9e9e-aaec-4dac-b969-54e01956ef4e@intel.com">
<br>
--
<br>
Thanks,
<br>
Anatoly
<br>
<br>
</blockquote>
</body>
</html>