[dpdk-dev] Mbuf memory alignment constraints for (micro)architectures
Gavin Hu (Arm Technology China)
Gavin.Hu at arm.com
Mon Nov 11 15:01:04 CET 2019
Hi Jerin,
> -----Original Message-----
> From: Jerin Jacob Kollanukkaran <jerinj at marvell.com>
> Sent: Thursday, October 31, 2019 2:02 AM
> To: dev at dpdk.org
> Cc: Olivier Matz <olivier.matz at 6wind.com>; Andrew Rybchenko
> <arybchenko at solarflare.com>; David Christensen <drc at linux.vnet.ibm.com>;
> bruce.richardson at intel.com; konstantin.ananyev at intel.com;
> hemant.agrawal at nxp.com; Shahaf Shuler <shahafs at mellanox.com>;
> Honnappa Nagarahalli <Honnappa.Nagarahalli at arm.com>; Gavin Hu (Arm
> Technology China) <Gavin.Hu at arm.com>; viktorin at rehivetech.com;
> anatoly.burakov at intel.com
> Subject: Mbuf memory alignment constraints for (micro)architectures
>
> CC: Arch and platform maintainers
>
> While reviewing the mempool objection allocation requirements in the code,
>
> A) it's found that in the default case, mempool objects have padding
> in the object trailer to have start addresses of objects among the different
> channels,
> to enable equally load on the DRAM channel to have better performance
>
> # More documentation is here
> https://doc.dpdk.org/guides/prog_guide/mempool_lib.html
> in section 8.3. Memory Alignment Constraints
>
> B) The optimize_object_size() does the channel distribution requirement
> by the following formula
>
> new_obj_size = (obj_size + RTE_MEMPOOL_ALIGN_MASK) /
> RTE_MEMPOOL_ALIGN;
> while (get_gcd(new_obj_size, nrank * nchan) != 1)
> new_obj_size++;
>
>
> C) The formula mentioned in the (B) is NOT generic. At least of the octeontx2
> SoC
> The memory/DDR controller works in different way. Where by:
> # It does XOR operation of some of physical address lines(not the user space
> VA address)
> to compute the hash and that the function defines the actual channel.
>
> The XOR(kind of CRC) scheme is useful because there is natural channel
> distribution
> based on the address i.e No need to have padding to waste memory
>
> So, in short the padding scheme does not need for some SoC. I trying to send
> the patch
> to fix it. So the questions is,
>
> # Is PPC and other ARM SoC has formula (B) to compute DRAM channel
> distribution ? or
> Is it specific to x86? That would define where the hooks needs to added to have
> proper fix.
Reading through some documents, both x86 and arm, and having internal discussion,
it looks like this is specific to x86, x86 spreads adjacent virtual addresses within a page across multiple memory devices,
the interleaving was done per one or two cache lines. https://software.intel.com/en-us/articles/how-memory-is-accessed
Arm leaves flexibility to implementations, no fixed pattern for interleaving and thus it can hardly be generalized.
/Gavin
>
>
>
>
>
>
>
>
More information about the dev
mailing list