Optimizing memory access with DPDK allocated memory

Antonio Di Bacco a.dibacco.ks at gmail.com
Wed May 25 09:30:18 CEST 2022


Just to add some more info that could possibly be useful to someone.
Even if a processor has many memory channels; there is also another
parameter to take into consideration, a given "core" cannot exploit
all the memory bandwidth available.
For example for a DDR4 2933 MT/s with 4 channels:
the memory bandwidth is   2933 X 8 (# of bytes of width) X 4 (# of
channels) = 93,866.88 MB/s bandwidth, or 94 GB/s but a single core
(according to my tests with DPDK process writing a 1GB hugepage) is
about 12 GB/s (with a block size exceeding the L3 cache size).

Can anyone confirm that ?

On Mon, May 23, 2022 at 3:16 PM Antonio Di Bacco <a.dibacco.ks at gmail.com> wrote:
>
> Got feedback from a guy working on HPC with DPDK and he told me that
> with dpdk mem-test (don't know where to find it) I should be doing
> 16GB/s with DDR4 (2666) per channel. In my case with 6 channels I
> should be doing 90GB/s .... that would be amazing!
>
> On Sat, May 21, 2022 at 11:42 AM Antonio Di Bacco
> <a.dibacco.ks at gmail.com> wrote:
> >
> > I read a couple of articles
> > (https://www.thomas-krenn.com/en/wiki/Optimize_memory_performance_of_Intel_Xeon_Scalable_systems?xtxsearchselecthit=1
> > and this https://www.exxactcorp.com/blog/HPC/balance-memory-guidelines-for-intel-xeon-scalable-family-processors)
> > and I understood a little bit more.
> >
> > If the XEON memory controller is able to spread contiguous memory
> > accesses onto different channels in hardware (as Stepphen correctly
> > stated), then, how DPDK with option -n can benefit an application?
> > I also coded a test application to write a 1GB hugepage and calculate
> > time needed but, equipping an additional two DIMM on two unused
> > channels of my available six channels motherboard (X11DPi-NT) , I
> > didn't observe any improvement. This is strange because adding two
> > channels to the 4 already equipped should make a noticeable
> > difference.
> >
> > For reference this is the small program for allocating and writing memory.
> > https://github.com/adibacco/simple_mp_mem_2
> > and the results with 4 memory channels:
> > https://docs.google.com/spreadsheets/d/1mDoKYLMhMMKDaOS3RuGEnpPgRNKuZOy4lMIhG-1N7B8/edit?usp=sharing
> >
> >
> > On Fri, May 20, 2022 at 5:48 PM Stephen Hemminger
> > <stephen at networkplumber.org> wrote:
> > >
> > > On Fri, 20 May 2022 10:34:46 +0200
> > > Antonio Di Bacco <a.dibacco.ks at gmail.com> wrote:
> > >
> > > > Let us say I have two memory channels each one with its own 16GB memory
> > > > module, I suppose the first memory channel will be used when addressing
> > > > physical memory in the range 0 to 0x4 0000 0000 and the second when
> > > > addressing physical memory in the range 0x4 0000 0000 to  0x7 ffff ffff.
> > > > Correct?
> > > > Now, I need to have a 2GB buffer with one "writer" and one "reader", the
> > > > writer writes on half of the buffer (call it A) and, in the meantime, the
> > > > reader reads on the other half (B). When the writer finishes writing its
> > > > half buffer (A), signal it to the reader and they swap,  the reader starts
> > > > to read from A and writer starts to write to B.
> > > > If I allocate the whole buffer (on two 1GB hugepages) across the two memory
> > > > channels, one half of the buffer is allocated on the end of first channel
> > > > while the other half is allocated on the start of the second memory
> > > > channel, would this increase performances compared to the whole buffer
> > > > allocated within the same memory channel?
> > >
> > > Most systems just interleave memory chips based on number of filled slots.
> > > This is handled by BIOS before kernel even starts.
> > > The DPDK has a number of memory channels parameter and what it does
> > > is try and optimize memory allocation by spreading.
> > >
> > > Looks like you are inventing your own limited version of what memif does.


More information about the users mailing list