[dpdk-dev] A question about hugepage initialization time

Bruce Richardson bruce.richardson at intel.com
Wed Dec 10 11:32:25 CET 2014


On Tue, Dec 09, 2014 at 02:10:32PM -0800, Stephen Hemminger wrote:
> On Tue, 9 Dec 2014 11:45:07 -0800
> &rew <andras.kovacs at ericsson.com> wrote:
> 
> > > Hey Folks,
> > >
> > > Our DPDK application deals with very large in memory data structures, and
> > > can potentially use tens or even hundreds of gigabytes of hugepage memory.
> > > During the course of development, we've noticed that as the number of huge
> > > pages increases, the memory initialization time during EAL init gets to be
> > > quite long, lasting several minutes at present.  The growth in init time
> > > doesn't appear to be linear, which is concerning.
> > >
> > > This is a minor inconvenience for us and our customers, as memory
> > > initialization makes our boot times a lot longer than it would otherwise
> > > be.  Also, my experience has been that really long operations often are
> > > hiding errors - what you think is merely a slow operation is actually a
> > > timeout of some sort, often due to misconfiguration. This leads to two
> > > questions:
> > >
> > > 1. Does the long initialization time suggest that there's an error
> > > happening under the covers?
> > > 2. If not, is there any simple way that we can shorten memory
> > > initialization time?
> > >
> > > Thanks in advance for your insights.
> > >
> > > --
> > > Matt Laswell
> > > laswell at infiniteio.com
> > > infinite io, inc.
> > >
> > 
> > Hello,
> > 
> > please find some quick comments on the questions:
> > 1.) By our experience long initialization time is normal in case of 
> > large amount of memory. However this time depends on some things:
> > - number of hugepages (pagefault handled by kernel is pretty expensive)
> > - size of hugepages (memset at initialization)
> > 
> > 2.) Using 1G pages instead of 2M will reduce the initialization time 
> > significantly. Using wmemset instead of memset adds an additional 20-30% 
> > boost by our measurements. Or, just by touching the pages but not cleaning 
> > them you can have still some more speedup. But in this case your layer or 
> > the applications above need to do the cleanup at allocation time 
> > (e.g. by using rte_zmalloc).
> > 
> > Cheers,
> > &rew
> 
> I wonder if the whole rte_malloc code is even worth it with a modern kernel
> with transparent huge pages? rte_malloc adds very little value and is less safe
> and slower than glibc or other allocators. Plus you lose the ablilty to get
> all the benefit out of valgrind or electric fence.

While I'd dearly love to not have our own custom malloc lib to maintain, for DPDK
multiprocess, rte_malloc will be hard to replace as we would need a replacement
solution that similarly guarantees that memory mapped in process A is also 
available at the same address in process B. :-(

/Bruce


More information about the dev mailing list