[dpdk-dev] [PATCH 1/2] memalloc: do not leave unmapped holes in EAL virtual memory area

Stojaczyk, DariuszX dariuszx.stojaczyk at intel.com
Fri Jun 1 14:22:58 CEST 2018


> -----Original Message-----
> From: Burakov, Anatoly
> Sent: Friday, June 1, 2018 1:00 PM
> On 01-Jun-18 1:51 PM, Dariusz Stojaczyk wrote:
> > EAL reserves a huge area in virtual address space to provide virtual
> > address contiguity for e.g.
> > future memory extensions (memory hotplug). During memory hotplug, if
> > the hugepage mmap succeeds but doesn't suffice EAL's requiriments, the
> > EAL would unmap this mapping straight away, leaving a hole in its
> > virtual memory area and making it available to everyone. As EAL still
> > thinks it owns the entire region, it may try to mmap it later with
> > MAP_FIXED, possibly overriding a user's mapping that was made in the
> > meantime.
> >
> > This patch ensures each hole is mapped back by EAL, so that it won't
> > be available to anyone else.
> >
> > Signed-off-by: Dariusz Stojaczyk <dariuszx.stojaczyk at intel.com>
> > ---
> 
> Generally, if the commit is a bugfix, reference to the original commit that
> introduces the issue should be part of the commit message. See DPDK
> contribution guidelines [1] and "git fixline" [2]. Also, this bug is present in
> 18.05, so you should also Cc: stable at dpdk.org (same applies for all of your
> other patches for memalloc).
> 
> [1] http://dpdk.org/doc/guides/contributing/patches.html
> [2] http://dpdk.org/dev

Ack, thanks.

> 
> >   lib/librte_eal/linuxapp/eal/eal_memalloc.c | 17 +++++++++++++++++
> >   1 file changed, 17 insertions(+)
> >
> > diff --git a/lib/librte_eal/linuxapp/eal/eal_memalloc.c
> > b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
> > index 8c11f98..b959ea5 100644
> > --- a/lib/librte_eal/linuxapp/eal/eal_memalloc.c
> > +++ b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
> > @@ -39,6 +39,7 @@
> >   #include "eal_filesystem.h"
> >   #include "eal_internal_cfg.h"
> >   #include "eal_memalloc.h"
> > +#include "eal_private.h"
> >
> >   /*
> >    * not all kernel version support fallocate on hugetlbfs, so fall
> > back to @@ -490,6 +491,8 @@ alloc_seg(struct rte_memseg *ms, void
> *addr, int socket_id,
> >   	int ret = 0;
> >   	int fd;
> >   	size_t alloc_sz;
> > +	int flags;
> > +	void *new_addr;
> >
> >   	/* takes out a read lock on segment or segment list */
> >   	fd = get_seg_fd(path, sizeof(path), hi, list_idx, seg_idx); @@
> > -585,6 +588,20 @@ alloc_seg(struct rte_memseg *ms, void *addr, int
> > socket_id,
> >
> >   mapped:
> >   	munmap(addr, alloc_sz);
> > +	flags = MAP_FIXED;
> > +#ifdef RTE_ARCH_PPC_64
> > +	flags |= MAP_HUGETLB;
> > +#endif
> > +	new_addr = eal_get_virtual_area(addr, &alloc_sz, alloc_sz, 0,
> > +flags);
> 
> Page size shouldn't be zero, should be alloc_sz.

The 0 above is for the `flags` parameter. Page size is set to alloc_sz.

```
void *
eal_get_virtual_area(void *requested_addr, size_t *size,
		size_t page_sz, int flags, int mmap_flags);
```

I believe it's okay. Correct me if I'm wrong.

> 
> > +	if (new_addr != addr) {
> > +		if (new_addr != NULL) {
> > +			munmap(new_addr, alloc_sz);
> > +		}
> > +		/* We're leaving a hole in our virtual address space. If
> > +		 * somebody else maps this hole now, we might accidentally
> > +		 * override it in the future. */
> > +		rte_panic("can't mmap holes in our virtual address space");
> 
> I don't think rte_panic() here makes sense. First of all, we're now striving to
> not panic inside DPDK libraries, especially once initialization has already
> finished. But more importantly, when you reach this panic, you're deep in
> the memory allocation process, which means you hold a write lock to DPDK
> memory hotplug. If you crash now, the lock will never be released and
> subsequent memory hotplug requests will just deadlock.
> 
> What's worse is you may be inside a secondary process, synchronizing the
> memory map with that of a primary, which means that even if you release
> the lock here, you're basically releasing someone else's lock, so behavior
> will be undefined at that point.
> 
> I think an error message with the highest possible log level should suffice
> (CRIT?), as there's really nothing more we can do here.

Ok, I'll fallback to CRIT log for now. We could try to add some proper error handling in a separate patch later.

> 
> That said, how likely is this scenario?

I can't think of any reason why that last mmap would fail, but it's still technically possible.

>
> 
> > +	}
> >   resized:
> >   	if (internal_config.single_file_segments) {
> >   		resize_hugefile(fd, path, list_idx, seg_idx, map_offset,
> >
> 
> Generally, if the above is fixed, i'm OK with the patch.
> 
> --
> Thanks,
> Anatoly

D.


More information about the dev mailing list