[dpdk-dev] [PATCH] eal/ppc: fix secondary process to map hugepages in correct order

Bruce Richardson bruce.richardson at intel.com
Tue Mar 22 18:10:46 CET 2016


On Tue, Mar 22, 2016 at 04:35:32PM +0000, Sergio Gonzalez Monroy wrote:
> First of all, forgive my ignorance regarding ppc64 and if the questions are
> naive but after having a
> look to the already existing code for ppc64 and this patch now, why are we
> doing this reverse mapping at all?
> 
> I guess the question revolves around the comment in eal_memory.c:
> 1316                 /* On PPC64 architecture, the mmap always start from
> higher
> 1317                  * virtual address to lower address. Here, both the
> physical
> 1318                  * address and virtual address are in descending order
> */
> 
> From looking at the code, for ppc64 we do qsort in reverse order and
> thereafter everything looks to be is
> done to account for that reverse sorting.
> 
> CC: Chao Zhu and David Marchand as original author and reviewer of the code.
> 
> Sergio
>

Just to add my 2c here. At one point, with I believe some i686 installs - don't
remember the specific OS/kernel, we found that the mmap calls were returning
the highest free address first and then working downwards - must like seems
to be described here. To fix this we changed the mmap code from assuming that
addresses are mapped upwards, to instead explicitly requesting a large free
block of memory (mmap of /dev/zero) to find a free address space
range of the correct size, and then explicitly mmapping each individual page to
the appropriate place in that free range. With this scheme it didn't matter whether
the OS tried to mmap the pages from the highest or lowest address because we
always told the OS where to put the page (and we knew the slot was free from
the earlier block mmap).
Would this scheme not also work for PPC in a similar way? (Again, forgive
unfamiliarity with PPC! :-) )

/Bruce

> 
> On 07/03/2016 14:13, Gowrishankar wrote:
> >From: Gowri Shankar <gowrishankar.m at linux.vnet.ibm.com>
> >
> >For a secondary process address space to map hugepages from every segment of
> >primary process, hugepage_file entries has to be mapped reversely from the
> >list that primary process updated for every segment. This is for a reason that,
> >in ppc64, hugepages are sorted for decrementing addresses.
> >
> >Signed-off-by: Gowrishankar <gowrishankar.m at linux.vnet.ibm.com>
> >---
> >  lib/librte_eal/linuxapp/eal/eal_memory.c |   26 ++++++++++++++++----------
> >  1 file changed, 16 insertions(+), 10 deletions(-)
> >
> >diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c
> >index 5b9132c..6aea5d0 100644
> >--- a/lib/librte_eal/linuxapp/eal/eal_memory.c
> >+++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
> >@@ -1400,7 +1400,7 @@ rte_eal_hugepage_attach(void)
> >  {
> >  	const struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
> >  	const struct hugepage_file *hp = NULL;
> >-	unsigned num_hp = 0;
> >+	unsigned num_hp = 0, mapped_hp = 0;
> >  	unsigned i, s = 0; /* s used to track the segment number */
> >  	off_t size;
> >  	int fd, fd_zero = -1, fd_hugepage = -1;
> >@@ -1486,14 +1486,12 @@ rte_eal_hugepage_attach(void)
> >  		goto error;
> >  	}
> >-	num_hp = size / sizeof(struct hugepage_file);
> >-	RTE_LOG(DEBUG, EAL, "Analysing %u files\n", num_hp);
> >-
> >  	s = 0;
> >  	while (s < RTE_MAX_MEMSEG && mcfg->memseg[s].len > 0){
> >  		void *addr, *base_addr;
> >  		uintptr_t offset = 0;
> >  		size_t mapping_size;
> >+		unsigned int index;
> >  #ifdef RTE_LIBRTE_IVSHMEM
> >  		/*
> >  		 * if segment has ioremap address set, it's an IVSHMEM segment and
> >@@ -1504,6 +1502,8 @@ rte_eal_hugepage_attach(void)
> >  			continue;
> >  		}
> >  #endif
> >+		num_hp = mcfg->memseg[s].len / mcfg->memseg[s].hugepage_sz;
> >+		RTE_LOG(DEBUG, EAL, "Analysing %u files in segment %u\n", num_hp, s);
> >  		/*
> >  		 * free previously mapped memory so we can map the
> >  		 * hugepages into the space
> >@@ -1514,18 +1514,23 @@ rte_eal_hugepage_attach(void)
> >  		/* find the hugepages for this segment and map them
> >  		 * we don't need to worry about order, as the server sorted the
> >  		 * entries before it did the second mmap of them */
> >+#ifdef RTE_ARCH_PPC_64
> >+		for (i = num_hp-1; i < num_hp && offset < mcfg->memseg[s].len; i--){
> >+#else
> >  		for (i = 0; i < num_hp && offset < mcfg->memseg[s].len; i++){
> >-			if (hp[i].memseg_id == (int)s){
> >-				fd = open(hp[i].filepath, O_RDWR);
> >+#endif
> >+			index = i + mapped_hp;
> >+			if (hp[index].memseg_id == (int)s){
> >+				fd = open(hp[index].filepath, O_RDWR);
> >  				if (fd < 0) {
> >  					RTE_LOG(ERR, EAL, "Could not open %s\n",
> >-						hp[i].filepath);
> >+						hp[index].filepath);
> >  					goto error;
> >  				}
> >  #ifdef RTE_EAL_SINGLE_FILE_SEGMENTS
> >-				mapping_size = hp[i].size * hp[i].repeated;
> >+				mapping_size = hp[index].size * hp[index].repeated;
> >  #else
> >-				mapping_size = hp[i].size;
> >+				mapping_size = hp[index].size;
> >  #endif
> >  				addr = mmap(RTE_PTR_ADD(base_addr, offset),
> >  						mapping_size, PROT_READ | PROT_WRITE,
> >@@ -1534,7 +1539,7 @@ rte_eal_hugepage_attach(void)
> >  				if (addr == MAP_FAILED ||
> >  						addr != RTE_PTR_ADD(base_addr, offset)) {
> >  					RTE_LOG(ERR, EAL, "Could not mmap %s\n",
> >-						hp[i].filepath);
> >+						hp[index].filepath);
> >  					goto error;
> >  				}
> >  				offset+=mapping_size;
> >@@ -1543,6 +1548,7 @@ rte_eal_hugepage_attach(void)
> >  		RTE_LOG(DEBUG, EAL, "Mapped segment %u of size 0x%llx\n", s,
> >  				(unsigned long long)mcfg->memseg[s].len);
> >  		s++;
> >+		mapped_hp += num_hp;
> >  	}
> >  	/* unmap the hugepage config file, since we are done using it */
> >  	munmap((void *)(uintptr_t)hp, size);
> 


More information about the dev mailing list