[dpdk-dev] [RFC 4/5] virtio/container: adjust memory initialization process
Tan, Jianfeng
jianfeng.tan at intel.com
Sun Nov 8 12:18:12 CET 2015
> -----Original Message-----
> From: Ananyev, Konstantin
> Sent: Saturday, November 7, 2015 12:22 AM
> To: Tan, Jianfeng; dev at dpdk.org
> Cc: nakajima.yoshihiro at lab.ntt.co.jp; zhbzg at huawei.com; mst at redhat.com;
> gaoxiaoqiu at huawei.com; oscar.zhangbo at huawei.com;
> ann.zhuangyanying at huawei.com; zhoujingbin at huawei.com;
> guohongzhen at huawei.com
> Subject: RE: [dpdk-dev] [RFC 4/5] virtio/container: adjust memory
> initialization process
>
> Hi,
>
> > -----Original Message-----
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Jianfeng Tan
> > Sent: Thursday, November 05, 2015 6:31 PM
> > To: dev at dpdk.org
> > Cc: nakajima.yoshihiro at lab.ntt.co.jp; zhbzg at huawei.com;
> > mst at redhat.com; gaoxiaoqiu at huawei.com; oscar.zhangbo at huawei.com;
> > ann.zhuangyanying at huawei.com; zhoujingbin at huawei.com;
> > guohongzhen at huawei.com
> > Subject: [dpdk-dev] [RFC 4/5] virtio/container: adjust memory
> > initialization process
> >
> > When using virtio for container, we should specify --no-huge so that
> > in memory initialization, shm_open() is used to alloc memory from
> > tmpfs filesystem /dev/shm/.
> >
> > Signed-off-by: Huawei Xie <huawei.xie at intel.com>
> > Signed-off-by: Jianfeng Tan <jianfeng.tan at intel.com>
> > ---
......
> > +int
> > +rte_memseg_info_get(int index, int *pfd, uint64_t *psize, void
> > +**paddr) {
> > + struct rte_mem_config *mcfg;
> > + mcfg = rte_eal_get_configuration()->mem_config;
> > +
> > + *pfd = mcfg->memseg[index].fd;
> > + *psize = (uint64_t)mcfg->memseg[index].len;
> > + *paddr = (void *)(uint64_t)mcfg->memseg[index].addr;
> > + return 0;
> > +}
>
> Wonder who will use that function?
> Can't see any references to that function in that patch or next.
This function is used in 1/5, when virtio front end needs to send VHOST_USER_SET_MEM_TABLE to back end.
> > +
> > /*
> > * Get physical address of any mapped virtual address in the current
> process.
> > */
> > @@ -1044,6 +1059,42 @@ calc_num_pages_per_socket(uint64_t *
> memory,
> > return total_num_pages;
> > }
> >
> > +static void *
> > +rte_eal_shm_create(int *pfd)
> > +{
> > + int ret, fd;
> > + char filepath[256];
> > + void *vaddr;
> > + uint64_t size = internal_config.memory;
> > +
> > + sprintf(filepath, "/%s_cvio", internal_config.hugefile_prefix);
> > +
> > + fd = shm_open(filepath, O_CREAT | O_RDWR, S_IRUSR | S_IWUSR);
> > + if (fd < 0) {
> > + rte_panic("shm_open %s failed: %s\n", filepath,
> strerror(errno));
> > + }
> > + ret = flock(fd, LOCK_EX);
> > + if (ret < 0) {
> > + close(fd);
> > + rte_panic("flock %s failed: %s\n", filepath, strerror(errno));
> > + }
> > +
> > + ret = ftruncate(fd, size);
> > + if (ret < 0) {
> > + rte_panic("ftruncate failed: %s\n", strerror(errno));
> > + }
> > + /* flag: MAP_HUGETLB */
>
> Any explanation what that comment means here?
> Do you plan to use MAP_HUGETLb in the call below or ...?
Yes, it's a todo item. Shm_open() just uses a tmpfs mounted at /dev/shm. So I wonder maybe we can use this flag to make
sure os allocates hugepages here if user would like to use hugepages.
>>
......
> > @@ -1081,8 +1134,8 @@ rte_eal_hugepage_init(void)
> >
> > /* hugetlbfs can be disabled */
> > if (internal_config.no_hugetlbfs) {
> > - addr = mmap(NULL, internal_config.memory, PROT_READ |
> PROT_WRITE,
> > - MAP_PRIVATE | MAP_ANONYMOUS, 0, 0);
> > + int fd;
> > + addr = rte_eal_shm_create(&fd);
>
> Why do you remove ability to map(dev/zero) here?
> Probably not everyone plan to use --no-hugepages only inside containers.
>From my understanding, mmap here is just to allocate some memory, which is initialized to be all zero. I cannot understand what's
the relationship with /dev/zero. rte_eal_shm_create() can do the original function, plus it generates a fd to point to this chunk of
memory. This fd is indispensable in vhost protocol when VHOST_USER_SET_MEM_TABLE using sendmsg().
>
>
> > if (addr == MAP_FAILED) {
> > RTE_LOG(ERR, EAL, "%s: mmap() failed: %s\n",
> __func__,
> > strerror(errno));
> > @@ -1093,6 +1146,7 @@ rte_eal_hugepage_init(void)
> > mcfg->memseg[0].hugepage_sz = RTE_PGSIZE_4K;
> > mcfg->memseg[0].len = internal_config.memory;
> > mcfg->memseg[0].socket_id = 0;
> > + mcfg->memseg[0].fd = fd;
> > return 0;
> > }
> >
> > diff --git a/lib/librte_mempool/rte_mempool.c
> > b/lib/librte_mempool/rte_mempool.c
> > index e57cbbd..8f8852b 100644
> > --- a/lib/librte_mempool/rte_mempool.c
> > +++ b/lib/librte_mempool/rte_mempool.c
> > @@ -453,13 +453,6 @@ rte_mempool_xmem_create(const char *name,
> unsigned n, unsigned elt_size,
> > rte_errno = EINVAL;
> > return NULL;
> > }
> > -
> > - /* check that we have both VA and PA */
> > - if (vaddr != NULL && paddr == NULL) {
> > - rte_errno = EINVAL;
> > - return NULL;
> > - }
> > -
> > /* Check that pg_num and pg_shift parameters are valid. */
> > if (pg_num < RTE_DIM(mp->elt_pa) || pg_shift >
> MEMPOOL_PG_SHIFT_MAX) {
> > rte_errno = EINVAL;
> > @@ -596,8 +589,15 @@ rte_mempool_xmem_create(const char *name,
> > unsigned n, unsigned elt_size,
> >
> > /* mempool elements in a separate chunk of memory. */
> > } else {
> > + /* when VA is specified, PA should be specified? */
> > + if (rte_eal_has_hugepages()) {
> > + if (paddr == NULL) {
> > + rte_errno = EINVAL;
> > + return NULL;
> > + }
> > + memcpy(mp->elt_pa, paddr, sizeof (mp->elt_pa[0])
> * pg_num);
> > + }
> > mp->elt_va_start = (uintptr_t)vaddr;
> > - memcpy(mp->elt_pa, paddr, sizeof (mp->elt_pa[0]) *
> pg_num);
>
> Could you explain the reason for that change?
> Specially why mempool over external memory now only allowed for
> hugepages config?
> Konstantin
Oops, you're right! This change was previously for creating a mbuf mempool at a given vaddr and without
giving any paddr[]. And now we don't need to care about neither vaddr nor paddr[] so I should have reverted
change in this file.
>
> > }
> >
> > mp->elt_va_end = mp->elt_va_start;
> > --
> > 2.1.4
More information about the dev
mailing list