[dpdk-dev] [PATCH] eal: add option --iso-cmem for external custom memory

Rajesh Ravi rajesh.ravi at broadcom.com
Wed Oct 30 20:50:02 CET 2019


Thanks Anatoly.
Please find  inline below:

[Anatoly] vfio_mem_event_callback() is called every time memory is added to
a
heap. That includes internal and external memory

[Rajesh] malloc_heap_add_external_memory() does call
eal_memalloc_mem_event_notify
[ instead of  vfio_mem_event_callback() ]
              But, no callback function is getting called from inside
eal_memalloc_mem_event_notify()
              execution flow is not entering inside following loop:

              *TAILQ_FOREACH(entry, &mem_event_callback_list, next) {*



*                 RTE_LOG(DEBUG, EAL, "Calling mem event callback
'%s:%p'\n",                         entry->name, entry->arg);
   entry->clb(event, start, len, entry->arg);              }*

Do you mean to say,  we are supposed to explicitly register a callback
which separately builds  iommu tables in addition to calling
rte_malloc_heap_memory_add()  API?

Regards,
Rajesh


On Fri, Oct 25, 2019 at 8:32 PM Burakov, Anatoly <anatoly.burakov at intel.com>
wrote:

> On 25-Oct-19 1:53 PM, Rajesh Ravi wrote:
> > Thanks Anatoly.
> > I think now we 're close to sync. You understood it right: the problem
> > is dma/iommu mappings are not getting created.
> >
> > 1)vfio_mem_event_callback() is getting called later when memory is
> > allocated from heap but not when external memory is added to heap
>
> Please correct me if i'm wrong, but i believe that's not true.
> vfio_mem_event_callback() is called every time memory is added to a
> heap. That includes internal and external memory [1] [2].
>
> [1] http://git.dpdk.org/dpdk/tree/lib/librte_eal/common/rte_malloc.c#n397
> [2]
> http://git.dpdk.org/dpdk/tree/lib/librte_eal/common/malloc_heap.c#n1231
>
> The only way the callback doesn't get called is if VFIO doesn't get
> initialize and/or fails to register said callback.
>
> >
> > 2)vfio_dma_mem_map() is called inside vfio_mem_event_callback() but it's
> > skipped in case of external memory because it's invoked inside following
> > if condition:
> >      if (rte_eal_iova_mode() == RTE_IOVA_VA && msl->external == 0)
> >      [ I verified this also with gdb by stepping over line by line
> > inside vfio_mem_event_callback() ]
> >    We 're using DPDK v19.02 but I see almost same code and flow in DPDK
> > 19.08-rc4 as well.
>
> I believe that is not true either.
>
> The line you're referring to [1] does *not* skip external memory
> segments (or rather, that's not its intention). Instead, it is checking
> if we can assume that the region is IOVA-contiguous. For IOVA as VA
> mode, we can assume that, *but not for external memory*, which is why it
> is skipped.
>
> After this check, there is a loop that goes memseg by memseg [2], and it
> does *not* skip external memory. The memseg will only get skipped if it
> doesn't have a valid IOVA address [3], which will only happen if you
> don't provide an IOVA table at heap creation time (we can't make
> assumptions about desired IOVA addresses, so we just skip them). Perhaps
> it should output a debug message when it does that, but the important
> thing is, the code does nothing to skip external memory.
>
> [1] http://git.dpdk.org/dpdk/tree/lib/librte_eal/linux/eal/eal_vfio.c#n521
> [2] http://git.dpdk.org/dpdk/tree/lib/librte_eal/linux/eal/eal_vfio.c#n533
> [3] http://git.dpdk.org/dpdk/tree/lib/librte_eal/linux/eal/eal_vfio.c#n536
>
> >
> ---------------------------------------------------------------------------------------------------------------------------------
> > [ _Further explanation of point 1) :_
> >   I used gdb and set break point and checked back trace to verify the
> > call stack trace: first I set break point at rte_malloc_heap_memory_add.
> >    Then I hit break point and then set break point at
> > vfio_mem_event_callback. I see from back trace that it's from
> > malloc_heap_alloc () with event = RTE_MEM_EVENT_ALLOC as can be seen
> > from log at the end of this mail.
> >
> >   And also I don't see any event for memory ADD as below:
> > enum rte_mem_event {
> >           RTE_MEM_EVENT_ALLOC = 0, /**< Allocation event. */
> >           RTE_MEM_EVENT_FREE,      /**< Deallocation event. */
> >   }; ]
> >
> > *_Log_*
> > *_
> > _*
> > (gdb) bt
> > #0  vfio_mem_event_callback (type=RTE_MEM_EVENT_ALLOC,
> > addr=0x200016c00000, len=4194304, arg=0x0) at
> >
> /usr/src/debug/brcm-dpdk/gitAUTOINC+ca5edf7f64-r0/lib/librte_eal/linuxapp/eal/eal_vfio.c:512
> > #1  0x00000000004b78f0 in eal_memalloc_mem_event_notify
> > (event=event at entry=RTE_MEM_EVENT_ALLOC,
> > start=start at entry=0x200016c00000, len=len at entry=4194304)
> >      at
> >
> /usr/src/debug/brcm-dpdk/gitAUTOINC+ca5edf7f64-r0/lib/librte_eal/common/eal_common_memalloc.c:248
> > #2  0x00000000004c7a90 in try_expand_heap_primary (contig=false,
> > bound=0, align=35184753770496, flags=0, socket=0,
> > elt_size=35184753770496, pg_sz=2097152, heap=0x569b3c
> > <early_mem_config+10364>)
> >      at
> >
> /usr/src/debug/brcm-dpdk/gitAUTOINC+ca5edf7f64-r0/lib/librte_eal/common/malloc_heap.c:412
> > #3  try_expand_heap (heap=heap at entry=0x569b3c <early_mem_config+10364>,
> > pg_sz=2097152, elt_size=elt_size at entry=8192, socket=socket at entry=0,
> > flags=flags at entry=0, align=align at entry=2097152, bound=bound at entry=0,
> > contig=contig at entry=false)
> >      at
> >
> /usr/src/debug/brcm-dpdk/gitAUTOINC+ca5edf7f64-r0/lib/librte_eal/common/malloc_heap.c:494
> > #4  0x00000000004c7f28 in alloc_more_mem_on_socket
> > (heap=heap at entry=0x569b3c <early_mem_config+10364>,
> > size=size at entry=8192, socket=0, flags=flags at entry=0,
> > align=align at entry=2097152, bound=bound at entry=0, contig=contig at entry
> =false)
> >      at
> >
> /usr/src/debug/brcm-dpdk/gitAUTOINC+ca5edf7f64-r0/lib/librte_eal/common/malloc_heap.c:622
> > #5  0x00000000004c822c in malloc_heap_alloc_on_heap_id
> > (size=size at entry=8192, heap_id=heap_id at entry=0, flags=flags at entry=0,
> > align=<optimized out>, align at entry=2097152, bound=bound at entry=0,
> > contig=contig at entry=false, type=0x0)
> >      at
> >
> /usr/src/debug/brcm-dpdk/gitAUTOINC+ca5edf7f64-r0/lib/librte_eal/common/malloc_heap.c:676
> > #6  0x00000000004c8464 in malloc_heap_alloc (type=type at entry=0x0,
> > size=size at entry=8192, socket_arg=<optimized out>, flags=flags at entry=0,
> > align=2097152, bound=bound at entry=0, contig=contig at entry=false)
> >      at
> >
> /usr/src/debug/brcm-dpdk/gitAUTOINC+ca5edf7f64-r0/lib/librte_eal/common/malloc_heap.c:714
> > #7  0x00000000004c498c in rte_malloc_socket (type=type at entry=0x0,
> > size=size at entry=8192, align=align at entry=2097152, socket_arg=<optimized
> > out>, socket_arg at entry=-1)
> >      at
> >
> /usr/src/debug/brcm-dpdk/gitAUTOINC+ca5edf7f64-r0/lib/librte_eal/common/rte_malloc.c:58
> > #8  0x000000000049fe24 in spdk_malloc (size=size at entry=8192,
> > align=align at entry=2097152, phys_addr=phys_addr at entry=0x0,
> > socket_id=socket_id at entry=-1, flags=flags at entry=3)
> >      at
> > /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/env_dpdk/env.c:95
> > #9  0x000000000049fe94 in spdk_zmalloc (size=8192,
> > align=align at entry=2097152, phys_addr=phys_addr at entry=0x0,
> > socket_id=socket_id at entry=-1, flags=flags at entry=3) at
> > /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/env_dpdk/env.c:108
> > #10 0x0000000000431f64 in nvme_pcie_qpair_construct
> > (qpair=0x200016aed1d8, opts=opts at entry=0x0) at
> >
> /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/nvme/nvme_pcie.c:1054
> > #11 0x000000000043233c in nvme_pcie_ctrlr_construct_admin_qpair
> > (ctrlr=0x200016aed340) at
> >
> /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/nvme/nvme_pcie.c:719
> > #12 nvme_pcie_ctrlr_construct (trid=<optimized out>, opts=<optimized
> > out>, devhandle=0x70f4b0) at
> >
> /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/nvme/nvme_pcie.c:881
> > #13 0x0000000000437674 in nvme_transport_ctrlr_construct
> > (trid=trid at entry=0xffffffffec98, opts=opts at entry=0xffffffffea00,
> > devhandle=devhandle at entry=0x70f4b0)
> >      at
> >
> /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/nvme/nvme_transport.c:99
> > #14 0x0000000000436b08 in nvme_ctrlr_probe
> > (trid=trid at entry=0xffffffffec98, probe_ctx=0x70eae0,
> > devhandle=devhandle at entry=0x70f4b0) at
> > /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/nvme/nvme.c:426
> > #15 0x0000000000430d40 in pcie_nvme_enum_cb (ctx=0xffffffffefe8,
> > pci_dev=0x70f4b0) at
> >
> /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/nvme/nvme_pcie.c:752
> > #16 0x00000000004a2e20 in spdk_pci_device_init (_drv=0x51b660
> > <g_nvme_pci_drv>, _dev=0x61b5f0) at
> > /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/env_dpdk/pci.c:347
> > #17 0x00000000004d5c68 in rte_pci_probe_one_driver (dev=0x61b5f0,
> > dr=0x51b660 <g_nvme_pci_drv>) at
> >
> /usr/src/debug/brcm-dpdk/gitAUTOINC+ca5edf7f64-r0/drivers/bus/pci/pci_common.c:185
> > #18 pci_probe_all_drivers (dev=0x61b5f0) at
> >
> /usr/src/debug/brcm-dpdk/gitAUTOINC+ca5edf7f64-r0/drivers/bus/pci/pci_common.c:259
> > #19 rte_pci_probe () at
> >
> /usr/src/debug/brcm-dpdk/gitAUTOINC+ca5edf7f64-r0/drivers/bus/pci/pci_common.c:294
> > #20 0x00000000004bb490 in rte_bus_probe () at
> >
> /usr/src/debug/brcm-dpdk/gitAUTOINC+ca5edf7f64-r0/lib/librte_eal/common/eal_common_bus.c:100
> > #21 0x00000000004a3030 in spdk_pci_enumerate (driver=0x51b660
> > <g_nvme_pci_drv>, enum_cb=enum_cb at entry=0x430c50 <pcie_nvme_enum_cb>,
> > enum_ctx=enum_ctx at entry=0xffffffffefe8)
> >      at
> > /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/env_dpdk/pci.c:502
> > #22 0x00000000004319c4 in nvme_pcie_ctrlr_scan (probe_ctx=0x70eae0,
> > direct_connect=<optimized out>) at
> >
> /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/nvme/nvme_pcie.c:780
> > #23 0x00000000004376ac in nvme_transport_ctrlr_scan
> > (probe_ctx=probe_ctx at entry=0x70eae0,
> > direct_connect=direct_connect at entry=false) at
> >
> /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/nvme/nvme_transport.c:106
> > #24 0x0000000000436be4 in spdk_nvme_probe_internal
> > (probe_ctx=probe_ctx at entry=0x70eae0,
> > direct_connect=direct_connect at entry=false) at
> > /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/nvme/nvme.c:553
> > #25 0x0000000000436fc4 in spdk_nvme_probe_async (trid=0xfffffffff0d8,
> > cb_ctx=cb_ctx at entry=0x6402c0, probe_cb=probe_cb at entry=0x420b70
> > <probe_cb>, attach_cb=0x421b50 <attach_cb>, remove_cb=0x0)
> >      at
> > /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/nvme/nvme.c:1078
> > #26 0x00000000004372d0 in spdk_nvme_probe (trid=<optimized out>,
> > trid at entry=0x0, cb_ctx=cb_ctx at entry=0x6402c0,
> > probe_cb=probe_cb at entry=0x420b70 <probe_cb>,
> > attach_cb=attach_cb at entry=0x421b50 <attach_cb>,
> > remove_cb=remove_cb at entry=0x0)
> >      at
> > /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/nvme/nvme.c:625
> > #27 0x00000000004223b0 in bdev_nvme_library_init () at
> >
> /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/bdev/nvme/bdev_nvme.c:1539
> > #28 0x000000000048cf80 in spdk_bdev_modules_init () at
> > /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/bdev/bdev.c:963
> > #29 spdk_bdev_initialize (cb_fn=<optimized out>, cb_arg=<optimized out>)
> > at /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/bdev/bdev.c:1084
> > #30 0x00000000004811b8 in spdk_subsystem_init (cb_fn=<optimized out>,
> > cb_arg=<optimized out>) at
> >
> /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/event/subsystem.c:178
> > #31 0x0000000000484720 in _spdk_msg_queue_run_batch (max_msgs=<optimized
> > out>, thread=0x60c600) at
> > /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/thread/thread.c:406
> > #32 spdk_thread_poll (thread=thread at entry=0x60c600,
> > max_msgs=max_msgs at entry=0, now=now at entry=2386134829398) at
> > /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/thread/thread.c:462
> > #33 0x000000000048040c in _spdk_reactor_run (arg=0x60bcc0) at
> > /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/event/reactor.c:272
> > #34 0x0000000000480990 in spdk_reactors_start () at
> > /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/event/reactor.c:381
> > #35 0x000000000047f7d4 in spdk_app_start
> > (opts=opts at entry=0xfffffffffa78, start_fn=start_fn at entry=0x407350
> > <nvmf_tgt_started>, arg1=arg1 at entry=0x0) at
> > /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/event/app.c:692
> > #36 0x0000000000405b2c in main (argc=7, argv=0xfffffffffc78) at
> >
> /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/app/nvmf_tgt/nvmf_main.c:75
> >
> > Regards,
> > Rajesh
> >
> > On Thu, Oct 24, 2019 at 5:13 PM Burakov, Anatoly
> > <anatoly.burakov at intel.com <mailto:anatoly.burakov at intel.com>> wrote:
> >
> >     On 22-Oct-19 8:56 AM, Rajesh Ravi wrote:
> >      > Hi Anatoly,
> >      > I tried calling rte_vfio_dma_map() but, it  failed for me because
> >      >
> >      > vfio_dma_mem_map() failed with error:VFIO support not initialized
> >      > because: default_vfio_cfg->vfio_iommu_type=NULL (
> container_dma_map
> >      > calls vfio_dma_mem_map  with vfio_cfg = default_vfio_cfg )
> >      >
> >      > call sequence: rte_vfio_dma_map() -->container_dma_map()
> >      > --> vfio_dma_mem_map
> >      >
> >      > Regards,
> >      > Rajesh
> >      >
> >      > On Mon, Oct 21, 2019 at 9:16 PM Rajesh Ravi
> >     <rajesh.ravi at broadcom.com <mailto:rajesh.ravi at broadcom.com>
> >      > <mailto:rajesh.ravi at broadcom.com
> >     <mailto:rajesh.ravi at broadcom.com>>> wrote:
> >      >
> >      >     Thanks Anatoly for prompt response. Sorry for the delayed
> >     response,
> >      >     I took some time to reverify with SPDK.
> >      >
> >      >     Infact, I do want the iommu mapping to be performed. I don't
> >     want it
> >      >     to be bypassed by type1_map()
> >      >     [lib/librte_eal/linuxapp/eal/eal_vfio.c] for external memory.
> >      >
> >      >     Then, if I understood correctly, you suggested to call
> >      >     rte_vfio_dma_map() as an alternative.
> >      >
> >      >     _Context & clarification_
> >      >
> >      >     1) We 're using DPDK to manage/allocate memory for SPDK
> >     through heap
> >      >     API.
> >      >
> >      >        The only DPDK APIs we 're calling are:
> >      >        A)rte_malloc_heap_memory_add() to add external memory to
> heap.
> >      >        B)rte_malloc_heap_get_socket() & rte_malloc_socket() to
> >     allocate
> >      >     memory
> >      >
> >      >     /Are you suggesting to make a call to rte_vfio_dma_map() from
> >     spdk,
> >      >     in addition to the APIs listed under 1)A & 1)B instead of
> >     modifying
> >      >     DPDK vfio functions?/
> >      >         Please confirm, Probably I missed the context and might
> >     not have
> >      >     understood fully.
> >      >
> >      >
> >      >     2) .dma_user_map_func=vfio_type1_dma_mem_map() is called from
> 2
> >      >     paths in dpdk. In either case call to dma_user_map_func() is
> >     skipped.
> >      >         A) _During the startup, as you said before:_
> >      >           rte_vfio_setup_device() --> type1_map()
> >      >
> >      >         B)During allocation event:
> >      >              vfio_mem_event_callback()
> (type=RTE_MEM_EVENT_ALLOC,..)
> >      >     -->vfio_dma_mem_map() -->dma_user_map_func()
> >      >
> >
>  ----------------------------------------------------------------------------------------------------------------------------------------------------------------------
> >      >     _Conclusion_
> >      >
> >      >     So we have 2 alternatives:
> >      >
> >      >     A) make additional call to rte_vfio_dma_map() API after adding
> >      >     external memory using rte_malloc_heap_memory_add()  API.
> >      >
> >      >     B) remove msl->external check which bypasses call to
> >      >     dma_user_map_func() in DPDK.
> >      >
> >      >     I modified DPDK functions [Option B) ]. I guess you 're
> >      >     suggesting option A)
> >      >
> >      >     Please confirm.
> >      >
> >      >     Regards,
> >      >     Rajesh
> >
> >     Hi Rajesh,
> >
> >     Apologies for delayed response.
> >
> >     I'm still not sure i understand your problem.
> >
> >     When adding memory with rte_malloc_heap_memory_add(), this memory
> >     is, as
> >     far as i can tell, automatically mapped for DMA mapping with VFIO.
> This
> >     happens because:
> >
> >     rte_malloc_heap_memory_add() -> malloc_heap_add_external_memory() ->
> >     eal_memalloc_mem_event_notify() -> vfio_mem_event_callback() ->
> >     vfio_dma_mem_map()
> >
> >     VFIO registers itself for mem event callbacks in
> >     eal_vfio.c:rte_vfio_setup_device():791.
> >
> >     So, there is no need to map anything for DMA when you're creating
> >     external heaps - it is being mapped automatically by EAL.
> >
> >     For an example implementation have a look at testpmd, specifically
> >     testpmd.c:mbuf_pool_create() for MP_ALLOC_XMEM type. This calls
> >     testpmd.c:setup_extmem(), which creates an external heap. Notice how
> it
> >     doesn't do anything to map the memory for DMA, because that is
> already
> >     done by EAL. You can verify this working by adding --mp-alloc=xmem
> >     argument to testpmd to force it to use external memory for its
> mempools.
> >
> >     So, what is the actual issue here, given that the memory is mapped
> for
> >     DMA by EAL automatically when it is added?
> >
> >     --
> >     Thanks,
> >     Anatoly
> >
> >
> >
> > --
> > Regards,
> > Rajesh
>
>
> --
> Thanks,
> Anatoly
>


-- 
Regards,
Rajesh


More information about the dev mailing list