[dpdk-dev] [PATCH v2 3/6] bus: introduce device level DMA memory mapping
Burakov, Anatoly
anatoly.burakov at intel.com
Thu Feb 28 13:14:56 CET 2019
On 21-Feb-19 2:50 PM, Shahaf Shuler wrote:
> The DPDK APIs expose 3 different modes to work with memory used for DMA:
>
> 1. Use the DPDK owned memory (backed by the DPDK provided hugepages).
> This memory is allocated by the DPDK libraries, included in the DPDK
> memory system (memseg lists) and automatically DMA mapped by the DPDK
> layers.
>
> 2. Use memory allocated by the user and register to the DPDK memory
> systems. Upon registration of memory, the DPDK layers will DMA map it
> to all needed devices. After registration, allocation of this memory
> will be done with rte_*malloc APIs.
>
> 3. Use memory allocated by the user and not registered to the DPDK memory
> system. This is for users who wants to have tight control on this
> memory (e.g. avoid the rte_malloc header).
> The user should create a memory, register it through rte_extmem_register
> API, and call DMA map function in order to register such memory to
> the different devices.
>
> The scope of the patch focus on #3 above.
>
> Currently the only way to map external memory is through VFIO
> (rte_vfio_dma_map). While VFIO is common, there are other vendors
> which use different ways to map memory (e.g. Mellanox and NXP).
>
> The work in this patch moves the DMA mapping to vendor agnostic APIs.
> Device level DMA map and unmap APIs were added. Implementation of those
> APIs was done currently only for PCI devices.
>
> For PCI bus devices, the pci driver can expose its own map and unmap
> functions to be used for the mapping. In case the driver doesn't provide
> any, the memory will be mapped, if possible, to IOMMU through VFIO APIs.
>
> Application usage with those APIs is quite simple:
> * allocate memory
> * call rte_extmem_register on the memory chunk.
> * take a device, and query its rte_device.
> * call the device specific mapping function for this device.
>
> Future work will deprecate the rte_vfio_dma_map and rte_vfio_dma_unmap
> APIs, leaving the rte device APIs as the preferred option for the user.
>
> Signed-off-by: Shahaf Shuler <shahafs at mellanox.com>
> ---
<snip>
> +
> + if (!pdev || !pdev->driver) {
> + rte_errno = EINVAL;
> + return -rte_errno;
> + }
We could put a check in here to see if the memory has been registered
with DPDK. Just call rte_mem_virt2memseg_list(addr) - if it returns
NULL, the memory wasn't registered, so you can throw an error. Not sure
of appropriate errno in that case - ENODEV? EINVAL?
> + if (pdev->driver->dma_map)
> + return pdev->driver->dma_map(pdev, addr, iova, len);
> + /**
> + * In case driver don't provides any specific mapping
> + * try fallback to VFIO.
> + */
> + if (pdev->kdrv == RTE_KDRV_VFIO)
> + return rte_vfio_container_dma_map
> + (RTE_VFIO_DEFAULT_CONTAINER_FD, (uintptr_t)addr,
> + iova, len);
<snip>
> +rte_dev_dma_map(struct rte_device *dev, void *addr, uint64_t iova,
> + size_t len)
> +{
> + if (dev->bus->dma_map == NULL || len == 0) {
> + rte_errno = EINVAL;
> + return -rte_errno;
> + }
> + /* Memory must be registered through rte_extmem_* APIs */
> + if (rte_mem_virt2memseg(addr, NULL) == NULL) {
No need to call rte_mem_virt2memseg - rte_mem_virt2memseg_list will do.
> + rte_errno = EINVAL;
> + return -rte_errno;
> + }
> +
> + return dev->bus->dma_map(dev, addr, iova, len);
> +}
> +
> +int
> +rte_dev_dma_unmap(struct rte_device *dev, void *addr, uint64_t iova,
> + size_t len)
> +{
> + if (dev->bus->dma_unmap == NULL || len == 0) {
> + rte_errno = EINVAL;
> + return -rte_errno;
> + }
I think attempting to unmap a memory region that isn't registered should
be an error, so rte_mem_virt2memseg_list call should be here too.
> +
> + return dev->bus->dma_unmap(dev, addr, iova, len);
> +}
> diff --git a/lib/librte_eal/common/include/rte_bus.h b/lib/librte_eal/common/include/rte_bus.h
> index 6be4b5cabe..4faf2d20a0 100644
> --- a/lib/librte_eal/common/include/rte_bus.h
> +++ b/lib/librte_eal/common/include/rte_bus.h
> @@ -168,6 +168,48 @@ typedef int (*rte_bus_unplug_t)(struct rte_device *dev);
> typedef int (*rte_bus_parse_t)(const char *name, void *addr);
<snip>
> --- a/lib/librte_eal/common/include/rte_dev.h
> +++ b/lib/librte_eal/common/include/rte_dev.h
> @@ -515,4 +515,47 @@ rte_dev_hotplug_handle_enable(void);
> int __rte_experimental
> rte_dev_hotplug_handle_disable(void);
>
> +/**
> + * Device level DMA map function.
> + * After a successful call, the memory segment will be mapped to the
> + * given device.
here and in unmap:
@note please register memory first
?
> + *
> + * @param dev
> + * Device pointer.
> + * @param addr
> + * Virtual address to map.
> + * @param iova
> + * IOVA address to map.
> + * @param len
> + * Length of the memory segment being mapped.
> + *
> + * @return
> + * 0 if mapping was successful.
> + * Negative value and rte_errno is set otherwise.
Here and in other similar places: why are we setting rte_errno *and*
returning -rte_errno? Wouldn't returning -1 be enough?
--
Thanks,
Anatoly
More information about the dev
mailing list