[dpdk-dev] [PATCH v2 3/6] bus: introduce device level DMA memory mapping

Burakov, Anatoly anatoly.burakov at intel.com
Thu Feb 28 13:14:56 CET 2019


On 21-Feb-19 2:50 PM, Shahaf Shuler wrote:
> The DPDK APIs expose 3 different modes to work with memory used for DMA:
> 
> 1. Use the DPDK owned memory (backed by the DPDK provided hugepages).
> This memory is allocated by the DPDK libraries, included in the DPDK
> memory system (memseg lists) and automatically DMA mapped by the DPDK
> layers.
> 
> 2. Use memory allocated by the user and register to the DPDK memory
> systems. Upon registration of memory, the DPDK layers will DMA map it
> to all needed devices. After registration, allocation of this memory
> will be done with rte_*malloc APIs.
> 
> 3. Use memory allocated by the user and not registered to the DPDK memory
> system. This is for users who wants to have tight control on this
> memory (e.g. avoid the rte_malloc header).
> The user should create a memory, register it through rte_extmem_register
> API, and call DMA map function in order to register such memory to
> the different devices.
> 
> The scope of the patch focus on #3 above.
> 
> Currently the only way to map external memory is through VFIO
> (rte_vfio_dma_map). While VFIO is common, there are other vendors
> which use different ways to map memory (e.g. Mellanox and NXP).
> 
> The work in this patch moves the DMA mapping to vendor agnostic APIs.
> Device level DMA map and unmap APIs were added. Implementation of those
> APIs was done currently only for PCI devices.
> 
> For PCI bus devices, the pci driver can expose its own map and unmap
> functions to be used for the mapping. In case the driver doesn't provide
> any, the memory will be mapped, if possible, to IOMMU through VFIO APIs.
> 
> Application usage with those APIs is quite simple:
> * allocate memory
> * call rte_extmem_register on the memory chunk.
> * take a device, and query its rte_device.
> * call the device specific mapping function for this device.
> 
> Future work will deprecate the rte_vfio_dma_map and rte_vfio_dma_unmap
> APIs, leaving the rte device APIs as the preferred option for the user.
> 
> Signed-off-by: Shahaf Shuler <shahafs at mellanox.com>
> ---

<snip>

> +
> +	if (!pdev || !pdev->driver) {
> +		rte_errno = EINVAL;
> +		return -rte_errno;
> +	}

We could put a check in here to see if the memory has been registered 
with DPDK. Just call rte_mem_virt2memseg_list(addr) - if it returns 
NULL, the memory wasn't registered, so you can throw an error. Not sure 
of appropriate errno in that case - ENODEV? EINVAL?

> +	if (pdev->driver->dma_map)
> +		return pdev->driver->dma_map(pdev, addr, iova, len);
> +	/**
> +	 *  In case driver don't provides any specific mapping
> +	 *  try fallback to VFIO.
> +	 */
> +	if (pdev->kdrv == RTE_KDRV_VFIO)
> +		return rte_vfio_container_dma_map
> +				(RTE_VFIO_DEFAULT_CONTAINER_FD, (uintptr_t)addr,
> +				 iova, len);

<snip>

> +rte_dev_dma_map(struct rte_device *dev, void *addr, uint64_t iova,
> +		size_t len)
> +{
> +	if (dev->bus->dma_map == NULL || len == 0) {
> +		rte_errno = EINVAL;
> +		return -rte_errno;
> +	}
> +	/* Memory must be registered through rte_extmem_* APIs */
> +	if (rte_mem_virt2memseg(addr, NULL) == NULL) {

No need to call rte_mem_virt2memseg - rte_mem_virt2memseg_list will do.

> +		rte_errno = EINVAL;
> +		return -rte_errno;
> +	}
> +
> +	return dev->bus->dma_map(dev, addr, iova, len);
> +}
> +
> +int
> +rte_dev_dma_unmap(struct rte_device *dev, void *addr, uint64_t iova,
> +		  size_t len)
> +{
> +	if (dev->bus->dma_unmap == NULL || len == 0) {
> +		rte_errno = EINVAL;
> +		return -rte_errno;
> +	}

I think attempting to unmap a memory region that isn't registered should 
be an error, so rte_mem_virt2memseg_list call should be here too.

> +
> +	return dev->bus->dma_unmap(dev, addr, iova, len);
> +}
> diff --git a/lib/librte_eal/common/include/rte_bus.h b/lib/librte_eal/common/include/rte_bus.h
> index 6be4b5cabe..4faf2d20a0 100644
> --- a/lib/librte_eal/common/include/rte_bus.h
> +++ b/lib/librte_eal/common/include/rte_bus.h
> @@ -168,6 +168,48 @@ typedef int (*rte_bus_unplug_t)(struct rte_device *dev);
>   typedef int (*rte_bus_parse_t)(const char *name, void *addr);

<snip>

> --- a/lib/librte_eal/common/include/rte_dev.h
> +++ b/lib/librte_eal/common/include/rte_dev.h
> @@ -515,4 +515,47 @@ rte_dev_hotplug_handle_enable(void);
>   int __rte_experimental
>   rte_dev_hotplug_handle_disable(void);
>   
> +/**
> + * Device level DMA map function.
> + * After a successful call, the memory segment will be mapped to the
> + * given device.

here and in unmap:

@note please register memory first

?

> + *
> + * @param dev
> + *	Device pointer.
> + * @param addr
> + *	Virtual address to map.
> + * @param iova
> + *	IOVA address to map.
> + * @param len
> + *	Length of the memory segment being mapped.
> + *
> + * @return
> + *	0 if mapping was successful.
> + *	Negative value and rte_errno is set otherwise.

Here and in other similar places: why are we setting rte_errno *and* 
returning -rte_errno? Wouldn't returning -1 be enough?

-- 
Thanks,
Anatoly


More information about the dev mailing list