[dpdk-dev] [PATCH 00/10] Infrastructure to detect iova mapping on the bus

santosh santosh.shukla at caviumnetworks.com
Tue Jul 4 13:20:56 CEST 2017


On Tuesday 04 July 2017 03:40 PM, Thomas Monjalon wrote:

> Hi Santosh,
> Let's try to make this proposal clearer in order to have some reviews.
>
> 08/06/2017 13:05, Santosh Shukla:
>> Q) Why do we need such infrastructure?
>>
>> A) Some NPU hardware like OCTEONTX follows push model to get the packet
>> from the pktio device. Where packet allocation and freeing done
>> by the HW. Since HW can operate only on IOVA with help of SMMU/IOMMU,
> Some readers may not know IOVA: IO Virtual address.
> Some explanations:
> 	https://www.kernel.org/doc/Documentation/Intel-IOMMU.txt
> 	http://vfio.blogspot.fr/2014/08/iommu-groups-inside-and-out.html
>
> It must be said that SMMU is equivalent to IOMMU for ARM:
> 	https://developer.arm.com/products/system-ip/system-controllers/system-memory-management-unit
>
>> when packet receives from the Ethernet device, it is the IOVA address
>> (which is PA in existing scheme).
> You mean that we are currently using only Physical Address (PA)?

Yes. DPDK default approach is iova=pa. Refer [1], latest example [2].
[1] http://dpdk.org/browse/dpdk/tree/lib/librte_eal/linuxapp/eal/eal_vfio.c#n709
[2] http://dpdk.org/browse/dpdk/tree/drivers/bus/fslmc/fslmc_vfio.c#n231

>> Mapping IOVA as PA is expensive on those HW, where every packet
>> needs to be converted to VA from PA/IOVA.
> Please, could you explain how and where addresses are converted currently?

HW(iommu/smmu) does. 
VFIO case for example: user could program vfio(s) dma_map.iova as _pa or _va.

And below api does address translation in dpdk:
rte_mem_virt2phy
rte_malloc_virt2phy
rte_mempool_virt2phy.

>
>> This patchset proposes the method to autodetect the preferred
>> IOVA mode for a device. Summary of IOVA scheme:
>> - If all the devices are iommu capable and support IOMMU
>>   capable driver then selects IOVA_VA.
>> - If any of the devices are non-iommu then use default IOVA
>>   scheme ie. IOVA_PA.
>> - If no device found then IOVA scheme would be
>>   IOVA_DC (Don't care).
> I think you should better describe these modes and how they behave.

Aren't they self explanatory? meaning
0) If I program my dma device (of-course, iommu-backed-dma-device) as iova = va, 
then expect dma address (iova) a _va.
1) If I program my dma device (noiommu, e.g. vfio-noiommu or igb_uio case) as iova=pa,
then expect _pa.
2) If I program my dma device (+iommu-backed) as iova = pa
then expect dma address as _pa. 

above described approach tested and works for both x86 and arm64.

The default scheme for iova mapping is iova=pa. And framework
allows user to explicitly override any scheme via --iova-mode=<>.

Thanks.

>> To achieve that, two global APIs introduced:
>> - rte_bus_get_iommu_class
>> - rte_pci_get_iommu_class
>>
>> Return values for those APIs are:
>> enum rte_iova_mod {
>>         RTE_IOVA_DC, /* Don't care */
>>         RTE_IOVA_PA,
>>         RTE_IOVA_VA
>> }
>>
>> Those are the bus policy for selecting IOVA mode. In case user
>> want to override bus IOVA mapping then added an EAL option
>> "--iova-mode=<string>". User to pass string format 'pa' --> IOVA_PA,
>> 'va' --> IOVA_VA.
>>
>> To support new eal option, adding global API:
>> - rte_eal_iova_mode
>>
>> Patch Summary:
>> 2) 1st - 2th patch: Adds infrastructure in linuxapp and bsdapp
>> layer.
>> 1) 3rd patch: Introduces global bus api named rte_bus_get_iommu_class.
>> 3) 4th patch: Add new eal option called --iova-mode=<mode-string>.
>> 4) 5th - 6th patch: Logic to detect iova scheme.
>> 5) 9th patch: Check IOVA mode before programing vfio dma_map.iova.
>> Default scheme is IOVA_PA.
>> 6) 10th-12th patch: Check for IOVA_VA mode in below APIs
>>         - rte_mem_virt2phy
>>         - rte_mempool_virt2phy
>>         - rte_malloc_virt2phy
>> If set then return paddr=vaddr, else return value from default
>> implementation.



More information about the dev mailing list