[dpdk-dev] [PATCH 07/10] linuxapp/eal_vfio: honor iova mode before mapping

Maxime Coquelin maxime.coquelin at redhat.com
Thu Jul 6 16:39:36 CEST 2017



On 07/06/2017 04:13 PM, santosh wrote:
> On Thursday 06 July 2017 06:41 PM, Maxime Coquelin wrote:
> 
>>
>> On 07/06/2017 03:08 PM, Maxime Coquelin wrote:
>>>
>>>
>>> On 07/06/2017 01:19 PM, santosh wrote:
>>>> On Thursday 06 July 2017 04:29 PM, Maxime Coquelin wrote:
>>>>
>>>>>
>>>>> On 07/06/2017 11:49 AM, Jerin Jacob wrote:
>>>>>> -----Original Message-----
>>>>>>> Date: Thu, 6 Jul 2017 09:58:41 +0200
>>>>>>> From: Maxime Coquelin <maxime.coquelin at redhat.com>
>>>>>>> To: Jerin Jacob <jerin.jacob at caviumnetworks.com>
>>>>>>> CC: Santosh Shukla <santosh.shukla at caviumnetworks.com>,
>>>>>>>     thomas at monjalon.net, bruce.richardson at intel.com, dev at dpdk.org,
>>>>>>>     hemant.agrawal at nxp.com, shreyansh.jain at nxp.com, gaetan.rivet at 6wind.com
>>>>>>> Subject: Re: [dpdk-dev] [PATCH 07/10] linuxapp/eal_vfio: honor iova mode
>>>>>>>     before mapping
>>>>>>> User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101
>>>>>>>     Thunderbird/52.1.0
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 07/05/2017 05:43 PM, Jerin Jacob wrote:
>>>>>>>> -----Original Message-----
>>>>>>>>> Date: Wed, 5 Jul 2017 11:14:01 +0200
>>>>>>>>> From: Maxime Coquelin <maxime.coquelin at redhat.com>
>>>>>>>>> To: Santosh Shukla <santosh.shukla at caviumnetworks.com>,
>>>>>>>>>      thomas at monjalon.net, bruce.richardson at intel.com, dev at dpdk.org
>>>>>>>>> CC: jerin.jacob at caviumnetworks.com, hemant.agrawal at nxp.com,
>>>>>>>>>      shreyansh.jain at nxp.com, gaetan.rivet at 6wind.com
>>>>>>>>> Subject: Re: [dpdk-dev] [PATCH 07/10] linuxapp/eal_vfio: honor iova mode
>>>>>>>>>      before mapping
>>>>>>>>> User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101
>>>>>>>>>      Thunderbird/52.1.0
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 06/08/2017 01:05 PM, Santosh Shukla wrote:
>>>>>>>>>> Check iova mode and accordingly map iova to pa or va.
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Santosh Shukla<santosh.shukla at caviumnetworks.com>
>>>>>>>>>> Signed-off-by: Jerin Jacob<jerin.jacob at caviumnetworks.com>
>>>>>>>>>> ---
>>>>>>>>>>       lib/librte_eal/linuxapp/eal/eal_vfio.c | 10 ++++++++--
>>>>>>>>>>       1 file changed, 8 insertions(+), 2 deletions(-)
>>>>>>>>>>
>>>>>>>>>> diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/librte_eal/linuxapp/eal/eal_vfio.c
>>>>>>>>>> index 04914406f..348b7a7f4 100644
>>>>>>>>>> --- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
>>>>>>>>>> +++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
>>>>>>>>>> @@ -706,7 +706,10 @@ vfio_type1_dma_map(int vfio_container_fd)
>>>>>>>>>>               dma_map.argsz = sizeof(struct vfio_iommu_type1_dma_map);
>>>>>>>>>>               dma_map.vaddr = ms[i].addr_64;
>>>>>>>>>>               dma_map.size = ms[i].len;
>>>>>>>>>> -        dma_map.iova = ms[i].phys_addr;
>>>>>>>>>> +        if (rte_eal_iova_mode() == RTE_IOVA_VA)
>>>>>>>>>> +            dma_map.iova = dma_map.vaddr;
>>>>>>>>>> +        else
>>>>>>>>>> +            dma_map.iova = ms[i].phys_addr;
>>>>>>>>>>               dma_map.flags = VFIO_DMA_MAP_FLAG_READ | VFIO_DMA_MAP_FLAG_WRITE;
>>>>>>>>>
>>>>>>>>> IIUC, it is changing default behavior for VFIO devices.
>>>>>>>>>
>>>>>>>>> I see a possible problem, but I'm not sure the case is valid.
>>>>>>>>>
>>>>>>>>> Imagine you have two devices in the iommu group, and the two devices are
>>>>>>>>> used in separate processes. Each process could try two different
>>>>>>>>> physical addresses at the same virtual address, and so the second map
>>>>>>>>> would fail.
>>>>>>>>
>>>>>>>> IMO, Doesn't look like a problem. Here is the data flow
>>>>>>>>
>>>>>>>> 1) The vfio DMA map function(vfio_type1_dma_map()) will be called only
>>>>>>>> on primary process
>>>>>>>> http://dpdk.org/browse/dpdk/tree/lib/librte_eal/linuxapp/eal/eal_vfio.c#n359
>>>>>>>>
>>>>>>>> 2) On secondary process, DPDK rte_eal_huge_page_attach() will make sure
>>>>>>>> that, the Secondary process has the _same_ virtual address as primary or
>>>>>>>> exit from on attach.
>>>>>>>> http://dpdk.org/browse/dpdk/tree/lib/librte_eal/linuxapp/eal/eal_memory.c#n1452
>>>>>>>>
>>>>>>>> 3) Since secondary process adds the mapped the virtual address in step (2).
>>>>>>>> in the page table in OS. On SMMU entry miss(When device
>>>>>>>> request from I/O transaction), OS will load the mapping and update the SMMU
>>>>>>>> "context" with page tables from MMU.
>>>>>>>
>>>>>>> Ok thanks for the detailed info, but what about the case where the same
>>>>>>> iommu group is used by two primary processes?
>>>>>>
>>>>>> Does that case exist with DPDK? We always need to blacklist same BDF in
>>>>>> the secondary process to make things work with existing DPDK setup. Which
>>>>>> make sense as well. Only primary process configures the HW blocks.
>>>>>
>>>>> I meant the case when two BDF are in the same IOMMU group (if ACS is not
>>>>> supported at some point in the hierarchy). And I meant two primary
>>>>> processes running, like for example two containers running each a DPDK
>>>>> application.
>>>>>
>>>>> Maybe this is not a valid use-case (it is not secure, as it would break
>>>>> isolation between the two containers), but it seems that it is something
>>>>> DPDK allows today, if I'm not mistaken.
>>>>>
>>>> I'm not sure how two primary process could run, as because latter primary process
>>>> would try accessing /var/run/.rte_config and would fail at this [1] point.
>>>>
>>>> It's not valid use-case for dpdk (imo).
>>>> [1] http://dpdk.org/browse/dpdk/tree/lib/librte_eal/linuxapp/eal/eal.c#n204
>>>
>>> Yes this is possible. I had never used it before, but Thomas told me it
>>> is supported by setting--file-prefix option. I had a trial, and I
>>> confirm it works:
>>> session 1> ./install/bin/testpmd -l 0,2 --socket-mem=1024 -w 0000:05:00.0 --proc-type=primary --file-prefix=app1 -- --disable-hw-vlan -i --rxq=1 --txq=1         --nb-cores=1 --forward-mode=io
>>> session 2> ./install/bin/testpmd -l 0,3 --socket-mem=1024 -w 0000:05:00.1 --proc-type=primary --file-prefix=app2 -- --disable-hw-vlan -i --rxq=1 --txq=1         --nb-cores=1 --forward-mode=io
>>>
>>> In the above example, two ports of the same card is used by two
>>> processes. Note that in this case, ACS is supproted and both ports have
>>> their own iommu group.
>>
>> # ls -al /var/run/.app*
>> -rw-r-----. 1 root root 208420 Jul  6 09:08 /var/run/.app1_config
>> -rw-r--r--. 1 root root  49728 Jul  6 09:08 /var/run/.app1_hugepage_info
>> srwxr-xr-x. 1 root root      0 Jul  6 09:08 /var/run/.app1_mp_socket
>> -rw-r-----. 1 root root 208420 Jul  6 09:08 /var/run/.app2_config
>> -rw-r--r--. 1 root root  45584 Jul  6 09:08 /var/run/.app2_hugepage_info
>> srwxr-xr-x. 1 root root      0 Jul  6 09:08 /var/run/.app2_mp_socket
>>
> Yes, You're right, you can start two primary process, I missed that point.
> Use-case which you mentioned is ok, because they are under two different iommu
> group so proposed scheme will work. It may not work for the case when ACS not present,
> so its bypass mode which falls under vfio-noiommu category.
> 
> Having said that: Per discussion on [1]. The proposed scheme where
> bus makes decision based on pci_id and/or pci_drv will be a full proof
> solution, and that way other types of devices will not be impacted. Right?


Right!

Thanks,
Maxime
> [1] https://www.mail-archive.com/dev@dpdk.org/msg70283.html
> 
> 


More information about the dev mailing list