[dpdk-dev] [PATCH] vhost: fix async DMA map
    Burakov, Anatoly 
    anatoly.burakov at intel.com
       
    Tue Oct 26 12:58:57 CEST 2021
    
    
  
On 26-Oct-21 11:27 AM, Ding, Xuan wrote:
> Hi,
> 
>> -----Original Message-----
>> From: Maxime Coquelin <maxime.coquelin at redhat.com>
>> Sent: Tuesday, October 26, 2021 5:49 PM
>> To: Ding, Xuan <xuan.ding at intel.com>; dev at dpdk.org;
>> david.marchand at redhat.com; Xia, Chenbo <chenbo.xia at intel.com>
>> Cc: Burakov, Anatoly <anatoly.burakov at intel.com>
>> Subject: Re: [PATCH] vhost: fix async DMA map
>>
>>
>>
>> On 10/26/21 10:49, Ding, Xuan wrote:
>>> Hi Maxime,
>>>
>>>> -----Original Message-----
>>>> From: Maxime Coquelin <maxime.coquelin at redhat.com>
>>>> Sent: Tuesday, October 26, 2021 2:53 PM
>>>> To: Ding, Xuan <xuan.ding at intel.com>; dev at dpdk.org;
>>>> david.marchand at redhat.com; Xia, Chenbo <chenbo.xia at intel.com>
>>>> Cc: Burakov, Anatoly <anatoly.burakov at intel.com>
>>>> Subject: Re: [PATCH] vhost: fix async DMA map
>>>>
>>>>
>>>>
>>>> On 10/26/21 04:07, Ding, Xuan wrote:
>>>>> Hi Maxime,
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Maxime Coquelin <maxime.coquelin at redhat.com>
>>>>>> Sent: Tuesday, October 26, 2021 4:47 AM
>>>>>> To: dev at dpdk.org; david.marchand at redhat.com; Xia, Chenbo
>>>>>> <chenbo.xia at intel.com>; Ding, Xuan <xuan.ding at intel.com>
>>>>>> Subject: Re: [PATCH] vhost: fix async DMA map
>>>>>>
>>>>>> Hi Xuan,
>>>>>>
>>>>>> On 10/25/21 22:33, Maxime Coquelin wrote:
>>>>>>> This patch fixes possible NULL-pointer dereferencing
>>>>>>> reported by Coverity and also fixes NUMA reallocation
>>>>>>> of the async DMA map.
>>>>>>>
>>>>>>> Fixes: 7c61fa08b716 ("vhost: enable IOMMU for async vhost")
>>>>>>>
>>>>>>> Coverity issue: 373655
>>>>>>>
>>>>>>> Signed-off-by: Maxime Coquelin <maxime.coquelin at redhat.com>
>>>>>>> ---
>>>>>>>      lib/vhost/vhost_user.c | 45 +++++++++++++++++++-----------------------
>>>>>>>      1 file changed, 20 insertions(+), 25 deletions(-)
>>>>>>>
>>>>>>
>>>>>> I posted this patch to fix the issue reported by Coverity and also other
>>>>>> issue on NUMA realloc that I found at the same time. But I wonder
>>>>>> whether all this async_map_status is needed.
>>>>>
>>>>> Thanks for your fix! I can help to review and test the patch later.
>>>>>
>>>>> I add the async_map_status in v2 for compatibility. Some DMA device,
>>>>> like DSA, may use kernel idxd driver only. If there is no device bound to
>>>>> DPDK vfio and kernel vfio module is modprobed to ensure
>>>> rte_vfio_is_enabled() is true,
>>>>> we will unavoidably do DMA map/unmap and it will fail.
>>>>>
>>>>> Therefore, the dma_map_status here is used to filter this situation by
>>>> preventing
>>>>> unnecessary DMA unmap.
>>>>
>>>> Ok, then I think we can just remove the async DMA map.
>>>>
>>>>>>
>>>>>> Indeed, if the only place where we DMA map is in
>>>>>> vhost_user_mmap_region(). If it fails, the error is propagated, the mem
>>>>>> table are freed and NACK is replied to the master. IOW, the device will
>>>>>> be in an unusable state.
>>>>>
>>>>> I agree with you, this is the place I consider right to do DMA map
>>>>> because we also do SW mapping here, any suggestions?
>>>>
>>>> No suggestion, I was just explaining that at the only place where
>>>> DMA map were done, mapping errors were properly handled and
>> propagated.
>>>
>>> What about just setting async_copy to false, and allow switching to sync
>> path.
>>>
>>>>
>>>>>>
>>>>>> Removing the async DMA map will simplify a lot the code, do you agree
>> to
>>>>>> remove it or there is something I missed?
>>>>>
>>>>> See above. Indeed, it adds a lot of code. But we can't know the driver for
>>>>> each device in vhost lib, or we can only restrict the user to bind some
>>>> devices
>>>>> to DPDK vfio if async logic needed.
>>>>
>>>> I would think we don't care if DMA unmap fails, we can just do the same
>>>> as what you do for DMA map, i.e. just ignore the error.
>>>
>>> Get your idea, we can do the same as DMA map, and in this way
>> dma_map_status flag can be removed.
>>>
>>>>
>>>> Thanks to this discussion, I have now more concerns on how it works. I
>>>> think we have a problem here in case of DMA device hotplug, that device
>>>> could miss the necessary map entries from Vhost if no VFIO devices were
>>>> attached at VHST_USER_SET_MEM_TABLE time. How would you handle
>> that
>>>> case?
>>>
>>> DMA device is uncore, so I don't see the  hotplug issue here.
>>
>> I'm not sure what 'uncore' is, I suppose you mean your device cannot be
>> physically added/removed to the system.
> 
> Yes, we are at the same understanding.
> 
>>
>> I was not clear enough in my question. I meant that for example, the
>> application is started and the Vhost port is created. Then, the DMA
>> device is bound to VFIO, and probed by the DPDK application. Finally,
>> the application register the DMA device as an async channel in Vhost.
>>
>> I think it will not work as the SET_MEM_TABLE will already have
>> happened, so the guest memory won't be mapped in the VFIO container.
> 
> Assuming the DMA device supports hotplug, this situation was not actually took into consideration, and maybe the handling should be added in app, with an event callback.
> 
>>
>> Do you have the same understanding?
>>
>>> I will have another patch containing compatibility with sync path, and
>> async_map_status flag will be removed.
>>> Hope to get your insights.
>>
>> What do you mean by compatibility with sync path?
> 
> I mean whatever DMA map succeeds or fails, we return 0 so as not to prevent
> SET_EM_TABLE.
> 
>>
>> Thanks for taking care of the async map status removal.
> 
> Found a tricky point for removing dma_map_status flag, currently DMA unmap API will directly access the vfio_config,
> And if without dma_map_status, we cannot get rte_errno at this moment.
This is a bug in VFIO API. We always assume VFIO type is valid, but it 
is only valid whenever 1) VFIO support is detected at startup, and 2) we 
assign at least one device to VFIO. If VFIO was detected but no devices 
were assigned, the VFIO type will not be assigned, and the pointer will 
be NULL. In this case, we should not attempt any unmaps, because there 
are no active devices.
I'll submit a patch to fix this, thanks for catching this issue!
> 
> BTW, I will consider this Coverity fix in new patch. :)
> 
> Thanks,
> Xuan
> 
>>
>> Maxime
>>> Thanks,
>>> Xuan
>>>
>>>>
>>>> Regards,
>>>> Maxime
>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> Maxime
>>>>>
>>>
> 
-- 
Thanks,
Anatoly
    
    
More information about the dev
mailing list