[dpdk-dev] [PATCH] vfio: fix expanding DMA area in ppc64le

Burakov, Anatoly anatoly.burakov at intel.com
Fri Jun 28 16:04:50 CEST 2019

On 28-Jun-19 2:47 PM, Burakov, Anatoly wrote:
> On 28-Jun-19 12:38 PM, Takeshi T Yoshimura wrote:
>>> To: "Mo, YufengX" <yufengx.mo at intel.com>, dev at dpdk.org
>>> From: "Burakov, Anatoly" <anatoly.burakov at intel.com>
>>> Date: 06/26/2019 06:43PM
>>> Cc: drc at ibm.com, pradeep at us.ibm.com, Takeshi Yoshimura
>>> <tyos at jp.ibm.com>
>>> Subject: [EXTERNAL] Re: [dpdk-dev] [PATCH] vfio: fix expanding DMA
>>> area in ppc64le
>>> On 18-Jun-19 3:37 AM, Mo, YufengX wrote:
>>>> From: Takeshi Yoshimura <tyos at jp.ibm.com>
>>>> In ppc64le, expanding DMA areas always fail because we cannot
>>> remove
>>>> a DMA window. As a result, we cannot allocate more than one memseg
>>> in
>>>> ppc64le. This is because vfio_spapr_dma_mem_map() doesn't unmap all
>>>> the mapped DMA before removing the window. This patch fixes this
>>>> incorrect behavior.
>>>> I added a global variable to track current window size since we do
>>>> not have better ways to get exact size of it than doing so. sPAPR
>>>> IOMMU seems not to provide any ways to get window size with ioctl
>>>> interfaces. rte_memseg_walk*() is currently used to calculate
>>> window
>>>> size, but it walks memsegs that are marked as used, not mapped. So,
>>>> we need to determine if a given memseg is mapped or not, otherwise
>>>> the ioctl reports errors due to attempting to unregister memory
>>>> addresses that are not registered. The global variable is excluded
>>>> in non-ppc64le binaries.
>>>> Similar problems happen in user maps. We need to avoid attempting
>>> to
>>>> unmap the address that is given as the function's parameter. The
>>>> compaction of user maps prevents us from passing correct length for
>>>> unmapping DMA at the window recreation. So, I removed it in
>>> ppc64le.
>>>> I also fixed the order of ioctl for unregister and unmap. The ioctl
>>>> for unregister sometimes report device busy errors due to the
>>>> existence of mapped area.
>>>> Signed-off-by: Takeshi Yoshimura <tyos at jp.ibm.com>
>>>> ---
>>> OK there are three patches, and two v1's with two different authors
>>> in
>>> reply to the same original patch. There's too much going on here, i
>>> can't review this. Needs splitting.
>>> Also, #ifdef-ing out the map merging seems highly suspect.
>>> With regards to "walking used memsegs, not mapped", unless i'm
>>> misunderstanding something, these are the same - whenever a segment
>>> is
>>> mapped, it is marked as used, and whenever it is unmapped, it is
>>> marked
>>> as free. Could you please explain what is the difference and why is
>>> this
>>> needed?
>>> Is the point of contention here being the fact that whenever the
>>> unmap
>>> callback arrives, the segments still appear used when iterating over
>>> the
>>> map? If that's the case, then i think it would be OK to mark them as
>>> unused *before* triggering callbacks, and chances are some of this
>>> code
>>> wouldn't be needed. That would require a deprecation notice though,
>>> because the API behavior will change (even if this fact is not
>>> documented properly).
>>> -- 
>>> Thanks,
>>> Anatoly
>> I am the author of this patch. We should ignore a patch from YufengX Mo.
>>> From my code reading, a memsg is at first marked as used when it is 
>>> allocated. Then, the memseg is passed to vfio_spapr_dma_mem_map(). 
>>> The callback iterates all the used (i.e., allocated) memsegs and call 
>>> ioctl for mapping VA to IOVA. So, when vfio_spapr_dma_mem_map() is 
>>> called, passed memsegs can be non-mapped but marked as used. As a 
>>> result, an attempt to unmap non-mapped area happens during DMA window 
>>> expansion. This is the difference and why this fix was needed.
>>> i think it would be OK to mark them as unused *before* triggering 
>>> callbacks
>> Yes, my first idea was the same as yours, but I was also worried that 
>> it might cause inconsistent API behavior as you also pointed out. If 
>> you think so, I think I can rewrite the patch without ugly #ifdef.
>> Unfortunately, I don't have enough time for writing code next week and 
>> next next week. So, I will resubmit the revised patch weeks later.
> I think the approach with fixing the mem callbacks to report the 
> unmapped segments as no longer used would be better.
> As far as i can remember at the point where callbacks are triggered, the 
> memory is already removed from malloc heap and from all processes. Each 
> secondary also stores their own shadow copy of the memory map, so 
> removing the "used" flags from the main map will not have any 
> consequences as far as correctness is concerned. Each callback is also 
> getting the memory area being removed as parameters, so if there is code 
> that needs to be run taking into account that memory area, it can be done.
> Existing code may rely on this behavior (even though it doesn't make 
> much sense now that i think of it), so going with this approach *will* 
> require a deprecation notice and can only be done in the next release.

One *other* way to fix this would be to stored the pages being removed 
in a struct, and pass it as parameters to the _walk() window size 
calculation function. This would avoid needless API changes and handle 
this case correctly.

>> Regards,
>> Takeshi


More information about the dev mailing list