[dpdk-dev] [PATCH 3/5] common/dpaax: add library for PA VA translation table

Burakov, Anatoly anatoly.burakov at intel.com
Thu Oct 11 12:09:09 CEST 2018


On 11-Oct-18 11:02 AM, Shreyansh Jain wrote:
> On Thursday 11 October 2018 02:33 PM, Burakov, Anatoly wrote:
>> On 09-Oct-18 11:45 AM, Shreyansh Jain wrote:
>>> On Tuesday 25 September 2018 07:09 PM, Shreyansh Jain wrote:
>>>> Hello Anatoly,
>>>>
>>>> On Tuesday 25 September 2018 06:58 PM, Burakov, Anatoly wrote:
>>>>> On 25-Sep-18 1:54 PM, Shreyansh Jain wrote:
>>>>>> A common library, valid for dpaaX drivers, which is used to maintain
>>>>>> a local copy of PA->VA translations.
>>>>>>
>>>>>> In case of physical addressing mode (one of the option for FSLMC, and
>>>>>> only option for DPAA bus), the addresses of descriptors Rx'd are
>>>>>> physical. These need to be converted into equivalent VA for rte_mbuf
>>>>>> and other similar calls.
>>>>>>
>>>>>> Using the rte_mem_virt2iova or rte_mem_virt2phy is expensive. This
>>>>>> library is an attempt to reduce the overall cost associated with
>>>>>> this translation.
>>>>>>
>>>>>> A small table is maintained, containing continuous entries
>>>>>> representing a continguous physical range. Each of these entries
>>>>>> stores the equivalent VA, which is fed during mempool creation, or
>>>>>> memory allocation/deallocation callbacks.
>>>>>>
>>>
>>> [...]
>>>
>>>>
>>>>>
>>>>> Also, a couple of nitpicks below.
>>>>>
>>>>>>   cosnfig/common_base                            |   5 +
>>>>>>   config/common_linuxapp                        |   5 +
>>>>>>   drivers/common/Makefile                       |   4 +
>>>>>>   drivers/common/dpaax/Makefile                 |  31 ++
>>>>>>   drivers/common/dpaax/dpaax_iova_table.c       | 509 
>>>>>> ++++++++++++++++++
>>>>>>   drivers/common/dpaax/dpaax_iova_table.h       | 104 ++++
>>>>>>   drivers/common/dpaax/dpaax_logs.h             |  39 ++
>>>>>>   drivers/common/dpaax/meson.build              |  12 +
>>>>>
>>>>> <snip>
>>>>>
>>>>>> +    DPAAX_DEBUG("Add: Found slot at (%"PRIu64")[(%zu)] for 
>>>>>> vaddr:(%p),"
>>>>>> +            " phy(%"PRIu64"), len(%zu)", entry[i].start, e_offset,
>>>>>> +            vaddr, paddr, length);
>>>>>> +    return 0;
>>>>>> +}
>>>>>> +
>>>>>> +int
>>>>>> +dpaax_iova_table_del(phys_addr_t paddr, size_t len __rte_unused)
>>>>>
>>>>> len is not unused.
>>>>
>>>> I will fix this.
>>>> Actually, this function itself is useless - more for symmetry reason.
>>>> Callers would be either simply updating the table, or ignoring it 
>>>> completely. But, yes, this is indeed wrong that I set that unused.
>>>>
>>>
>>> Actually, I was wrong in my first reply. In case of 
>>> dpaax_iova_table_del(), len is indeed redundant. This is because the 
>>> mapping is for a complete page (min of 2MB size), even if the request 
>>> is for lesser length. So, removal of a single entry (of fixed size) 
>>> would be done.
>>>
>>> In fact, while on this, I think deleting a PA->VA entry itself is 
>>> incorrect (not just useless). A single entry (~2MB equivalent) can 
>>> represent multiple users (working on a rte_malloc'd area, for 
>>> example). So, effectively, its always an update - not an add or del.
>>
>> I'm not sure what you mean here. If you got a mem event about memory 
>> area being freed, it's guaranteed to *not* have any users - neither 
>> malloc, nor any other memory. And len is always page-aligned.
> 
> ok. Maybe I am getting this wrong, but consider this:
> 
> 1) hugepage size=2MB
> 2) a = malloc(1M)
>    this will pin an entry in table for a block starting at VA=(a) and 
> PA=(a'). Each entry is of 2MB length - that means, even if someone were 
> to access a+1048577 for an equivalent PA, they would get it (though, 
> that is a incorrect access).
> 3) b = malloc(1M)
>    this *might* lead to a case where same 2MB page is used and 
> VA=(b==(a+1MB)). Being hugepage backed, PA=(b=PA(a)+1M).
> = After b, the PA-VA table has a single entry of 2MB, representing two 
> mallocs. It can be used for translation for any thread requesting PAs of 
> a or b.
> 4) Free(a)
>   - this would attempt to remove one 2MB entry from PA-VA table. But, 

No, it wouldn't. Not unless 'b' was also freed. Malloc will not free the 
area unless there are no users of this area (as in, it checks if a free 
malloc element encompasses an entire page). If you do two allocations, 
1MB in size each, the hugepage will contain two buys malloc elements, 
one meg each. You free one, but the other one isn't free, so no action 
is taken.

> 'b' is already valid. Access to get_pa(VA(b)) should return me the PA(b).
>   - 'len' is not even used as the entry in PA-VA table is of a fixed size.
> 
> In the above, (3) is an assumption I am making based on my understanding 
> how mem allocator is working. Is that wrong?
> 
> Basically, this is a restriction of this table - it has a min chunk of 
> 2MB - even for 1G hugepages - and hence, it is not possible to honor 
> deletes. I know this is convoluted logic - but, this keeps it simple and 
> use-able without much performance impact.

The granularity of mem events are page size. You will not ever get a mem 
event on anything other than a full page, and you are guaranteed that it 
is free and not used by anything within DPDK.

> 
> [...]
> 
> 


-- 
Thanks,
Anatoly


More information about the dev mailing list