[dpdk-dev] [PATCH 3/5] common/dpaax: add library for PA VA translation table

Shreyansh Jain shreyansh.jain at nxp.com
Thu Oct 11 12:39:34 CEST 2018


On Thursday 11 October 2018 03:43 PM, Burakov, Anatoly wrote:
> On 11-Oct-18 11:07 AM, Shreyansh Jain wrote:
>> On Thursday 11 October 2018 03:32 PM, Shreyansh Jain wrote:
>>> On Thursday 11 October 2018 02:33 PM, Burakov, Anatoly wrote:
>>>> On 09-Oct-18 11:45 AM, Shreyansh Jain wrote:
>>>>> On Tuesday 25 September 2018 07:09 PM, Shreyansh Jain wrote:
>>>>>> Hello Anatoly,
>>>>>>
>>>>>> On Tuesday 25 September 2018 06:58 PM, Burakov, Anatoly wrote:
>>>>>>> On 25-Sep-18 1:54 PM, Shreyansh Jain wrote:
>>>>>>>> A common library, valid for dpaaX drivers, which is used to 
>>>>>>>> maintain
>>>>>>>> a local copy of PA->VA translations.
>>>>>>>>
>>>>>>>> In case of physical addressing mode (one of the option for 
>>>>>>>> FSLMC, and
>>>>>>>> only option for DPAA bus), the addresses of descriptors Rx'd are
>>>>>>>> physical. These need to be converted into equivalent VA for 
>>>>>>>> rte_mbuf
>>>>>>>> and other similar calls.
>>>>>>>>
>>>>>>>> Using the rte_mem_virt2iova or rte_mem_virt2phy is expensive. This
>>>>>>>> library is an attempt to reduce the overall cost associated with
>>>>>>>> this translation.
>>>>>>>>
>>>>>>>> A small table is maintained, containing continuous entries
>>>>>>>> representing a continguous physical range. Each of these entries
>>>>>>>> stores the equivalent VA, which is fed during mempool creation, or
>>>>>>>> memory allocation/deallocation callbacks.
>>>>>>>>
>>>>>
>>>>> [...]
>>>>>
>>>>>>
>>>>>>>
>>>>>>> Also, a couple of nitpicks below.
>>>>>>>
>>>>>>>>   cosnfig/common_base                            |   5 +
>>>>>>>>   config/common_linuxapp                        |   5 +
>>>>>>>>   drivers/common/Makefile                       |   4 +
>>>>>>>>   drivers/common/dpaax/Makefile                 |  31 ++
>>>>>>>>   drivers/common/dpaax/dpaax_iova_table.c       | 509 
>>>>>>>> ++++++++++++++++++
>>>>>>>>   drivers/common/dpaax/dpaax_iova_table.h       | 104 ++++
>>>>>>>>   drivers/common/dpaax/dpaax_logs.h             |  39 ++
>>>>>>>>   drivers/common/dpaax/meson.build              |  12 +
>>>>>>>
>>>>>>> <snip>
>>>>>>>
>>>>>>>> +    DPAAX_DEBUG("Add: Found slot at (%"PRIu64")[(%zu)] for 
>>>>>>>> vaddr:(%p),"
>>>>>>>> +            " phy(%"PRIu64"), len(%zu)", entry[i].start, e_offset,
>>>>>>>> +            vaddr, paddr, length);
>>>>>>>> +    return 0;
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +int
>>>>>>>> +dpaax_iova_table_del(phys_addr_t paddr, size_t len __rte_unused)
>>>>>>>
>>>>>>> len is not unused.
>>>>>>
>>>>>> I will fix this.
>>>>>> Actually, this function itself is useless - more for symmetry reason.
>>>>>> Callers would be either simply updating the table, or ignoring it 
>>>>>> completely. But, yes, this is indeed wrong that I set that unused.
>>>>>>
>>>>>
>>>>> Actually, I was wrong in my first reply. In case of 
>>>>> dpaax_iova_table_del(), len is indeed redundant. This is because 
>>>>> the mapping is for a complete page (min of 2MB size), even if the 
>>>>> request is for lesser length. So, removal of a single entry (of 
>>>>> fixed size) would be done.
>>>>>
>>>>> In fact, while on this, I think deleting a PA->VA entry itself is 
>>>>> incorrect (not just useless). A single entry (~2MB equivalent) can 
>>>>> represent multiple users (working on a rte_malloc'd area, for 
>>>>> example). So, effectively, its always an update - not an add or del.
>>>>
>>>> I'm not sure what you mean here. If you got a mem event about memory 
>>>> area being freed, it's guaranteed to *not* have any users - neither 
>>>> malloc, nor any other memory. And len is always page-aligned.
>>>
>>> ok. Maybe I am getting this wrong, but consider this:
>>>
>>> 1) hugepage size=2MB
>>> 2) a = malloc(1M)
>>>    this will pin an entry in table for a block starting at VA=(a) and 
>>> PA=(a'). Each entry is of 2MB length - that means, even if someone 
>>> were to access a+1048577 for an equivalent PA, they would get it 
>>> (though, that is a incorrect access).
>>> 3) b = malloc(1M)
>>>    this *might* lead to a case where same 2MB page is used and 
>>> VA=(b==(a+1MB)). Being hugepage backed, PA=(b=PA(a)+1M).
>>> = After b, the PA-VA table has a single entry of 2MB, representing 
>>> two mallocs. It can be used for translation for any thread requesting 
>>> PAs of a or b.
>>> 4) Free(a)
>>>   - this would attempt to remove one 2MB entry from PA-VA table. But, 
>>> 'b' is already valid. Access to get_pa(VA(b)) should return me the 
>>> PA(b).
>>>   - 'len' is not even used as the entry in PA-VA table is of a fixed 
>>> size.
>>
>> Just to add to this:
>> - if talking about the mem_event callback, it definitely won't be a 
>> case where same page is still being served under another rte_malloc
>> - But, calls can come to delete from users of PA-VA table based on 
>> their own rte_free().
>>
>> And, your comment makes me think - I should probably del entry from 
>> the table only when mem_event callback is received.
> 
> Mem events are not triggered on rte_free(), they're triggered on page 
> deallocation. A call to rte_free/rte_memzone_free/rte_mempool_free etc. 
> *might* trigger a page deallocation, but *only* if the memory area being 
> freed encompasses an entire page. If you rte_malloc() 64 bytes and then 
> rte_free() those 64 bytes, you won't get a mem event *unless* these were 
> the only 64 bytes allocated on a particular page, and the entire page is 
> no longer used by anything else.

My understanding is same.
But, it seems my explanation wasn't well written:

For a rte_free(), I am not expecting that mem_event is raised - but, the 
caller of rte_free() (the eth or crypto drivers, or applications) may 
call the PA-VA table del function to remove the entries.

This voluntary delete of table entry from the drivers or applications 
using PA-VA calling del of PA-VA table - is not correct.

The path from mem_event callback clearing the PA-VA table entry is 
correct (which I removed in v2) - that time the page (len) would 
definitely not be used by anyone and can be removed from PA-VA table. 
And, yes, I agree that mem-event may not be on an rte_free().

> 
>>
>>>
>>> In the above, (3) is an assumption I am making based on my 
>>> understanding how mem allocator is working. Is that wrong?
>>>
>>> Basically, this is a restriction of this table - it has a min chunk 
>>> of 2MB - even for 1G hugepages - and hence, it is not possible to 
>>> honor deletes. I know this is convoluted logic - but, this keeps it 
>>> simple and use-able without much performance impact.
>>>
>>> [...]
>>>
>>
>>
> 
> 



More information about the dev mailing list