[dpdk-dev] [PATCH 3/5] common/dpaax: add library for PA VA translation table
Burakov, Anatoly
anatoly.burakov at intel.com
Thu Oct 11 12:13:17 CEST 2018
On 11-Oct-18 11:07 AM, Shreyansh Jain wrote:
> On Thursday 11 October 2018 03:32 PM, Shreyansh Jain wrote:
>> On Thursday 11 October 2018 02:33 PM, Burakov, Anatoly wrote:
>>> On 09-Oct-18 11:45 AM, Shreyansh Jain wrote:
>>>> On Tuesday 25 September 2018 07:09 PM, Shreyansh Jain wrote:
>>>>> Hello Anatoly,
>>>>>
>>>>> On Tuesday 25 September 2018 06:58 PM, Burakov, Anatoly wrote:
>>>>>> On 25-Sep-18 1:54 PM, Shreyansh Jain wrote:
>>>>>>> A common library, valid for dpaaX drivers, which is used to maintain
>>>>>>> a local copy of PA->VA translations.
>>>>>>>
>>>>>>> In case of physical addressing mode (one of the option for FSLMC,
>>>>>>> and
>>>>>>> only option for DPAA bus), the addresses of descriptors Rx'd are
>>>>>>> physical. These need to be converted into equivalent VA for rte_mbuf
>>>>>>> and other similar calls.
>>>>>>>
>>>>>>> Using the rte_mem_virt2iova or rte_mem_virt2phy is expensive. This
>>>>>>> library is an attempt to reduce the overall cost associated with
>>>>>>> this translation.
>>>>>>>
>>>>>>> A small table is maintained, containing continuous entries
>>>>>>> representing a continguous physical range. Each of these entries
>>>>>>> stores the equivalent VA, which is fed during mempool creation, or
>>>>>>> memory allocation/deallocation callbacks.
>>>>>>>
>>>>
>>>> [...]
>>>>
>>>>>
>>>>>>
>>>>>> Also, a couple of nitpicks below.
>>>>>>
>>>>>>> cosnfig/common_base | 5 +
>>>>>>> config/common_linuxapp | 5 +
>>>>>>> drivers/common/Makefile | 4 +
>>>>>>> drivers/common/dpaax/Makefile | 31 ++
>>>>>>> drivers/common/dpaax/dpaax_iova_table.c | 509
>>>>>>> ++++++++++++++++++
>>>>>>> drivers/common/dpaax/dpaax_iova_table.h | 104 ++++
>>>>>>> drivers/common/dpaax/dpaax_logs.h | 39 ++
>>>>>>> drivers/common/dpaax/meson.build | 12 +
>>>>>>
>>>>>> <snip>
>>>>>>
>>>>>>> + DPAAX_DEBUG("Add: Found slot at (%"PRIu64")[(%zu)] for
>>>>>>> vaddr:(%p),"
>>>>>>> + " phy(%"PRIu64"), len(%zu)", entry[i].start, e_offset,
>>>>>>> + vaddr, paddr, length);
>>>>>>> + return 0;
>>>>>>> +}
>>>>>>> +
>>>>>>> +int
>>>>>>> +dpaax_iova_table_del(phys_addr_t paddr, size_t len __rte_unused)
>>>>>>
>>>>>> len is not unused.
>>>>>
>>>>> I will fix this.
>>>>> Actually, this function itself is useless - more for symmetry reason.
>>>>> Callers would be either simply updating the table, or ignoring it
>>>>> completely. But, yes, this is indeed wrong that I set that unused.
>>>>>
>>>>
>>>> Actually, I was wrong in my first reply. In case of
>>>> dpaax_iova_table_del(), len is indeed redundant. This is because the
>>>> mapping is for a complete page (min of 2MB size), even if the
>>>> request is for lesser length. So, removal of a single entry (of
>>>> fixed size) would be done.
>>>>
>>>> In fact, while on this, I think deleting a PA->VA entry itself is
>>>> incorrect (not just useless). A single entry (~2MB equivalent) can
>>>> represent multiple users (working on a rte_malloc'd area, for
>>>> example). So, effectively, its always an update - not an add or del.
>>>
>>> I'm not sure what you mean here. If you got a mem event about memory
>>> area being freed, it's guaranteed to *not* have any users - neither
>>> malloc, nor any other memory. And len is always page-aligned.
>>
>> ok. Maybe I am getting this wrong, but consider this:
>>
>> 1) hugepage size=2MB
>> 2) a = malloc(1M)
>> this will pin an entry in table for a block starting at VA=(a) and
>> PA=(a'). Each entry is of 2MB length - that means, even if someone
>> were to access a+1048577 for an equivalent PA, they would get it
>> (though, that is a incorrect access).
>> 3) b = malloc(1M)
>> this *might* lead to a case where same 2MB page is used and
>> VA=(b==(a+1MB)). Being hugepage backed, PA=(b=PA(a)+1M).
>> = After b, the PA-VA table has a single entry of 2MB, representing two
>> mallocs. It can be used for translation for any thread requesting PAs
>> of a or b.
>> 4) Free(a)
>> - this would attempt to remove one 2MB entry from PA-VA table. But,
>> 'b' is already valid. Access to get_pa(VA(b)) should return me the PA(b).
>> - 'len' is not even used as the entry in PA-VA table is of a fixed
>> size.
>
> Just to add to this:
> - if talking about the mem_event callback, it definitely won't be a case
> where same page is still being served under another rte_malloc
> - But, calls can come to delete from users of PA-VA table based on their
> own rte_free().
>
> And, your comment makes me think - I should probably del entry from the
> table only when mem_event callback is received.
Mem events are not triggered on rte_free(), they're triggered on page
deallocation. A call to rte_free/rte_memzone_free/rte_mempool_free etc.
*might* trigger a page deallocation, but *only* if the memory area being
freed encompasses an entire page. If you rte_malloc() 64 bytes and then
rte_free() those 64 bytes, you won't get a mem event *unless* these were
the only 64 bytes allocated on a particular page, and the entire page is
no longer used by anything else.
>
>>
>> In the above, (3) is an assumption I am making based on my
>> understanding how mem allocator is working. Is that wrong?
>>
>> Basically, this is a restriction of this table - it has a min chunk of
>> 2MB - even for 1G hugepages - and hence, it is not possible to honor
>> deletes. I know this is convoluted logic - but, this keeps it simple
>> and use-able without much performance impact.
>>
>> [...]
>>
>
>
--
Thanks,
Anatoly
More information about the dev
mailing list