[RFC PATCH 1/4] fib: add multi-VRF support
Medvedkin, Vladimir
vladimir.medvedkin at intel.com
Fri Mar 27 19:32:41 CET 2026
On 3/26/2026 10:13 AM, Konstantin Ananyev wrote:
>
>>>>>> Add VRF (Virtual Routing and Forwarding) support to the IPv4
>>>>>> FIB library, allowing multiple independent routing tables
>>>>>> within a single FIB instance.
>>>>>>
>>>>>> Introduce max_vrfs and vrf_default_nh fields in rte_fib_conf
>>>>>> to configure the number of VRFs and per-VRF default nexthops.
>>>>> Thanks Vladimir, allowing multiple VRFs per same LPM table will
>>>>> definitely be a useful thing to have.
>>>>> Though, I have the same concern as Maxime:
>>>>> memory requirements are just overwhelming.
>>>>> Stupid q - why just not to store a pointer to a vector of next-hops
>>>>> within the table entry?
>>>> Am I understand correctly, a vector with max_number_of_vrfs entries and
>>>> use vrf id to address a nexthop?
>>> Yes.
>> Here I can see 2 problems:
>>
>> 1. tbl entries must be the size of a pointer, so no way to use smaller sizes
> Yes, but as we are talking about storing nexthops for multiple VRFs anyway,
> I don't think it is a big deal.
>
>> 2. those vectors will be sparsely populated and, depending on the
>> runtime configuration, may consume a lot of memory too (as Robin
>> mentioned they may have 1024 VRFs)
> Yeas, each VRF vector can become really sparse and we waste a lot of memory.
> If that's an issue, we probably can think about something smarter
> then simple flat array indexed by vrf-id: something like 2-level B-tree or so.
> The main positives that I see in that approach:
> - low extra overhead at lookup - one/two extra pointer de-refernces.
I'm afraidtheoverheadwillbe
comparativelylargejustbecausethecurrentimplementationis fastandmost
likely hit with a single memory access. However, for a low number of
VRFs, B-tree may be a good solution
> - it allows CP to allocate/free space for each such vecto separately,
> so we don't need to pre-allocate memory for max possible entries at startup.
>
>>>> Yes, this may work.
>>>> But, if we are going to do an extra memory access, I'd better to
>>>> maintain an internal hash table with 5 byte keys {24_bits_from_LPM,
>>>> 16_bits_vrf_id} to retrieve a nexthop.
>>> Hmm... and what to do with entries in tbl8, I mean what will be the key for
>> them?
>>> Or you don't plan to put entries from tbl8 to that hash table?
>> The idea is to have a single LPM struct with a join superset of all
>> prefixes existing in all VRFs. Each prefix in this LPM struct has its
>> own unique "nexthop", which is not the final next hop, but an
>> intermediate metadata defining this unique prefix. Then, the following
>> search is performed with the key containing this intermediate metadata +
>> vrf_id in some exact match database like hash table. This approach is
>> the most memory friendly, since there is only one LPM data struct (which
>> scales well with number of prefixes it has) with intermediate entries
>> only 4b long.
>> On the other hand it requires an extra search, so lookup will be slower.
>> Also, some current LPM optimizations, like tbl8 collapsing if all tbl8
>> entries have a similar value, will be gone.
> Yes, and yes :)
> Yes it would help to save memory, and yes lookup will most likely be slower.
> The other thing that I consider as a possible drawback here - with current rte_hash
> implementation we still need to allocate space for all possible max entries at startup.
I don't think this is a big problem, since the size of this memory will
be reasonable and will not grow linearly with the number of VRFs. So I
agree it is an acceptable trade-off
> But that's not new in DPDK, and for most cases it is considered as acceptable trade-off.
> Overall, it seems like a possible approach to me, I suppose the main question is:
> what will be the price of that extra hash-lookup here.
And this is the key problem. I don't think rte_hash is well suitable
here, at best we need some kind of a perfect hash. I have a few ideas on
this, stay tuned :)
>
> Again there is a bulk version of hash lookup and in theory it might be it can be
> improved further (avx512 version on x86?).
>
>>>>> And we can provide to the user with ability to specify custom
>>>>> alloc/free function for these vectors.
>>>>> That would help to avoid allocating huge chunks of memory at startup.
>>>>> I understand that it will be one extra memory dereference,
>>>>> but probably it will be not that critical in terms of performance .
>>>>> Again for bulk function we might be able to pipeline lookups and
>>>>> de-references and hide that extra load latency.
>>>>>
>>>>>> Add four new experimental APIs:
>>>>>> - rte_fib_vrf_add() and rte_fib_vrf_delete() to manage routes
>>>>>> per VRF
>>>>>> - rte_fib_vrf_lookup_bulk() for multi-VRF bulk lookups
>>>>>> - rte_fib_vrf_get_rib() to retrieve a per-VRF RIB handle
>>>>>>
>>>>>> Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin at intel.com>
>>>>>> ---
>>>>>> lib/fib/dir24_8.c | 241 ++++++++++++++++------
>>>>>> lib/fib/dir24_8.h | 255 ++++++++++++++++--------
>>>>>> lib/fib/dir24_8_avx512.c | 420 +++++++++++++++++++++++++++++++--------
>>>>>> lib/fib/dir24_8_avx512.h | 80 +++++++-
>>>>>> lib/fib/rte_fib.c | 158 ++++++++++++---
>>>>>> lib/fib/rte_fib.h | 94 ++++++++-
>>>>>> 6 files changed, 988 insertions(+), 260 deletions(-)
>>>>>>
>>>> <snip>
>>>>
>>>> --
>>>> Regards,
>>>> Vladimir
>>>>
>> --
>> Regards,
>> Vladimir
>>
--
Regards,
Vladimir
More information about the dev
mailing list