[dpdk-dev] [PATCH 2/2] lpm: hide internal data
Medvedkin, Vladimir
vladimir.medvedkin at intel.com
Wed Oct 14 15:10:15 CEST 2020
On 13/10/2020 20:48, Michel Machado wrote:
> On 10/13/20 3:06 PM, Medvedkin, Vladimir wrote:
>>
>>
>> On 13/10/2020 18:46, Michel Machado wrote:
>>> On 10/13/20 11:41 AM, Medvedkin, Vladimir wrote:
>>>> Hi Michel,
>>>>
>>>> Could you please describe a condition when LPM gets inconsistent? As
>>>> I can see if there is no free tbl8 it will return -ENOSPC.
>>>
>>> Consider this simple example, we need to add the following two
>>> prefixes with different next hops: 10.99.0.0/16, 18.99.99.128/25. If
>>> the LPM table is out of tbl8s, the second prefix is not added and
>>> Gatekeeper will make decisions in violation of the policy. The data
>>> structure of the LPM table is consistent, but its content
>>> inconsistent with the policy.
>>
>> Aha, thanks. So do I understand correctly that you need to add a set
>> of routes atomically (either the entire set is installed or nothing)?
>
> Yes.
>
>> If so, then I would suggest having 2 lpm and switching them atomically
>> after a successful addition. As for now, even if you have enough
>> tbl8's, routes are installed non atomically, i.e. there will be a time
>> gap between adding two routes, so in this time interval the table will
>> be inconsistent with the policy.
>> Also, if new lpm algorithms are added to the DPDK, they won't have
>> such a thing as tbl8.
>
> Our code already deals with synchronization.
OK, so my suggestion here would be to add new routes to the shadow copy
of the lpm, and if it returns -ENOSPC, than create a new LPM with double
amount of tbl8's and add all the routes to it. Then switch the
active-shadow LPM pointers. In this case you'll always add a bulk of
routes atomically.
>
>>> We minimize the need of replacing a LPM table by allocating LPM
>>> tables with the double of what we need (see example here
>>> https://github.com/AltraMayor/gatekeeper/blob/95d1d6e8201861a0d0c698bfd06ad606674f1e07/lua/examples/policy.lua#L172-L183),
>>> but the code must be ready for unexpected needs that may arise in
>>> production.
>>>
>>
>> Usually, the table is initialized with a large enough number of
>> entries, enough to add a possible number of routes. One tbl8 group
>> takes up 1Kb of memory which is nothing comparing to the size of tbl24
>> which is 64Mb.
>
> When the prefixes come from BGP, initializing a large enough table
> is fine. But when prefixes come from threat intelligence, the number of
> prefixes can vary wildly and the number of prefixes above 24 bits are
> way more common.
>
>> P.S. consider using rte_fib library, it has a number of advantages
>> over LPM. You can replace the loop in __lookup_fib_bulk() with a bulk
>> lookup call and this will probably increase the speed.
>
> I'm not aware of the rte_fib library. The only documentation that I
> found on Google was https://doc.dpdk.org/api/rte__fib_8h.html and it
> just says "FIB (Forwarding information base) implementation for IPv4
> Longest Prefix Match".
That's true, I'm going to add programmer's guide soon.
Although the fib API is very similar to existing LPM.
>
>>>>
>>>> On 13/10/2020 15:58, Michel Machado wrote:
>>>>> Hi Kevin,
>>>>>
>>>>> We do need fields max_rules and number_tbl8s of struct rte_lpm,
>>>>> so the removal would force us to have another patch to our local
>>>>> copy of DPDK. We'd rather avoid this new local patch because we
>>>>> wish to eventually be in sync with the stock DPDK.
>>>>>
>>>>> Those fields are needed in Gatekeeper because we found a
>>>>> condition in an ongoing deployment in which the entries of some LPM
>>>>> tables may suddenly change a lot to reflect policy changes. To
>>>>> avoid getting into a state in which the LPM table is inconsistent
>>>>> because it cannot fit all the new entries, we compute the needed
>>>>> parameters to support the new entries, and compare with the current
>>>>> parameters. If the current table doesn't fit everything, we have to
>>>>> replace it with a new LPM table.
>>>>>
>>>>> If there were a way to obtain the struct rte_lpm_config of a
>>>>> given LPM table, it would cleanly address our need. We have the
>>>>> same need in IPv6 and have a local patch to work around it (see
>>>>> https://github.com/cjdoucette/dpdk/commit/3eaf124a781349b8ec8cd880db26a78115cb8c8f).
>>>>> Thus, an IPv4 and IPv6 solution would be best.
>>>>>
>>>>> PS: I've added Qiaobin Fu, another Gatekeeper maintainer, to
>>>>> this disscussion.
>>>>>
>>>>> [ ]'s
>>>>> Michel Machado
>>>>>
>>>>> On 10/13/20 9:53 AM, Kevin Traynor wrote:
>>>>>> Hi Gatekeeper maintainers (I think),
>>>>>>
>>>>>> fyi - there is a proposal to remove some members of a struct in
>>>>>> DPDK LPM
>>>>>> API that Gatekeeper is using [1]. It would be only from DPDK 20.11
>>>>>> but
>>>>>> as it's an LTS I guess it would probably hit Debian in a few months.
>>>>>>
>>>>>> The full thread is here:
>>>>>> http://inbox.dpdk.org/dev/20200907081518.46350-1-ruifeng.wang@arm.com/
>>>>>>
>>>>>>
>>>>>> Maybe you can take a look and tell us if they are needed in
>>>>>> Gatekeeper
>>>>>> or you can workaround it?
>>>>>>
>>>>>> thanks,
>>>>>> Kevin.
>>>>>>
>>>>>> [1]
>>>>>> https://github.com/AltraMayor/gatekeeper/blob/master/gt/lua_lpm.c#L235-L248
>>>>>>
>>>>>>
>>>>>> On 09/10/2020 07:54, Ruifeng Wang wrote:
>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: Kevin Traynor <ktraynor at redhat.com>
>>>>>>>> Sent: Wednesday, September 30, 2020 4:46 PM
>>>>>>>> To: Ruifeng Wang <Ruifeng.Wang at arm.com>; Medvedkin, Vladimir
>>>>>>>> <vladimir.medvedkin at intel.com>; Bruce Richardson
>>>>>>>> <bruce.richardson at intel.com>
>>>>>>>> Cc: dev at dpdk.org; Honnappa Nagarahalli
>>>>>>>> <Honnappa.Nagarahalli at arm.com>; nd <nd at arm.com>
>>>>>>>> Subject: Re: [dpdk-dev] [PATCH 2/2] lpm: hide internal data
>>>>>>>>
>>>>>>>> On 16/09/2020 04:17, Ruifeng Wang wrote:
>>>>>>>>>
>>>>>>>>>> -----Original Message-----
>>>>>>>>>> From: Medvedkin, Vladimir <vladimir.medvedkin at intel.com>
>>>>>>>>>> Sent: Wednesday, September 16, 2020 12:28 AM
>>>>>>>>>> To: Bruce Richardson <bruce.richardson at intel.com>; Ruifeng Wang
>>>>>>>>>> <Ruifeng.Wang at arm.com>
>>>>>>>>>> Cc: dev at dpdk.org; Honnappa Nagarahalli
>>>>>>>>>> <Honnappa.Nagarahalli at arm.com>; nd <nd at arm.com>
>>>>>>>>>> Subject: Re: [PATCH 2/2] lpm: hide internal data
>>>>>>>>>>
>>>>>>>>>> Hi Ruifeng,
>>>>>>>>>>
>>>>>>>>>> On 15/09/2020 17:02, Bruce Richardson wrote:
>>>>>>>>>>> On Mon, Sep 07, 2020 at 04:15:17PM +0800, Ruifeng Wang wrote:
>>>>>>>>>>>> Fields except tbl24 and tbl8 in rte_lpm structure have no
>>>>>>>>>>>> need to
>>>>>>>>>>>> be exposed to the user.
>>>>>>>>>>>> Hide the unneeded exposure of structure fields for better ABI
>>>>>>>>>>>> maintainability.
>>>>>>>>>>>>
>>>>>>>>>>>> Suggested-by: David Marchand <david.marchand at redhat.com>
>>>>>>>>>>>> Signed-off-by: Ruifeng Wang <ruifeng.wang at arm.com>
>>>>>>>>>>>> Reviewed-by: Phil Yang <phil.yang at arm.com>
>>>>>>>>>>>> ---
>>>>>>>>>>>> lib/librte_lpm/rte_lpm.c | 152
>>>>>>>>>>>> +++++++++++++++++++++++---------------
>>>>>>>>>> -
>>>>>>>>>>>> lib/librte_lpm/rte_lpm.h | 7 --
>>>>>>>>>>>> 2 files changed, 91 insertions(+), 68 deletions(-)
>>>>>>>>>>>>
>>>>>>>>>>> <snip>
>>>>>>>>>>>> diff --git a/lib/librte_lpm/rte_lpm.h
>>>>>>>>>>>> b/lib/librte_lpm/rte_lpm.h
>>>>>>>>>>>> index 03da2d37e..112d96f37 100644
>>>>>>>>>>>> --- a/lib/librte_lpm/rte_lpm.h
>>>>>>>>>>>> +++ b/lib/librte_lpm/rte_lpm.h
>>>>>>>>>>>> @@ -132,17 +132,10 @@ struct rte_lpm_rule_info {
>>>>>>>>>>>>
>>>>>>>>>>>> /** @internal LPM structure. */
>>>>>>>>>>>> struct rte_lpm {
>>>>>>>>>>>> - /* LPM metadata. */
>>>>>>>>>>>> - char name[RTE_LPM_NAMESIZE]; /**< Name of the
>>>>>>>>>>>> lpm. */
>>>>>>>>>>>> - uint32_t max_rules; /**< Max. balanced rules per lpm. */
>>>>>>>>>>>> - uint32_t number_tbl8s; /**< Number of tbl8s. */
>>>>>>>>>>>> - struct rte_lpm_rule_info rule_info[RTE_LPM_MAX_DEPTH];
>>>>>>>>>>>> /**<
>>>>>>>>>> Rule info table. */
>>>>>>>>>>>> -
>>>>>>>>>>>> /* LPM Tables. */
>>>>>>>>>>>> struct rte_lpm_tbl_entry
>>>>>>>>>>>> tbl24[RTE_LPM_TBL24_NUM_ENTRIES]
>>>>>>>>>>>> __rte_cache_aligned; /**< LPM tbl24 table. */
>>>>>>>>>>>> struct rte_lpm_tbl_entry *tbl8; /**< LPM tbl8 table. */
>>>>>>>>>>>> - struct rte_lpm_rule *rules_tbl; /**< LPM rules. */
>>>>>>>>>>>> };
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Since this changes the ABI, does it not need advance notice?
>>>>>>>>>>>
>>>>>>>>>>> [Basically the return value point from rte_lpm_create() will be
>>>>>>>>>>> different, and that return value could be used by
>>>>>>>>>>> rte_lpm_lookup()
>>>>>>>>>>> which as a static inline function will be in the binary and
>>>>>>>>>>> using
>>>>>>>>>>> the old structure offsets.]
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Agree with Bruce, this patch breaks ABI, so it can't be accepted
>>>>>>>>>> without prior notice.
>>>>>>>>>>
>>>>>>>>> So if the change wants to happen in 20.11, a deprecation notice
>>>>>>>>> should
>>>>>>>>> have been added in 20.08.
>>>>>>>>> I should have added a deprecation notice. This change will have
>>>>>>>>> to wait for
>>>>>>>> next ABI update window.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Do you plan to extend? or is this just speculative?
>>>>>>> It is speculative.
>>>>>>>
>>>>>>>>
>>>>>>>> A quick scan and there seems to be several projects using some
>>>>>>>> of these
>>>>>>>> members that you are proposing to hide. e.g. BESS, NFF-Go, DPVS,
>>>>>>>> gatekeeper. I didn't look at the details to see if they are
>>>>>>>> really needed.
>>>>>>>>
>>>>>>>> Not sure how much notice they'd need or if they update DPDK
>>>>>>>> much, but I
>>>>>>>> think it's worth having a closer look as to how they use lpm and
>>>>>>>> what the
>>>>>>>> impact to them is.
>>>>>>> Checked the projects listed above. BESS, NFF-Go and DPVS don't
>>>>>>> access the members to be hided.
>>>>>>> They will not be impacted by this patch.
>>>>>>> But Gatekeeper accesses the rte_lpm internal members that to be
>>>>>>> hided. Its compilation will be broken with this patch.
>>>>>>>
>>>>>>>>
>>>>>>>>> Thanks.
>>>>>>>>> Ruifeng
>>>>>>>>>>>> /** LPM RCU QSBR configuration structure. */
>>>>>>>>>>>> --
>>>>>>>>>>>> 2.17.1
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Regards,
>>>>>>>>>> Vladimir
>>>>>>>
>>>>>>
>>>>
>>
--
Regards,
Vladimir
More information about the dev
mailing list