|SUCCESS| [v1,5/5] fib6: speed up tbl8 reservation accounting

dpdklab at iol.unh.edu dpdklab at iol.unh.edu
Fri May 22 19:40:25 CEST 2026


Test-Label: ai-code-review
Test-Status: SUCCESS
https://dpdk.org/patch/164270

_AI Code Review_

=== Patch Review: series.patch (via Claude) ===
Review date: 2026-05-22

This is a DPDK patch series that fixes a tbl8 reservation drift bug in
the FIB6 trie implementation and adds several improvements.

commit SHA: 20260522145855.1748406-2-maxime at leroys.fr
Author: Maxime Leroy <maxime at leroys.fr>
fib6: fix tbl8 reservation drift in trie

The patch fixes a bug where rsvd_tbl8s accounting diverges when RIB
changes occur between ADD and DEL operations, eventually wrapping to
UINT32_MAX and rejecting subsequent /25+ ADDs with -ENOSPC.

> diff --git a/lib/fib/trie.c b/lib/fib/trie.c
> index fa5d9ec6b0..44b90f72ff 100644
> --- a/lib/fib/trie.c
> +++ b/lib/fib/trie.c

[ ... ]

> +/*
> + * Count byte boundaries between 24 and CEIL(depth, 8) where the
> + * supernet of ip has no descendant in the RIB. This is the number of
> + * new tbl8 levels an ADD of ip/depth would introduce, or the number
> + * to free at DEL once the prefix has been removed from the RIB.
> + *
> + * A NULL answer at level L propagates upwards: narrower supernets at
> + * L+8, L+16, ... are subsets of S_L and cannot contain descendants
> + * either. The loop stops at the first NULL and tallies the remaining
> + * boundaries in one shot.
> + */
> +static uint8_t
> +count_empty_levels(struct rte_rib6 *rib, const struct rte_ipv6_addr *ip,
> +	uint8_t depth)
> +{
> +	uint8_t level, top = RTE_ALIGN_CEIL(depth, 8);
> +
> +	for (level = 24; level < top; level += 8) {
> +		if (rte_rib6_get_nxt(rib, ip, level, NULL,
> +				RTE_RIB6_GET_NXT_COVER) == NULL)
> +			return (top - level) >> 3;
> +	}
> +	return 0;
> +}

Does the code handle the case where depth is less than 24 correctly? The
top calculation uses RTE_ALIGN_CEIL which may produce unexpected results
for very small depths. Should there be a bounds check or explicit handling
for depth <= 24?

> @@ -559,37 +583,6 @@ trie_modify(struct rte_fib6 *fib, const struct rte_ipv6_addr *ip,
>  	ip_masked = *ip;
>  	rte_ipv6_addr_mask(&ip_masked, depth);
>  
> -	if (depth > 24) {

The removal of this depth > 24 guard suggests count_empty_levels handles
it, but does count_empty_levels()->rte_rib6_get_nxt() behave correctly
when level equals or exceeds top in the loop condition?

[ ... ]

> @@ -603,7 +596,8 @@ trie_modify(struct rte_fib6 *fib, const struct rte_ipv6_addr *ip,
>  			return 0;
>  		}
>  
> -		if ((depth > 24) && (dp->rsvd_tbl8s + depth_diff > dp->number_tbl8s))
> +		new_levels = count_empty_levels(rib, &ip_masked, depth);
> +		if (dp->rsvd_tbl8s + new_levels > dp->number_tbl8s)
>  			return -ENOSPC;

The capacity check now runs for all depths, not just depth > 24. Does
this change behavior for depth <= 24 prefixes in an unintended way?

> @@ -640,9 +634,8 @@ trie_modify(struct rte_fib6 *fib, const struct rte_ipv6_addr *ip,
>  
>  		if (ret != 0)
>  			return ret;
> -		rte_rib6_remove(rib, ip, depth);
> -
> -		dp->rsvd_tbl8s -= depth_diff;
> +		rte_rib6_remove(rib, &ip_masked, depth);
> +		dp->rsvd_tbl8s -= count_empty_levels(rib, &ip_masked, depth);

Does calling count_empty_levels after rte_rib6_remove produce the correct
count? The prefix has already been removed, so does the RIB query see the
post-removal state or does it need to be called before removal?


commit SHA: 20260522145855.1748406-5-maxime at leroys.fr
Author: Maxime Leroy <maxime at leroys.fr>
rib: track valid descendant count per node

This patch adds a valid_descendants field to rte_rib6_node and maintains
it incrementally during insert and remove operations.

> diff --git a/lib/rib/rib6_internal.h b/lib/rib/rib6_internal.h
> new file mode 100644
> index 0000000000..8246626de9
> --- /dev/null
> +++ b/lib/rib/rib6_internal.h

> +/**
> + * @internal
> + * Count byte boundaries L in {24, 32, 40, ..., RTE_ALIGN_CEIL(depth, 8) - 8}
> + * for which the supernet of ip at level L has no valid descendant with
> + * depth > L. Used by lib/fib to maintain tbl8 reservation accounting in
> + * a single descent of the binary tree.
> + *
> + * @param rib
> + *  RIB object handle
> + * @param ip
> + *  IPv6 prefix address
> + * @param depth
> + *  prefix length
> + * @return
> + *  number of empty byte boundaries (0 if all levels have descendants
> + *  or depth <= 24)
> + */
> +__rte_internal
> +uint8_t
> +rte_rib6_count_empty_supernets(struct rte_rib6 *rib,
> +	const struct rte_ipv6_addr *ip, uint8_t depth);

Does this function need the __rte_experimental tag in addition to
__rte_internal since it is a new API being exposed, even if only to other
DPDK libraries?

> diff --git a/lib/rib/rte_rib6.c b/lib/rib/rte_rib6.c
> index ec8ff68e87..ae8aba6563 100644
> --- a/lib/rib/rte_rib6.c
> +++ b/lib/rib/rte_rib6.c

> @@ -36,6 +37,7 @@ struct rte_rib6_node {
>  	struct rte_rib6_node	*parent;
>  	uint64_t		nh;
>  	struct rte_ipv6_addr	ip;
> +	uint32_t		valid_descendants;
>  	uint8_t			depth;
>  	uint8_t			flag;
>  	uint64_t ext[];

Does the placement of valid_descendants utilize existing padding, or does
it increase the structure size? The comment claims it fits in padding, but
can this be verified with a static assertion or is it compiler-dependent?

> @@ -104,10 +106,35 @@ node_alloc(struct rte_rib6 *rib)
>  	ret = rte_mempool_get(rib->node_pool, (void *)&ent);
>  	if (unlikely(ret != 0))
>  		return NULL;
> +	ent->valid_descendants = 0;
>  	++rib->cur_nodes;
>  	return ent;
>  }

Does rte_mempool_get guarantee zeroed memory, or could valid_descendants
contain stale data if the mempool recycles objects? Should there be a
memset or is explicit initialization sufficient?

> +/* Increment valid_descendants along the parent chain when node becomes valid. */
> +static inline void
> +inc_valid_descendants(struct rte_rib6_node *node)
> +{
> +	struct rte_rib6_node *p = node->parent;
> +
> +	while (p != NULL) {
> +		p->valid_descendants++;
> +		p = p->parent;
> +	}
> +}

Does this code handle potential uint32_t overflow? In theory, could
valid_descendants wrap if an extremely large number of prefixes share
common ancestors? The commit message mentions uint32_t is required for
more than 65535 routes, but what is the practical limit?

> @@ -371,6 +401,9 @@ rte_rib6_insert(struct rte_rib6 *rib,
>  			new_node->left = *tmp;
>  		new_node->parent = (*tmp)->parent;
>  		(*tmp)->parent = new_node;
> +		/* new_node inherits *tmp's subtree */
> +		new_node->valid_descendants = (is_valid_node(*tmp) ? 1 : 0) +
> +			(*tmp)->valid_descendants;

Does this arithmetic correctly account for all cases? When is_valid_node
returns true for *tmp, does adding 1 plus tmp->valid_descendants correctly
represent the entire subtree rooted at new_node?

> @@ -386,6 +419,11 @@ rte_rib6_insert(struct rte_rib6 *rib,
>  		common_node->parent = (*tmp)->parent;
>  		new_node->parent = common_node;
>  		(*tmp)->parent = common_node;
> +		/* common_node inherits *tmp's subtree (new_node will be
> +		 * counted by inc_valid_descendants below).
> +		 */
> +		common_node->valid_descendants = (is_valid_node(*tmp) ? 1 : 0) +
> +			(*tmp)->valid_descendants;

Does the comment accurately reflect the code flow? Is inc_valid_descendants
called after this point for new_node, ensuring common_node gets updated
correctly?

> +RTE_EXPORT_INTERNAL_SYMBOL(rte_rib6_count_empty_supernets)
> +uint8_t
> +rte_rib6_count_empty_supernets(struct rte_rib6 *rib,
> +	const struct rte_ipv6_addr *ip, uint8_t depth)
> +{
> +	uint8_t top = RTE_ALIGN_CEIL(depth, 8);
> +	struct rte_rib6_node *cur;
> +	bool has_descendant;
> +	uint8_t level;
> +
> +	if (unlikely(rib == NULL || ip == NULL || depth > RTE_IPV6_MAX_DEPTH))
> +		return 0;
> +	if (depth <= 24)
> +		return 0;
> +
> +	cur = rib->tree;

Does the code handle the case where rib->tree is NULL correctly? Should
there be an explicit check before the loop?

> +	/* Single descent through the binary tree, checking each byte
> +	 * boundary on the way. NULL at level L propagates upward, so we
> +	 * stop at the first empty supernet and tally the remaining levels.
> +	 */
> +	for (level = 24; level < top; level += 8) {
> +		while (cur != NULL && cur->depth < level)
> +			cur = get_nxt_node(cur, ip);
> +
> +		if (cur == NULL || !rte_ipv6_addr_eq_prefix(&cur->ip, ip, level))
> +			return (top - level) >> 3;
> +
> +		if (cur->depth > level)
> +			has_descendant = is_valid_node(cur) ||
> +				cur->valid_descendants > 0;
> +		else
> +			has_descendant = cur->valid_descendants > 0;

Does this logic correctly distinguish between a node exactly at level
versus one beyond level? When cur->depth equals level, does checking only
valid_descendants > 0 capture all necessary cases?

> +		if (!has_descendant)
> +			return (top - level) >> 3;
> +	}
> +	return 0;
> +}


More information about the test-report mailing list