[PATCH 0/4] net/gve: add flow steering support
Stephen Hemminger
stephen at networkplumber.org
Fri Feb 27 23:52:33 CET 2026
On Fri, 27 Feb 2026 19:51:22 +0000
"Jasper Tran O'Leary" <jtranoleary at google.com> wrote:
> This patch series adds flow steering support to the Google Virtual
> Ethernet (gve) driver. This functionality allows traffic to be directed
> to specific receive queues based on user-specified flow patterns.
>
> The series includes foundational support for extended admin queue
> commands needed to handle flow rules, the specific adminqueue commands
> for flow rule management, and the integration with the DPDK rte_flow
> API. The series adds support flow matching on the following protocols:
> IPv4, IPv6, TCP, UDP, SCTP, ESP, and AH.
>
> Patch Overview:
>
> 1. "net/gve: add flow steering device option" checks for and enables
> the flow steering capability in the device options during
> initialization.
> 2. "net/gve: introduce extended adminq command" adds infrastructure
> for sending extended admin queue commands. These commands use a
> flexible buffer descriptor format required for flow rule management.
> 3. "net/gve: add adminq commands for flow steering" implements the
> specific admin queue commands to add and remove flow rules on the
> device, including handling of rule IDs and parameters.
> 4. "net/gve: add rte flow API integration" exposes the flow steering
> functionality via the DPDK rte_flow API. This includes strict
> pattern validation, rule parsing, and lifecycle management (create,
> destroy, flush). It ensures thread-safe access to the flow subsystem
> and proper resource cleanup during device reset.
>
> Jasper Tran O'Leary (2):
> net/gve: add adminq commands for flow steering
> net/gve: add rte flow API integration
>
> Vee Agarwal (2):
> net/gve: add flow steering device option
> net/gve: introduce extended adminq command
>
> doc/guides/nics/features/gve.ini | 12 +
> doc/guides/nics/gve.rst | 20 +
> doc/guides/rel_notes/release_26_03.rst | 1 +
> drivers/net/gve/base/gve.h | 3 +-
> drivers/net/gve/base/gve_adminq.c | 118 ++++-
> drivers/net/gve/base/gve_adminq.h | 57 +++
> drivers/net/gve/gve_ethdev.c | 87 +++-
> drivers/net/gve/gve_ethdev.h | 46 ++
> drivers/net/gve/gve_flow_rule.c | 645 +++++++++++++++++++++++++
> drivers/net/gve/gve_flow_rule.h | 64 +++
> drivers/net/gve/meson.build | 1 +
> 11 files changed, 1049 insertions(+), 5 deletions(-)
> create mode 100644 dpdk/drivers/net/gve/gve_flow_rule.c
> create mode 100644 dpdk/drivers/net/gve/gve_flow_rule.h
>
There is a lot here, so sent AI to take a look.
Summary:
Error 1 — Resource leak: If gve_flush_flow_rules fails and disables the
flow subsystem via gve_clear_flow_subsystem_ok(), the bitmap memory
(avail_flow_rule_bmp_mem) is never freed because both gve_dev_close and
gve_dev_reset gate their teardown on gve_get_flow_subsystem_ok()
returning true. The guard needs to check for allocated memory rather
than subsystem state.
Error 2 — pthread_mutex_init with NULL attributes in shared memory: The
flow_rule_lock lives in dev_private (DPDK shared memory) but is
initialized without PTHREAD_PROCESS_SHARED. This will cause undefined
behavior with secondary processes. Switching to rte_spinlock_t would be
the simplest fix since the critical sections are short (bitmap scan +
TAILQ operations).
Patches 1–3 are clean. The overall structure and error handling in patch 4 is solid — the allocation-before-lock pattern, bitmap rollback on adminq failure, and defense-in-depth validation in destroy are all well done.
Long form:
# Code Review: GVE Flow Steering Patch Series (4 patches)
**Series**: `[PATCH 1/4]` through `[PATCH 4/4]`
**Author**: Jasper Tran O'Leary / Vee Agarwal (Google)
**Subject**: Add receive flow steering (RFS) support to the GVE driver
---
## Summary
This series adds n-tuple flow steering to the GVE (Google Virtual Ethernet) driver via the `rte_flow` API. The implementation is cleanly structured across four patches: device option discovery (1/4), extended adminq infrastructure (2/4), flow rule adminq commands (3/4), and full rte_flow integration (4/4). The code quality is generally good with thorough validation, proper locking, and well-structured error handling.
Two correctness issues were identified: a resource leak when the flow subsystem is disabled on error, and a missing `PTHREAD_PROCESS_SHARED` attribute on a mutex in shared memory.
---
## Patch 1/4: `net/gve: add flow steering device option`
### Errors
None.
### Warnings
None.
This patch is clean. The device option parsing follows the established pattern for existing options (modify_ring, jumbo_frames), the byte-swap of `max_flow_rules` is correct, and the zero-check on the big-endian value before byte-swap is valid (non-zero in any byte order is still non-zero).
---
## Patch 2/4: `net/gve: introduce extended adminq command`
### Errors
None.
### Warnings
None.
The extended command mechanism is straightforward: allocate DMA memory for the inner command, copy the command in, set up the outer wrapper with the DMA address, execute, and free. The error path is correct — `gve_free_dma_mem` is called after `gve_adminq_execute_cmd` regardless of success or failure.
---
## Patch 3/4: `net/gve: add adminq commands for flow steering`
### Errors
None.
### Warnings
**1. `gve_flow_rule.h` copyright appears to be copied from an Intel driver file.**
```c
/* SPDX-License-Identifier: BSD-3-Clause
* Copyright(C) 2022 Intel Corporation
*/
```
This is a new file in the Google GVE driver. The Intel copyright and 2022 date look like they were carried over from whichever file was used as a template. This should be updated to reflect the actual author.
---
## Patch 4/4: `net/gve: add rte flow API integration`
This is the largest patch (815 lines added) and where the significant findings are.
### Errors
**1. Resource leak: bitmap memory leaked when flow subsystem is disabled on error.**
In `gve_flush_flow_rules`, if either `gve_free_flow_rules()` or `gve_flow_init_bmp()` fails, the code disables the subsystem:
```c
gve_clear_flow_subsystem_ok(priv);
```
However, in both `gve_dev_close` and `gve_dev_reset`, teardown is gated on the flag:
```c
if (gve_get_flow_subsystem_ok(priv))
gve_teardown_flow_subsystem(priv);
```
If the subsystem was disabled by a failed flush, `gve_teardown_flow_subsystem` is never called, and `priv->avail_flow_rule_bmp_mem` is never freed. This is a memory leak.
**Suggested fix**: Either (a) always call `gve_flow_free_bmp(priv)` in close/reset regardless of the flag, or (b) have the flush error path free the bitmap memory itself, or (c) unconditionally call teardown in close/reset:
```c
/* In gve_dev_close / gve_dev_reset: */
if (priv->avail_flow_rule_bmp_mem)
gve_teardown_flow_subsystem(priv);
```
**2. `pthread_mutex_init` without `PTHREAD_PROCESS_SHARED` on a mutex in shared memory.**
In `gve_dev_init`:
```c
pthread_mutex_init(&priv->flow_rule_lock, NULL);
```
`priv` is `dev->data->dev_private`, which is allocated in DPDK shared memory accessible by both primary and secondary processes. A pthread mutex in shared memory initialized with `NULL` attributes has undefined behavior when used across processes. It may appear to work in testing but fail in production with secondary processes.
**Suggested fix**: Either use `PTHREAD_PROCESS_SHARED`:
```c
pthread_mutexattr_t attr;
pthread_mutexattr_init(&attr);
pthread_mutexattr_setpshared(&attr, PTHREAD_PROCESS_SHARED);
pthread_mutex_init(&priv->flow_rule_lock, &attr);
pthread_mutexattr_destroy(&attr);
```
Or switch to `rte_spinlock_t` which works correctly in shared memory without special initialization (appropriate here since the critical sections are short — bitmap scan, TAILQ insert/remove):
```c
rte_spinlock_t flow_rule_lock;
/* ... */
rte_spinlock_init(&priv->flow_rule_lock);
```
### Warnings
**3. Implicit pointer comparisons throughout `gve_flow_rule.c`.**
Several places use `!pointer` instead of `pointer == NULL`:
```c
if (!flow) { /* line ~1641, ~1710 */
if (!action->conf) { /* line ~1510 */
if (!attr) { /* line ~1173 */
if (!actions) { /* line ~1498 */
if (!pattern) { /* line ~1329 */
```
DPDK coding style requires explicit comparison with NULL for pointers. The idiomatic form is `if (flow == NULL)`.
**4. `rte_zmalloc` used for flow rule metadata that does not require hugepage memory.**
The `gve_flow` structs (8 bytes: a `uint32_t` + TAILQ pointers) are allocated via `rte_zmalloc`. These are small control-plane structures not accessed by DMA and not requiring NUMA placement. Standard `malloc` would be more appropriate per DPDK guidelines and would not consume limited hugepage resources.
Similarly, `priv->avail_flow_rule_bmp_mem` is allocated with `rte_zmalloc`. Since `rte_bitmap` may require specific alignment, this one is more defensible, but worth considering whether standard allocation would suffice.
**5. Standalone `rte_atomic_thread_fence()` in flow subsystem flag accessors.**
The `gve_get_flow_subsystem_ok` / `gve_set_flow_subsystem_ok` / `gve_clear_flow_subsystem_ok` functions use standalone fences with `rte_bit_relaxed_*` operations:
```c
static inline bool
gve_get_flow_subsystem_ok(struct gve_priv *priv)
{
bool ret;
ret = !!rte_bit_relaxed_get32(GVE_PRIV_FLAGS_FLOW_SUBSYSTEM_OK,
&priv->state_flags);
rte_atomic_thread_fence(rte_memory_order_acquire);
return ret;
}
```
These follow the existing pattern for other state flags in the driver, so this is consistent. However, it's worth noting that in every call site in this patch, the flag is checked while holding `flow_rule_lock`, which already provides the necessary memory ordering. The fences are redundant in those paths (but harmless).
**6. RST documentation uses simple bullet lists where definition lists would be cleaner.**
In `doc/guides/nics/gve.rst`:
```rst
Supported Patterns:
- IPv4/IPv6 source and destination addresses.
- TCP/UDP/SCTP source and destination ports.
- ESP/AH SPI.
```
These term+list groupings would read better as RST definition lists. This is minor given the lists are short.
### Design Notes (Info)
**Well-structured error handling in `gve_create_flow_rule`**: The allocation-before-lock pattern avoids holding the mutex during memory allocation, and the `free_flow_and_unlock` goto label correctly handles all error paths. The bitmap-set-on-error rollback in the adminq failure case is also correct.
**`gve_destroy_flow_rule` double-validates flow state**: The function checks both the flow pointer for NULL and verifies `rule_id < max_flow_rules` before checking the bitmap. This defense-in-depth is good practice.
**IPv6 word-reversal in `gve_parse_ipv6`**: The comment explaining the device's expected word order is clear and the implementation correctly reverses the 32-bit words. This is the kind of hardware-specific detail that benefits from the inline comment.
---
## Cross-Patch Observations
The series is well-ordered: each patch builds incrementally and the dependencies flow naturally (device option → extended command infrastructure → flow rule commands → rte_flow integration). The documentation, feature matrix, and release notes are all updated in patch 4/4 together with the code, which is correct per DPDK guidelines.
The `Co-developed-by` / `Signed-off-by` tag sequences are correctly formatted per Linux kernel convention.
More information about the dev
mailing list