[EXTERNAL] Re: [PATCH v3 0/2] net/mana: add device reset support
Wei Hu
weh at microsoft.com
Thu May 28 09:30:03 CEST 2026
> -----Original Message-----
> From: Stephen Hemminger <stephen at networkplumber.org>
> Sent: Wednesday, May 27, 2026 3:38 AM
> To: Wei Hu <weh at linux.microsoft.com>
> Cc: dev at dpdk.org; Long Li <longli at microsoft.com>; Wei Hu
> <weh at microsoft.com>
> Subject: [EXTERNAL] Re: [PATCH v3 0/2] net/mana: add device reset support
>
>
> Warnings
>
> 3. Secondary process: qsbr does not actually quiesce secondary lcores.
>
> rte_rcu_qsbr_thread_register is only called from
> mana_dev_configure, which the secondary never runs. The
> secondary's rx_burst/tx_burst still call thread_online/offline
> against the shared qsv, but the secondary tids are unregistered,
> so they are invisible to rte_rcu_qsbr_check in the primary, and
> the secondary MP handler (mana_mp_reset_enter) does not call
> qsbr_check at all -- it just sets db_page = NULL and munmaps.
>
> The dev_state check at the top of secondary tx_burst is racy:
> the page can be munmapped after the in-loop read of db_page but
> before the doorbell write at the bottom. The "All secondary
> threads are quiescent" log line in mana_mp_reset_enter is not
> true.
>
> The secondary needs a real reader-side primitive -- its own
> qsbr with secondary lcore registration, or an rwlock the MP
> handler takes before munmap.
>
Thanks for the v3 review, @Stephen. I will send out v4 to incorporate most
of the review comments except for this one.
The review on this point is not correct. Here I am providing analysis from
AI and my own test results to show why.
The concern is that "rte_rcu_qsbr_thread_register is only called
from mana_dev_configure, which the secondary never runs", so
secondary tids are unregistered and invisible to rte_rcu_qsbr_check.
This is incorrect. The RCU thread IDs are per-queue, not per-process.
The tid used in rte_rcu_qsbr_thread_online/offline is:
- RX path: tid = rxq->rxq_idx (0 to num_queues-1)
- TX path: tid = num_queues + txq_idx (num_queues to 2*num_queues-1)
These tids are registered once by primary in mana_dev_configure
(mana.c:117), which calls rte_rcu_qsbr_thread_register for IDs
0..2*num_queues-1. The QSV (priv->dev_state_qsv) is allocated
with rte_zmalloc_socket in shared hugepage memory (mana.c:2207).
The queue structures (rxq, txq) also live in shared memory via
dev->data->rx_queues[] / tx_queues[].
When a secondary lcore polls a queue, it calls
rte_rcu_qsbr_thread_online(qsv, tid) with the SAME shared QSV
and SAME queue-based tid. DPDK's fundamental model requires that
a given queue is polled by exactly one lcore at a time (queues
are not thread-safe), so the tid maps 1:1 to a lcore regardless
of which process that lcore belongs to. The tid IS registered ―
registration is a property of the QSV structure, not of the process.
The reset synchronization flow (mana_reset_enter, mana.c:1458-1488):
1. Primary: dev_state = MANA_DEV_RESET_ENTER (release store)
2. Primary: rte_rcu_qsbr_start + rte_rcu_qsbr_check ― blocks
until ALL registered tids have passed through thread_offline
3. Secondary lcores in rx_burst/tx_burst eventually call
rte_rcu_qsbr_thread_offline, and on re-entry see
dev_state != ACTIVE → return 0 without touching db_page
4. Only after QSBR check passes (ALL tids quiescent, including
secondary's) does primary proceed to mana_dev_stop (line 1472)
5. Primary sends MANA_MP_REQ_RESET_ENTER IPC (line 1481)
6. Secondary MP handler: sets db_page = NULL, munmaps
The alleged race ― "db_page can be munmapped after the in-loop
read but before the doorbell write" ― cannot happen because:
- db_page is munmapped at step 6, which is AFTER step 2 (QSBR).
- After QSBR passes, any secondary re-entry into tx_burst/rx_burst
reads dev_state with acquire ordering (tx.c:214, rx.c:468),
sees RESET_ENTER, and returns immediately without using the
local db_page copy.
- The memory ordering is sound: primary's release store of
dev_state (step 1) happens-before QSBR start (step 2).
Secondary's thread_offline (release) synchronizes-with
primary's QSBR check (acquire). On secondary re-entry,
thread_online + acquire load of dev_state sees the update.
The secondary MP handler does NOT need its own QSBR check ―
the primary's check at step 2 already covers all data path
threads including secondary's.
Note: The log "All secondary threads are quiescent" in
mana_mp_reset_enter is misleading and has been removed.
The actual quiescence is enforced by the primary's QSBR
check before IPC is sent.
Followings are from my own test. I added the debug log in mana_tx_burst() call.
diff --git a/drivers/net/mana/mana.h b/drivers/net/mana/mana.h
index 115bc722f4..038a3d9e00 100644
--- a/drivers/net/mana/mana.h
+++ b/drivers/net/mana/mana.h
@@ -454,6 +454,9 @@ struct mana_txq {
struct mana_stats stats;
unsigned int socket;
unsigned int txq_idx;
+#if 1
+ unsigned int call_cnt;
+#endif
};
struct mana_rxq {
diff --git a/drivers/net/mana/tx.c b/drivers/net/mana/tx.c
index 3b07a8b9a6..3ae825190e 100644
--- a/drivers/net/mana/tx.c
+++ b/drivers/net/mana/tx.c
@@ -210,6 +210,14 @@ mana_tx_burst(void *dpdk_txq, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
}
rte_rcu_qsbr_thread_online(dstate_qsv, tid);
+#if 1
+ if (txq->call_cnt++ % 1000000 == 0) {
+ if (rte_eal_process_type() == RTE_PROC_SECONDARY)
+ DP_LOG(ERR, "secondary tid %u online", tid);
+ else
+ DP_LOG(ERR, "primary tid %u online", tid);
+ }
+#endif
if (unlikely(rte_atomic_load_explicit(&priv->dev_state,
rte_memory_order_acquire) != MANA_DEV_ACTIVE || !db_page)) {
Primary command:
dpdk-testpmd --log-level='.*,8' -l 2-4 ... --proc-type=primary -- --nb-cores 2
--forward-mode=txonly ... --txq=4 --rxq=4 ... --num-procs=2 --proc-id 0
Secondary command:
dpdk-testpmd --log-level='.*,8' -l 6-8 ... --proc-type=secondary -- --nb-cores 2
--forward-mode=txonly ... --txq=4 --rxq=4 ... --num-procs=2 --proc-id 1
Here in both commands, I specify the number of txq and rxq to be 4.
After primary start, the log shows 8 threads total were registered in mana_dev_configure
and their tids:
Configuring Port 0 (socket 0)
MANA_DRIVER: mana_dev_configure(): priv 0x1003cdc80, port 0, dev port 1, num_queues: 4
MANA_DRIVER: mana_dev_configure(): Register thread 0x0 for priv 0x1003cdc80, port 0
MANA_DRIVER: mana_dev_configure(): Register thread 0x1 for priv 0x1003cdc80, port 0
MANA_DRIVER: mana_dev_configure(): Register thread 0x2 for priv 0x1003cdc80, port 0
MANA_DRIVER: mana_dev_configure(): Register thread 0x3 for priv 0x1003cdc80, port 0
MANA_DRIVER: mana_dev_configure(): Register thread 0x4 for priv 0x1003cdc80, port 0
MANA_DRIVER: mana_dev_configure(): Register thread 0x5 for priv 0x1003cdc80, port 0
MANA_DRIVER: mana_dev_configure(): Register thread 0x6 for priv 0x1003cdc80, port 0
MANA_DRIVER: mana_dev_configure(): Register thread 0x7 for priv 0x1003cdc80, port 0
ETHDEV: Port 0 Rx offload RSS_HASH is not requested but enabled
Note tid 0x0 to 0x3 are rx threads, 0x4 to 0x7 are tx threads. Then the primary
only printed out the tid 4 and 5 sere online:
MANA_DRIVER: primary tid 4 online
MANA_DRIVER: primary tid 5 online
Then I started the secondary and saw tid 6 and 7 were online in its mana_tx_burst call:
MANA_DRIVER: secondary tid 7 online
MANA_DRIVER: secondary tid 6 online
MANA_DRIVER: secondary tid 7 online
This simply means primary has registered all threads which would be up in both
primary and secondary in mana_ddev_configure. Primary only starts half of the
threads since --num-procs=2 passed in already tells primary that there will be
a secondary to start.
When the secondary starts, it takes the rest two tx and rx threads. All these threads
have already been registered in mana_dev_configure() by calling
rte_rcu_qsbr_thread_register() in the primary.
So, the review comments on this point is incorrect. All threads are covered and
there is no race at all.
Thanks,
Wei
More information about the dev
mailing list