<html> <head> <base href="https://bugs.dpdk.org/"> </head> <body><table border="1" cellspacing="0" cellpadding="8" class="bz_new_table"> <tr> <th>Bug ID</th> <td><a class="bz_bug_link bz_status_UNCONFIRMED " title="UNCONFIRMED - examples/l3fwd: in event-mode hash.txadapter.txq is not always updated" href="https://bugs.dpdk.org/show_bug.cgi?id=1391">1391</a> </td> </tr> <tr> <th>Summary</th> <td>examples/l3fwd: in event-mode hash.txadapter.txq is not always updated </td> </tr> <tr> <th>Product</th> <td>DPDK </td> </tr> <tr> <th>Version</th> <td>unspecified </td> </tr> <tr> <th>Hardware</th> <td>All </td> </tr> <tr> <th>OS</th> <td>All </td> </tr> <tr> <th>Status</th> <td>UNCONFIRMED </td> </tr> <tr> <th>Severity</th> <td>normal </td> </tr> <tr> <th>Priority</th> <td>Normal </td> </tr> <tr> <th>Component</th> <td>examples </td> </tr> <tr> <th>Assignee</th> <td>dev@dpdk.org </td> </tr> <tr> <th>Reporter</th> <td>konstantin.v.ananyev@yandex.ru </td> </tr> <tr> <th>CC</th> <td>pbhagavatula@marvell.com </td> </tr> <tr> <th>Target Milestone</th> <td>--- </td> </tr></table> <p> <div class="bz_comment_block"> <pre class="bz_comment_text">Reproducible with latest main branch. l3fwd in event-mode with SW with SW eventdev on mlx5 PMDs can crash: ./dpdk-l3fwd --lcores=49,51,53,55,57 -n 6 -a ca:00.0 -a ca:00.1 -a cb:00.0 -a cb:00.1 -s 0x8000000000000 -\ -vdev event_sw0 -- -L -P -p f --rx-queue-size 1024 --tx-queue-size 1024 --mode eventdev --eventq-sched=ordered \ --rule_ipv4=test/l3fwd_lpm_v4_u1.cfg --rule_ipv6=test/l3fwd_lpm_v6_u1.cfg --no-numa Thread 4 "dpdk-worker51" received signal SIGSEGV, Segmentation fault. 0x000000000135d27f in rte_eth_tx_buffer (tx_pkt=0x17f3ea780, buffer=0x10, queue_id=43, port_id=1) at ../lib/ethdev/rte_ethdev.h:6637 6637 buffer->pkts[buffer->length++] = tx_pkt; (gdb) bt #0 0x000000000135d27f in rte_eth_tx_buffer (tx_pkt=0x17f3ea780, buffer=0x10, queue_id=43, port_id=1) at ../lib/ethdev/rte_ethdev.h:6637 #1 txa_service_tx (txa=0x11f89959c0, ev=0x7ffff2f23e10, n=16) at ../lib/eventdev/rte_event_eth_tx_adapter.c:631 #2 0x000000000135d3ef in txa_service_func (args=0x11f89959c0) at ../lib/eventdev/rte_event_eth_tx_adapter.c:666 #3 0x00000000015d30e1 in service_runner_do_callback (s=0x11ffffe100, cs=0x11fffe8500, service_idx=2) at ../lib/eal/common/rte_service.c:405 #4 0x00000000015d3429 in service_run (i=2, cs=0x11fffe8500, service_mask=7, s=0x11ffffe100, serialize_mt_unsafe=1) at ../lib/eal/common/rte_service.c:441 #5 0x00000000015d363f in service_runner_func (arg=0x0) at ../lib/eal/common/rte_service.c:513 #6 0x00000000015c12c1 in eal_thread_loop (arg=0x33) at ../lib/eal/common/eal_common_thread.c:212 #7 0x00000000015e1b98 in eal_worker_thread_loop (arg=0x33) at ../lib/eal/linux/eal.c:916 #8 0x00007ffff5ff76ea in start_thread () from /lib64/libpthread.so.0 #9 0x00007ffff5d0fa8f in clone () from /lib64/libc.so.6 Obviously 'queue_id=43' is wrong here and it crashed while trying to access un-configured TX queue. What is happening here is a coincidence of two different problems: 1. EVENT framework silently and un-conditionally re-uses mbuf::hash.fdir for its own purposes: struct { uint32_t reserved1; uint16_t reserved2; uint16_t txq; /**< The event eth Tx adapter uses this field * to store Tx queue id. * @see rte_event_eth_tx_adapter_txq_set() */ } txadapter; /**< Eventdev ethdev Tx adapter */ In particular txa_service_tx() expects hash.txadapter.txq to contain valid TX queue index. Though l3fwd not always set it properly. Usually it is ok for that particular app, as only queue 0 is in use, and it doesn't configure PMDs to overwrite mbuf::hash.fdir.hi value (RTE_MBUF_F_RX_FDIR). But if by whatever reason PMD will overwrite mbuf::hash.fdir.hi with some non-zero value, then we are in trouble. 2. That's exactly what is happening here: mlx5 driver sometimes superfluously updates mbuf::hash.fdir.hi. The fix I applied localy is obvious - *always* set hash.txadapter.txq to a proper value before calling rte_event_enqueue_burst(). See below for details. Note that it is not the 'complete' fix, as same needs to be done for other codepaths (em, fib, acl, etc.). As a more general thing - I don't understand while EVENT framework keep using hash.fdir for its own purposes. Specially in a completely silent and unconditional way. I think it would be much cleaner to switch to mbuf dynfiield/dynflag based approach. diff --git a/examples/l3fwd/l3fwd_lpm.c b/examples/l3fwd/l3fwd_lpm.c index a484a33089..ef9838aef3 100644 --- a/examples/l3fwd/l3fwd_lpm.c +++ b/examples/l3fwd/l3fwd_lpm.c @@ -285,6 +285,8 @@ lpm_event_loop_single(struct l3fwd_event_resources *evt_rsrc, continue; } + rte_event_eth_tx_adapter_txq_set(ev.mbuf, 0); + if (flags & L3FWD_EVENT_TX_ENQ) { ev.queue_id = tx_q_id; ev.op = RTE_EVENT_OP_FORWARD; @@ -295,7 +297,6 @@ lpm_event_loop_single(struct l3fwd_event_resources *evt_rsrc, } if (flags & L3FWD_EVENT_TX_DIRECT) { - rte_event_eth_tx_adapter_txq_set(ev.mbuf, 0); do { enq = rte_event_eth_tx_adapter_enqueue( event_d_id, event_p_id, &ev, 1, 0); @@ -344,11 +345,8 @@ lpm_event_loop_burst(struct l3fwd_event_resources *evt_rsrc, events[i].op = RTE_EVENT_OP_FORWARD; } - if (flags & L3FWD_EVENT_TX_DIRECT) - rte_event_eth_tx_adapter_txq_set(events[i].mbuf, - 0); - lpm_process_event_pkt(lconf, events[i].mbuf); + rte_event_eth_tx_adapter_txq_set(events[i].mbuf, 0); } if (flags & L3FWD_EVENT_TX_ENQ) { </pre> </div> </p> <hr> <span>You are receiving this mail because:</span> <ul> <li>You are the assignee for the bug.</li> </ul> <div itemscope itemtype="http://schema.org/EmailMessage"> <div itemprop="action" itemscope itemtype="http://schema.org/ViewAction"> <link itemprop="url" href="https://bugs.dpdk.org/show_bug.cgi?id=1391"> <meta itemprop="name" content="View bug"> </div> <meta itemprop="description" content="Bugzilla bug update notification"> </div> </body> </html>