<div dir="ltr">Thank you for your interest in the problem. It seems that the error message was due to the passing of option --allow 0000:00.0 by mistake to the secondary too.<div>The primary correctly did all initialization phases:</div><div><br></div><div><div>rte_dev_probe(vf)</div><div>rte_eth_dev_configure(port_id, ... );<br>rte_eth_dev_adjust_nb_rx_tx_desc(port_id, ... );<br>rte_eth_rx_queue_setup(port_id, .... );<br>rte_eth_tx_queue_setup(port_id, ... );<br>rte_eth_dev_start(port_id ... );<br></div><div><br></div></div><div> and the secondary did nothing apart from the tx_burst but the secondary didn't see the port at all due to --allow wrong options.</div><div><br></div><div>BR,</div><div>Anna.</div><div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">Il giorno gio 1 set 2022 alle ore 17:22 Stephen Hemminger <<a href="mailto:stephen@networkplumber.org">stephen@networkplumber.org</a>> ha scritto:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On Thu, 1 Sep 2022 09:33:54 +0200<br>
Anna Tauzzi <<a href="mailto:admin@argonnetech.net" target="_blank">admin@argonnetech.net</a>> wrote:<br>
<br>
> I'm using the Mellanox Connect X5:<br>
> <br>
> pci@0000:3b:00.0 enp59s0f0np0 network MT27800 Family [ConnectX-5]<br>
> pci@0000:3b:00.1 enp59s0f1np1 network MT27800 Family [ConnectX-5]<br>
> pci@0000:3b:00.2 enp59s0f0v0 network MT27800 Family [ConnectX-5<br>
> Virtual Function]<br>
> pci@0000:3b:00.3 enp59s0f0v1 network MT27800 Family [ConnectX-5<br>
> Virtual Function]<br>
> pci@0000:3b:00.4 enp59s0f0v2 network MT27800 Family [ConnectX-5<br>
> Virtual Function]<br>
> pci@0000:3b:00.5 enp59s0f0v3 network MT27800 Family [ConnectX-5<br>
> Virtual Function]<br>
> pci@0000:3b:04.2 enp59s0f1v0 network MT27800 Family [ConnectX-5<br>
> Virtual Function]<br>
> pci@0000:3b:04.3 enp59s0f1v1 network MT27800 Family [ConnectX-5<br>
> Virtual Function]<br>
> pci@0000:3b:04.4 enp59s0f1v2 network MT27800 Family [ConnectX-5<br>
> Virtual Function]<br>
> pci@0000:3b:04.5 enp59s0f1v3 network MT27800 Family [ConnectX-5<br>
> Virtual Function]<br>
> <br>
> This is the message:<br>
> lcore 6 called tx_pkt_burst for not ready port 0<br>
> 8: [/lib/x86_64-linux-gnu/libc.so.6(+0x126a00) [0x7ffff7c77a00]]<br>
> 7: [/lib/x86_64-linux-gnu/libc.so.6(+0x94b43) [0x7ffff7be5b43]]<br>
> 6: [/usr/local/lib/librte_eal.so.22(+0x1559a) [0x7ffff7d8e59a]]<br>
> 5: [build/simple_eth_tx_mp(+0x1a0c7) [0x55555556e0c7]]<br>
> 4: [build/simple_eth_tx_mp(+0x19f89) [0x55555556df89]]<br>
> 3: [build/simple_eth_tx_mp(+0x423c) [0x55555555823c]]<br>
> 2: [/usr/local/lib/librte_ethdev.so.22(+0x7cbc) [0x7ffff7eb3cbc]]<br>
> 1: [/usr/local/lib/librte_eal.so.22(rte_dump_stack+0x32) [0x7ffff7daf152]]<br>
> <br>
> I'm having all sorts of problems with this Mellanox stuff, Intel cards are<br>
> much more user friendly.<br>
> <br>
> Just to recap:<br>
> * configure on primary and transmit on primary ---> GOOD<br>
> <br>
> * configure on secondary and transmit on secondary ---> SIGSEGV<br>
> Thread 4 "lcore-worker-6" received signal SIGSEGV, Segmentation fault.<br>
> [Switching to Thread 0x7ffff4346640 (LWP 7208)]<br>
> rte_eth_tx_burst (port_id=0, queue_id=0, tx_pkts=0x7ffff4344ac0, nb_pkts=1)<br>
> at /usr/local/include/rte_ethdev.h:5650<br>
> 5650 qd = p->txq.data[queue_id];<br>
> (gdb) print p->txq<br>
> $2 = {data = 0x0, clbk = 0x7ffff7f21528 <rte_eth_devices+8296>} (data is<br>
> NULL)<br>
> <br>
> <br>
> * configure on primary and transmit on secondary ---> PORT NOT READY<br>
> <br>
> Do you know who should be notified of this problem? Should I open a bug on<br>
> DPDK bugzilla or file it to NVIDIA?<br>
> <br>
> Thx.<br>
> <br>
> <br>
> <br>
> Il giorno gio 1 set 2022 alle ore 03:25 Stephen Hemminger <<br>
> <a href="mailto:stephen@networkplumber.org" target="_blank">stephen@networkplumber.org</a>> ha scritto: <br>
> <br>
> > On Wed, 31 Aug 2022 22:59:56 +0200<br>
> > Anna Tauzzi <<a href="mailto:admin@argonnetech.net" target="_blank">admin@argonnetech.net</a>> wrote:<br>
> > <br>
> > > I initialize a port with the following methods on a primary process:<br>
> > ><br>
> > > rte_dev_probe(vf)<br>
> > ><br>
> > > rte_eth_dev_configure(port_id, ... );<br>
> > ><br>
> > > rte_eth_dev_adjust_nb_rx_tx_desc(port_id, ... );<br>
> > ><br>
> > > rte_eth_rx_queue_setup(port_id, .... );<br>
> > ><br>
> > > rte_eth_tx_queue_setup(port_id, ... );<br>
> > ><br>
> > > rte_eth_dev_start(port_id ... );<br>
> > ><br>
> > ><br>
> > ><br>
> > > Then I use the rte_eth_tx_burst(port_id) in the secondary process but I <br>
> > get <br>
> > > this message:<br>
> > ><br>
> > > called tx_pkt_burst for not ready port 0<br>
> > ><br>
> > > Is this expected? <br>
> ><br>
> > No looks like a device driver bug. Which PMD?<br>
<br>
What version of rdma-core and kernel.<br>
There were some bugs in earlier versions around secondary process support.<br>
They were fixed, some users are using failsafe and mlx5 on Azure with<br>
secondary processes.<br>
</blockquote></div>