<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META content="text/html; charset=iso-8859-1" http-equiv=Content-Type>
<STYLE>
BLOCKQUOTE {
MARGIN-BOTTOM: 0px; MARGIN-TOP: 0px; MARGIN-LEFT: 2em
}
OL {
MARGIN-BOTTOM: 0px; MARGIN-TOP: 0px
}
UL {
MARGIN-BOTTOM: 0px; MARGIN-TOP: 0px
}
P {
MARGIN-BOTTOM: 0px; MARGIN-TOP: 0px
}
BODY {
FONT-SIZE: 10.5pt; FONT-FAMILY: Microsoft YaHei UI; COLOR: #000000; LINE-HEIGHT: 1.5
}
</STYLE>
<META name=GENERATOR content="MSHTML 11.00.10570.1001"></HEAD>
<BODY style="MARGIN: 10px">
<DIV>Yes, I think you are right. After adding some debug information, I can
confirm that it's probably an initialization issue with the ixgbe driver. </DIV>
<DIV>Secondary processes should initialize some callback functions, but they
seem to be missing. </DIV>
<DIV> </DIV>
<DIV>
<DIV>I made some minor modifications by moving the ixgbe_init_shared_code(hw)
position before the secondary processes.</DIV>
<DIV>While this brought about some changes, there still occurred a core
dump.</DIV>
<DIV>I suspect there might be other issues or that such modification might not
be appropriate.</DIV>
<DIV> </DIV>
<DIV>
<DIV>[root@xc03-compute3 /]# /dpdk/app/dpdk-dumpcap -i 0000:18:00.0</DIV>
<DIV>mlx5_net: Cannot attach mlx5 shared data</DIV>
<DIV>mlx5_net: Unable to init PMD global data: No such file or directory</DIV>
<DIV>mlx5_common: Failed to load driver mlx5_eth</DIV>
<DIV>EAL: Requested device 0000:3b:00.0 cannot be used</DIV>
<DIV>mlx5_net: Cannot attach mlx5 shared data</DIV>
<DIV>mlx5_net: Unable to init PMD global data: No such file or directory</DIV>
<DIV>mlx5_common: Failed to load driver mlx5_eth</DIV>
<DIV>EAL: Requested device 0000:3b:00.1 cannot be used</DIV>
<DIV>File: /tmp/dpdk-dumpcap_0_0000:18:00.0_20240314091910.pcapng</DIV>
<DIV>Capturing on '0000:18:00.0'</DIV>
<DIV style=>Packets captured: 2 Primary process is no longer active,
exiting...</DIV>
<DIV style=>EAL: Fail to recv reply for request
/var/run/dpdk/rte/mp_socket:mp_pdump</DIV>
<DIV style=>pdump_prepare_client_request(): client request for pdump
enable/disable failed</DIV>
<DIV>Floating point exception (core dumped)</DIV></DIV>
<DIV> </DIV></DIV>
<DIV><PRE style="WHITE-SPACE: pre-wrap; WORD-SPACING: 0px; TEXT-TRANSFORM: none; FONT-WEIGHT: 400; COLOR: rgb(0,0,0); FONT-STYLE: normal; ORPHANS: 2; WIDOWS: 2; LETTER-SPACING: normal; TEXT-INDENT: 0px; font-variant-ligatures: normal; font-variant-caps: normal; -webkit-text-stroke-width: 0px; text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial; overflow-wrap: break-word">diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c
index d6cf00317e77b64f9822c155115f388ae62241eb..0bf885d7eaba3689fb9b98cdcaa6a928aa787985 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.c
+++ b/drivers/net/ixgbe/ixgbe_ethdev.c
@@ -1104,6 +1104,24 @@ eth_ixgbe_dev_init(struct rte_eth_dev *eth_dev, void *init_params __rte_unused)
eth_dev->tx_pkt_burst = &ixgbe_xmit_pkts;
eth_dev->tx_pkt_prepare = &ixgbe_prep_pkts;
+ /* Vendor and Device ID need to be set before init of shared code */
+ hw->device_id = pci_dev->id.device_id;
+ hw->vendor_id = pci_dev->id.vendor_id;
+ hw->hw_addr = (void *)pci_dev->mem_resource[0].addr;
+ hw->allow_unsupported_sfp = 1;
+
+ /* Initialize the shared code (base driver) */
+#ifdef RTE_LIBRTE_IXGBE_BYPASS
+ diag = ixgbe_bypass_init_shared_code(hw);
+#else
+ diag = ixgbe_init_shared_code(hw);
+#endif /* RTE_LIBRTE_IXGBE_BYPASS */
+
+ if (diag != IXGBE_SUCCESS) {
+ PMD_INIT_LOG(ERR, "Shared code init failed: %d", diag);
+ return -EIO;
+ }
+
/*
* For secondary processes, we don't initialise any further as primary
* has already done this work. Only check we don't need a different
@@ -1135,24 +1153,6 @@ eth_ixgbe_dev_init(struct rte_eth_dev *eth_dev, void *init_params __rte_unused)
rte_eth_copy_pci_info(eth_dev, pci_dev);
eth_dev->data->dev_flags |= RTE_ETH_DEV_AUTOFILL_QUEUE_XSTATS;
- /* Vendor and Device ID need to be set before init of shared code */
- hw->device_id = pci_dev->id.device_id;
- hw->vendor_id = pci_dev->id.vendor_id;
- hw->hw_addr = (void *)pci_dev->mem_resource[0].addr;
- hw->allow_unsupported_sfp = 1;
-
- /* Initialize the shared code (base driver) */
-#ifdef RTE_LIBRTE_IXGBE_BYPASS
- diag = ixgbe_bypass_init_shared_code(hw);
-#else
- diag = ixgbe_init_shared_code(hw);
-#endif /* RTE_LIBRTE_IXGBE_BYPASS */
-
- if (diag != IXGBE_SUCCESS) {
- PMD_INIT_LOG(ERR, "Shared code init failed: %d", diag);
- return -EIO;
- }
-
if (hw->mac.ops.fw_recovery_mode && hw->mac.ops.fw_recovery_mode(hw)) {
PMD_INIT_LOG(ERR, "\nERROR: "
"Firmware recovery mode detected. Limiting functionality.\n"</PRE></DIV>
<DIV> </DIV>
<DIV> </DIV>
<DIV>Additionally, I'm using a debug build, but the printed call stack still
doesn't feel clear enough, which is quite strange. </DIV>
<DIV> </DIV>
<DIV>
<DIV> meson -Dc_args="-mno-avx512f"
-Ddisable_drivers=net/ark,net/atlantic,net/avp,net/axgbe,net/pfe,net/netvsc
-Dmax_numa_nodes=8 -Dmax_ethports=128 --buildtype=debug --optimization=0 build
</DIV>
<DIV> ninja -C build install</DIV></DIV>
<DIV> </DIV>
<HR style="HEIGHT: 1px; WIDTH: 210px" align=left color=#b5c4df SIZE=1>
<DIV><SPAN>
<DIV style="FONT-SIZE: 10pt; FONT-FAMILY: verdana; MARGIN: 10px">
<DIV>junwang01@cestc.cn</DIV></DIV></SPAN></DIV>
<DIV> </DIV>
<DIV
style="BORDER-TOP: #b5c4df 1pt solid; BORDER-RIGHT: medium none; BORDER-BOTTOM: medium none; PADDING-BOTTOM: 0cm; PADDING-TOP: 3pt; PADDING-LEFT: 0cm; BORDER-LEFT: medium none; PADDING-RIGHT: 0cm">
<DIV
style="FONT-SIZE: 12px; FONT-FAMILY: tahoma; BACKGROUND: #efefef; COLOR: #000000; PADDING-BOTTOM: 8px; PADDING-TOP: 8px; PADDING-LEFT: 8px; PADDING-RIGHT: 8px">
<DIV><B>From:</B> <A href="mailto:stephen@networkplumber.org">Stephen
Hemminger</A></DIV>
<DIV><B>Date:</B> 2024-03-14 00:29</DIV>
<DIV><B>To:</B> <A
href="mailto:junwang01@cestc.cn">junwang01@cestc.cn</A></DIV>
<DIV><B>CC:</B> <A href="mailto:dev@dpdk.org">dev</A></DIV>
<DIV><B>Subject:</B> Re: dumpcap coredump for 82599 NIC</DIV></DIV></DIV>
<DIV>
<DIV>On Wed, 13 Mar 2024 10:00:17 +0800</DIV>
<DIV>"junwang01@cestc.cn" <junwang01@cestc.cn> wrote:</DIV>
<DIV> </DIV>
<DIV>> Hi, when I use dumpcap to capture packets on the 82559 network card,
coredump appears. </DIV>
<DIV>> The network card bound to ovs-dpdk is 82599, but when I capture
packets in other non-82599 network cards (mellanox CX5/C6 or Intel's E810), it
is normal. ,</DIV>
<DIV>> the dpdk version I am using is 22.11.1, but I see that the call stack
is strange, so I am asking you for help. </DIV>
<DIV>> </DIV>
<DIV>> </DIV>
<DIV>> </DIV>
<DIV>> </DIV>
<DIV>> </DIV>
<DIV>> I thought the new version of dpdk might solve it, so I upgraded the
dpdk version to 23.11, but the problem is still the same, but the call stack is
different and weirder. </DIV>
<DIV>> </DIV>
<DIV>> </DIV>
<DIV>> </DIV>
<DIV>> </DIV>
<DIV>> </DIV>
<DIV>> </DIV>
<DIV>> junwang01@cestc.cn</DIV>
<DIV> </DIV>
<DIV>This is not an issue with dumpcap. The problem is in ixgbe driver.</DIV>
<DIV>Some part of the code for checking link status is not safe to be called
in</DIV>
<DIV>secondary process.</DIV>
<DIV> </DIV>
<DIV>The backtrace looks a bit messed up, since ixgbe driver should not be
calling i40e code.</DIV>
<DIV>Maybe do a debug build (so more complete symbols available).</DIV>
<DIV> </DIV>
<DIV> </DIV></DIV></BODY></HTML>