dpdk mlx5 driver crash in rxq_cq_decompress_v

Matan Azrad matan at nvidia.com
Mon Jul 3 05:35:14 CEST 2023


+ @Alexander Kozyrev<mailto:akozyrev at nvidia.com> to suggest.

קבל ‏Outlook עבור Android‏<https://aka.ms/AAb9ysg>
________________________________
From: Xiaoping Yan (NSB) <xiaoping.yan at nokia-sbell.com>
Sent: Monday, July 3, 2023 4:18:22 AM
To: users at dpdk.org <users at dpdk.org>; Matan Azrad <matan at nvidia.com>; dekelp at nvidia.com <dekelp at nvidia.com>
Subject: RE: dpdk mlx5 driver crash in rxq_cq_decompress_v

External email: Use caution opening links or attachments


Hi,



@'dekelp at nvidia.com'<mailto:dekelp at nvidia.com>@'Matan Azrad'<mailto:matan at nvidia.com> Can you kindly suggest?

Thank you.



Br, Xiaoping



From: Xiaoping Yan (NSB)
Sent: 2023年6月27日 12:11
To: users at dpdk.org; 'Matan Azrad' <matan at nvidia.com>; 'dekelp at nvidia.com' <dekelp at nvidia.com>
Subject: dpdk mlx5 driver crash in rxq_cq_decompress_v



Hi,



dpdk version in use: 21.11.2



Mlx5 driver crashes in rxq_cq_decompress_v in traffic test after several minutes.

Stack trace:

(gdb) bt

#0  0x00007ffff58612bc in _mm_storeu_si128 (__B=..., __P=<optimized out>)

    at /usr/lib/gcc/x86_64-redhat-linux/12/include/emmintrin.h:739

#1  rxq_cq_decompress_v (rxq=rxq at entry=0x2abe5592f40, cq=cq at entry=0x2abe54fdb00, elts=elts at entry=0x2abe5594638)

    at ../dpdk-21.11/drivers/net/mlx5/mlx5_rxtx_vec_sse.h:142

#2  0x00007ffff5862c84 in rxq_burst_v (no_cq=<synthetic pointer>, err=0x7fffffffb848, pkts_n=4, pkts=<optimized out>,

    rxq=0x2abe5592f40) at ../dpdk-21.11/drivers/net/mlx5/mlx5_rxtx_vec.c:349

#3  mlx5_rx_burst_vec (dpdk_rxq=0x2abe5592f40, pkts=0x7fffffffbf80, pkts_n=32) at ../dpdk-21.11/drivers/net/mlx5/mlx5_rxtx_vec.c:393

#4  0x00005555556a0f41 in rte_eth_rx_burst (nb_pkts=32, rx_pkts=0x7fffffffbf80, queue_id=0, port_id=1)

    at /usr/include/rte_ethdev.h:5721

…

Attached is the error log “Unexpected CQE error syndrome…” and dump file



I found there was a similar bug here: https://bugs.dpdk.org/show_bug.cgi?id=334

But the fix (88c0733535d6 extend Rx completion with error handling) should already been included, as I’m using 21.11.2

Also below commit (fix to 88c0733535d6) is already included in my dpdk version.

commit 60b254e3923d007bcadbb8d410f95ad89a2f13fa

Author: Matan Azrad matan at nvidia.com<mailto:matan at nvidia.com>

Date:   Thu Aug 11 19:51:55 2022 +0300



    net/mlx5: fix Rx queue recovery mechanism



Any suggestion?

Thank you.



Br, Xiaoping


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mails.dpdk.org/archives/users/attachments/20230703/dc910387/attachment-0001.htm>


More information about the users mailing list