[dpdk-dev] [Bug 533] stack corruption in mlx5_xstats_reset when number of stats changes
bugzilla at dpdk.org
bugzilla at dpdk.org
Tue Sep 1 09:57:25 CEST 2020
https://bugs.dpdk.org/show_bug.cgi?id=533
Bug ID: 533
Summary: stack corruption in mlx5_xstats_reset when number of
stats changes
Product: DPDK
Version: 20.08
Hardware: All
OS: All
Status: UNCONFIRMED
Severity: normal
Priority: Normal
Component: ethdev
Assignee: dev at dpdk.org
Reporter: ralf.hoffmann at allegro-packets.com
Target Milestone: ---
Created attachment 120
--> https://bugs.dpdk.org/attachment.cgi?id=120&action=edit
Patch to fix stack corruption
I stumbled across a crash in the mellanox driver due to stack corruption.
The function mlx5_xstats_reset() uses a dynamically sized array on the stack
based on the last number of statistic values. Then it queries the actual number
of stats. If the returned value is larger than the previous number, it will
overwrite the stack frame including the return addresses usually resulting in a
crash.
In my case the initial value of xstats_ctrl->mlx5_stats_n was zero and the
return value of mlx5_os_get_stats_n() was 24 so it overwrote 96 bytes on the
stack. The problem became visible after the update to 20.08. Apparently some
statistic code has been refactored which now triggers this problem in our code
but the actual problem in the reset function exists for a long time. We are
calling rte_eth_xstats_reset() before calling rte_eth_xstats_get() to get the
available statistics from the driver.
There are two problems in this function:
1. The array is created not based on the actual number of items, but the
previously known number of items.
2. The function mlx5_os_read_dev_counters() just gets a pointer to the array
without knowing its size making it unsafe to use.
I have attached a patch which works for me by using alloca instead. Since this
is not completely portable, another approach can be using a separate scope
block for the dynamically sized array.
If it matters, the card we used is MCX516A-CDAT
--
You are receiving this mail because:
You are the assignee for the bug.
More information about the dev
mailing list