[PATCH] net/mlx5: fix spurious CPU wakeups caused by invalid CQE

Tummala, Sivaprasad Sivaprasad.Tummala at amd.com
Tue Nov 18 04:40:25 CET 2025


[AMD Official Use Only - AMD Internal Distribution Only]

Hi Alexander,

________________________________
From: Alexander Kozyrev <akozyrev at nvidia.com>
Sent: Tuesday, November 18, 2025 1:35 AM
To: Tummala, Sivaprasad <Sivaprasad.Tummala at amd.com>; Dariusz Sosnowski <dsosnowski at nvidia.com>; Slava Ovsiienko <viacheslavo at nvidia.com>
Cc: jerinj at marvell.com <jerinj at marvell.com>; kirankumark at marvell.com <kirankumark at marvell.com>; ndabilpuram at marvell.com <ndabilpuram at marvell.com>; yanzhirun_163 at 163.com <yanzhirun_163 at 163.com>; david.marchand at redhat.com <david.marchand at redhat.com>; ktraynor at redhat.com <ktraynor at redhat.com>; NBU-Contact-Thomas Monjalon (EXTERNAL) <thomas at monjalon.net>; konstantin.ananyev at huawei.com <konstantin.ananyev at huawei.com>; konstantin.v.ananyev at yandex.ru <konstantin.v.ananyev at yandex.ru>; bruce.richardson at intel.com <bruce.richardson at intel.com>; maxime.coquelin at redhat.com <maxime.coquelin at redhat.com>; anatoly.burakov at intel.com <anatoly.burakov at intel.com>; aconole at redhat.com <aconole at redhat.com>; dev at dpdk.org <dev at dpdk.org>; stable at dpdk.org <stable at dpdk.org>
Subject: Re: [PATCH] net/mlx5: fix spurious CPU wakeups caused by invalid CQE

Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.

>>> Fixes: a8f0df6bf98d ("net/mlx5: support power monitoring")
>>> Cc: akozyrev at nvidia.com
>>> Cc: stable at dpdk.org
>>>
>>> Signed-off-by: Sivaprasad Tummala <sivaprasad.tummala at amd.com>
>>> ---
>>>  drivers/net/mlx5/mlx5_rx.c | 17 ++++++++++++++++-
>>>  1 file changed, 16 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/net/mlx5/mlx5_rx.c b/drivers/net/mlx5/mlx5_rx.c
>>> index 420a03068d..2765b4b730 100644
>>> --- a/drivers/net/mlx5/mlx5_rx.c
>>> +++ b/drivers/net/mlx5/mlx5_rx.c
>>> @@ -295,6 +295,20 @@ mlx5_monitor_callback(const uint64_t value,
>>>       return (value & m) == v ? -1 : 0;
>>>  }
>>>
>>> +static int
>>> +mlx5_monitor_cqe_own_callback(const uint64_t value,
>>> +             const uint64_t opaque[RTE_POWER_MONITOR_OPAQUE_SZ])
>>> +{
>>> +     const uint64_t m = opaque[CLB_MSK_IDX];
>>> +     const uint64_t v = opaque[CLB_VAL_IDX];
>>> +     const uint64_t match = ((value & m) == v);
>>
>> Could you please rename "match" variable to "sw_owned"?
>> This name would better relay the meaning of the checked condition that
>> CQE owner bit value signifies that CQE is SW owned.
>ACK! Will update this in v2.
>>
>>> +     const uint64_t opcode = MLX5_CQE_OPCODE(value);
>>> +     const uint64_t valid_op = (opcode ^ MLX5_CQE_INVALID);
>>
>>IMO the usage of bit operations here (although logic is correct) is a bit confusing.
>>Could you rewrite it in terms of logical operations so it's easier to
>>follow? For example like this:
>>
>>        const uint64_t valid_op = opcode != MLX5_CQE_INVALID
>>
>>        return (sw_owned && valid_op) ? -1 : 0;
>>
>>This also would properly describe in code the required condition:
>>CQE can be parsed by SW if and only if owner bit is "SW owned" and CQE
>>opcode is valid.
>ACK! Will update this in v2.
>>
>>> +
>>> +     /* ownership bit is not valid for invalid opcode; CQE is HW owned */
>>> +     return -(match & valid_op);
>>> +}
>>> +
>>> int mlx5_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
>>>  {
>>>       struct mlx5_rxq_data *rxq = rx_queue;
>>> @@ -312,12 +326,13 @@ int mlx5_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
>>>               pmc->addr = &cqe->validity_iteration_count;
>>>               pmc->opaque[CLB_VAL_IDX] = vic;
>>>               pmc->opaque[CLB_MSK_IDX] = MLX5_CQE_VIC_INIT;
>>> +             pmc->fn = mlx5_monitor_callback;
>>
>>Alex, Slava: Just to double check - in case of enhanced CQE compression
>>layout, should both CQE opcode and vic be checked?
>>Right now only vic is checked in power monitor callback for that case.
>>In Rx datapath both are checked to determine CQE ownership:
>>https://github.com/DPDK/dpdk/blob/main/drivers/common/mlx5/mlx5_common.h#L277
>
>Sorry for the late reply. I think we should check opcode in both cases.
>mlx5_monitor_callback can be updated with the opcode check for enhanced CQE compression layout,
>instead of having 2 separate callback functions. Could you please prepare a follow-up patch for that?
Ok, I can extend this patch to also cover for enhanced CQE compression case as well.
Right now, the new call back was added to avoid additional checks in the older callback function.
I can rework on this as needed.
>>
>>>       } else {
>>>               pmc->addr = &cqe->op_own;
>>>               pmc->opaque[CLB_VAL_IDX] = !!idx;
>>>               pmc->opaque[CLB_MSK_IDX] = MLX5_CQE_OWNER_MASK;
>>> +             pmc->fn = mlx5_monitor_cqe_own_callback;
>>>       }
>>> -     pmc->fn = mlx5_monitor_callback;
>>>       pmc->size = sizeof(uint8_t);
>>>       return 0;
>>>  }
>>> --
>>> 2.43.0
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mails.dpdk.org/archives/stable/attachments/20251118/aadb64a9/attachment.htm>


More information about the stable mailing list