[Help needed] net_ice: MDD event (Malicious Driver Detection) on TX queue when using rte_eth_tx_prepare / rte_eth_tx_burst
    Bruce Richardson 
    bruce.richardson at intel.com
       
    Thu Aug 28 17:57:05 CEST 2025
    
    
  
On Wed, Aug 27, 2025 at 08:52:26AM +0800, Doraemon wrote:
>    Hello DPDK / net_ice maintainers,
> 
>    We are seeing a reproducible and concerning issue when using the
>    net_ice PMD with DPDK 22.11.2, and we would appreciate your help
>    diagnosing it.
>    Summary
>    - Environment:
>    - DPDK: 22.11.2
>    - net_ice PCI device: 8086:159b
>    - ice kernel driver: 1.12.7
>    - NIC firmware: FW 7.3.6111681 (NVM 4.30)
>    - IOVA mode: PA, VFIO enabled
>    - Multi-process socket: /var/run/dpdk/PGW/mp_socket
>    - NUMA: 2, detected lcores: 112
>    - Bonding: pmd_bond with bonded devices created (net_bonding0 on port
>    4, net_bonding1 on port 5)
>    - Driver enabled AVX2 OFFLOAD Vector Tx (log shows
>    "ice_set_tx_function(): Using AVX2 OFFLOAD Vector Tx")
>    - Problem statement:
>    - Our application calls rte_eth_tx_prepare before calling
>    rte_eth_tx_burst as part of the normal transmission path.
>    - After the application has been running for some time (not immediate),
>    the kernel/driver emits the following messages repeatedly:
>    - ice_interrupt_handler(): OICR: MDD event
>    - ice_interrupt_handler(): Malicious Driver Detection event 3 by TCLAN
>    on TX queue 1025 PF# 1
>    - We are using a single TX queue (application-level single queue) and
>    are sending only one packet per burst (burst size = 1).
>    - The sequence is: rte_eth_tx_prepare (returns) -> rte_eth_tx_burst ->
>    MDD events occur later.
>    - The events affect stability and repeat over time.
>    Relevant startup logs (excerpt)
>    EAL: Detected CPU lcores: 112
>    EAL: Detected NUMA nodes: 2
>    EAL: Selected IOVA mode 'PA'
>    EAL: VFIO support initialized
>    EAL: Probe PCI driver: net_ice (8086:159b) device: 0000:3b:00.1 (socket
>    0)
>    ice_load_pkg_type(): Active package is: 1.3.45.0, ICE COMMS Package
>    (double VLAN mode)
>    ice_dev_init(): FW 7.3.6111681 API 1.7
>    ...
>    bond_probe(3506) - Initializing pmd_bond for net_bonding0
>    bond_probe(3592) - Create bonded device net_bonding0 on port 4 in mode
>    1 on socket 0.
>    ...
>    ice_set_tx_function(): Using AVX2 OFFLOAD Vector Tx (port 0).
>    TELEMETRY: No legacy callbacks, legacy socket not created
>    What we have tried / preliminary observations
>    - Confirmed application calls rte_eth_tx_prepare prior to
>    rte_eth_tx_burst.
>    - Confirmed single TX queue configuration and small bursts (size = 1)
>    �� not high-rate, not a typical high-burst/malicious pattern.
>    - The MDD log identifies "TX queue 1025";  unclear how that maps to our
>    DPDK queue numbering (we use queue 0 in the app).
>    - No obvious other DPDK errors at startup;  interface initializes
>    normally and vector TX is enabled.
>    - We suspect the driver's Malicious Driver Detection (MDD) is
>    triggering due to some descriptor/doorbell ordering or offload
>    interaction, possibly related to AVX2 Vector Tx offload.
>    Questions / requests to the maintainers
>    1.  What specifically triggers "MDD event 3 by TCLAN" in net_ice?
>    Which driver check/threshold corresponds to event type 3?
>    2.  How is the "TX queue 1025" value computed/mapped in the log?  (Is
>    it queue id + offset, VF mapping, or an internal vector id?)  We need
>    to map that log value to our DPDK queue index.
>    3.  Can the rte_eth_tx_prepare + rte_eth_tx_burst call pattern cause
>    MDD detections under any circumstances?  If so, are there recommended
>    usage patterns or ordering constraints to avoid false positives?
>    4.  Are there known firmware/driver/DPDK version combinations with
>    similar MDD behavior?  Do you recommend specific NIC firmware, kernel
>    driver, or DPDK versions as a workaround/fix?
>    5.  Any suggested workarounds we can test quickly (e.g., disable vector
>    TX offload, disable specific HW offloads, change interrupt/queue
>    bindings, or adjust doorbell behavior)?
While I've not come across this particular issue before in the past, one
immediate suggestion might be to try the latest point release of 22.11,
updating from 22.11.2 to 22.11.9. Checking the diffs, I see that there were
some changes made to the ice_prep_pkts() function between those two
releases. Perhaps those changes may help here.
Regards,
/Bruce
    
    
More information about the dev
mailing list