[PATCH v4 16/20] net/txgbe: fix SFP module identification
Stephen Hemminger
stephen at networkplumber.org
Mon May 18 01:52:25 CEST 2026
On Mon, 11 May 2026 18:35:58 +0800
Zaiyu Wang <zaiyuwang at trustnetic.com> wrote:
> Some optical modules were not correctly recognized due to ambiguous
> classification in the original detection flow. Rework the module
> identification logic to cover all module types. Also narrow the
> I2C lock scope to avoid potential race conditions during module
> access.
>
> Fixes: ab191e6d9189 ("net/txgbe: support new SFP/QSFP modules")
> Cc: stable at dpdk.org
>
> Signed-off-by: Zaiyu Wang <zaiyuwang at trustnetic.com>
> ---
Lots of AI feedback here.
Note: not all this is required; like the request for extra release notes is AI being picky.
# Review of Patch 16/20: net/txgbe: fix SFP module identification
## Summary
This patch refactors SFP/QSFP module identification logic and adjusts I2C locking scope. The changes address module type detection and claim to fix race conditions.
---
## Errors
### 1. Resource leak in `txgbe_identify_sfp_module()` - multiple error paths missing unlock
**Error paths that acquire lock but fail to release on error:**
At line 836, `hw->mac.acquire_swfw_sync()` acquires `TXGBE_MNGSEM_SWPHY`. Several error paths after this point fail to release the lock:
```c
err = hw->phy.read_i2c_eeprom(hw, TXGBE_SFF_IDENTIFIER, &identifier);
if (err != 0) {
/* ... */
hw->mac.release_swfw_sync(hw, TXGBE_MNGSEM_SWPHY); // OK - releases
return TXGBE_ERR_SFP_NOT_PRESENT;
}
if (identifier != TXGBE_SFF_IDENTIFIER_SFP) {
hw->phy.type = txgbe_phy_sfp_unsupported;
hw->mac.release_swfw_sync(hw, TXGBE_MNGSEM_SWPHY); // OK - releases
return TXGBE_ERR_SFP_NOT_SUPPORTED;
}
err = hw->phy.read_i2c_eeprom(hw, ...);
/* Multiple read_i2c_eeprom calls follow */
```
**After line 894**, there are at least **4 read_i2c_eeprom() calls** (lines ~895-930 in the context) that can fail with `goto out;` - but `goto out;` is removed in this patch. If any of these reads fail now, the function returns immediately **without releasing the lock acquired at line 836**.
Example at line 1000:
```c
err = hw->phy.read_i2c_eeprom(hw, TXGBE_SFF_CABLE_SPEC_COMP, &cable_spec);
if (err != 0)
goto out; /* This label is REMOVED - now falls through or returns */
```
The old code had `out:` label that might have handled cleanup. The new code removes this pattern but doesn't add `hw->mac.release_swfw_sync()` calls before each early return.
**Impact:** Lock held indefinitely on I2C read failure, blocking other threads.
**Fix:** Add lock release before every early return after line 836:
```c
err = hw->phy.read_i2c_eeprom(hw, TXGBE_SFF_CABLE_SPEC_COMP, &cable_spec);
if (err != 0) {
hw->mac.release_swfw_sync(hw, TXGBE_MNGSEM_SWPHY);
goto out;
}
```
---
### 2. `txgbe_write_i2c_byte()` removes STOP condition - probable hardware protocol violation
**Line 1299 (old) vs 1299 (new):**
Old code:
```c
wr32(hw, TXGBE_I2CDATA, byte_offset | TXGBE_I2CDATA_STOP);
```
New code:
```c
wr32(hw, TXGBE_I2CDATA, byte_offset);
```
The `TXGBE_I2CDATA_STOP` flag is removed from the byte_offset write but remains only on the data write. This changes the I2C transaction sequence:
- Old: Write byte_offset with STOP, write data with STOP
- New: Write byte_offset (no STOP), write data with WRITE+STOP (via `txgbe_i2c_stop()`)
**Analysis:** I2C writes to EEPROMs typically follow: START → device_addr+W → ACK → byte_offset → ACK → data → ACK → STOP. Removing STOP from the byte_offset phase may be correct if the hardware requires a combined write transaction, **but this is a functional change to I2C timing that is not explained in the commit message** and could break module writes.
**Impact:** Could cause I2C write failures or data corruption on SFP/QSFP EEPROMs.
**Fix:** Either:
1. Verify via hardware datasheet that single STOP at end is correct, document in commit message
2. Restore `TXGBE_I2CDATA_STOP` on byte_offset write
---
### 3. Removed validation logic without replacement - SFP vendor checking deleted
**Lines 1025-1070 (old code, removed):** The old `txgbe_identify_qsfp_module()` read vendor OUI bytes, checked for Intel vendor, and enforced "allow_any_sfp" policy. This entire block is deleted with no equivalent in the new code.
**Impact:**
- Unsupported vendor modules are now silently accepted (security/quality risk)
- `hw->allow_unsupported_sfp` flag is no longer checked (user configuration ignored)
- Warning messages for untested modules removed (users not informed of risks)
**This is a functional regression unless the validation moved elsewhere** (not evident in this patch).
**Fix:** Restore vendor OUI check or document why it's safe to remove.
---
## Warnings
### 1. `txgbe_identify_qsfp_module()` GPIO config runs unconditionally
**Lines 1098-1099:**
```c
/* config GPIO before read i2c */
wr32(hw, TXGBE_GPIODATA, TXGBE_GPIOBIT_1);
```
This GPIO write now runs **even when media_type is not fiber_qsfp** (the media check is after the GPIO write at lines 1108-1111). The old code had this GPIO write inside the `if (hw->mac.type == txgbe_mac_aml40)` block, after the media check.
**Impact:** Unnecessary hardware access, potential side effects on non-QSFP ports.
**Fix:** Move GPIO config inside the `if (hw->mac.type == txgbe_mac_aml40)` block or add media_type check first.
---
### 2. Missing error propagation - `txgbe_read_i2c_sff8636()` page select failure ignored
**Line 1215:**
```c
hw->phy.write_i2c_byte(hw, TXGBE_SFF_QSFP_PAGE_SELECT,
TXGBE_I2C_EEPROM_DEV_ADDR,
page);
```
The return value from `write_i2c_byte()` is discarded. If page select fails, the subsequent read at line 1218 will read from the **wrong page**, producing incorrect module data.
**Fix:**
```c
s32 err = hw->phy.write_i2c_byte(hw, TXGBE_SFF_QSFP_PAGE_SELECT,
TXGBE_I2C_EEPROM_DEV_ADDR, page);
if (err != 0)
return err;
```
---
### 3. Implicit comparison on pointer
**Line 1123 (in `txgbe_identify_qsfp_module`):**
```c
if (err)
return -EBUSY;
```
Should be explicit:
```c
if (err != 0)
return -EBUSY;
```
---
### 4. Release notes likely required
This patch changes module detection logic and affects supported module types. API behavior changes (vendor validation removed, new module type support) typically require release notes under "Fixed Issues" or "Known Issues" if compatibility changes exist.
---
### 5. Function pointer assignments removed but functions still defined
**Lines 2912-2913 (txgbe_hw.c):**
```c
phy->read_i2c_byte_unlocked = txgbe_read_i2c_byte_unlocked;
phy->write_i2c_byte_unlocked = txgbe_write_i2c_byte_unlocked;
```
These assignments are removed, but the commit message doesn't indicate whether the `*_unlocked` functions are deleted or just unused. If the functions remain defined but unreferenced, they are dead code. If other drivers reference `phy->read_i2c_byte_unlocked` as a function pointer, this breaks them.
---
## Info
### 1. Simplified I2C locking approach is clearer
The removal of separate `*_unlocked` variants and the move to caller-managed locking (acquire once, multiple reads, release) is a good pattern that reduces lock churn.
---
### 2. New SFP type handling for 25G/40G modules
The patch adds proper detection for:
- 25G AOC modules (`txgbe_sfp_type_25g_aoc_core0/1`)
- 40G QSFP types with Ethernet compliance byte checks
This appears to address the "ambiguous classification" mentioned in the commit message.
---
### 3. Consistent use of `read_i2c_sff8636()` helper
The new `txgbe_read_i2c_sff8636()` helper centralizes page-select logic for QSFP modules, which is good for maintainability (despite the error handling issue noted above).
---
## Final Check Summary
**Errors (must fix):**
1. Lock leaks in `txgbe_identify_sfp_module()` on I2C read errors after line 836
2. `TXGBE_I2CDATA_STOP` removal in `txgbe_write_i2c_byte()` changes I2C protocol, not explained/verified
3. Vendor OUI validation removed without replacement - security/quality regression
**Warnings (should fix):**
1. GPIO config runs before media_type check
2. Page select error ignored in `txgbe_read_i2c_sff8636()`
3. Implicit comparison `if (err)`
4. Release notes likely needed for behavior changes
5. Unused function pointer assignments removed - check for dead code
More information about the stable
mailing list