[dpdk-dev] Question about unsupported transceivers

Alexander Duyck alexander.duyck at gmail.com
Mon Oct 19 17:08:33 CEST 2015


On 10/18/2015 06:06 PM, Alex Forster wrote:
> On 10/15/15, 3:53 PM, "Alexander Duyck" <alexander.duyck at gmail.com> wrote:
>
>
>>>> It looks like you are probably seeing interfaces be unbound and then
>>>> rebound.  As such you are likely pushing things outside of the array
>>>> boundary.  One solution might just be to at more ",1"s if you are only
>>>> going to be doing this kind of thing at boot up.  The upper limit for
>>>> the array is 32 entries so as long as you only are setting this up once
>>>> you could probably get away with that.
>>>>
>>>> An alternative would be to modify the definition of the parameter in
>>>> ixgbe_param.c.  If you look through the file you should fine several
>>>> likes like below:
>>>> 	struct ixgbe_option opt = {
>>>> 			.type = enable_option,
>>>> 			.name = "allow_unsupported_sfp",
>>>> 			.err  = "defaulting to Disabled",
>>>> 			.def  = OPTION_DISABLED
>>>> 		};
>>>>
>>>> If you modify the .def value to "OPTION_ENABLED", and then rebuild and
>>>> reinstall your driver you should be able have it install without any
>>>> issues.
>>>>
>>>> - Alex
>>>>
>>> Yeah, I've had roughly the same thought process since you mentioned the
>>> args array. My first idea was "maybe the driver can't fit all of my 1's"
>>> but I saw it was defined at 32. Then I decided to just patch the whole
>>> enable_unsupported_sfp option out
>>> https://gist.github.com/AlexForster/112fd822704caf804849 but I'm still
>>> failing.
>>>
>>> I've been digging a bit, and I'm failing here in ixgbe_main.c...
>>>
>>> /* reset_hw fills in the perm_addr as well */
>>> hw->phy.reset_if_overtemp = true;
>>> err = hw->mac.ops.reset_hw(hw);
>>> hw->phy.reset_if_overtemp = false;
>>> if (err == IXGBE_ERR_SFP_NOT_PRESENT) {
>>> 	err = IXGBE_SUCCESS;
>>> } else if (err == IXGBE_ERR_SFP_NOT_SUPPORTED) {
>>> 	e_dev_err("failed to load because an unsupported SFP+ or QSFP "
>>> 		  "module type was detected.\n");
>>> 	e_dev_err("Reload the driver after installing a supported "
>>> 		  "module.\n");
>>> 	goto err_sw_init;
>>> } else if (err) {
>>> 	e_dev_err("HW Init failed: %d\n", err);
>>> 	goto err_sw_init;
>>> }
>>>
>>>
>>> I've attempted a hand-stacktrace and came up with the following...
>>>
>>> ixgbe_82599.c at 1016
>>>    * ixgbe_reset_hw_82599() is defined
>>>    * calls phy->ops.init() which potentially returns
>>> IXGBE_ERR_SFP_NOT_SUPPORTED
>>>
>>> ixgbe_82599.c at 102
>>>    * ixgbe_init_phy_ops_82599() is defined
>>>    * IXGBE_ERR_SFP_NOT_SUPPORTED is returned after calling
>>> phy->ops.identify()
>>>
>>> ixgbe_82599.c at 2085
>>>    * ixgbe_identify_phy_82599() is defined
>>>    * calls ixgbe_identify_module_generic()
>>>
>>> ixgbe_phy.c at 1281
>>>    * ixgbe_identify_module_generic() is defined
>>>    * calls ixgbe_identify_qsfp_module_generic()
>>>
>>> ixgbe_phy.c at 1663
>>>    * ixgbe_identify_qsfp_module_generic() is defined
>>>    * We fail somewhere before the ending call to ixgbe_get_device_caps()
>>> which does take allow_unsupported_sfp into account
>>>
>>>    * Possibility: hw->phy.ops.read_i2c_eeprom(hw, IXGBE_SFF_IDENTIFIER,
>>> &identifier) != IXGBE_SFF_IDENTIFIER_QSFP_PLUS
>>>    * Possibility: active_cable != true
>>>
>>> And then I'm over my head. Should I assume from here that the most
>>> likely
>>> explanation is a bad transceiver or bad fiber?
>>>
>>> Alex Forster
>>
>> Are you able to swap transceiver or fiber between the 4 ports that work
>> and the 4 that don't?  If you could do that then you should be able to
>> tell if the issue is following the NIC ports, or if it is an issue with
>> the external connections.  If it is issue is following the transceiver
>> or fiber then it is probably what is causing the issue.
>>
>> The other thing you could try doing is adding a printk to the spots
>> where the status is set to SFP_NOT_SUPPORTED so that you could figure
>> out exactly which spot is triggering the rejection of the module.
>>
>> - Alex
>
> I had remote hands swap fibers on the QSFP side and the issue moved to the
> first card, so I'm going to have the fibers cleaned and tested. This
> appears to be my issue.
>
> I'd like to submit a patch for ixgbe_identify_qsfp_module_generic() to
> print more helpful errors in the two cases mentioned above, so that
> hopefully nobody ever has to deal with this again. Would I be wasting my
> time, or does something like this have a likelihood of being accepted?
>
> Thank you for all of your help! I wouldn't have figured this out nearly as
> quickly without it.
>
> Alex Forster

I suspect there would be some value to such a patch, just make sure to 
explain the reason for needing it in the patch description.  My advice 
would be to put such a patch together against what is in Jeff Kirsher's 
next queue 
(https://git.kernel.org/cgit/linux/kernel/git/jkirsher/next-queue.git/) 
and then base your patches off of that.  The email lists for submitting 
patches to is intel-wired-lan at lists.osuosl.org and
netdev at vger.kernel.org.

- Alex
- Alex



More information about the dev mailing list