[dpdk-dev] Question about unsupported transceivers

Alexander Duyck alexander.duyck at gmail.com
Thu Oct 15 21:53:02 CEST 2015


On 10/15/2015 10:13 AM, Alex Forster wrote:
> On 10/15/15, 12:17 PM, "Alexander Duyck" <alexander.duyck at gmail.com> wrote:
>
>
>> On 10/15/2015 08:43 AM, Alex Forster wrote:
>>> On 10/15/15, 11:30 AM, "Alexander Duyck" <alexander.duyck at gmail.com>
>>> wrote:
>>>
>>>> On 10/15/2015 07:46 AM, Alex Forster wrote:
>>>>> On 10/13/15, 4:34 PM, "Alexander Duyck" <alexander.duyck at gmail.com>
>>>>> wrote:
>>>>>
>>>>>> If you are using Intel's out-of-tree ixgbe driver I believe the
>>>>>> module
>>>>>> parameters are comma separated with one index per port.  So if you
>>>>>> have
>>>>>> two ports you should be passing "allow_unsupported_sfp=1,1", and for
>>>>>> 4
>>>>>> you would need four '1's.
>>>>> This seemed very promising. I compiled and installed the out of tree
>>>>> ixgbe
>>>>> driver and set the option in /etc/modprobe.d/ixgbe.conf. dmesg shows
>>>>> all
>>>>> eight "allow_unsupported_sfp enabled" messages but the last four ports
>>>>> still error out with the unsupported SFP message when running the
>>>>> tests.
>>>>>
>>>>> Before I start arbitrarily trying to patch out parts of the SFP
>>>>> verification code in ixgbe, are there any other tips I should know?
>>>> Can you send me the command you used to load the module, and the exact
>>>> number of ixgbe ports you have in the system?  With that I could then
>>>> verify that the command was entered correctly as it is possible there
>>>> could still be an issue in the way the command was entered.
>>>>
>>>> One other possibility is that when the driver loads each load counts as
>>>> an instance in the module parameter array.  So if for example you
>>>> unbind
>>>> the driver on one port and then later rebind it you will have consumed
>>>> one of the values in the array.  Do it enough times and you exceed the
>>>> bounds of the array as you entered it and it will simply use the
>>>> default
>>>> value of 0.
>>>>
>>>> Also the output of "ethtool -i <ethX>" would be useful to verify that
>>>> you have the out-of-tree driver loaded and not the in kernel.
>>>>
>>>> - Alex
>>>>
>>> Er, let me try that again.
>>>
>>> https://gist.github.com/AlexForster/f5372c5b60153d278089
>>>
>>>
>>> Alex Forster
>>>
>>>
>> It looks like you are probably seeing interfaces be unbound and then
>> rebound.  As such you are likely pushing things outside of the array
>> boundary.  One solution might just be to at more ",1"s if you are only
>> going to be doing this kind of thing at boot up.  The upper limit for
>> the array is 32 entries so as long as you only are setting this up once
>> you could probably get away with that.
>>
>> An alternative would be to modify the definition of the parameter in
>> ixgbe_param.c.  If you look through the file you should fine several
>> likes like below:
>> 	struct ixgbe_option opt = {
>> 			.type = enable_option,
>> 			.name = "allow_unsupported_sfp",
>> 			.err  = "defaulting to Disabled",
>> 			.def  = OPTION_DISABLED
>> 		};
>>
>> If you modify the .def value to "OPTION_ENABLED", and then rebuild and
>> reinstall your driver you should be able have it install without any
>> issues.
>>
>> - Alex
>>
> Yeah, I've had roughly the same thought process since you mentioned the
> args array. My first idea was "maybe the driver can't fit all of my 1's"
> but I saw it was defined at 32. Then I decided to just patch the whole
> enable_unsupported_sfp option out
> https://gist.github.com/AlexForster/112fd822704caf804849 but I'm still
> failing.
>
> I've been digging a bit, and I'm failing here in ixgbe_main.c...
>
> /* reset_hw fills in the perm_addr as well */
> hw->phy.reset_if_overtemp = true;
> err = hw->mac.ops.reset_hw(hw);
> hw->phy.reset_if_overtemp = false;
> if (err == IXGBE_ERR_SFP_NOT_PRESENT) {
> 	err = IXGBE_SUCCESS;
> } else if (err == IXGBE_ERR_SFP_NOT_SUPPORTED) {
> 	e_dev_err("failed to load because an unsupported SFP+ or QSFP "
> 		  "module type was detected.\n");
> 	e_dev_err("Reload the driver after installing a supported "
> 		  "module.\n");
> 	goto err_sw_init;
> } else if (err) {
> 	e_dev_err("HW Init failed: %d\n", err);
> 	goto err_sw_init;
> }
>
>
> I've attempted a hand-stacktrace and came up with the following...
>
> ixgbe_82599.c at 1016
>   * ixgbe_reset_hw_82599() is defined
>   * calls phy->ops.init() which potentially returns
> IXGBE_ERR_SFP_NOT_SUPPORTED
>
> ixgbe_82599.c at 102
>   * ixgbe_init_phy_ops_82599() is defined
>   * IXGBE_ERR_SFP_NOT_SUPPORTED is returned after calling
> phy->ops.identify()
>
> ixgbe_82599.c at 2085
>   * ixgbe_identify_phy_82599() is defined
>   * calls ixgbe_identify_module_generic()
>
> ixgbe_phy.c at 1281
>   * ixgbe_identify_module_generic() is defined
>   * calls ixgbe_identify_qsfp_module_generic()
>
> ixgbe_phy.c at 1663
>   * ixgbe_identify_qsfp_module_generic() is defined
>   * We fail somewhere before the ending call to ixgbe_get_device_caps()
> which does take allow_unsupported_sfp into account
>
>   * Possibility: hw->phy.ops.read_i2c_eeprom(hw, IXGBE_SFF_IDENTIFIER,
> &identifier) != IXGBE_SFF_IDENTIFIER_QSFP_PLUS
>   * Possibility: active_cable != true
>
> And then I'm over my head. Should I assume from here that the most likely
> explanation is a bad transceiver or bad fiber?
>
> Alex Forster

Are you able to swap transceiver or fiber between the 4 ports that work 
and the 4 that don't?  If you could do that then you should be able to 
tell if the issue is following the NIC ports, or if it is an issue with 
the external connections.  If it is issue is following the transceiver 
or fiber then it is probably what is causing the issue.

The other thing you could try doing is adding a printk to the spots 
where the status is set to SFP_NOT_SUPPORTED so that you could figure 
out exactly which spot is triggering the rejection of the module.

- Alex


More information about the dev mailing list