[dpdk-dev] [PATCH v4 3/5] net/ice: fix max mtu size packets with vlan tag cannot be received by default

Ferruh Yigit ferruh.yigit at intel.com
Wed Oct 21 12:36:49 CEST 2020


On 10/21/2020 10:47 AM, Ananyev, Konstantin wrote:
> 
> 
>>
>> On 10/20/2020 10:07 AM, Ananyev, Konstantin wrote:
>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> testpmd will initialize default max packet length to 1518 which
>>>>>>>>>>>>>>>> doesn't include vlan tag size in ether overheader. Once, send the
>>>>>>>>>>>>>>>> max mtu length packet with vlan tag, the max packet length will
>>>>>>>>>>>>>>>> exceed 1518 that will cause packets dropped directly from NIC hw
>>>>>>>>>>>> side.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> ice can support dual vlan tags that need more 8 bytes for max
>>>>>>>>>>>>>>>> packet size, so, configures the correct max packet size in
>>>>>>>>>>>>>>>> dev_config
>>>>>>>>>>>>> ops.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Fixes: 50cc9d2a6e9d ("net/ice: fix max frame size")
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Signed-off-by: SteveX Yang <stevex.yang at intel.com>
>>>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>>>>       drivers/net/ice/ice_ethdev.c | 11 +++++++++++
>>>>>>>>>>>>>>>>       1 file changed, 11 insertions(+)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> diff --git a/drivers/net/ice/ice_ethdev.c
>>>>>>>>>>>>>>>> b/drivers/net/ice/ice_ethdev.c index
>>>>>>>>>>>>>>>> cfd357b05..6b7098444 100644
>>>>>>>>>>>>>>>> --- a/drivers/net/ice/ice_ethdev.c
>>>>>>>>>>>>>>>> +++ b/drivers/net/ice/ice_ethdev.c
>>>>>>>>>>>>>>>> @@ -3146,6 +3146,7 @@ ice_dev_configure(struct rte_eth_dev
>>>>>>>>>> *dev)
>>>>>>>>>>>>>>>> struct ice_adapter *ad =
>>>>>>>>>>>>>>>> ICE_DEV_PRIVATE_TO_ADAPTER(dev->data->dev_private);
>>>>>>>>>>>>>>>>       struct ice_pf *pf =
>>>>>>>>>>>>>>>> ICE_DEV_PRIVATE_TO_PF(dev->data->dev_private);
>>>>>>>>>>>>>>>> +uint32_t frame_size = dev->data->mtu + ICE_ETH_OVERHEAD;
>>>>>>>>>>>>>>>>       int ret;
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>       /* Initialize to TRUE. If any of Rx queues doesn't meet the @@
>>>>>>>>>>>>>>>> -3157,6
>>>>>>>>>>>>>>>> +3158,16 @@ ice_dev_configure(struct rte_eth_dev *dev)
>>>>>>>>>>>>>>>>       if (dev->data->dev_conf.rxmode.mq_mode &
>>>>>>>>>> ETH_MQ_RX_RSS_FLAG)
>>>>>>>>>>>>>>>> dev->data->dev_conf.rxmode.offloads |=
>>>>>>>>>>>>> DEV_RX_OFFLOAD_RSS_HASH;
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>> + * Considering QinQ packet, max frame size should be equal or
>>>>>>>>>>>>>>>> + * larger than total size of MTU and Ether overhead.
>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> +if (frame_size > dev->data->dev_conf.rxmode.max_rx_pkt_len) {
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Why we need this check?
>>>>>>>>>>>>>>> Can we just call ice_mtu_set directly
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I think that without that check we can silently overwrite provided
>>>>>>>>>>>>>> by user dev_conf.rxmode.max_rx_pkt_len value.
>>>>>>>>>>>>>
>>>>>>>>>>>>> OK, I see
>>>>>>>>>>>>>
>>>>>>>>>>>>> But still have one question
>>>>>>>>>>>>> dev->data->mtu is initialized to 1518 as default , but if
>>>>>>>>>>>>> dev->data->application set
>>>>>>>>>>>>> dev_conf.rxmode.max_rx_pkt_len = 1000 in dev_configure.
>>>>>>>>>>>>> does that mean we will still will set mtu to 1518, is this expected?
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> max_rx_pkt_len should be larger than mtu at least, so we should raise
>>>>>>>>>>>> the max_rx_pkt_len (e.g.:1518) to hold expected mtu value (e.g.: 1500).
>>>>>>>>>>>
>>>>>>>>>>> Ok, this describe the problem more general and better to replace exist
>>>>>>>>>> code comment and commit log for easy understanding.
>>>>>>>>>>> Please send a new version for reword
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I didn't really get this set.
>>>>>>>>>>
>>>>>>>>>> Application explicitly sets 'max_rx_pkt_len' to '1518', and a frame bigger than
>>>>>>>>>> this size is dropped.
>>>>>>>>>
>>>>>>>>> Sure, it is normal case for dropping oversize data.
>>>>>>>>>
>>>>>>>>>> Isn't this what should be, why we are trying to overwrite user configuration
>>>>>>>>>> in PMD to prevent this?
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> But it is a confliction that application/user sets mtu & max_rx_pkt_len at the same time.
>>>>>>>>> This fix will make a decision when confliction occurred.
>>>>>>>>> MTU value will come from user operation (e.g.: port config mtu 0 1500) directly,
>>>>>>>>> so, the max_rx_pkt_len will resize itself to adapt expected MTU value if its size is smaller than MTU + Ether overhead.
>>>>>>>>>
>>>>>>>>>> During eth_dev allocation, mtu set to default '1500', by ethdev layer.
>>>>>>>>>> And testpmd sets 'max_rx_pkt_len' by default to '1518'.
>>>>>>>>>> I think Qi's concern above is valid, what is user set 'max_rx_pkt_len' to '1000'
>>>>>>>>>> and mean it? PMD will not honor the user config.
>>>>>>>>>
>>>>>>>>> I'm not sure when set 'mtu' to '1500' and 'max_rx_pkt_len' to '1000', what's the behavior expected?
>>>>>>>>> If still keep the 'max_rx_pkt_len' value, that means the larger 'mtu' will be invalid.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Why not simply increase the default 'max_rx_pkt_len' in testpmd?
>>>>>>>>>>
>>>>>>>>> The default 'max_rx_pkt_len' has been initialized to generical value (1518) and default 'mtu' is '1500' in testpmd,
>>>>>>>>> But it isn't suitable to those NIC drivers which Ether overhead is larger than 18. (e.g.: ice, i40e) if 'mtu' value is preferable.
>>>>>>>>>
>>>>>>>>>> And I guess even better what we need is to tell to the application what the
>>>>>>>>>> frame overhead PMD accepts.
>>>>>>>>>> So the application can set proper 'max_rx_pkt_len' value per port for a
>>>>>>>>>> given/requested MTU value.
>>>>>>>>>> @Ian, cc'ed, was complaining almost same thing years ago, these PMD
>>>>>>>>>> overhead macros and 'max_mtu'/'min_mtu' added because of that, perhaps
>>>>>>>>>> he has a solution now?
>>>>>>>>
>>>>>>>>     From my perspective the main problem here:
>>>>>>>> We have 2 different variables for nearly the same thing:
>>>>>>>> rte_eth_dev_data.mtu and rte_eth_dev_data.dev_conf.max_rx_pkt_len.
>>>>>>>> and 2 different API to update them: dev_mtu_set() and dev_configure().
>>>>>>>
>>>>>>> According API 'max_rx_pkt_len' is 'Only used if JUMBO_FRAME enabled'
>>>>>>> Although not sure that is practically what is done for all drivers.
>>>>>>
>>>>>> I think most of Intel PMDs use it unconditionally.
>>>>>>
>>>>>>>
>>>>>>>> And inside majority of Intel PMDs we don't keep these 2 variables in sync:
>>>>>>>> - mtu_set() will update both variables.
>>>>>>>> - dev_configure() will update only max_rx_pkt_len, but will keep mtu intact.
>>>>>>>>
>>>>>>>> This patch fixes this inconsistency, which I think is a good thing.
>>>>>>>> Though yes, it introduces change in behaviour.
>>>>>>>>
>>>>>>>> Let say the code:
>>>>>>>> rte_eth_dev_set_mtu(port, 1500);
>>>>>>>> dev_conf.max_rx_pkt_len = 1000;
>>>>>>>> rte_eth_dev_configure(port, 1, 1, &dev_conf);
>>>>>>>>
>>>>>>>
>>>>>>> 'rte_eth_dev_configure()' is one of the first APIs called, it is called before
>>>>>>> 'rte_eth_dev_set_mtu().
>>>>>>
>>>>>> Usually yes.
>>>>>> But you can still do sometimes later: dev_mtu_set(); ...; dev_stop(); dev_configure(); dev_start();
>>>>>>
>>>>>>>
>>>>>>> When 'rte_eth_dev_configure()' is called, MTU is set to '1500' by default by
>>>>>>> ethdev layer, so it is not user configuration, but 'max_rx_pkt_len' is.
>>>>>>
>>>>>> See above.
>>>>>> PMD doesn't know where this MTU value came from (default ethdev value or user specified value)
>>>>>> and probably it shouldn't care.
>>>>>>
>>>>>>>
>>>>>>> And later, when 'rte_eth_dev_set_mtu()' is called, but MTU and 'max_rx_pkt_len'
>>>>>>> are updated (mostly).
>>>>>>
>>>>>> Yes, in mtu_set() we update both.
>>>>>> But we don't update MTU in dev_configure(), only max_rx_pkt_len.
>>>>>> That what this patch tries to fix (as I understand it).
>>>>>
>>>>> To be more precise - it doesn't change MTU value in dev_configure(),
>>>>> but instead doesn't allow max_rx_pkt_len to become smaller
>>>>> then MTU + OVERHEAD.
>>>>> Probably changing MTU value instead is a better choice.
>>>>>
>>>>
>>>> +1 to change mtu for this case.
>>>> And this is what happens in practice when there is no 'rte_eth_dev_set_mtu()'
>>>> call, since PMD is using ('max_rx_pkt_len' - OVERHEAD) to set MTU.
>>>
>>> Hmm, I don't see that happens within Intel PMDs.
>>> As I can read the code: if user never call mtu_set(), then MTU value is left intact.
>>>
>>
>> I was checking ice,
>> in 'ice_dev_start()', 'rxmode.max_rx_pkt_len' is used to configure the device.
> 
> Yes, I am not arguing with that.
> What I am saying - dev_config() doesn't update MTU based on max_rx_pkt_len.
> While it probably should.
> 

Yes 'dev_configure()' doesn't update the 'dev->data->mtu' and 'max_rx_pkt_len' & 
'dev->data->mtu' may diverge there.

I think best place to update 'dev->data->mtu' is where the device is actually 
updated, but to prevent the diversion above we can update 'dev->data->mtu' in 
ethdev layer, in 'rte_eth_dev_configure()' based on 'max_rx_pkt_len', will it work?

Only concern I see is if user reads the MTU ('rte_eth_dev_get_mtu()') after 
'rte_eth_dev_configure()' but before device configured, user will get the wrong 
value, I guess that problem was already there but changing default value may 
make it more visible.

>>
>>>> But this won't solve the problem Steve is trying to solve.
>>>
>>> You mean we still need to update test-pmd code to calculate max_rx_pkt_len
>>> properly for default mtu value?
>>>
>>
>> Yes.
>> Because target of this set is able to receive packets with payload size
>> 'RTE_ETHER_MTU', if MTU is updated according to the provided 'max_rx_pkt_len',
>> device still won't able to receive those packets.
> 
> Agree.
> 
>>
>>>>>>>
>>>>>>>
>>>>>>>> Before the patch will result:
>>>>>>>> mtu==1500, max_rx_pkt_len=1000;  //out of sync looks wrong to me
>>>>>>>>
>>>>>>>> After the patch:
>>>>>>>> mtu=1500, max_rx_ptk_len=1518; // in sync, change in behaviour.
>>>>>>>>
>>>>>>>> If you think we need to preserve current behaviour,
>>>>>>>> then I suppose the easiest thing would be to change dev_config() code
>>>>>>>> to update mtu value based on max_rx_pkt_len.
>>>>>>>> I.E: dev_configure {...; mtu_set(max_rx_pkt_len - OVERHEAD); ...}
>>>>>>>> So the code snippet above will result:
>>>>>>>> mtu=982,max_rx_pkt_len=1000;
>>>>>>>>
>>>>>>>
>>>>>>> The 'max_rx_ptk_len' is annoyance for a long time, what do you think to just
>>>>>>> drop it?
>>>>>>>
>>>>>>> By default device will be up with default MTU (1500), later
>>>>>>> 'rte_eth_dev_set_mtu' can be used to set the MTU, no frame size setting at all.
>>>>>>>
>>>>>>> Will this work?
>>>>>>
>>>>>> I think it might, but that's a big change, probably too risky at that stage...
>>>>>>
>>>>
>>>> Defintely, I was thinking for 21.11. Let me send a deprecation notice and see
>>>> what happens.
>>>>
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> And for short term, for above Intel PMDs, there must be a place this
>>>>>>> 'max_rx_pkt_len' value taken into account (mostly 'start()' dev_ops), that
>>>>>>> function can be updated to take 'max_rx_pkt_len' only if JUMBO_FRAME set,
>>>>>>> otherwise use the 'MTU' value.
>>>>>>
>>>>>> Even if we'll use max_rx_pkt_len only when if JUMBO_FRAME is set,
>>>>>> I think we still need to keep max_rx_pkt_len and MTU values in sync.
>>>>>>
>>>>>>>
>>>>>>> Without 'start()' updated the current logic won't work after stop & start anyway.
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> And why this same thing can't happen to other PMDs? If this is a problem for
>>>>>>>>>> all PMDs, we should solve in other level, not for only some PMDs.
>>>>>>>>>>
>>>>>>>>> No, all PMDs exist the same issue, another proposal:
>>>>>>>>>      -  rte_ethdev provides the unique resize 'max_rx_pkt_len' in rte_eth_dev_configure();
>>>>>>>>>      - provide the uniform API for fetching the NIC's supported Ether Overhead size;
>>>>>>>>> Is it feasible?
>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> Generally, the mtu value can be adjustable from user (e.g.: ip link
>>>>>>>>>>>> set ens801f0 mtu 1400), hence, we just adjust the max_rx_pkt_len to
>>>>>>>>>>>> satisfy mtu requirement.
>>>>>>>>>>>>
>>>>>>>>>>>>> Should we just call ice_mtu_set(dev, dev_conf.rxmode.max_rx_pkt_len)
>>>>>>>>>>>>> here?
>>>>>>>>>>>> ice_mtu_set(dev, mtu) will append ether overhead to
>>>>>>>>>>>> frame_size/max_rx_pkt_len, so we need pass the mtu value as the 2nd
>>>>>>>>>>>> parameter, or not the max_rx_pkt_len.
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> And please remove above comment, since ether overhead is already
>>>>>>>>>>>>>> considered in ice_mtu_set.
>>>>>>>>>>>> Ether overhead is already considered in ice_mtu_set, but it also
>>>>>>>>>>>> should be considered as the adjustment condition that if ice_mtu_set
>>>>>>>>>> need be invoked.
>>>>>>>>>>>> So, it perhaps should remain this comment before this if() condition.
>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> +ret = ice_mtu_set(dev, dev->data->mtu); if (ret != 0) return
>>>>>>>>>>>>>>>> +ret; }
>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>       ret = ice_init_rss(pf);
>>>>>>>>>>>>>>>>       if (ret) {
>>>>>>>>>>>>>>>>       PMD_DRV_LOG(ERR, "Failed to enable rss for PF");
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> 2.17.1
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>
>>>>>
>>>
> 



More information about the dev mailing list