[dpdk-dev] [PATCH v3 01/22] ethdev: fix MTU size exceeds max rx packet length

Ferruh Yigit ferruh.yigit at intel.com
Mon Jan 25 13:22:03 CET 2021


On 1/19/2021 8:46 AM, oulijun wrote:
> 
> 
> 在 2021/1/18 18:42, Ferruh Yigit 写道:
>> On 1/15/2021 10:44 AM, oulijun wrote:
>>> Hi Steve
>>> This is a very good job! But I have some question and suggestions.
>>> Please check it.
>>>
>>> 在 2021/1/14 17:45, Steve Yang 写道:
>>>> Ethdev is using default Ethernet overhead to decide if provided
>>>> 'max_rx_pkt_len' value is bigger than max (non jumbo) MTU value,
>>>> and limits it to MAX if it is.
>>>>
>>>> Since the application/driver used Ethernet overhead is different than
>>>> the ethdev one, check result is wrong.
>>>>
>>>> If the driver is using Ethernet overhead bigger than the default one,
>>>> the provided 'max_rx_pkt_len' is trimmed down, and in the driver when
>>>> correct Ethernet overhead is used to convert back, the resulting MTU is
>>>> less than the intended one, causing some packets to be dropped.
>>>>
>>>> Like,
>>>> app     -> max_rx_pkt_len = 1500/*mtu*/ + 22/*overhead*/ = 1522
>>>> ethdev  -> 1522 > 1518/*MAX*/; max_rx_pkt_len = 1518
>>>> driver  -> MTU = 1518 - 22 = 1496
>>>> Packets with size 1497-1500 are dropped although intention is to be able
>>>> to send/receive them.
>>>>
>>>> The fix is to make ethdev use the correct Ethernet overhead for port,
>>>> instead of default one.
>>>>
>>>> Fixes: 59d0ecdbf0e1 ("ethdev: MTU accessors")
>>>>
>>>> Signed-off-by: Steve Yang <stevex.yang at intel.com>
>>>> ---
>>>>   lib/librte_ethdev/rte_ethdev.c | 27 ++++++++++++++++++++++++---
>>>>   1 file changed, 24 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
>>>> index 17ddacc78d..19ca4c4512 100644
>>>> --- a/lib/librte_ethdev/rte_ethdev.c
>>>> +++ b/lib/librte_ethdev/rte_ethdev.c
>>>> @@ -1292,8 +1292,10 @@ rte_eth_dev_configure(uint16_t port_id, uint16_t 
>>>> nb_rx_q, uint16_t nb_tx_q,
>>>>       struct rte_eth_dev *dev;
>>>>       struct rte_eth_dev_info dev_info;
>>>>       struct rte_eth_conf orig_conf;
>>>> +    uint16_t overhead_len;
>>>>       int diag;
>>>>       int ret;
>>>> +    uint16_t old_mtu;
>>>>       RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
>>>> @@ -1319,10 +1321,20 @@ rte_eth_dev_configure(uint16_t port_id, uint16_t 
>>>> nb_rx_q, uint16_t nb_tx_q,
>>>>           memcpy(&dev->data->dev_conf, dev_conf,
>>>>                  sizeof(dev->data->dev_conf));
>>>> +    /* Backup mtu for rollback */
>>>> +    old_mtu = dev->data->mtu;
>>>> +
>>>>       ret = rte_eth_dev_info_get(port_id, &dev_info);
>>>>       if (ret != 0)
>>>>           goto rollback;
>>>> +    /* Get the real Ethernet overhead length */
>>>> +    if (dev_info.max_mtu != UINT16_MAX &&
>>>> +        dev_info.max_rx_pktlen > dev_info.max_mtu)
>>>> +        overhead_len = dev_info.max_rx_pktlen - dev_info.max_mtu;
>>>> +    else
>>>> +        overhead_len = RTE_ETHER_HDR_LEN + RTE_ETHER_CRC_LEN;
>>>> +
>>> The Ethernet frame header length supported by each NIC may be different, 
>>> which cause the difference of allowed packets size and drop you say.
>>> The diference of Ethernet frame header length have an impact for the maximum 
>>> Ethernet frame length, which is a boundary value
>>> of enabling jumbo frame through the ' max_rx_pkt_len '.
>>> However, we need to do the same thing you do above to get the overhead_len 
>>> every time, which will
>>> cause a lot of duplicate code in the framework and app. For examples, parsing 
>>> and processing for '--max-pkt-len=xxx' parameter,
>>>   and " cmd_config_max_pkt_len_parsed " in testpmd, and here modifying here 
>>> dev_configure API.
>>> It's a little redundant and troublesome.
>>>
>>> Maybe, it is necessary for driver to directly report the supported maximum 
>>> Ethernet frame length by rte_dev_info_get API.
>>> As following:
>>> struct rte_eth_dev_info {
>>>      xxxx
>>>      /**
>>>       * The maximum Ethernet frame length supported by each
>>>       * driver varies with the Ethernet header length.
>>>       */
>>>      uint16_t eth_max_len;
>>>      xxxx
>>> }
>>>
>>> int
>>> rte_eth_dev_info_get(uint16_t port_id, struct rte_eth_dev_info *dev_info)
>>> {
>>>      xxx
>>>      dev_info->min_mtu = RTE_ETHER_MIN_MTU;
>>>      dev_info->max_mtu = UINT16_MAX;
>>>      dev_info->eth_max_len = RTE_ETHER_MAX_LEN;
>>>      xxx
>>> }
>>>
>>> And then:
>>> xxx_devv_infos_get(struct rte_eth_dev *eth_dev, struct rte_eth_dev_info *info)
>>> {
>>>      xxx
>>>      info->eth_max_len = xxx_ETHER_MAX _LEN;
>>>      xxx
>>> }
>>> What do you think?
>>
>> Hi Lijun,
>>
>> The 'max_mtu' has been added in the past to solve exact same problem, perhaps 
>> it would be better to add something like 'eth_max_len' at that time.
>>
>> But as Steve mentioned we are planning to switch to more MTU based 
>> configuration, which should remove the need of the field you mentioned.
>>
>>>>       /* If number of queues specified by application for both Rx and Tx is
>>>>        * zero, use driver preferred values. This cannot be done individually
>>>>        * as it is valid for either Tx or Rx (but not both) to be zero.
>>>> @@ -1410,11 +1422,18 @@ rte_eth_dev_configure(uint16_t port_id, uint16_t 
>>>> nb_rx_q, uint16_t nb_tx_q,
>>>>               goto rollback;
>>>>           }
>>>>       } else {
>>>> -        if (dev_conf->rxmode.max_rx_pkt_len < RTE_ETHER_MIN_LEN ||
>>>> -            dev_conf->rxmode.max_rx_pkt_len > RTE_ETHER_MAX_LEN)
>>>> +        uint16_t pktlen = dev_conf->rxmode.max_rx_pkt_len;
>>>> +        if (pktlen < RTE_ETHER_MIN_MTU + overhead_len ||
>>>> +            pktlen > RTE_ETHER_MTU + overhead_len)
>>>>               /* Use default value */
>>>>               dev->data->dev_conf.rxmode.max_rx_pkt_len =
>>>> -                            RTE_ETHER_MAX_LEN;
>>>> +                        RTE_ETHER_MTU + overhead_len;
>>>> +    }
>>>> +
>>>> +    /* Scale the MTU size to adapt max_rx_pkt_len */
>>>> +    if (dev_conf->rxmode.offloads & DEV_RX_OFFLOAD_JUMBO_FRAME) {
>>>> +        dev->data->mtu = dev->data->dev_conf.rxmode.max_rx_pkt_len -
>>>> +                overhead_len;
>>>>       }
>>> Now that we update mtu here when jumbo frame enabled. It is necessary to 
>>> check validity of max_rx_pkt_len.
>>> As following:
>>> if (dev_conf->rxmode.offloads & DEV_RX_OFFLOAD_JUMBO_FRAME) {
>>>      if (dev_conf->rxmode.max_rx_pkt_len > dev_info.max_rx_pktlen ||
>>>          dev_conf->rxmode.max_rx_pkt_len <=  RTE_ETHER_MTU + overhead_len) {
>>
>> It is not clear from API that if "max_rx_pkt_len <=  RTE_ETHER_MTU + 
>> overhead_len" is a valid value or not, the API says:
>> "max_rx_pkt_len;  /**< Only used if JUMBO_FRAME enabled. */"
>>
>> But it doesn't say that when 'DEV_RX_OFFLOAD_JUMBO_FRAME' is set, 
>> 'max_rx_pkt_len' should be less then 'RTE_ETHER_MTU'
>>
> The MTU setting entry does need to be adjusted. I agree! But I think this place 
> should be modified, and it's probably more reasonable.
> 
> Generally speaking, max_rx_pkt_len will be also updated with 'mtu + 
> overhead_len' also when application sets mtu > 1500, which means JUMBO_FRAME is 
> enabled.
> Namely, JUMBO_FRAME is enabled based on mtu > 1500 or max_rx_pkt_len > 1500 + 
> overhead_len.
> Besides, having following check for "rxmode.offloads & 
> DEV_RX_OFFLOAD_JUMBO_FRAME == 0":
> @@ -1410,11 +1422,18 @@ rte_eth_dev_configure(uint16_t port_id, uint16_t 
> nb_rx_q, uint16_t nb_tx_q,
>                goto rollback;
>            }
>        } else {
>          uint16_t pktlen = dev_conf->rxmode.max_rx_pkt_len;
>          if (pktlen < RTE_ETHER_MIN_MTU + overhead_len ||
>              pktlen > RTE_ETHER_MTU + overhead_len)
>                /* Use default value */
>                dev->data->dev_conf.rxmode.max_rx_pkt_len =
>                          RTE_ETHER_MTU + overhead_len;
> }
> One of check condition for the branch is "pktlen > RTE_ETHER_MTU + overhead_len ".
> 
> In view of the above two aspects, when application set 
> DEV_RX_OFFLOAD_JUMBO_FRAME in offload, the max_rx_pkt_len must be greater than " 
> RTE_ETHER_MTU + overhead_len "
> 

Hi Lijun,

Most probably intention was like you said, but that is not implemented or 
documented that way.

The API doc only says 'max_rx_pkt_len' is valid only when 'JUMBO_FRAME' is set, 
but it doesn't say what is the valid range.

And when 'JUMBO_FRAME' is set, the check in the 'rte_eth_dev_configure()' is 
same from the beggining of the project, and it is between 'RTE_ETHER_MIN_LEN' to 
'dev_info.max_rx_pktlen', and changing it will be the behaviour change.

So I suggest lets not add this new config restriction, but it is application's 
responsiblity to set/unset the 'JUMBO_FRAME' flag based on the packet size, and 
we can implemet this in the testpmd.

> Since DEV_RX_OFFLOAD_JUMBO_FRAME is not removed, so the check is necessary.
> 
>>>              xxxx
>>>          }
>>> } else {
>>>          xxx
>>> }
>>>
>>> If it does not thing above, which will cause an unreasonable case.
>>> Like,
>>> dev_conf->rxmode.offloads & DEV_RX_OFFLOAD_JUMBO_FRAME == 1
>>> dev_conf->rxmode.max_rx_pkt_len = 1500
>>> overhead_len = 26
>>> dev->data->mtu = dev_conf->rxmode.max_rx_pkt_len - overhead_len = 1500 - 26 = 
>>> 1474
>>>
>>
>> This looks correct, if the applicatin is requesting 'max_rx_pkt_len = 1500', 
>> the requeted MTU is 1474,
>> if application wants an MTU as 1500, it should set the 'max_rx_pkt_len' as 
>> "1500 + overhead".
>>
>>> In fact, DEV_RX_OFFLOAD_JUMBO_FRAME is set to rxmode.offloads when mtu > 1500.
>>
>> Overall the 'max_rx_pkt_len' setting is a little messy, it has been tried to 
>> fix in last release but we reverted them because of unexpected side affects.
>>
>> Current long term plan is,
>> - Switch both 'rte_eth_dev_configure()' & 'rte_eth_dev_set_mtu()' to work on 
>> MTU values. (instead one working on frame lenght and other on MTU)
>> - Remove the 'DEV_RX_OFFLOAD_JUMBO_FRAME' and decide jumbo frame request by 
>> requested MTU value check
>> - App decide the jumbo frame capability from the 'max_mtu' check
>> - App calculate Ethernet overhead using 'max_mtu' and 'max_rx_pktlen'
>>
>>>>       /*
>>>> @@ -1549,6 +1568,8 @@ rte_eth_dev_configure(uint16_t port_id, uint16_t 
>>>> nb_rx_q, uint16_t nb_tx_q,
>>>>       eth_dev_tx_queue_config(dev, 0);
>>>>   rollback:
>>>>       memcpy(&dev->data->dev_conf, &orig_conf, sizeof(dev->data->dev_conf));
>>>> +    if (old_mtu != dev->data->mtu)
>>>> +        dev->data->mtu = old_mtu;
>>>>       rte_ethdev_trace_configure(port_id, nb_rx_q, nb_tx_q, dev_conf, ret);
>>>>       return ret;
>>>>
>>
>> .
>>



More information about the dev mailing list