[dpdk-dev] [PATCH v3 01/22] ethdev: fix MTU size exceeds max rx packet length

oulijun oulijun at huawei.com
Tue Jan 19 09:46:56 CET 2021



在 2021/1/18 18:42, Ferruh Yigit 写道:
> On 1/15/2021 10:44 AM, oulijun wrote:
>> Hi Steve
>> This is a very good job! But I have some question and suggestions.
>> Please check it.
>>
>> 在 2021/1/14 17:45, Steve Yang 写道:
>>> Ethdev is using default Ethernet overhead to decide if provided
>>> 'max_rx_pkt_len' value is bigger than max (non jumbo) MTU value,
>>> and limits it to MAX if it is.
>>>
>>> Since the application/driver used Ethernet overhead is different than
>>> the ethdev one, check result is wrong.
>>>
>>> If the driver is using Ethernet overhead bigger than the default one,
>>> the provided 'max_rx_pkt_len' is trimmed down, and in the driver when
>>> correct Ethernet overhead is used to convert back, the resulting MTU is
>>> less than the intended one, causing some packets to be dropped.
>>>
>>> Like,
>>> app     -> max_rx_pkt_len = 1500/*mtu*/ + 22/*overhead*/ = 1522
>>> ethdev  -> 1522 > 1518/*MAX*/; max_rx_pkt_len = 1518
>>> driver  -> MTU = 1518 - 22 = 1496
>>> Packets with size 1497-1500 are dropped although intention is to be able
>>> to send/receive them.
>>>
>>> The fix is to make ethdev use the correct Ethernet overhead for port,
>>> instead of default one.
>>>
>>> Fixes: 59d0ecdbf0e1 ("ethdev: MTU accessors")
>>>
>>> Signed-off-by: Steve Yang <stevex.yang at intel.com>
>>> ---
>>>   lib/librte_ethdev/rte_ethdev.c | 27 ++++++++++++++++++++++++---
>>>   1 file changed, 24 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/lib/librte_ethdev/rte_ethdev.c 
>>> b/lib/librte_ethdev/rte_ethdev.c
>>> index 17ddacc78d..19ca4c4512 100644
>>> --- a/lib/librte_ethdev/rte_ethdev.c
>>> +++ b/lib/librte_ethdev/rte_ethdev.c
>>> @@ -1292,8 +1292,10 @@ rte_eth_dev_configure(uint16_t port_id, 
>>> uint16_t nb_rx_q, uint16_t nb_tx_q,
>>>       struct rte_eth_dev *dev;
>>>       struct rte_eth_dev_info dev_info;
>>>       struct rte_eth_conf orig_conf;
>>> +    uint16_t overhead_len;
>>>       int diag;
>>>       int ret;
>>> +    uint16_t old_mtu;
>>>       RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
>>> @@ -1319,10 +1321,20 @@ rte_eth_dev_configure(uint16_t port_id, 
>>> uint16_t nb_rx_q, uint16_t nb_tx_q,
>>>           memcpy(&dev->data->dev_conf, dev_conf,
>>>                  sizeof(dev->data->dev_conf));
>>> +    /* Backup mtu for rollback */
>>> +    old_mtu = dev->data->mtu;
>>> +
>>>       ret = rte_eth_dev_info_get(port_id, &dev_info);
>>>       if (ret != 0)
>>>           goto rollback;
>>> +    /* Get the real Ethernet overhead length */
>>> +    if (dev_info.max_mtu != UINT16_MAX &&
>>> +        dev_info.max_rx_pktlen > dev_info.max_mtu)
>>> +        overhead_len = dev_info.max_rx_pktlen - dev_info.max_mtu;
>>> +    else
>>> +        overhead_len = RTE_ETHER_HDR_LEN + RTE_ETHER_CRC_LEN;
>>> +
>> The Ethernet frame header length supported by each NIC may be 
>> different, which cause the difference of allowed packets size and drop 
>> you say.
>> The diference of Ethernet frame header length have an impact for the 
>> maximum Ethernet frame length, which is a boundary value
>> of enabling jumbo frame through the ' max_rx_pkt_len '.
>> However, we need to do the same thing you do above to get the 
>> overhead_len every time, which will
>> cause a lot of duplicate code in the framework and app. For examples, 
>> parsing and processing for '--max-pkt-len=xxx' parameter,
>>   and " cmd_config_max_pkt_len_parsed " in testpmd, and here modifying 
>> here dev_configure API.
>> It's a little redundant and troublesome.
>>
>> Maybe, it is necessary for driver to directly report the supported 
>> maximum Ethernet frame length by rte_dev_info_get API.
>> As following:
>> struct rte_eth_dev_info {
>>      xxxx
>>      /**
>>       * The maximum Ethernet frame length supported by each
>>       * driver varies with the Ethernet header length.
>>       */
>>      uint16_t eth_max_len;
>>      xxxx
>> }
>>
>> int
>> rte_eth_dev_info_get(uint16_t port_id, struct rte_eth_dev_info *dev_info)
>> {
>>      xxx
>>      dev_info->min_mtu = RTE_ETHER_MIN_MTU;
>>      dev_info->max_mtu = UINT16_MAX;
>>      dev_info->eth_max_len = RTE_ETHER_MAX_LEN;
>>      xxx
>> }
>>
>> And then:
>> xxx_devv_infos_get(struct rte_eth_dev *eth_dev, struct 
>> rte_eth_dev_info *info)
>> {
>>      xxx
>>      info->eth_max_len = xxx_ETHER_MAX _LEN;
>>      xxx
>> }
>> What do you think?
> 
> Hi Lijun,
> 
> The 'max_mtu' has been added in the past to solve exact same problem, 
> perhaps it would be better to add something like 'eth_max_len' at that 
> time.
> 
> But as Steve mentioned we are planning to switch to more MTU based 
> configuration, which should remove the need of the field you mentioned.
> 
>>>       /* If number of queues specified by application for both Rx and 
>>> Tx is
>>>        * zero, use driver preferred values. This cannot be done 
>>> individually
>>>        * as it is valid for either Tx or Rx (but not both) to be zero.
>>> @@ -1410,11 +1422,18 @@ rte_eth_dev_configure(uint16_t port_id, 
>>> uint16_t nb_rx_q, uint16_t nb_tx_q,
>>>               goto rollback;
>>>           }
>>>       } else {
>>> -        if (dev_conf->rxmode.max_rx_pkt_len < RTE_ETHER_MIN_LEN ||
>>> -            dev_conf->rxmode.max_rx_pkt_len > RTE_ETHER_MAX_LEN)
>>> +        uint16_t pktlen = dev_conf->rxmode.max_rx_pkt_len;
>>> +        if (pktlen < RTE_ETHER_MIN_MTU + overhead_len ||
>>> +            pktlen > RTE_ETHER_MTU + overhead_len)
>>>               /* Use default value */
>>>               dev->data->dev_conf.rxmode.max_rx_pkt_len =
>>> -                            RTE_ETHER_MAX_LEN;
>>> +                        RTE_ETHER_MTU + overhead_len;
>>> +    }
>>> +
>>> +    /* Scale the MTU size to adapt max_rx_pkt_len */
>>> +    if (dev_conf->rxmode.offloads & DEV_RX_OFFLOAD_JUMBO_FRAME) {
>>> +        dev->data->mtu = dev->data->dev_conf.rxmode.max_rx_pkt_len -
>>> +                overhead_len;
>>>       }
>> Now that we update mtu here when jumbo frame enabled. It is necessary 
>> to check validity of max_rx_pkt_len.
>> As following:
>> if (dev_conf->rxmode.offloads & DEV_RX_OFFLOAD_JUMBO_FRAME) {
>>      if (dev_conf->rxmode.max_rx_pkt_len > dev_info.max_rx_pktlen ||
>>          dev_conf->rxmode.max_rx_pkt_len <=  RTE_ETHER_MTU + 
>> overhead_len) {
> 
> It is not clear from API that if "max_rx_pkt_len <=  RTE_ETHER_MTU + 
> overhead_len" is a valid value or not, the API says:
> "max_rx_pkt_len;  /**< Only used if JUMBO_FRAME enabled. */"
> 
> But it doesn't say that when 'DEV_RX_OFFLOAD_JUMBO_FRAME' is set, 
> 'max_rx_pkt_len' should be less then 'RTE_ETHER_MTU'
> 
The MTU setting entry does need to be adjusted. I agree! But I think 
this place should be modified, and it's probably more reasonable.

Generally speaking, max_rx_pkt_len will be also updated with 'mtu + 
overhead_len' also when application sets mtu > 1500, which means 
JUMBO_FRAME is enabled.
Namely, JUMBO_FRAME is enabled based on mtu > 1500 or max_rx_pkt_len > 
1500 + overhead_len.
Besides, having following check for "rxmode.offloads & 
DEV_RX_OFFLOAD_JUMBO_FRAME == 0":
@@ -1410,11 +1422,18 @@ rte_eth_dev_configure(uint16_t port_id, uint16_t 
nb_rx_q, uint16_t nb_tx_q,
               goto rollback;
           }
       } else {
         uint16_t pktlen = dev_conf->rxmode.max_rx_pkt_len;
         if (pktlen < RTE_ETHER_MIN_MTU + overhead_len ||
             pktlen > RTE_ETHER_MTU + overhead_len)
               /* Use default value */
               dev->data->dev_conf.rxmode.max_rx_pkt_len =
                         RTE_ETHER_MTU + overhead_len;
}
One of check condition for the branch is "pktlen > RTE_ETHER_MTU + 
overhead_len ".

In view of the above two aspects, when application set 
DEV_RX_OFFLOAD_JUMBO_FRAME in offload, the max_rx_pkt_len must be 
greater than " RTE_ETHER_MTU + overhead_len "

Since DEV_RX_OFFLOAD_JUMBO_FRAME is not removed, so the check is necessary.

>>              xxxx
>>          }
>> } else {
>>          xxx
>> }
>>
>> If it does not thing above, which will cause an unreasonable case.
>> Like,
>> dev_conf->rxmode.offloads & DEV_RX_OFFLOAD_JUMBO_FRAME == 1
>> dev_conf->rxmode.max_rx_pkt_len = 1500
>> overhead_len = 26
>> dev->data->mtu = dev_conf->rxmode.max_rx_pkt_len - overhead_len = 1500 
>> - 26 = 1474
>>
> 
> This looks correct, if the applicatin is requesting 'max_rx_pkt_len = 
> 1500', the requeted MTU is 1474,
> if application wants an MTU as 1500, it should set the 'max_rx_pkt_len' 
> as "1500 + overhead".
> 
>> In fact, DEV_RX_OFFLOAD_JUMBO_FRAME is set to rxmode.offloads when mtu 
>> > 1500.
> 
> Overall the 'max_rx_pkt_len' setting is a little messy, it has been 
> tried to fix in last release but we reverted them because of unexpected 
> side affects.
> 
> Current long term plan is,
> - Switch both 'rte_eth_dev_configure()' & 'rte_eth_dev_set_mtu()' to 
> work on MTU values. (instead one working on frame lenght and other on MTU)
> - Remove the 'DEV_RX_OFFLOAD_JUMBO_FRAME' and decide jumbo frame request 
> by requested MTU value check
> - App decide the jumbo frame capability from the 'max_mtu' check
> - App calculate Ethernet overhead using 'max_mtu' and 'max_rx_pktlen'
> 
>>>       /*
>>> @@ -1549,6 +1568,8 @@ rte_eth_dev_configure(uint16_t port_id, 
>>> uint16_t nb_rx_q, uint16_t nb_tx_q,
>>>       eth_dev_tx_queue_config(dev, 0);
>>>   rollback:
>>>       memcpy(&dev->data->dev_conf, &orig_conf, 
>>> sizeof(dev->data->dev_conf));
>>> +    if (old_mtu != dev->data->mtu)
>>> +        dev->data->mtu = old_mtu;
>>>       rte_ethdev_trace_configure(port_id, nb_rx_q, nb_tx_q, dev_conf, 
>>> ret);
>>>       return ret;
>>>
> 
> .
> 


More information about the dev mailing list