[dpdk-dev] [PATCH v3 01/22] ethdev: fix MTU size exceeds max rx packet length

Ferruh Yigit ferruh.yigit at intel.com
Mon Jan 18 11:42:03 CET 2021


On 1/15/2021 10:44 AM, oulijun wrote:
> Hi Steve
> This is a very good job! But I have some question and suggestions.
> Please check it.
> 
> 在 2021/1/14 17:45, Steve Yang 写道:
>> Ethdev is using default Ethernet overhead to decide if provided
>> 'max_rx_pkt_len' value is bigger than max (non jumbo) MTU value,
>> and limits it to MAX if it is.
>>
>> Since the application/driver used Ethernet overhead is different than
>> the ethdev one, check result is wrong.
>>
>> If the driver is using Ethernet overhead bigger than the default one,
>> the provided 'max_rx_pkt_len' is trimmed down, and in the driver when
>> correct Ethernet overhead is used to convert back, the resulting MTU is
>> less than the intended one, causing some packets to be dropped.
>>
>> Like,
>> app     -> max_rx_pkt_len = 1500/*mtu*/ + 22/*overhead*/ = 1522
>> ethdev  -> 1522 > 1518/*MAX*/; max_rx_pkt_len = 1518
>> driver  -> MTU = 1518 - 22 = 1496
>> Packets with size 1497-1500 are dropped although intention is to be able
>> to send/receive them.
>>
>> The fix is to make ethdev use the correct Ethernet overhead for port,
>> instead of default one.
>>
>> Fixes: 59d0ecdbf0e1 ("ethdev: MTU accessors")
>>
>> Signed-off-by: Steve Yang <stevex.yang at intel.com>
>> ---
>>   lib/librte_ethdev/rte_ethdev.c | 27 ++++++++++++++++++++++++---
>>   1 file changed, 24 insertions(+), 3 deletions(-)
>>
>> diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
>> index 17ddacc78d..19ca4c4512 100644
>> --- a/lib/librte_ethdev/rte_ethdev.c
>> +++ b/lib/librte_ethdev/rte_ethdev.c
>> @@ -1292,8 +1292,10 @@ rte_eth_dev_configure(uint16_t port_id, uint16_t 
>> nb_rx_q, uint16_t nb_tx_q,
>>       struct rte_eth_dev *dev;
>>       struct rte_eth_dev_info dev_info;
>>       struct rte_eth_conf orig_conf;
>> +    uint16_t overhead_len;
>>       int diag;
>>       int ret;
>> +    uint16_t old_mtu;
>>       RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
>> @@ -1319,10 +1321,20 @@ rte_eth_dev_configure(uint16_t port_id, uint16_t 
>> nb_rx_q, uint16_t nb_tx_q,
>>           memcpy(&dev->data->dev_conf, dev_conf,
>>                  sizeof(dev->data->dev_conf));
>> +    /* Backup mtu for rollback */
>> +    old_mtu = dev->data->mtu;
>> +
>>       ret = rte_eth_dev_info_get(port_id, &dev_info);
>>       if (ret != 0)
>>           goto rollback;
>> +    /* Get the real Ethernet overhead length */
>> +    if (dev_info.max_mtu != UINT16_MAX &&
>> +        dev_info.max_rx_pktlen > dev_info.max_mtu)
>> +        overhead_len = dev_info.max_rx_pktlen - dev_info.max_mtu;
>> +    else
>> +        overhead_len = RTE_ETHER_HDR_LEN + RTE_ETHER_CRC_LEN;
>> +
> The Ethernet frame header length supported by each NIC may be different, which 
> cause the difference of allowed packets size and drop you say.
> The diference of Ethernet frame header length have an impact for the maximum 
> Ethernet frame length, which is a boundary value
> of enabling jumbo frame through the ' max_rx_pkt_len '.
> However, we need to do the same thing you do above to get the overhead_len every 
> time, which will
> cause a lot of duplicate code in the framework and app. For examples, parsing 
> and processing for '--max-pkt-len=xxx' parameter,
>   and " cmd_config_max_pkt_len_parsed " in testpmd, and here modifying here 
> dev_configure API.
> It's a little redundant and troublesome.
> 
> Maybe, it is necessary for driver to directly report the supported maximum 
> Ethernet frame length by rte_dev_info_get API.
> As following:
> struct rte_eth_dev_info {
>      xxxx
>      /**
>       * The maximum Ethernet frame length supported by each
>       * driver varies with the Ethernet header length.
>       */
>      uint16_t eth_max_len;
>      xxxx
> }
> 
> int
> rte_eth_dev_info_get(uint16_t port_id, struct rte_eth_dev_info *dev_info)
> {
>      xxx
>      dev_info->min_mtu = RTE_ETHER_MIN_MTU;
>      dev_info->max_mtu = UINT16_MAX;
>      dev_info->eth_max_len = RTE_ETHER_MAX_LEN;
>      xxx
> }
> 
> And then:
> xxx_devv_infos_get(struct rte_eth_dev *eth_dev, struct rte_eth_dev_info *info)
> {
>      xxx
>      info->eth_max_len = xxx_ETHER_MAX _LEN;
>      xxx
> }
> What do you think?

Hi Lijun,

The 'max_mtu' has been added in the past to solve exact same problem, perhaps it 
would be better to add something like 'eth_max_len' at that time.

But as Steve mentioned we are planning to switch to more MTU based 
configuration, which should remove the need of the field you mentioned.

>>       /* If number of queues specified by application for both Rx and Tx is
>>        * zero, use driver preferred values. This cannot be done individually
>>        * as it is valid for either Tx or Rx (but not both) to be zero.
>> @@ -1410,11 +1422,18 @@ rte_eth_dev_configure(uint16_t port_id, uint16_t 
>> nb_rx_q, uint16_t nb_tx_q,
>>               goto rollback;
>>           }
>>       } else {
>> -        if (dev_conf->rxmode.max_rx_pkt_len < RTE_ETHER_MIN_LEN ||
>> -            dev_conf->rxmode.max_rx_pkt_len > RTE_ETHER_MAX_LEN)
>> +        uint16_t pktlen = dev_conf->rxmode.max_rx_pkt_len;
>> +        if (pktlen < RTE_ETHER_MIN_MTU + overhead_len ||
>> +            pktlen > RTE_ETHER_MTU + overhead_len)
>>               /* Use default value */
>>               dev->data->dev_conf.rxmode.max_rx_pkt_len =
>> -                            RTE_ETHER_MAX_LEN;
>> +                        RTE_ETHER_MTU + overhead_len;
>> +    }
>> +
>> +    /* Scale the MTU size to adapt max_rx_pkt_len */
>> +    if (dev_conf->rxmode.offloads & DEV_RX_OFFLOAD_JUMBO_FRAME) {
>> +        dev->data->mtu = dev->data->dev_conf.rxmode.max_rx_pkt_len -
>> +                overhead_len;
>>       }
> Now that we update mtu here when jumbo frame enabled. It is necessary to check 
> validity of max_rx_pkt_len.
> As following:
> if (dev_conf->rxmode.offloads & DEV_RX_OFFLOAD_JUMBO_FRAME) {
>      if (dev_conf->rxmode.max_rx_pkt_len > dev_info.max_rx_pktlen ||
>          dev_conf->rxmode.max_rx_pkt_len <=  RTE_ETHER_MTU + overhead_len) {

It is not clear from API that if "max_rx_pkt_len <=  RTE_ETHER_MTU + 
overhead_len" is a valid value or not, the API says:
"max_rx_pkt_len;  /**< Only used if JUMBO_FRAME enabled. */"

But it doesn't say that when 'DEV_RX_OFFLOAD_JUMBO_FRAME' is set, 
'max_rx_pkt_len' should be less then 'RTE_ETHER_MTU'

>              xxxx
>          }
> } else {
>          xxx
> }
> 
> If it does not thing above, which will cause an unreasonable case.
> Like,
> dev_conf->rxmode.offloads & DEV_RX_OFFLOAD_JUMBO_FRAME == 1
> dev_conf->rxmode.max_rx_pkt_len = 1500
> overhead_len = 26
> dev->data->mtu = dev_conf->rxmode.max_rx_pkt_len - overhead_len = 1500 - 26 = 1474
> 

This looks correct, if the applicatin is requesting 'max_rx_pkt_len = 1500', the 
requeted MTU is 1474,
if application wants an MTU as 1500, it should set the 'max_rx_pkt_len' as "1500 
+ overhead".

> In fact, DEV_RX_OFFLOAD_JUMBO_FRAME is set to rxmode.offloads when mtu > 1500.

Overall the 'max_rx_pkt_len' setting is a little messy, it has been tried to fix 
in last release but we reverted them because of unexpected side affects.

Current long term plan is,
- Switch both 'rte_eth_dev_configure()' & 'rte_eth_dev_set_mtu()' to work on MTU 
values. (instead one working on frame lenght and other on MTU)
- Remove the 'DEV_RX_OFFLOAD_JUMBO_FRAME' and decide jumbo frame request by 
requested MTU value check
- App decide the jumbo frame capability from the 'max_mtu' check
- App calculate Ethernet overhead using 'max_mtu' and 'max_rx_pktlen'

>>       /*
>> @@ -1549,6 +1568,8 @@ rte_eth_dev_configure(uint16_t port_id, uint16_t 
>> nb_rx_q, uint16_t nb_tx_q,
>>       eth_dev_tx_queue_config(dev, 0);
>>   rollback:
>>       memcpy(&dev->data->dev_conf, &orig_conf, sizeof(dev->data->dev_conf));
>> +    if (old_mtu != dev->data->mtu)
>> +        dev->data->mtu = old_mtu;
>>       rte_ethdev_trace_configure(port_id, nb_rx_q, nb_tx_q, dev_conf, ret);
>>       return ret;
>>



More information about the dev mailing list