[dpdk-dev] [PATCH v3 01/22] ethdev: fix MTU size exceeds max rx packet length
Ferruh Yigit
ferruh.yigit at intel.com
Mon Jan 18 11:42:03 CET 2021
On 1/15/2021 10:44 AM, oulijun wrote:
> Hi Steve
> This is a very good job! But I have some question and suggestions.
> Please check it.
>
> 在 2021/1/14 17:45, Steve Yang 写道:
>> Ethdev is using default Ethernet overhead to decide if provided
>> 'max_rx_pkt_len' value is bigger than max (non jumbo) MTU value,
>> and limits it to MAX if it is.
>>
>> Since the application/driver used Ethernet overhead is different than
>> the ethdev one, check result is wrong.
>>
>> If the driver is using Ethernet overhead bigger than the default one,
>> the provided 'max_rx_pkt_len' is trimmed down, and in the driver when
>> correct Ethernet overhead is used to convert back, the resulting MTU is
>> less than the intended one, causing some packets to be dropped.
>>
>> Like,
>> app -> max_rx_pkt_len = 1500/*mtu*/ + 22/*overhead*/ = 1522
>> ethdev -> 1522 > 1518/*MAX*/; max_rx_pkt_len = 1518
>> driver -> MTU = 1518 - 22 = 1496
>> Packets with size 1497-1500 are dropped although intention is to be able
>> to send/receive them.
>>
>> The fix is to make ethdev use the correct Ethernet overhead for port,
>> instead of default one.
>>
>> Fixes: 59d0ecdbf0e1 ("ethdev: MTU accessors")
>>
>> Signed-off-by: Steve Yang <stevex.yang at intel.com>
>> ---
>> lib/librte_ethdev/rte_ethdev.c | 27 ++++++++++++++++++++++++---
>> 1 file changed, 24 insertions(+), 3 deletions(-)
>>
>> diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
>> index 17ddacc78d..19ca4c4512 100644
>> --- a/lib/librte_ethdev/rte_ethdev.c
>> +++ b/lib/librte_ethdev/rte_ethdev.c
>> @@ -1292,8 +1292,10 @@ rte_eth_dev_configure(uint16_t port_id, uint16_t
>> nb_rx_q, uint16_t nb_tx_q,
>> struct rte_eth_dev *dev;
>> struct rte_eth_dev_info dev_info;
>> struct rte_eth_conf orig_conf;
>> + uint16_t overhead_len;
>> int diag;
>> int ret;
>> + uint16_t old_mtu;
>> RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
>> @@ -1319,10 +1321,20 @@ rte_eth_dev_configure(uint16_t port_id, uint16_t
>> nb_rx_q, uint16_t nb_tx_q,
>> memcpy(&dev->data->dev_conf, dev_conf,
>> sizeof(dev->data->dev_conf));
>> + /* Backup mtu for rollback */
>> + old_mtu = dev->data->mtu;
>> +
>> ret = rte_eth_dev_info_get(port_id, &dev_info);
>> if (ret != 0)
>> goto rollback;
>> + /* Get the real Ethernet overhead length */
>> + if (dev_info.max_mtu != UINT16_MAX &&
>> + dev_info.max_rx_pktlen > dev_info.max_mtu)
>> + overhead_len = dev_info.max_rx_pktlen - dev_info.max_mtu;
>> + else
>> + overhead_len = RTE_ETHER_HDR_LEN + RTE_ETHER_CRC_LEN;
>> +
> The Ethernet frame header length supported by each NIC may be different, which
> cause the difference of allowed packets size and drop you say.
> The diference of Ethernet frame header length have an impact for the maximum
> Ethernet frame length, which is a boundary value
> of enabling jumbo frame through the ' max_rx_pkt_len '.
> However, we need to do the same thing you do above to get the overhead_len every
> time, which will
> cause a lot of duplicate code in the framework and app. For examples, parsing
> and processing for '--max-pkt-len=xxx' parameter,
> and " cmd_config_max_pkt_len_parsed " in testpmd, and here modifying here
> dev_configure API.
> It's a little redundant and troublesome.
>
> Maybe, it is necessary for driver to directly report the supported maximum
> Ethernet frame length by rte_dev_info_get API.
> As following:
> struct rte_eth_dev_info {
> xxxx
> /**
> * The maximum Ethernet frame length supported by each
> * driver varies with the Ethernet header length.
> */
> uint16_t eth_max_len;
> xxxx
> }
>
> int
> rte_eth_dev_info_get(uint16_t port_id, struct rte_eth_dev_info *dev_info)
> {
> xxx
> dev_info->min_mtu = RTE_ETHER_MIN_MTU;
> dev_info->max_mtu = UINT16_MAX;
> dev_info->eth_max_len = RTE_ETHER_MAX_LEN;
> xxx
> }
>
> And then:
> xxx_devv_infos_get(struct rte_eth_dev *eth_dev, struct rte_eth_dev_info *info)
> {
> xxx
> info->eth_max_len = xxx_ETHER_MAX _LEN;
> xxx
> }
> What do you think?
Hi Lijun,
The 'max_mtu' has been added in the past to solve exact same problem, perhaps it
would be better to add something like 'eth_max_len' at that time.
But as Steve mentioned we are planning to switch to more MTU based
configuration, which should remove the need of the field you mentioned.
>> /* If number of queues specified by application for both Rx and Tx is
>> * zero, use driver preferred values. This cannot be done individually
>> * as it is valid for either Tx or Rx (but not both) to be zero.
>> @@ -1410,11 +1422,18 @@ rte_eth_dev_configure(uint16_t port_id, uint16_t
>> nb_rx_q, uint16_t nb_tx_q,
>> goto rollback;
>> }
>> } else {
>> - if (dev_conf->rxmode.max_rx_pkt_len < RTE_ETHER_MIN_LEN ||
>> - dev_conf->rxmode.max_rx_pkt_len > RTE_ETHER_MAX_LEN)
>> + uint16_t pktlen = dev_conf->rxmode.max_rx_pkt_len;
>> + if (pktlen < RTE_ETHER_MIN_MTU + overhead_len ||
>> + pktlen > RTE_ETHER_MTU + overhead_len)
>> /* Use default value */
>> dev->data->dev_conf.rxmode.max_rx_pkt_len =
>> - RTE_ETHER_MAX_LEN;
>> + RTE_ETHER_MTU + overhead_len;
>> + }
>> +
>> + /* Scale the MTU size to adapt max_rx_pkt_len */
>> + if (dev_conf->rxmode.offloads & DEV_RX_OFFLOAD_JUMBO_FRAME) {
>> + dev->data->mtu = dev->data->dev_conf.rxmode.max_rx_pkt_len -
>> + overhead_len;
>> }
> Now that we update mtu here when jumbo frame enabled. It is necessary to check
> validity of max_rx_pkt_len.
> As following:
> if (dev_conf->rxmode.offloads & DEV_RX_OFFLOAD_JUMBO_FRAME) {
> if (dev_conf->rxmode.max_rx_pkt_len > dev_info.max_rx_pktlen ||
> dev_conf->rxmode.max_rx_pkt_len <= RTE_ETHER_MTU + overhead_len) {
It is not clear from API that if "max_rx_pkt_len <= RTE_ETHER_MTU +
overhead_len" is a valid value or not, the API says:
"max_rx_pkt_len; /**< Only used if JUMBO_FRAME enabled. */"
But it doesn't say that when 'DEV_RX_OFFLOAD_JUMBO_FRAME' is set,
'max_rx_pkt_len' should be less then 'RTE_ETHER_MTU'
> xxxx
> }
> } else {
> xxx
> }
>
> If it does not thing above, which will cause an unreasonable case.
> Like,
> dev_conf->rxmode.offloads & DEV_RX_OFFLOAD_JUMBO_FRAME == 1
> dev_conf->rxmode.max_rx_pkt_len = 1500
> overhead_len = 26
> dev->data->mtu = dev_conf->rxmode.max_rx_pkt_len - overhead_len = 1500 - 26 = 1474
>
This looks correct, if the applicatin is requesting 'max_rx_pkt_len = 1500', the
requeted MTU is 1474,
if application wants an MTU as 1500, it should set the 'max_rx_pkt_len' as "1500
+ overhead".
> In fact, DEV_RX_OFFLOAD_JUMBO_FRAME is set to rxmode.offloads when mtu > 1500.
Overall the 'max_rx_pkt_len' setting is a little messy, it has been tried to fix
in last release but we reverted them because of unexpected side affects.
Current long term plan is,
- Switch both 'rte_eth_dev_configure()' & 'rte_eth_dev_set_mtu()' to work on MTU
values. (instead one working on frame lenght and other on MTU)
- Remove the 'DEV_RX_OFFLOAD_JUMBO_FRAME' and decide jumbo frame request by
requested MTU value check
- App decide the jumbo frame capability from the 'max_mtu' check
- App calculate Ethernet overhead using 'max_mtu' and 'max_rx_pktlen'
>> /*
>> @@ -1549,6 +1568,8 @@ rte_eth_dev_configure(uint16_t port_id, uint16_t
>> nb_rx_q, uint16_t nb_tx_q,
>> eth_dev_tx_queue_config(dev, 0);
>> rollback:
>> memcpy(&dev->data->dev_conf, &orig_conf, sizeof(dev->data->dev_conf));
>> + if (old_mtu != dev->data->mtu)
>> + dev->data->mtu = old_mtu;
>> rte_ethdev_trace_configure(port_id, nb_rx_q, nb_tx_q, dev_conf, ret);
>> return ret;
>>
More information about the dev
mailing list