[dpdk-dev] [PATCH v3 01/22] ethdev: fix MTU size exceeds max rx packet length
oulijun
oulijun at huawei.com
Tue Jan 19 09:46:56 CET 2021
在 2021/1/18 18:42, Ferruh Yigit 写道:
> On 1/15/2021 10:44 AM, oulijun wrote:
>> Hi Steve
>> This is a very good job! But I have some question and suggestions.
>> Please check it.
>>
>> 在 2021/1/14 17:45, Steve Yang 写道:
>>> Ethdev is using default Ethernet overhead to decide if provided
>>> 'max_rx_pkt_len' value is bigger than max (non jumbo) MTU value,
>>> and limits it to MAX if it is.
>>>
>>> Since the application/driver used Ethernet overhead is different than
>>> the ethdev one, check result is wrong.
>>>
>>> If the driver is using Ethernet overhead bigger than the default one,
>>> the provided 'max_rx_pkt_len' is trimmed down, and in the driver when
>>> correct Ethernet overhead is used to convert back, the resulting MTU is
>>> less than the intended one, causing some packets to be dropped.
>>>
>>> Like,
>>> app -> max_rx_pkt_len = 1500/*mtu*/ + 22/*overhead*/ = 1522
>>> ethdev -> 1522 > 1518/*MAX*/; max_rx_pkt_len = 1518
>>> driver -> MTU = 1518 - 22 = 1496
>>> Packets with size 1497-1500 are dropped although intention is to be able
>>> to send/receive them.
>>>
>>> The fix is to make ethdev use the correct Ethernet overhead for port,
>>> instead of default one.
>>>
>>> Fixes: 59d0ecdbf0e1 ("ethdev: MTU accessors")
>>>
>>> Signed-off-by: Steve Yang <stevex.yang at intel.com>
>>> ---
>>> lib/librte_ethdev/rte_ethdev.c | 27 ++++++++++++++++++++++++---
>>> 1 file changed, 24 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/lib/librte_ethdev/rte_ethdev.c
>>> b/lib/librte_ethdev/rte_ethdev.c
>>> index 17ddacc78d..19ca4c4512 100644
>>> --- a/lib/librte_ethdev/rte_ethdev.c
>>> +++ b/lib/librte_ethdev/rte_ethdev.c
>>> @@ -1292,8 +1292,10 @@ rte_eth_dev_configure(uint16_t port_id,
>>> uint16_t nb_rx_q, uint16_t nb_tx_q,
>>> struct rte_eth_dev *dev;
>>> struct rte_eth_dev_info dev_info;
>>> struct rte_eth_conf orig_conf;
>>> + uint16_t overhead_len;
>>> int diag;
>>> int ret;
>>> + uint16_t old_mtu;
>>> RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
>>> @@ -1319,10 +1321,20 @@ rte_eth_dev_configure(uint16_t port_id,
>>> uint16_t nb_rx_q, uint16_t nb_tx_q,
>>> memcpy(&dev->data->dev_conf, dev_conf,
>>> sizeof(dev->data->dev_conf));
>>> + /* Backup mtu for rollback */
>>> + old_mtu = dev->data->mtu;
>>> +
>>> ret = rte_eth_dev_info_get(port_id, &dev_info);
>>> if (ret != 0)
>>> goto rollback;
>>> + /* Get the real Ethernet overhead length */
>>> + if (dev_info.max_mtu != UINT16_MAX &&
>>> + dev_info.max_rx_pktlen > dev_info.max_mtu)
>>> + overhead_len = dev_info.max_rx_pktlen - dev_info.max_mtu;
>>> + else
>>> + overhead_len = RTE_ETHER_HDR_LEN + RTE_ETHER_CRC_LEN;
>>> +
>> The Ethernet frame header length supported by each NIC may be
>> different, which cause the difference of allowed packets size and drop
>> you say.
>> The diference of Ethernet frame header length have an impact for the
>> maximum Ethernet frame length, which is a boundary value
>> of enabling jumbo frame through the ' max_rx_pkt_len '.
>> However, we need to do the same thing you do above to get the
>> overhead_len every time, which will
>> cause a lot of duplicate code in the framework and app. For examples,
>> parsing and processing for '--max-pkt-len=xxx' parameter,
>> and " cmd_config_max_pkt_len_parsed " in testpmd, and here modifying
>> here dev_configure API.
>> It's a little redundant and troublesome.
>>
>> Maybe, it is necessary for driver to directly report the supported
>> maximum Ethernet frame length by rte_dev_info_get API.
>> As following:
>> struct rte_eth_dev_info {
>> xxxx
>> /**
>> * The maximum Ethernet frame length supported by each
>> * driver varies with the Ethernet header length.
>> */
>> uint16_t eth_max_len;
>> xxxx
>> }
>>
>> int
>> rte_eth_dev_info_get(uint16_t port_id, struct rte_eth_dev_info *dev_info)
>> {
>> xxx
>> dev_info->min_mtu = RTE_ETHER_MIN_MTU;
>> dev_info->max_mtu = UINT16_MAX;
>> dev_info->eth_max_len = RTE_ETHER_MAX_LEN;
>> xxx
>> }
>>
>> And then:
>> xxx_devv_infos_get(struct rte_eth_dev *eth_dev, struct
>> rte_eth_dev_info *info)
>> {
>> xxx
>> info->eth_max_len = xxx_ETHER_MAX _LEN;
>> xxx
>> }
>> What do you think?
>
> Hi Lijun,
>
> The 'max_mtu' has been added in the past to solve exact same problem,
> perhaps it would be better to add something like 'eth_max_len' at that
> time.
>
> But as Steve mentioned we are planning to switch to more MTU based
> configuration, which should remove the need of the field you mentioned.
>
>>> /* If number of queues specified by application for both Rx and
>>> Tx is
>>> * zero, use driver preferred values. This cannot be done
>>> individually
>>> * as it is valid for either Tx or Rx (but not both) to be zero.
>>> @@ -1410,11 +1422,18 @@ rte_eth_dev_configure(uint16_t port_id,
>>> uint16_t nb_rx_q, uint16_t nb_tx_q,
>>> goto rollback;
>>> }
>>> } else {
>>> - if (dev_conf->rxmode.max_rx_pkt_len < RTE_ETHER_MIN_LEN ||
>>> - dev_conf->rxmode.max_rx_pkt_len > RTE_ETHER_MAX_LEN)
>>> + uint16_t pktlen = dev_conf->rxmode.max_rx_pkt_len;
>>> + if (pktlen < RTE_ETHER_MIN_MTU + overhead_len ||
>>> + pktlen > RTE_ETHER_MTU + overhead_len)
>>> /* Use default value */
>>> dev->data->dev_conf.rxmode.max_rx_pkt_len =
>>> - RTE_ETHER_MAX_LEN;
>>> + RTE_ETHER_MTU + overhead_len;
>>> + }
>>> +
>>> + /* Scale the MTU size to adapt max_rx_pkt_len */
>>> + if (dev_conf->rxmode.offloads & DEV_RX_OFFLOAD_JUMBO_FRAME) {
>>> + dev->data->mtu = dev->data->dev_conf.rxmode.max_rx_pkt_len -
>>> + overhead_len;
>>> }
>> Now that we update mtu here when jumbo frame enabled. It is necessary
>> to check validity of max_rx_pkt_len.
>> As following:
>> if (dev_conf->rxmode.offloads & DEV_RX_OFFLOAD_JUMBO_FRAME) {
>> if (dev_conf->rxmode.max_rx_pkt_len > dev_info.max_rx_pktlen ||
>> dev_conf->rxmode.max_rx_pkt_len <= RTE_ETHER_MTU +
>> overhead_len) {
>
> It is not clear from API that if "max_rx_pkt_len <= RTE_ETHER_MTU +
> overhead_len" is a valid value or not, the API says:
> "max_rx_pkt_len; /**< Only used if JUMBO_FRAME enabled. */"
>
> But it doesn't say that when 'DEV_RX_OFFLOAD_JUMBO_FRAME' is set,
> 'max_rx_pkt_len' should be less then 'RTE_ETHER_MTU'
>
The MTU setting entry does need to be adjusted. I agree! But I think
this place should be modified, and it's probably more reasonable.
Generally speaking, max_rx_pkt_len will be also updated with 'mtu +
overhead_len' also when application sets mtu > 1500, which means
JUMBO_FRAME is enabled.
Namely, JUMBO_FRAME is enabled based on mtu > 1500 or max_rx_pkt_len >
1500 + overhead_len.
Besides, having following check for "rxmode.offloads &
DEV_RX_OFFLOAD_JUMBO_FRAME == 0":
@@ -1410,11 +1422,18 @@ rte_eth_dev_configure(uint16_t port_id, uint16_t
nb_rx_q, uint16_t nb_tx_q,
goto rollback;
}
} else {
uint16_t pktlen = dev_conf->rxmode.max_rx_pkt_len;
if (pktlen < RTE_ETHER_MIN_MTU + overhead_len ||
pktlen > RTE_ETHER_MTU + overhead_len)
/* Use default value */
dev->data->dev_conf.rxmode.max_rx_pkt_len =
RTE_ETHER_MTU + overhead_len;
}
One of check condition for the branch is "pktlen > RTE_ETHER_MTU +
overhead_len ".
In view of the above two aspects, when application set
DEV_RX_OFFLOAD_JUMBO_FRAME in offload, the max_rx_pkt_len must be
greater than " RTE_ETHER_MTU + overhead_len "
Since DEV_RX_OFFLOAD_JUMBO_FRAME is not removed, so the check is necessary.
>> xxxx
>> }
>> } else {
>> xxx
>> }
>>
>> If it does not thing above, which will cause an unreasonable case.
>> Like,
>> dev_conf->rxmode.offloads & DEV_RX_OFFLOAD_JUMBO_FRAME == 1
>> dev_conf->rxmode.max_rx_pkt_len = 1500
>> overhead_len = 26
>> dev->data->mtu = dev_conf->rxmode.max_rx_pkt_len - overhead_len = 1500
>> - 26 = 1474
>>
>
> This looks correct, if the applicatin is requesting 'max_rx_pkt_len =
> 1500', the requeted MTU is 1474,
> if application wants an MTU as 1500, it should set the 'max_rx_pkt_len'
> as "1500 + overhead".
>
>> In fact, DEV_RX_OFFLOAD_JUMBO_FRAME is set to rxmode.offloads when mtu
>> > 1500.
>
> Overall the 'max_rx_pkt_len' setting is a little messy, it has been
> tried to fix in last release but we reverted them because of unexpected
> side affects.
>
> Current long term plan is,
> - Switch both 'rte_eth_dev_configure()' & 'rte_eth_dev_set_mtu()' to
> work on MTU values. (instead one working on frame lenght and other on MTU)
> - Remove the 'DEV_RX_OFFLOAD_JUMBO_FRAME' and decide jumbo frame request
> by requested MTU value check
> - App decide the jumbo frame capability from the 'max_mtu' check
> - App calculate Ethernet overhead using 'max_mtu' and 'max_rx_pktlen'
>
>>> /*
>>> @@ -1549,6 +1568,8 @@ rte_eth_dev_configure(uint16_t port_id,
>>> uint16_t nb_rx_q, uint16_t nb_tx_q,
>>> eth_dev_tx_queue_config(dev, 0);
>>> rollback:
>>> memcpy(&dev->data->dev_conf, &orig_conf,
>>> sizeof(dev->data->dev_conf));
>>> + if (old_mtu != dev->data->mtu)
>>> + dev->data->mtu = old_mtu;
>>> rte_ethdev_trace_configure(port_id, nb_rx_q, nb_tx_q, dev_conf,
>>> ret);
>>> return ret;
>>>
>
> .
>
More information about the dev
mailing list