[dpdk-dev] [RFC] ethdev: add min/max MTU to device info
Nithin Kumar Dabilpuram
ndabilpuram at marvell.com
Mon Jun 24 15:18:53 CEST 2019
Since Ian's change didn't include MTU interpretation issue, we want to see if we can send out a patch for this discussion to conclude as we have a similar problem.
Currently two ways are as below.
#1 Konstantin's solution of adding new api rte_ethdev_mtu_to_frame_size(devid, mtu), that would dynamically provide the new max mtu.
#2 Can we have the value of max_mtu field in rte_eth_dev_info be dynamic i.e based on device configure to indicate that new Tx offloads would need more space for L2.
For this we can update max_mtu field documentation without abi change.
I also see a problem with above two solutions that is not present with Linux like approach. It is that currently MTU configuration both effects Rx frame size and Tx frame size.
If we change the MTU interpretation based on Tx offloads, then for Rx, there is a confusion of what will be its max frame size accepted.
From: dev <dev-bounces at dpdk.org> On Behalf Of Ian Stokes
Sent: Wednesday, February 20, 2019 9:32 PM
To: Ananyev, Konstantin <konstantin.ananyev at intel.com>; Morten Brørup <mb at smartsharesystems.com>; Stephen Hemminger <stephen at networkplumber.org>
Cc: dev at dpdk.org; Mcnamara, John <john.mcnamara at intel.com>; Kovacevic, Marko <marko.kovacevic at intel.com>; Thomas Monjalon <thomas at monjalon.net>; Yigit, Ferruh <ferruh.yigit at intel.com>; Andrew Rybchenko <arybchenko at solarflare.com>
Subject: Re: [dpdk-dev] [RFC] ethdev: add min/max MTU to device info
On 2/7/2019 12:00 PM, Ananyev, Konstantin wrote:
>>> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Ananyev,
>>>>> From: dev on behalf of Stephen Hemminger
>>>>> On Wed, 6 Feb 2019 14:05:34 +0100
>>>>> Morten Brørup <mb at smartsharesystems.com> wrote:
>>>>>> Good work, Stephen.
>>>>>> It should also be documented how PMDs should interpret this MTU.
>>>>>> Obviously, a VLAN tagged Ethernet frame grows from 1518 to 1522
>>> bytes incl. header and CRC, and should be allowed with an Ethernet
>>>> MTU of 1500 bytes. There's even a #define ETHER_MAX_VLAN_FRAME_LEN
>>> for this, but that's as far as it goes...
>>>>>> But how about frames with even larger headers, e.g. 4 MPLS labels
>>> makes a frame 16 bytes longer, i.e. it grows from 1518 to 1534
>>>> bytes... is such a frame acceptable with an MTU of 1500 bytes?
>>>>> No. According to standard practice in Linux and FreeBSD, only the
>>> first VLAN tag is free.
>>>>> After that any other headers count against MTU.
>>>> Thank you for the insights. Just to clarify:
>>>> 1 VLAN tag is allowed for free.
>>>> But on order to support two VLAN tags, the MTU must be increased by
>>> the size of one VLAN tag, because the first VLAN tag is free?
>>>> Or must the MTU be increased by the size of two VLAN tags, because
>>> only the special case of exactly one VLAN tag is free?
>>> Can we introduce new function at ehtdev API to query PMD frame size
>>> based on MTU?
>>> Something like: rte_ethdev_mtu_to_frame_size(uint32_t mtu);
>>> Provide default behavior and allow PMD to overwrite it?
>> This assumes that the Layer 2 headers are fixed size. If you add e.g. an MPLS stack to the packet, the number of MPLS labels can vary, and
>> thus the size of the Layer 2 headers varies with each packet.
> If all this extra stuff (MPLS labels, etc.) is added by SW - then yes, it will be SW (upper layer) to take account about such overhead
> and it would be accounted by PMD frame-size caluclation.
> If these fields would be added by HW/PMD - then it would be PMD responsibility to take these extras into account when calculating
> frame size.
>> It is a problem if Layer 3/4 offload features make assumptions about the Layer 3/4 MTUs based on the Layer 2 MTU without considering the
>> size of the actual Ethernet headers of each packet, but simply assume that the Ethernet header size is fixed.
>> It might currently be calculated correctly for untagged or single VLAN tagged packets (assuming the VLAN tag is not already part of the
>> packet data, but is set in the mbuf for the NIC to add).
> That could be a default behavior for that function.
with a view to help progress this I've posted an RFC series based on
Stephens work and some of the PMD drivers
Any feedback welcome
More information about the dev