[dpdk-dev] questions about new offload ethdev api

Andrew Rybchenko arybchenko at solarflare.com
Mon Dec 16 09:39:05 CET 2019
Previous message: [dpdk-dev] questions about new offload ethdev api
Next message: [dpdk-dev] questions about new offload ethdev api
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On 12/10/19 9:07 PM, Ferruh Yigit wrote:
> On 1/23/2018 2:34 PM, Shahaf Shuler wrote:
>> Tuesday, January 23, 2018 3:53 PM, Olivier Matz:
> 
> <...>
> 
>>>
>>> 2/ meaning of rxmode.jumbo_frame, rxmode.enable_scatter, 
>>> rxmode.max_rx_pkt_len
>>>
>>> While it's not related to the new API, it is probably a good opportunity
>>> to clarify the meaning of these flags. I'm not able to find a good
>>> documentation about them.
>>>
>>> Here is my understanding, the configuration only depends on: - the maximum
>>> rx frame length - the amount of data available in a mbuf (minus headroom)
>>>
>>> Flags to set in rxmode (example): 
>>> +---------------+----------------+----------------+-----------------+ |
>>> |mbuf_data_len=1K|mbuf_data_len=2K|mbuf_data_len=16K| 
>>> +---------------+----------------+----------------+-----------------+ 
>>> |max_rx_len=1500|enable_scatter  |                |                 | 
>>> +---------------+----------------+----------------+-----------------+ 
>>> |max_rx_len=9000|enable_scatter, |enable_scatter, |jumbo_frame      | |
>>> |jumbo_frame     |jumbo_frame     |                 | 
>>> +---------------+----------------+----------------+-----------------+
>>>
>>> If this table is correct, the flag jumbo_frame would be equivalent to check
>>> if max_rx_pkt_len is above a threshold.
>>>
>>> And enable_scatter could be deduced from the mbuf size of the given rxq 
>>> (which is a bit harder but maybe doable).
>>
>> I glad you raised this subject. We had a lot of discussion on it internally
>> in Mellanox.
>>
>> I fully agree. All application needs is to specify the maximum packet size it
>> wants to receive.
>>
>> I think also the lack of documentation is causing PMDs to use those flags
>> wrongly. For example - some PMDs set the jumbo_frame flag internally without
>> it being set by the application.
>>
>> I would like to add one more item : MTU. What is the relation (if any)
>> between setting MTU and the max_rx_len ? I know MTU stands for Max Transmit
>> Unit, however at least in Linux it is the same for the Send and the receive.
>>
>>
> 
> (Resurrecting the thread after two years, I will reply again with latest
> understanding.)
> 
> Thanks Olivier for above summary and table, and unfortunately usage still not
> consistent between PMDs. According my understanding:
> 
> 'max_rx_pkt_len' is user configuration value, to limit the size packet that is
> shared with host, but this doesn't limit the size of packet that NIC receives.

Also comment in lib/librte_ethdev/rte_ethdev.h says that the
rxmode field is used if (and I think only if) JUMBO_FRAME is
enabled. So, if user wants to set it on device configure stage,
device *must* support JUMBO_FRAME offload which mean that
driver code handles rxmode.max_rx_pkt_len and either accept it
and configures HW appropriately or return an error if specified
value is wrong. Basically it is written in jumbo frame feature
definition in features.rst. User has max_rx_pktlen in dev_info
to find out maximum supported value for max_rx_pkt_len.

> Like if the mbuf size of the mempool used by a queue is 1024 bytes, we don't
> want packets bigger than buffer size, but if NIC supports it is possible receive
> 6000 bytes packet and split data into multiple buffers, and we can use multi
> segment packets to represent it.
> So what we need is NIC ability to limit the size of data to share to host and
> scattered Rx support (device + driver).

It requires RX_SCATTER offload enabled and it must be
controlled by the user only (not PMD) since it basically
mean if the application is ready to handle multi-segment
packets (have code which takes a look at the number of
segments and next pointers etc). Moreover, application
may disable MULTI_SEG Tx offload (and drivers may ignore
number of segments and next pointer as well).

> But MTU limits the size of the packet that NIC receives.

Personally I've never treated it this way. For me the only
difference between max_rx_pkt_len and MTU is:
 - max_rx_pkt_len is entire packet with all L2 headers and
   even FCS (basically everything which could be provided
   to application in mbuf)
 - MTU does not cover L2 (and VLANs, I'm not sure about MPLS)

> Assuming above are correct J,
> 
> Using mbuf data size as 'max_rx_pkt_len' without asking from user is an option,
> but perhaps user has different reason to limit packet size, so I think better to
> keep as different config option.
> 
> I think PMD itself enabling "jumbo frame" offload is not too bad, and
> acceptable, since providing a large MTU already implies it.

Yes

> But not sure about PMD enabling scattered Rx, application may want to force to
> receive single segment mbufs, for that case PMD enabling this config on its own
> looks like a problem.

Yes

> But user really needs this when a packet doesn't fit to the mbuf, so providing a
> MTU larger than 'max_rx_pkt_len' _may_ imply enabling scattered Rx, I assume
> this is the logic in some PMDs which looks acceptable.

I don't think so. IMO auto enabling Rx scatter from PMD is a
breakage of a contract between application and driver.
As stated above the application may be simply not ready to
handle multi-segment packets correctly.

I think that providing an MTU larger than 'max_rx_pkt_len' is simply a
change of max_rx_pkt_len = (MTU plus space for L2+).

> And PMD behavior should be according for mentioned configs:
> 
> 1) Don't change user provided 'max_rx_pkt_len' value

I have no strong opinion. However, it is important to clarify
which understanding of max_rx_pkt_len vs MTU is the right one.

> 2) If jumbo frame is not enabled, don't limit the size of packets to the host (I
> think this is based on assumption that mbuf size always will be > 1514)

I think that JUMBO_FRAME is not relevant here. It is just a
promise to take a look at max_rx_pkt_len on configure or
start stage.

> 3) When user request to set the MTU bigger than ETH_MAX, PMD enable jumbo frame
> support (if it is not enabled by user already and supported by HW). If HW
> doesn't support if of course it should fail.

I'm not sure which ETH_MAX is mentioned above.
#define ETH_MAX_MTU 0xFFFFU /* 65535, same as IP_MAX_MTU */
or do you mean
#define ETH_FRAME_LEN 1514 /* Max. octets in frame sans FCS */
or even
#define ETH_DATA_LEN 1500 /* Max. octets in payload */

We should be careful when we talk about Ethernet lengths and
MTU.

> 4) When user request to set MTU bigger than 'max_rx_pkt_len'

I think the second parameter to consider here is not
max_rx_pkt_len, but amount of space for data in single
mbuf (for all Rx queues).

> 4a) if "scattered Rx" is enabled, configure the MTU and limit packet size to
> host to 'max_rx_pkt_len'

Yes and no. IMO configure the MTU and bump max_rx_pkt_len.

> 4b) if "scattered Rx" is not enabled but HW supports it, enable "scattered Rx"
> by PMD, configure the MTU and limit packet size to host to 'max_rx_pkt_len'

No, I think it is wrong to enable Rx scatter from PMD.

> 4c) if "scattered Rx" is not enabled and not supported by HW, fail MTU set.

Yes, regardless support in HW.

> 4d) if HW doesn't support to limit the packet size to host, but requested MTU
> bigger than 'max_rx_pkt_len' it should fail.

I would rephrase it as impossibility to disable Rx scatter.
If so, it must be driver responsibility to drop scattered
packets if Rx scatter offload is not enabled.

> Btw, I am aware of that some PMDs have a larger MTU by default and can't limit
> the packet size to host to 'max_rx_pkt_len' value, I don't know what to do in
> that case, fail in configure? Or at least be sure configured mempool's mbuf size
> is big enough?

See above.

Thanks for reminder about the topic.
Previous message: [dpdk-dev] questions about new offload ethdev api
Next message: [dpdk-dev] questions about new offload ethdev api
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the dev mailing list