[dpdk-dev] [EXT] Re: [PATCH] RFC: ethdev: add reassembly offload
Andrew Rybchenko
andrew.rybchenko at oktetlabs.ru
Mon Sep 13 09:22:14 CEST 2021
On 9/13/21 9:56 AM, Xu, Rosen wrote:
> Hi,
>
>> -----Original Message-----
>> From: Anoob Joseph <anoobj at marvell.com>
>> Sent: Wednesday, September 08, 2021 18:30
>> To: Yigit, Ferruh <ferruh.yigit at intel.com>; Xu, Rosen <rosen.xu at intel.com>;
>> Andrew Rybchenko <arybchenko at solarflare.com>
>> Cc: Nicolau, Radu <radu.nicolau at intel.com>; Doherty, Declan
>> <declan.doherty at intel.com>; hemant.agrawal at nxp.com;
>> matan at nvidia.com; Ananyev, Konstantin <konstantin.ananyev at intel.com>;
>> thomas at monjalon.net; Ankur Dwivedi <adwivedi at marvell.com>;
>> andrew.rybchenko at oktetlabs.ru; Akhil Goyal <gakhil at marvell.com>;
>> dev at dpdk.org
>> Subject: RE: [EXT] Re: [PATCH] RFC: ethdev: add reassembly offload
>>
>> Hi Ferruh, Rosen, Andrew,
>>
>> Please see inline.
>>
>> Thanks,
>> Anoob
>>
>>> Subject: [EXT] Re: [PATCH] RFC: ethdev: add reassembly offload
>>>
>>> External Email
>>>
>>> ----------------------------------------------------------------------
>>> On 8/23/2021 11:02 AM, Akhil Goyal wrote:
>>>> Reassembly is a costly operation if it is done in software, however,
>>>> if it is offloaded to HW, it can considerably save application cycles.
>>>> The operation becomes even more costlier if IP fragmants are
>>>> encrypted.
>>>>
>>>> To resolve above two issues, a new offload
>>> DEV_RX_OFFLOAD_REASSEMBLY
>>>> is introduced in ethdev for devices which can attempt reassembly of
>>>> packets in hardware.
>>>> rte_eth_dev_info is added with the reassembly capabilities which a
>>>> device can support.
>>>> Now, if IP fragments are encrypted, reassembly can also be attempted
>>>> while doing inline IPsec processing.
>>>> This is controlled by a flag in rte_security_ipsec_sa_options to
>>>> enable reassembly of encrypted IP fragments in the inline path.
>>>>
>>>> The resulting reassembled packet would be a typical segmented mbuf
>>>> in case of success.
>>>>
>>>> And if reassembly of fragments is failed or is incomplete (if
>>>> fragments do not come before the reass_timeout), the mbuf is updated
>>>> with an ol_flag PKT_RX_REASSEMBLY_INCOMPLETE and mbuf is returned
>>> as
>>>> is. Now application may decide the fate of the packet to wait more
>>>> for fragments to come or drop.
>>>>
>>>> Signed-off-by: Akhil Goyal <gakhil at marvell.com>
>>>> ---
>>>> lib/ethdev/rte_ethdev.c | 1 +
>>>> lib/ethdev/rte_ethdev.h | 18 +++++++++++++++++-
>>>> lib/mbuf/rte_mbuf_core.h | 3 ++-
>>>> lib/security/rte_security.h | 10 ++++++++++
>>>> 4 files changed, 30 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c index
>>>> 9d95cd11e1..1ab3a093cf 100644
>>>> --- a/lib/ethdev/rte_ethdev.c
>>>> +++ b/lib/ethdev/rte_ethdev.c
>>>> @@ -119,6 +119,7 @@ static const struct {
>>>> RTE_RX_OFFLOAD_BIT2STR(VLAN_FILTER),
>>>> RTE_RX_OFFLOAD_BIT2STR(VLAN_EXTEND),
>>>> RTE_RX_OFFLOAD_BIT2STR(JUMBO_FRAME),
>>>> + RTE_RX_OFFLOAD_BIT2STR(REASSEMBLY),
>>>> RTE_RX_OFFLOAD_BIT2STR(SCATTER),
>>>> RTE_RX_OFFLOAD_BIT2STR(TIMESTAMP),
>>>> RTE_RX_OFFLOAD_BIT2STR(SECURITY),
>>>> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h index
>>>> d2b27c351f..e89a4dc1eb 100644
>>>> --- a/lib/ethdev/rte_ethdev.h
>>>> +++ b/lib/ethdev/rte_ethdev.h
>>>> @@ -1360,6 +1360,7 @@ struct rte_eth_conf {
>>>> #define DEV_RX_OFFLOAD_VLAN_FILTER 0x00000200
>>>> #define DEV_RX_OFFLOAD_VLAN_EXTEND 0x00000400
>>>> #define DEV_RX_OFFLOAD_JUMBO_FRAME 0x00000800
>>>> +#define DEV_RX_OFFLOAD_REASSEMBLY 0x00001000
>>>
>>> previous '0x00001000' was 'DEV_RX_OFFLOAD_CRC_STRIP', it has been
>> long
>>> that offload has been removed, but not sure if it cause any problem to
>>> re- use it.
>>>
>>>> #define DEV_RX_OFFLOAD_SCATTER 0x00002000
>>>> /**
>>>> * Timestamp is set by the driver in
>>> RTE_MBUF_DYNFIELD_TIMESTAMP_NAME
>>>> @@ -1477,6 +1478,20 @@ struct rte_eth_dev_portconf {
>>>> */
>>>> #define RTE_ETH_DEV_SWITCH_DOMAIN_ID_INVALID
>>> (UINT16_MAX)
>>>>
>>>> +/**
>>>> + * Reassembly capabilities that a device can support.
>>>> + * The device which can support reassembly offload should set
>>>> + * DEV_RX_OFFLOAD_REASSEMBLY
>>>> + */
>>>> +struct rte_eth_reass_capa {
>>>> + /** Maximum time in ns that a fragment can wait for further
>>> fragments */
>>>> + uint64_t reass_timeout;
>>>> + /** Maximum number of fragments that device can reassemble */
>>>> + uint16_t max_frags;
>>>> + /** Reserved for future capabilities */
>>>> + uint16_t reserved[3];
>>>> +};
>>>> +
>>>
>>> I wonder if there is any other hardware around supports reassembly
>>> offload, it would be good to get more feedback on the capabilities list.
>>>
>>>> /**
>>>> * Ethernet device associated switch information
>>>> */
>>>> @@ -1582,8 +1597,9 @@ struct rte_eth_dev_info {
>>>> * embedded managed interconnect/switch.
>>>> */
>>>> struct rte_eth_switch_info switch_info;
>>>> + /* Reassembly capabilities of a device for reassembly offload */
>>>> + struct rte_eth_reass_capa reass_capa;
>>>>
>>>> - uint64_t reserved_64s[2]; /**< Reserved for future fields */
>>>
>>> Reserved fields were added to be able to update the struct without
>>> breaking the ABI, so that a critical change doesn't have to wait until
>>> next ABI break release.
>>> Since this is ABI break release, we can keep the reserved field and
>>> add the new struct. Or this can be an opportunity to get rid of the reserved
>> field.
>>>
>>> Personally I have no objection to get rid of the reserved field, but
>>> better to agree on this explicitly.
>>>
>>>> void *reserved_ptrs[2]; /**< Reserved for future fields */
>>>> };
>>>>
>>>> diff --git a/lib/mbuf/rte_mbuf_core.h b/lib/mbuf/rte_mbuf_core.h
>>>> index
>>>> bb38d7f581..cea25c87f7 100644
>>>> --- a/lib/mbuf/rte_mbuf_core.h
>>>> +++ b/lib/mbuf/rte_mbuf_core.h
>>>> @@ -200,10 +200,11 @@ extern "C" {
>>>> #define PKT_RX_OUTER_L4_CKSUM_BAD (1ULL << 21)
>>>> #define PKT_RX_OUTER_L4_CKSUM_GOOD (1ULL << 22)
>>>> #define PKT_RX_OUTER_L4_CKSUM_INVALID ((1ULL << 21) | (1ULL
>>> << 22))
>>>> +#define PKT_RX_REASSEMBLY_INCOMPLETE (1ULL << 23)
>>>>
>>>
>>> Similar comment with Andrew's, what is the expectation from
>>> application if this flag exists? Can we drop it to simplify the logic in the
>> application?
>>
>> [Anoob] There can be few cases where hardware/NIC attempts inline
>> reassembly but it fails to complete it
>>
>> 1. Number of fragments is larger than what is supported by the hardware 2.
>> Hardware reassembly resources are exhausted (due to limited reassembly
>> contexts etc) 3. Reassembly errors such as overlapping fragments 4. Wait
>> time exhausted (or reassembly timeout)
>>
>> In such cases, application would be required to retrieve the original
>> fragments so that it can attempt reassembly in software. The incomplete flag
>> is useful for 2 purposes basically, 1. Application would need to retrieve the
>> time the fragment has already spend in hardware reassembly so that
>> software reassembly attempt can compensate for it. Otherwise, reassembly
>> timeout across hardware + software will not be accurate
Could you clarify how application will find out the time spent
in HW.
>> 2. Retrieve original
>> fragments. With this proposal, an incomplete reassembly would result in a
>> chained mbuf but the segments need not be consecutive. To explain bit more,
>>
>> Suppose we have a packet that is fragmented into 3 fragments, and fragment
>> 3 & fragment 1 arrives in that order. Fragment 2 didn't arrive and hardware
>> ultimately pushes it. In that case, application would be receiving a
>> chained/segmented mbuf with fragment 1 & fragment 3 chained.
>>
>> Now, this chained mbuf can't be treated like a regular chained mbuf. Each
>> fragment would have its IP hdr and there are fragments missing in between.
>> The only thing application is expected to do is, retrieve fragments, push it to
>> s/w reassembly.
It sounds like it conflicts with SCATTER and BUFFER_SPLIT
offloads which allow to return chained mbuf's. Don't know
if it is good or bad, but anyway it must be documented.
>
> What you mentioned is error identification. But actually a negotiation about max frame size is needed before datagrams tx/rx.
It sounds like it is OK for informational purposes, but
right now I don't understand how it could be used by the
application. Application still has to support reassembly
in SW regardless of the information.
>>>
>>>> /* add new RX flags here, don't forget to update PKT_FIRST_FREE */
>>>>
>>>> -#define PKT_FIRST_FREE (1ULL << 23)
>>>> +#define PKT_FIRST_FREE (1ULL << 24)
>>>> #define PKT_LAST_FREE (1ULL << 40)
>>>>
>>>> /* add new TX flags here, don't forget to update PKT_LAST_FREE */
>>>> diff --git a/lib/security/rte_security.h
>>>> b/lib/security/rte_security.h index 88d31de0a6..364eeb5cd4 100644
>>>> --- a/lib/security/rte_security.h
>>>> +++ b/lib/security/rte_security.h
>>>> @@ -181,6 +181,16 @@ struct rte_security_ipsec_sa_options {
>>>> * * 0: Disable per session security statistics collection for this SA.
>>>> */
>>>> uint32_t stats : 1;
>>>> +
>>>> + /** Enable reassembly on incoming packets.
>>>> + *
>>>> + * * 1: Enable driver to try reassembly of encrypted IP packets for
>>>> + * this SA, if supported by the driver. This feature will work
>>>> + * only if rx_offload DEV_RX_OFFLOAD_REASSEMBLY is set in
>>>> + * inline ethernet device.
>>>> + * * 0: Disable reassembly of packets (default).
>>>> + */
>>>> + uint32_t reass_en : 1;
>>>> };
>>>>
>>>> /** IPSec security association direction */
>>>>
>
More information about the dev
mailing list