[dpdk-dev] [PATCH v2 1/2] ethdev: add packet integrity checks
    Ori Kam 
    orika at nvidia.com
       
    Tue Apr 13 12:21:49 CEST 2021
    
    
  
> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit at intel.com>
> Sent: Tuesday, April 13, 2021 11:30 AM
> 
> On 4/13/2021 9:18 AM, Ori Kam wrote:
> > Hi Ferruh,
> >
> >> -----Original Message-----
> >> From: Ferruh Yigit <ferruh.yigit at intel.com>
> >> Subject: Re: [PATCH v2 1/2] ethdev: add packet integrity checks
> >>
> >> On 4/13/2021 8:12 AM, Ori Kam wrote:
> >>> Hi Ferruh,
> >>>
> >>>> -----Original Message-----
> >>>> From: Ferruh Yigit <ferruh.yigit at intel.com>
> >>>> Subject: Re: [PATCH v2 1/2] ethdev: add packet integrity checks
> >>>>
> >>>> On 4/12/2021 8:26 PM, Ori Kam wrote:
> >>>>> Hi Ferruh,
> >>>>>
> >>>>>> -----Original Message-----
> >>>>>> From: Ferruh Yigit <ferruh.yigit at intel.com>
> >>>>>> Subject: Re: [PATCH v2 1/2] ethdev: add packet integrity checks
> >>>>>>
> >>>>>> On 4/11/2021 6:34 PM, Gregory Etelson wrote:
> >>>>>>> From: Ori Kam <orika at nvidia.com>
> >>>>>>>
> >>>>>>> Currently, DPDK application can offload the checksum check,
> >>>>>>> and report it in the mbuf.
> >>>>>>>
> >>>>>>> However, as more and more applications are offloading some or all
> >>>>>>> logic and action to the HW, there is a need to check the packet
> >>>>>>> integrity so the right decision can be taken.
> >>>>>>>
> >>>>>>> The application logic can be positive meaning if the packet is
> >>>>>>> valid jump / do  actions, or negative if packet is not valid
> >>>>>>> jump to SW / do actions (like drop)  a, and add default flow
> >>>>>>> (match all in low priority) that will direct the miss packet
> >>>>>>> to the miss path.
> >>>>>>>
> >>>>>>> Since currenlty rte_flow works in positive way the assumtion is
> >>>>>>> that the postive way will be the common way in this case also.
> >>>>>>>
> >>>>>>> When thinking what is the best API to implement such feature,
> >>>>>>> we need to considure the following (in no specific order):
> >>>>>>> 1. API breakage.
> >>>>>>> 2. Simplicity.
> >>>>>>> 3. Performance.
> >>>>>>> 4. HW capabilities.
> >>>>>>> 5. rte_flow limitation.
> >>>>>>> 6. Flexability.
> >>>>>>>
> >>>>>>> First option: Add integrity flags to each of the items.
> >>>>>>> For example add checksum_ok to ipv4 item.
> >>>>>>>
> >>>>>>> Pros:
> >>>>>>> 1. No new rte_flow item.
> >>>>>>> 2. Simple in the way that on each item the app can see
> >>>>>>> what checks are available.
> >>>>>>>
> >>>>>>> Cons:
> >>>>>>> 1. API breakage.
> >>>>>>> 2. increase number of flows, since app can't add global rule and
> >>>>>>>        must have dedicated flow for each of the flow combinations, for
> >>>> example
> >>>>>>>        matching on icmp traffic or UDP/TCP  traffic with IPv4 / IPv6 will
> >>>>>>>        result in 5 flows.
> >>>>>>>
> >>>>>>> Second option: dedicated item
> >>>>>>>
> >>>>>>> Pros:
> >>>>>>> 1. No API breakage, and there will be no for some time due to having
> >>>>>>>        extra space. (by using bits)
> >>>>>>> 2. Just one flow to support the icmp or UDP/TCP traffic with IPv4 /
> >>>>>>>        IPv6.
> >>>>>>> 3. Simplicity application can just look at one place to see all possible
> >>>>>>>        checks.
> >>>>>>> 4. Allow future support for more tests.
> >>>>>>>
> >>>>>>> Cons:
> >>>>>>> 1. New item, that holds number of fields from different items.
> >>>>>>>
> >>>>>>> For starter the following bits are suggested:
> >>>>>>> 1. packet_ok - means that all HW checks depending on packet layer
> have
> >>>>>>>        passed. This may mean that in some HW such flow should be
> splited
> >> to
> >>>>>>>        number of flows or fail.
> >>>>>>> 2. l2_ok - all check flor layer 2 have passed.
> >>>>>>> 3. l3_ok - all check flor layer 2 have passed. If packet doens't have
> >>>>>>>        l3 layer this check shoudl fail.
> >>>>>>> 4. l4_ok - all check flor layer 2 have passed. If packet doesn't
> >>>>>>>        have l4 layer this check should fail.
> >>>>>>> 5. l2_crc_ok - the layer 2 crc is O.K. it is possible that the crc will
> >>>>>>>        be O.K. but the l3_ok will be 0. it is not possible that l2_crc_ok will
> >>>>>>>        be 0 and the l3_ok will be 0.
> >>>>>>> 6. ipv4_csum_ok - IPv4 checksum is O.K.
> >>>>>>> 7. l4_csum_ok - layer 4 checksum is O.K.
> >>>>>>> 8. l3_len_OK - check that the reported layer 3 len is smaller than the
> >>>>>>>        packet len.
> >>>>>>>
> >>>>>>> Example of usage:
> >>>>>>> 1. check packets from all possible layers for integrity.
> >>>>>>>        flow create integrity spec packet_ok = 1 mask packet_ok = 1 .....
> >>>>>>>
> >>>>>>> 2. Check only packet with layer 4 (UDP / TCP)
> >>>>>>>        flow create integrity spec l3_ok = 1, l4_ok = 1 mask l3_ok = 1
> l4_ok =
> >> 1
> >>>>>>>
> >>>>>>
> >>>>>> Hi Ori,
> >>>>>>
> >>>>>> Is the intention of the API just filtering, like apply some action to the
> >>>>>> packets based on their integration status. Like drop packets their l2_crc
> >>>>>> checksum failed? Here configuration is done by existing offload APIs.
> >>>>>>
> >>>>>> Or is the intention to configure the integration check on NIC, like to say
> >>>>>> enable layer 2 checks, and do the action based on integration check
> >> status.
> >>>>>>
> >>>>> If I understand your question the first case is the one that this patch is
> >>>> targeting.
> >>>>> meaning based on those bits route/apply actions to the packet while still
> in
> >>>> the
> >>>>> HW.
> >>>>>
> >>>>> This is not design to enable the queue status bits.
> >>>>> In the use case suggestion by this patch, just like you said the app
> >>>>> can decide to drop the packet before arriving to the queue, application
> >> may
> >>>> also
> >>>>> use the mark + queue action to mark to the SW what is the issue with
> this
> >>>> packet.
> >>>>>
> >>>>> I'm not sure I understand your comment about "here configuration is
> done
> >> by
> >>>> existing
> >>>>> offload API" do you mean like the drop /  jump to table / any other
> rte_flow
> >>>> action?
> >>>>>
> >>>>>
> >>>>
> >>>> I am asking because difference between device configuration and packet
> >>>> filtering
> >>>> seems getting more blurred in the flow API.
> >>>>
> >>>> Currently L4 checksum offload is requested by application via setting
> >>>> 'DEV_RX_OFFLOAD_TCP_CKSUM' (UDP/SCTP/...) offload flags. This is the
> >> way
> >>>> to
> >>>> configure HW.
> >>>>
> >>>> Is the intention of this patch doing packet filtering after device configured
> >>>> with above offload API?
> >>>>
> >>>> Or is the intention HW to be configured via flow API, like if "l4_ok = 1" is
> set
> >>>> in the rule, will it enable L4 checks first and do the filtering later?
> >>>>
> >>>> If not what is the expected behavior when integration checks are not
> >> enabled
> >>>> when the rule is created?
> >>>>
> >>>
> >>> Let me try to explain it in a different way:
> >>> When application enables 'DEV_RX_OFFLOAD_TCP_CKSUM' (UDP/SCTP/...)
> >> offload flags
> >>> It only means that the HW will report the checksum in the mbuf.
> >>
> >> It is not only for reporting in mbuf, it also configures the HW to enable the
> >> relevant checks. This is for device configuration.
> >>
> >> Are these checks always enabled in mlx devices?
> >>
> > Yes, they are always on.
> > The RX offload just enables the RX burst to update the mbuf.
> > In any case if HW needs to be enabled, then
> > the PMD can enable the HW when first flow is inserted and just not
> > copy the value the mbuf.
> >
> 
> This was my initial question, if the new rule is just for filtering or
> "configuring device + filtering".
> 
> So, two questions:
> 1) Do we want to duplicate the way to configure the HW checks
> 2) Do we want want to use flow API as a device configuration interface, (as said
> before I am feeling we are going that path more and more)
> 
> Since the HW is always on in your case that is not a big problem for you but (1)
> can cause trouble for other vendors, for the cases like if HW check enabled via
> flow API, later it is disabled by clearing the offload flag, what will be the
> status of the flow rule?
> 
1. from my point of view it is not duplication since if the RX offload is enabled then there is also
setting of the metadata in the RX.
While if the RX offload is not enabled then even if the HW is working there is no need to copy
the result to the mbuf fields.
2. see my response above.
I think that since rte_flow is working with the HW, at any case creating flow may need to enable some HW.
I think at any case when offloading using queues the device must be stopped (at least in some
HW) and if the device is stopped in any case all flows are removed so I don't see 
the dependency between the rx queue offload and the rte_flow one.
like I said even if the HW is enabled it doesn't mean the mbuf should be updated
(the mbuf should be updated only if the queue offload was enabled. )
> >>> Lets call this mode RX queue offload.
> >>>
> >>> Now I'm introducing rte_flow offload,
> >>> This means that the application can create rte_flow that matches those
> >>> bits and based on that take different actions that are defined by the
> rte_flow.
> >>> Example for such a flow
> >>> Flow create 0 ingress pattern integrity spec packet_ok = 0 mask packet_ok
> =
> >> 1 / end actions count / drop / end
> >>>
> >>> Offloading such a flow will result that all invalid packets will be counted
> and
> >> dropped, in the HW
> >>> so even if the RX queue offload was enabled, no invalid packets will arrive
> to
> >> the application.
> >>>
> >>> In other case lets assume that the application wants all valid packets to
> jump
> >> to the next group,
> >>> and all the reset of the packets will go to SW. (we can assume that later in
> >> the pipeline also valid packets
> >>> will arrive to the application)
> >>> Flow create 0 ingress pattern integrity spec packet_ok = 1 mask packet_ok
> =
> >> 1 / end actions  jump group 1 / end
> >>> Flow create 0 priority 1 pattern eth / end actions rss / end
> >>>
> >>> In this case if the application enabled the RX offload then if the application
> >> will receive invalid packet
> >>> also the flag in the mbuf will be set.
> >>>
> >>
> >> If application not enabled the Rx offload that means HW checks are not
> >> enabled,
> >> so HW can't know if a packet is OK or not, right.
> >> In that case, for above rule, I expect none of the packets to match, so none
> >> will jump to next group. Are we in the same page here?
> >>
> >> Or do you expect above rule configure the HW to enable the relevant HW
> >> checks first?
> >>
> > If this is required by HW then yes.
> > please see my answer above.
> >
> >>> As you can see those two offloads mode are complementary to each other
> >> and one doesn't force the other one in any
> >>> way.
> >>>
> >>> I hope this is clearer.
> >>>
> >>>
> >>>>>>
> >>>>>>> Signed-off-by: Ori Kam <orika at nvidia.com>
> >>>>>>> ---
> >>>>>>> v2: fix compilation error
> >>>>>>> ---
> >>>>>>>      doc/guides/prog_guide/rte_flow.rst | 19 ++++++++++++
> >>>>>>>      lib/librte_ethdev/rte_flow.h       | 47
> >>>> ++++++++++++++++++++++++++++++
> >>>>>>>      2 files changed, 66 insertions(+)
> >>>>>>>
> >>>>>>> diff --git a/doc/guides/prog_guide/rte_flow.rst
> >>>>>> b/doc/guides/prog_guide/rte_flow.rst
> >>>>>>> index e1b93ecedf..87ef591405 100644
> >>>>>>> --- a/doc/guides/prog_guide/rte_flow.rst
> >>>>>>> +++ b/doc/guides/prog_guide/rte_flow.rst
> >>>>>>> @@ -1398,6 +1398,25 @@ Matches a eCPRI header.
> >>>>>>>      - ``hdr``: eCPRI header definition (``rte_ecpri.h``).
> >>>>>>>      - Default ``mask`` matches nothing, for all eCPRI messages.
> >>>>>>>
> >>>>>>> +Item: ``PACKET_INTEGRITY_CHECKS``
> >>>>>>> +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> >>>>>>> +
> >>>>>>> +Matches packet integrity.
> >>>>>>> +
> >>>>>>> +- ``level``: the encapsulation level that should be checked. level 0
> >> means
> >>>> the
> >>>>>>> +  default PMD mode (Can be inner most / outermost). value of 1
> means
> >>>>>> outermost
> >>>>>>> +  and higher value means inner header. See also RSS level.
> >>>>>>> +- ``packet_ok``: All HW packet integrity checks have passed based on
> >> the
> >>>>>> max
> >>>>>>> +  layer of the packet.
> >>>>>>> +  layer of the packet.
> >>>>>>> +- ``l2_ok``: all layer 2 HW integrity checks passed.
> >>>>>>> +- ``l3_ok``: all layer 3 HW integrity checks passed.
> >>>>>>> +- ``l4_ok``: all layer 3 HW integrity checks passed.
> >>>>>>
> >>>>>> s/layer 3/ layer 4/
> >>>>>>
> >>>>> Will fix.
> >>>>>
> >>>>>>> +- ``l2_crc_ok``: layer 2 crc check passed.
> >>>>>>> +- ``ipv4_csum_ok``: ipv4 checksum check passed.
> >>>>>>> +- ``l4_csum_ok``: layer 4 checksum check passed.
> >>>>>>> +- ``l3_len_ok``: the layer 3 len is smaller than the packet len.
> >>>>>>> +
> >>>>>>>      Actions
> >>>>>>>      ~~~~~~~
> >>>>>>>
> >>>>>>> diff --git a/lib/librte_ethdev/rte_flow.h b/lib/librte_ethdev/rte_flow.h
> >>>>>>> index 6cc57136ac..77471af2c4 100644
> >>>>>>> --- a/lib/librte_ethdev/rte_flow.h
> >>>>>>> +++ b/lib/librte_ethdev/rte_flow.h
> >>>>>>> @@ -551,6 +551,15 @@ enum rte_flow_item_type {
> >>>>>>>      	 * See struct rte_flow_item_geneve_opt
> >>>>>>>      	 */
> >>>>>>>      	RTE_FLOW_ITEM_TYPE_GENEVE_OPT,
> >>>>>>> +
> >>>>>>> +	/**
> >>>>>>> +	 * [META]
> >>>>>>> +	 *
> >>>>>>> +	 * Matches on packet integrity.
> >>>>>>> +	 *
> >>>>>>> +	 * See struct rte_flow_item_integrity.
> >>>>>>> +	 */
> >>>>>>> +	RTE_FLOW_ITEM_TYPE_INTEGRITY,
> >>>>>>>      };
> >>>>>>>
> >>>>>>>      /**
> >>>>>>> @@ -1685,6 +1694,44 @@ rte_flow_item_geneve_opt_mask = {
> >>>>>>>      };
> >>>>>>>      #endif
> >>>>>>>
> >>>>>>> +__extension__
> >>>>>>> +struct rte_flow_item_integrity {
> >>>>>>> +	uint32_t level;
> >>>>>>> +	/**< Packet encapsulation level the item should apply to.
> >>>>>>> +	 * @see rte_flow_action_rss
> >>>>>>> +	 */
> >>>>>>> +	union {
> >>>>>>> +		struct {
> >>>>>>> +			uint64_t packet_ok:1;
> >>>>>>> +			/** The packet is valid after passing all HW
> >> checks. */
> >>>>>>> +			uint64_t l2_ok:1;
> >>>>>>> +			/**< L2 layer is valid after passing all HW
> >> checks. */
> >>>>>>> +			uint64_t l3_ok:1;
> >>>>>>> +			/**< L3 layer is valid after passing all HW
> >> checks. */
> >>>>>>> +			uint64_t l4_ok:1;
> >>>>>>> +			/**< L4 layer is valid after passing all HW
> >> checks. */
> >>>>>>> +			uint64_t l2_crc_ok:1;
> >>>>>>> +			/**< L2 layer checksum is valid. */
> >>>>>>> +			uint64_t ipv4_csum_ok:1;
> >>>>>>> +			/**< L3 layer checksum is valid. */
> >>>>>>> +			uint64_t l4_csum_ok:1;
> >>>>>>> +			/**< L4 layer checksum is valid. */
> >>>>>>> +			uint64_t l3_len_ok:1;
> >>>>>>> +			/**< The l3 len is smaller than the packet len.
> >> */
> >>>>>>
> >>>>>> packet len?
> >>>>>>
> >>>>> Do you mean replace the l3_len_ok with packet len?
> >>>>
> >>>> no, I was trying to ask what is "packet len" here? frame length, or mbuf
> >> buffer
> >>>> length, or something else?
> >>>>
> >>> Frame length.
> >>>
> >>>>> My only issue is that the check is comparing the l3 len to the packet len.
> >>>>>
> >>>>> If you still think it is better to call it packet len, I'm also O.K with it.
> >>>>>
> >>>>>>> +			uint64_t reserved:56;
> >>>>>>> +		};
> >>>>>>> +		uint64_t  value;
> >>>>>>> +	};
> >>>>>>> +};
> >>>>>>> +
> >>>>>>> +#ifndef __cplusplus
> >>>>>>> +static const struct rte_flow_item_integrity
> >>>>>>> +rte_flow_item_integrity_mask = {
> >>>>>>> +	.level = 0,
> >>>>>>> +	.value = 0,
> >>>>>>> +};
> >>>>>>> +#endif
> >>>>>>> +
> >>>>>>>      /**
> >>>>>>>       * Matching pattern item definition.
> >>>>>>>       *
> >>>>>>>
> >>>>>
> >>>
> >
    
    
More information about the dev
mailing list