[dpdk-dev] [EXT] Re: [PATCH 1/3] mbuf: add Tx offloads for packet marking

Olivier Matz olivier.matz at 6wind.com
Thu May 14 22:29:31 CEST 2020


Hi Nithin,

On Tue, May 05, 2020 at 11:49:20AM +0530, Nithin Dabilpuram wrote:
> On Mon, May 04, 2020 at 02:27:35PM +0200, Olivier Matz wrote:
> > On Mon, May 04, 2020 at 03:34:57PM +0530, Nithin Dabilpuram wrote:
> > > On Mon, May 04, 2020 at 11:16:40AM +0200, Olivier Matz wrote:
> > > > On Mon, May 04, 2020 at 01:57:06PM +0530, Nithin Dabilpuram wrote:
> > > > > Hi Olivier,
> > > > > 
> > > > > On Mon, May 04, 2020 at 10:06:34AM +0200, Olivier Matz wrote:
> > > > > > External Email
> > > > > > 
> > > > > > ----------------------------------------------------------------------
> > > > > > Hi,
> > > > > > 
> > > > > > On Fri, May 01, 2020 at 04:48:21PM +0530, Jerin Jacob wrote:
> > > > > > > On Fri, Apr 17, 2020 at 12:53 PM Nithin Dabilpuram
> > > > > > > <nithind1988 at gmail.com> wrote:
> > > > > > > >
> > > > > > > > From: Nithin Dabilpuram <ndabilpuram at marvell.com>
> > > > > > > >
> > > > > > > > Introduce PKT_TX_MARK_IP_DSCP, PKT_TX_MARK_IP_ECN
> > > > > > > > and PKT_TX_MARK_VLAN_DEI Tx offload flags to support
> > > > > > > > packet marking.
> > > > > > > >
> > > > > > > > When packet marking feature in Traffic manager is enabled,
> > > > > > > > application has to the use the three new flags to indicate
> > > > > > > > to PMD on whether packet marking needs to be enabled on the
> > > > > > > > specific mbuf or not. By setting the three flags, it is
> > > > > > > > assumed by PMD that application has already verified the
> > > > > > > > applicability of marking on that specific packet and
> > > > > > > > PMD need not perform further checks as per RFC.
> > > > > > > >
> > > > > > > > Signed-off-by: Krzysztof Kanas <kkanas at marvell.com>
> > > > > > > > Signed-off-by: Nithin Dabilpuram <ndabilpuram at marvell.com>
> > > > > > > 
> > > > > > > None of the ethdev TM driver implementations has supported packet
> > > > > > > marking support.
> > > > > > > rte_tm and rte_mbuf maintainers(Christian, Oliver), Could you review this patch?
> > > > > > 
> > > > > > As you know, the number of mbuf flags is limited (only 18 bits are
> > > > > > remaining), so I think we should use them with care, i.e. for features
> > > > > > that are generic enough.
> > > > > 
> > > > > I agree, but I believe this is one of the basic flags needed like other 
> > > > > Tx checksum offload flags (like PKT_TX_IP_CKSUM, PKT_TX_IPV4, etc) which 
> > > > > are needed to identify on which packets HW should/can apply packet marking.
> > > > 
> > > > PKT_TX_IP_CKSUM tells the hardware to offload the checksum
> > > > calculation. This is pretty straightforward and there is no other
> > > > dependency than the offload feature advertised by the PMD.
> > > > 
> > > > I'm sorry, I have not a lot of experience with rte_tm.h, so it's
> > > > difficult for me to have a global view of what is done for instance when
> > > > PKT_TX_MARK_VLAN_DEI is set, and what happens when it is not set.
> > > > 
> > > > Can you confirm that my understanding below is correct? (or correct me
> > > > where I'm wrong)
> > > > 
> > > > Before your patch:
> > > > - the application enables the port and traffic manager on it
> > > > - the application calls rte_tm_mark_vlan_dei() to select which traffic
> > > >   class must be marked
> > > > - when a packet is transmitted, the traffic class is determined by the
> > > >   hardware, and if the hardware recognizes a VLAN packet, the VLAN DEI
> > > >   bit is set depending on traffic class
> > > > 
> > > > The problem is for packets that cannot be recognized by the hardware,
> > > > correct?
> > > 
> > > Yes. Octeontx2 HW always depends on application knowledge instead of walking 
> > > through all the layers of packet data in Tx to identify what packet it is 
> > > and where the l2, l3, l4 headers start for performance reasons. 
> > > 
> > > I believe there are other hardware too that have the same expectation
> > > and hence we have a need for PKT_TX_IPv4, PKT_TX_IPv6 kind of flags.
> > > 
> > > Hence we want to make use of mbuf:tx_offload field and PKT_TX_* flags 
> > > for identifying the packet and knowing what are its l2,l3,l4 offsets.
> > 
> > The objective is to give an indication to the hardware that the packet has:
> > - an 802.1q header at offset X for PKT_TX_MARK_VLAN_DEI
> > - an IP/IPv6 header at offset X for PKT_TX_MARK_IP_DSCP
> > - an IP/IPv6 header at offset X and a TCP/SCTP header at offset Y for
> >   PKT_TX_MARK_IP_ECN
> > 
> > Just to be sure I'm getting the point, would it also work if with flags
> > like this:
> > 
> > - an 802.1q header at offset X for PKT_TX_HAS_VLAN
> > - an IP/IPv6 header at offset X for PKT_TX_IPv4 or PKT_TX_IPv6
> > - a TCP/SCTP header at offset Y for PKT_TX_TCP/PKT_TX_SCTP (implies
> >   PKT_TX_IPv4 or PKT_TX_IPv6)
> > 
> > The underlying question is: do we need the flags to only describe the
> > content of the packet or do the flag also indicate that an action has to
> > be done?
> 
> If we don't have a specific action based flag, then in future it might collide
> with other functionality and we will not be able to choose that specific
> offload. All the existing features are having specific flags, like TSO,
> CSUM.
> 
> RFC wise, even when marking in enabled and packet is coloured, not all packets
> can be marked. 
> For example when IP DSCP marking(RFC 2597) is enabled, marking is defined
> only with below 12 code points out of 64 code points (6 bits of DSCP).
> 
>                   Class 1    Class 2    Class 3    Class 4    
>                  +----------+----------+----------+----------+
> Low Drop Prec    |  001010  |  010010  |  011010  |  100010  |
> Medium Drop Prec |  001100  |  010100  |  011100  |  100100  |
> High Drop Prec   |  001110  |  010110  |  011110  |  100110  |
>                  +----------+----------+----------+----------+
> 
> All other combinations of DSCP value can be used for some other purposes
> and hence packets with those values shouldn't be marked.
> Similar is the case with IP ECN marking for TCP/SCTP(RFC 3168).
> 
> Having PMD or HW to check if the packet falls in the said class and then do
> marking will impact performance. Since application actually fills those values
> in packet, it will be more easy for them to say.
> 
> > 
> > > > So your patch is a way to force the hardware to recognize mark set the
> > > > VLAN DEI on packets that are not recognized as VLAN packets?
> > > > 
> > > > How the is traffic class of the packet determined?
> > > 
> > > Packet is coloured based on Single Rate[1] or Dual Rate[2] Shaping result
> > > and packet color determines traffic class. The exact behavior of 
> > > packet color to traffic class mapping is mentioned in TM spec based on
> > > few other RFC's.
> > > 
> > > [1] https://urldefense.proofpoint.com/v2/url?u=https-3A__tools.ietf.org_html_rfc2697&d=DwIBAg&c=nKjWec2b6R0mOyPaz7xtfQ&r=FZ_tPCbgFOh18zwRPO9H0yDx8VW38vuapifdDfc8SFQ&m=pJDciSXpMy6TawycjvpYj_Jq5M5j_ywqhU8-keRI_ac&s=05emGNkz3Qat3dtZIbEsmQDC5y9-tU9yItHX0x1aaJU&e= 
> > > [2] https://urldefense.proofpoint.com/v2/url?u=https-3A__tools.ietf.org_html_rfc2698&d=DwIBAg&c=nKjWec2b6R0mOyPaz7xtfQ&r=FZ_tPCbgFOh18zwRPO9H0yDx8VW38vuapifdDfc8SFQ&m=pJDciSXpMy6TawycjvpYj_Jq5M5j_ywqhU8-keRI_ac&s=3VN2dIGSDt4vWM-FpPOOf-8SeVShl_t7QpXRU6Zw460&e= 
> > 
> > OK, so the traffic class does not depend on the packet type?
> Yes it doesn't. But where to update the traffic class is specific to packet
> type like DEI bit in VLAN or ECN field in IPv4/IPv6 or DSCP field in IPv4/IPv6.
> Also ECN marking is only valid for TCP/SCTP packets.
> 
> > 
> > 
> > > > > > From what I understand, this feature is bound to octeontx2, so using a
> > > > > > mbuf dynamic flag would make more sense here. There are some examples in
> > > > > > dpdk repository, just grep for "dynflag".
> > > > > 
> > > > > This is not octeontx2 specific flag but any "packet marking feature" enabled
> > > > > PMD would need these flags to identify on which packets marking needs to be 
> > > > > done. This is the first PMD that supports packet marking feature and
> > > > > hence it was not exposed earlier.
> > > > > 
> > > > > For example to mark VLAN DEI, PMD cannot always assume that there is preexisting
> > > > > VLAN header from Byte 12 as there is no gaurantee that ethernet header
> > > > > always starts at Byte 0 (Custom headers before ethernet hdr).
> > > > > 
> > > > > > 
> > > > > > Also, I think that the feature availability should be advertised through
> > > > > > an ethdev offload, so an application can know at initialization time
> > > > > > that these flags can be used.
> > > > > 
> > > > > Feature availablity is already part of TM spec in rte_tm.h 
> > > > > struct rte_tm_capabilities:mark_vlan_dei_supported
> > > > > struct rte_tm_capabilities:mark_ip_ecn_[sctp|tcp]_supported
> > > > > struct rte_tm_capabilities:mark_ip_dscp_supported
> > > > 
> > > > Does this mean that any driver advertising this existing feature flag
> > > > has to support the new mbuf flags too? Shouldn't we have a specific
> > > > feature for it?
> > > 
> > > Yes, I thought PMD's need to support both.
> > > I'm fine adding specific feature flag for the offload flags alone
> > > if you insist or if there are other PMD's which don't need the offload flags
> > > for packet marking. I was not able to find out about other PMD's as
> > > none of the existing PMD's support packet marking.
> > 
> > Do you suggest that the behavior of the traffic manager marking should
> > be:
> > 
> > a- the hardware tries to recognize tx packets, and mark them
> >    accordingly. What packets are recognized depend on hardware.
> > b- if the mbuf has a specific flag, it helps the PMD and hardware to
> >    recognize packets, so it can mark packets.
> > 
> > For an application, a- is difficult to apprehend as it will be dependent
> > on hardware.
> > 
> > Or do you suggest that packets should only be marked if there is a mbuf
> > flag? (only b-)
> Yes, I believe b- is the right thing.
> 
> > 
> > Do you confirm that there is no support at all for this feature today?
> > I mean, what was the usage of rte_tm_mark_vlan_dei() these last 3 years?
> 
> Yes, it was not implemented/used. Because of such reasons, rte_tm.h is
> supposed to be experimental but was mistakenly marked stable. 
> You can see related discussion in below threads about marking rte_tm.h 
> experimental again in v20.11.
> https://mails.dpdk.org/archives/dev/2020-April/164970.html
> https://mails.dpdk.org/archives/dev/2020-May/166221.html

Thank you for the explanations. I also think b- is a better choice.

I don't see any better approach than having a mbuf flag. However, I'm
still not fully convinced that a dynamic flag won't do the job. Taking
3 additional flags (among 18 remaing) for this feature also means that
we have 3 flags less for dynamic flags for all applications, even for
applications that will not use this feature.

Would it be a problem to use a dynamic flag in this case?

Thanks,
Olivier


> 
> Thanks
> Nithin
> 
> > 
> > Thanks,
> > Olivier
> > 
> > 
> > > 
> > > > 
> > > > Please also see few comments below.
> > > > 
> > > > > > > > ---
> > > > > > > >  doc/guides/nics/features.rst    | 14 ++++++++++++++
> > > > > > > >  lib/librte_mbuf/rte_mbuf.c      |  6 ++++++
> > > > > > > >  lib/librte_mbuf/rte_mbuf_core.h | 36 ++++++++++++++++++++++++++++++++++--
> > > > > > > >  3 files changed, 54 insertions(+), 2 deletions(-)
> > > > > > > >
> > > > > > > > diff --git a/doc/guides/nics/features.rst b/doc/guides/nics/features.rst
> > > > > > > > index edd21c4..bc978fb 100644
> > > > > > > > --- a/doc/guides/nics/features.rst
> > > > > > > > +++ b/doc/guides/nics/features.rst
> > > > > > > > @@ -913,6 +913,20 @@ Supports to get Rx/Tx packet burst mode information.
> > > > > > > >  * **[implements] eth_dev_ops**: ``rx_burst_mode_get``, ``tx_burst_mode_get``.
> > > > > > > >  * **[related] API**: ``rte_eth_rx_burst_mode_get()``, ``rte_eth_tx_burst_mode_get()``.
> > > > > > > >
> > > > > > > > +.. _nic_features_traffic_manager_packet_marking_offload:
> > > > > > > > +
> > > > > > > > +Traffic Manager Packet marking offload
> > > > > > > > +--------------------------------------
> > > > > > > > +
> > > > > > > > +Supports enabling a packet marking offload specific mbuf.
> > > > > > > > +
> > > > > > > > +* **[uses]     mbuf**: ``mbuf.ol_flags:PKT_TX_MARK_IP_DSCP``,
> > > > > > > > +  ``mbuf.ol_flags:PKT_TX_MARK_IP_ECN``, ``mbuf.ol_flags:PKT_TX_MARK_VLAN_DEI``,
> > > > > > > > +  ``mbuf.ol_flags:PKT_TX_IPV4``, ``mbuf.ol_flags:PKT_TX_IPV6``.
> > > > > > > > +* **[uses]     mbuf**: ``mbuf.l2_len``.
> > > > > > > > +* **[related] API**: ``rte_tm_mark_ip_dscp()``, ``rte_tm_mark_ip_ecn()``,
> > > > > > > > +  ``rte_tm_mark_vlan_dei()``.
> > > > > > > > +
> > > > > > > >  .. _nic_features_other:
> > > > > > > >
> > > > > > > >  Other dev ops not represented by a Feature
> > > > > > > > diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
> > > > > > > > index cd5794d..5c6896d 100644
> > > > > > > > --- a/lib/librte_mbuf/rte_mbuf.c
> > > > > > > > +++ b/lib/librte_mbuf/rte_mbuf.c
> > > > > > > > @@ -880,6 +880,9 @@ const char *rte_get_tx_ol_flag_name(uint64_t mask)
> > > > > > > >         case PKT_TX_SEC_OFFLOAD: return "PKT_TX_SEC_OFFLOAD";
> > > > > > > >         case PKT_TX_UDP_SEG: return "PKT_TX_UDP_SEG";
> > > > > > > >         case PKT_TX_OUTER_UDP_CKSUM: return "PKT_TX_OUTER_UDP_CKSUM";
> > > > > > > > +       case PKT_TX_MARK_VLAN_DEI: return "PKT_TX_MARK_VLAN_DEI";
> > > > > > > > +       case PKT_TX_MARK_IP_DSCP: return "PKT_TX_MARK_IP_DSCP";
> > > > > > > > +       case PKT_TX_MARK_IP_ECN: return "PKT_TX_MARK_IP_ECN";
> > > > > > > >         default: return NULL;
> > > > > > > >         }
> > > > > > > >  }
> > > > > > > > @@ -916,6 +919,9 @@ rte_get_tx_ol_flag_list(uint64_t mask, char *buf, size_t buflen)
> > > > > > > >                 { PKT_TX_SEC_OFFLOAD, PKT_TX_SEC_OFFLOAD, NULL },
> > > > > > > >                 { PKT_TX_UDP_SEG, PKT_TX_UDP_SEG, NULL },
> > > > > > > >                 { PKT_TX_OUTER_UDP_CKSUM, PKT_TX_OUTER_UDP_CKSUM, NULL },
> > > > > > > > +               { PKT_TX_MARK_VLAN_DEI, PKT_TX_MARK_VLAN_DEI, NULL },
> > > > > > > > +               { PKT_TX_MARK_IP_DSCP, PKT_TX_MARK_IP_DSCP, NULL },
> > > > > > > > +               { PKT_TX_MARK_IP_ECN, PKT_TX_MARK_IP_ECN, NULL },
> > > > > > > >         };
> > > > > > > >         const char *name;
> > > > > > > >         unsigned int i;
> > > > > > > > diff --git a/lib/librte_mbuf/rte_mbuf_core.h b/lib/librte_mbuf/rte_mbuf_core.h
> > > > > > > > index b9a59c8..d9f1290 100644
> > > > > > > > --- a/lib/librte_mbuf/rte_mbuf_core.h
> > > > > > > > +++ b/lib/librte_mbuf/rte_mbuf_core.h
> > > > > > > > @@ -187,11 +187,40 @@ extern "C" {
> > > > > > > >  /* add new RX flags here, don't forget to update PKT_FIRST_FREE */
> > > > > > > >
> > > > > > > >  #define PKT_FIRST_FREE (1ULL << 23)
> > > > > > > > -#define PKT_LAST_FREE (1ULL << 40)
> > > > > > > > +#define PKT_LAST_FREE (1ULL << 37)
> > > > > > > >
> > > > > > > >  /* add new TX flags here, don't forget to update PKT_LAST_FREE  */
> > > > > > > >
> > > > > > > >  /**
> > > > > > > > + * Packet marking offload flags. These flags indicated what kind
> > > > > > > > + * of packet marking needs to be applied on a given mbuf when
> > > > > > > > + * appropriate Traffic Manager configuration is in place.
> > > > > > > > + * When user set's these flags on a mbuf, below assumptions are made
> > > > > > > > + * 1) When PKT_TX_MARK_VLAN_DEI is set,
> > > > > > > > + * a) PMD assumes pkt to be a 802.1q packet.
> > > > 
> > > > What does that imply?
> > > 
> > > I meant by setting the flag, a packet has VLAN header adhering to IEEE 802.1Q spec.
> > > 
> > > > 
> > > > > > > > + * b) Application should also set mbuf.l2_len where 802.1Q header is
> > > > > > > > + *    at (mbuf.l2_len - 6) offset.
> > > > 
> > > > Why mbuf.l2_len - 6 ?
> > > L2 header when VLAN header is preset will be 
> > > {custom header 'X' Bytes}:{Ethernet SRC+DST (12B)}:{VLAN Header (4B)}:{Ether Type (2B)}
> > > l2_len = X + 12 + 4 + 2
> > > So, VLAN header starts at (l2_len - 6) bytes.
> > > 
> > > > 
> > > > > > > > + * 2) When PKT_TX_MARK_IP_DSCP is set,
> > > > > > > > + * a) Application should also set either PKT_TX_IPV4 or PKT_TX_IPV6
> > > > > > > > + *    to indicate whether if it is IPv4 packet or IPv6 packet
> > > > > > > > + *    for DSCP marking. It should also set PKT_TX_IP_CKSUM if it is
> > > > > > > > + *    IPv4 pkt.
> > > > > > > > + * b) Application should also set mbuf.l2_len that indicates
> > > > > > > > + *    start offset of L3 header.
> > > > > > > > + * 3) When PKT_TX_MARK_IP_ECN is set,
> > > > > > > > + * a) Application should also set either PKT_TX_IPV4 or PKT_TX_IPV6.
> > > > > > > > + *    It should also set PKT_TX_IP_CKSUM if it is IPv4 pkt.
> > > > > > > > + * b) PMD will assume pkt L4 protocol is either TCP or SCTP and
> > > > > > > > + *    ECN is set to 2'b01 or 2'b10 as per RFC 3168 and hence HW
> > > > > > > > + *    can mark the packet for a configured color.
> > > > > > > > + * c) Application should also set mbuf.l2_len that indicates
> > > > > > > > + *    start offset of L3 header.
> > > > > > > > + */
> > > > > > > > +#define PKT_TX_MARK_VLAN_DEI           (1ULL << 38)
> > > > > > > > +#define PKT_TX_MARK_IP_DSCP            (1ULL << 39)
> > > > > > > > +#define PKT_TX_MARK_IP_ECN             (1ULL << 40)
> > > > 
> > > > We should have one comment per define.
> > > Ack, will fix in V2.
> > > 
> > > > 
> > > > 
> > > > > > > > +
> > > > > > > > +/**
> > > > > > > >   * Outer UDP checksum offload flag. This flag is used for enabling
> > > > > > > >   * outer UDP checksum in PMD. To use outer UDP checksum, the user needs to
> > > > > > > >   * 1) Enable the following in mbuf,
> > > > > > > > @@ -384,7 +413,10 @@ extern "C" {
> > > > > > > >                 PKT_TX_MACSEC |          \
> > > > > > > >                 PKT_TX_SEC_OFFLOAD |     \
> > > > > > > >                 PKT_TX_UDP_SEG |         \
> > > > > > > > -               PKT_TX_OUTER_UDP_CKSUM)
> > > > > > > > +               PKT_TX_OUTER_UDP_CKSUM | \
> > > > > > > > +               PKT_TX_MARK_VLAN_DEI |   \
> > > > > > > > +               PKT_TX_MARK_IP_DSCP |    \
> > > > > > > > +               PKT_TX_MARK_IP_ECN)
> > > > > > > >
> > > > > > > >  /**
> > > > > > > >   * Mbuf having an external buffer attached. shinfo in mbuf must be filled.
> > > > > > > > --
> > > > > > > > 2.8.4
> > > > > > > >


More information about the dev mailing list