[dpdk-dev] [PATCH v2 1/2] ethdev: introduce conntrack flow action and item

Bing Zhao bingz at nvidia.com
Fri Apr 16 20:18:05 CEST 2021


Hi Thomas,
Thanks for your comments. Almost all the comments are addressed.
PSB.

> -----Original Message-----
> From: Thomas Monjalon <thomas at monjalon.net>
> Sent: Friday, April 16, 2021 6:50 PM
> To: Bing Zhao <bingz at nvidia.com>
> Cc: Ori Kam <orika at nvidia.com>; ferruh.yigit at intel.com;
> andrew.rybchenko at oktetlabs.ru; dev at dpdk.org;
> ajit.khaparde at broadcom.com; jerinj at marvell.com; humin29 at huawei.com;
> rosen.xu at intel.com; hemant.agrawal at nxp.com
> Subject: Re: [dpdk-dev] [PATCH v2 1/2] ethdev: introduce conntrack
> flow action and item
> 
> External email: Use caution opening links or attachments
> 
> 
> 15/04/2021 18:41, Bing Zhao:
> > This commit introduced the conntrack action and item.
> >
> > Usually the HW offloading is stateless. For some stateful
> offloading
> > like a TCP connection, HW module will help provide the ability of
> a
> > full offloading w/o SW participation after the connection was
> > established.
> >
> > The basic usage is that in the first flow the application should
> add
> > the conntrack action and in the following flow(s) the application
> > should use the conntrack item to match on the result.
> 
> You probably mean "flow rule", not "traffic flow".
> Please make it clear to avoid confusion.

Done

> 
> > A TCP connection has two directions traffic. To set a conntrack
> action
> > context correctly, information from packets of both directions are
> > required.
> >
> > The conntrack action should be created on one port and supply the
> peer
> > port as a parameter to the action. After context creating, it
> could
> > only be used between the ports (dual-port mode) or a single port.
> The
> > application should modify the action via the API
> > "action_handle_update" only when before using it to create a flow
> with
> > opposite direction. This will help the driver to recognize the
> > direction of the flow to be created, especially in single port
> mode.
> > The traffic from both directions will go through the same port if
> the
> > application works as an "forwarding engine" but not a end point.
> > There is no need to call the update interface if the subsequent
> flows
> > have nothing to be changed.
> 
> I am not sure this is a feature description for the commit log or an
> usage explanation for the doc.
> In any case, please distinguish "ethdev port" and "TCP port"
> to avoid confusion.

Changed, thanks.

> 
> > Query will be supported via action_ctx_query interface, about the
> > current packets information and connection status. Tha fields
> query
> > capabilities depends on the HW.
> >
> > For the packets received during the conntrack setup, it is
> suggested
> > to re-inject the packets in order to take full advantage of the
> 
> What do you mean by "full advantage"?
> It is counter-intuitive to re-inject for offloading.
> Does it improve the performance?

No, it is not for the performance but for the functionality correctness. Before the CT established, some data+ack packets may already be received by the SW, and the application will use the initial information to setup a conntrack. It may result into some error checking for the following packets. By re-injecting the packets already received by SW before the established CT, it will make the HW have all the packets information and check the following packets correctly.

> 
> > conntrack. Only the valid packets should pass the conntrack,
> packets
> > with invalid TCP information, like out of window, or with invalid
> > header, like malformed, should not pass.
> >
> > Naming and definition:
> 
> You mean naming is inspired from Linux?

The naming and critical fields definition. The original idea are from the paper listed below (correct me if I was wrong), and there is some well-known definition in this area, to my understanding, it would be better to follow it.

> 
> >
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fel
> ix
> >
> ir.bootlin.com%2Flinux%2Flatest%2Fsource%2Finclude%2Fuapi%2Flinux%2F
> ne
> >
> tfilter%2Fnf_conntrack_tcp.h&data=04%7C01%7Cbingz%40nvidia.com%7
> Ca
> >
> d68b128653e4428da6e08d900c56373%7C43083d15727340c1b7db39efd9ccc17a%7
> C0
> > %7C1%7C637541670056627642%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAw
> MDAi
> >
> LCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=kCn9
> kF
> > bi7yWrd7A94zFibvQEB97phXXUudSA%2BhAueTU%3D&reserved=0
> >
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fel
> ix
> >
> ir.bootlin.com%2Flinux%2Flatest%2Fsource%2Fnet%2Fnetfilter%2Fnf_conn
> tr
> >
> ack_proto_tcp.c&data=04%7C01%7Cbingz%40nvidia.com%7Cad68b128653e
> 44
> >
> 28da6e08d900c56373%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C1%7C6375
> 41
> >
> 670056627642%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2l
> uM
> >
> zIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=Ajs9NtCaEpG2Kfnjy
> t5
> > X8uwOFo2HyfMdWZbx%2BHkbvX8%3D&reserved=0
> >
> > Other reference:
> >
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fww
> w.
> >
> usenix.org%2Flegacy%2Fevents%2Fsec01%2Finvitedtalks%2Frooij.pdf&
> da
> >
> ta=04%7C01%7Cbingz%40nvidia.com%7Cad68b128653e4428da6e08d900c56373%7
> C4
> >
> 3083d15727340c1b7db39efd9ccc17a%7C0%7C1%7C637541670056627642%7CUnkno
> wn
> > %7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWw
> iLCJ
> >
> XVCI6Mn0%3D%7C1000&sdata=987HVU%2FefoJ40%2B6aM0Q1RVxcGH5nVJS4bzy
> 4A
> > 4ZoYSE%3D&reserved=0
> >
> > Signed-off-by: Bing Zhao <bingz at nvidia.com>
> [...]
> > +     /**
> > +      * [META]
> > +      *
> > +      * Matches conntrack state.
> > +      *
> > +      * See struct rte_flow_item_conntrack.
> 
> Please use @see for hyperlink in doxygen.
> 
> > +      */
> > +     RTE_FLOW_ITEM_TYPE_CONNTRACK,
> >  };
> [...]
> > +/**
> > + * The packet is with valid state after conntrack checking.
> 
> "is with valid state" looks strange.
> I propose "The packet is valid after conntrack checking."

Done

> 
> > + */
> > +#define RTE_FLOW_CONNTRACK_FLAG_PKT_STATE_VALID (1 << 0)
> 
> Please use RTE_BIT32().

Done

> 
> > +/**
> > + * The state of the connection was changed.
> > + */
> > +#define RTE_FLOW_CONNTRACK_FLAG_PKT_STATE_CHANGED (1 << 1)
> > +/**
> > + * Error is detected on this packet for this connection and
> > + * an invalid state is set.
> > + */
> > +#define RTE_FLOW_CONNTRACK_FLAG_PKT_STATE_INVAL (1 << 2)
> 
> "INVAL" is strange. Can we add the missing 2 characters?
> RTE_FLOW_CONNTRACK_FLAG_PKT_STATE_INVALID
> 
> On a related note, do we really need the word FLAG?
> And it is conflicting with the prefix in enum
> rte_flow_conntrack_tcp_last_index I think
> RTE_FLOW_CONNTRACK_PKT_STATE_ is a good prefix, long enough.
> 

Done

> > +/**
> > + * The HW connection tracking module is disabled.
> > + * It can be due to application command or an invalid state.
> > + */
> > +#define RTE_FLOW_CONNTRACK_FLAG_HW_DISABLED (1 << 3)
> 
> This one does not have PKT in its name.
> And it is limiting to HW, while the driver could implement conntrack
> in SW.
> I propose RTE_FLOW_CONNTRACK_PKT_DISABLED
> 

Done

> > +/**
> > + * The packet contains some bad field(s) and cannot continue
> > + * with the conntrack module checking.
> > + */
> > +#define RTE_FLOW_CONNTRACK_FLAG_PKT_BAD (1 << 4)
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this structure may change without prior
> notice
> > + *
> > + * RTE_FLOW_ITEM_TYPE_CONNTRACK
> > + *
> > + * Matches the state of a packet after it passed the connection
> > +tracking
> > + * examination. The state is a bit mask of one
> > +RTE_FLOW_CONNTRACK_FLAG*
> 
> s/bit mask/bitmap/ ?

Done

> 
> RTE_FLOW_CONNTRACK_PKT_STATE_*
> otherwise it is messed with rte_flow_conntrack_tcp_last_index
> 
> > + * or a reasonable combination of these bits.
> > + */
> > +struct rte_flow_item_conntrack {
> > +     uint32_t flags;
> > +};
> [...]
> > +
> > +     /**
> > +      * [META]
> > +      *
> > +      * Enable tracking a TCP connection state.
> > +      *
> > +      * Send packet to HW connection tracking module for
> examination.
> 
> Not necessarily HW.
> No packet is sent.
> I think you can remove this sentence completely.
> 

Done

> > +      *
> > +      * See struct rte_flow_action_conntrack.
> 
> @see
> 
> > +      */
> > +     RTE_FLOW_ACTION_TYPE_CONNTRACK,
> >  };
> >
> >  /**
> > @@ -2875,6 +2940,136 @@ struct rte_flow_action_set_dscp {
> >   */
> >  struct rte_flow_action_handle;
> >
> > +/**
> > + * The state of a TCP connection.
> > + */
> > +enum rte_flow_conntrack_state {
> > +     /**< SYN-ACK packet was seen. */
> > +     RTE_FLOW_CONNTRACK_STATE_SYN_RECV,
> > +     /**< 3-way handshark was done. */
> 
> s/handshark/handshake/
> 

Done

> > +     RTE_FLOW_CONNTRACK_STATE_ESTABLISHED,
> > +     /**< First FIN packet was received to close the connection.
> */
> > +     RTE_FLOW_CONNTRACK_STATE_FIN_WAIT,
> > +     /**< First FIN was ACKed. */
> > +     RTE_FLOW_CONNTRACK_STATE_CLOSE_WAIT,
> > +     /**< Second FIN was received, waiting for the last ACK. */
> > +     RTE_FLOW_CONNTRACK_STATE_LAST_ACK,
> > +     /**< Second FIN was ACKed, connection was closed. */
> > +     RTE_FLOW_CONNTRACK_STATE_TIME_WAIT,
> > +};
> > +
> > +/**
> > + * The last passed TCP packet flags of a connection.
> > + */
> > +enum rte_flow_conntrack_tcp_last_index {
> > +     RTE_FLOW_CONNTRACK_FLAG_NONE = 0, /**< No Flag. */
> > +     RTE_FLOW_CONNTRACK_FLAG_SYN = (1 << 0), /**< With SYN flag.
> */
> > +     RTE_FLOW_CONNTRACK_FLAG_SYNACK = (1 << 1), /**< With SYN+ACK
> flag. */
> > +     RTE_FLOW_CONNTRACK_FLAG_FIN = (1 << 2), /**< With FIN flag.
> */
> > +     RTE_FLOW_CONNTRACK_FLAG_ACK = (1 << 3), /**< With ACK flag.
> */
> > +     RTE_FLOW_CONNTRACK_FLAG_RST = (1 << 4), /**< With RST flag.
> */
> > +};
> 
> Please use RTE_BIT32().
> 

Done

> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this structure may change without prior
> notice
> > + *
> > + * Configuration parameters for each direction of a TCP
> connection.
> > + */
> > +struct rte_flow_tcp_dir_param {
> > +     uint32_t scale:4; /**< TCP window scaling factor, 0xF to
> disable. */
> > +     uint32_t close_initiated:1; /**< The FIN was sent by this
> direction. */
> > +     /**< An ACK packet has been received by this side. */
> 
> Move all comments on their own line before the struct member.
> Comment should then start with /**
> 

All done, BTW, I see in the current code, the format "/**<" is used in a lot of parts.

> > +     uint32_t last_ack_seen:1;
> > +     /**< If set, indicates that there is unacked data of the
> > + connection. */
> 
> not sure what means "unacked data of the connection"

Updated the description, it means some packets were sent but not all of them are ACKed.

> 
> > +     uint32_t data_unacked:1;
> > +     /**< Maximal value of sequence + payload length over sent
> > +      * packets (next ACK from the opposite direction).
> > +      */
> > +     uint32_t sent_end;
> > +     /**< Maximal value of (ACK + window size) over received
> packet + length
> > +      * over sent packet (maximal sequence could be sent).
> > +      */
> > +     uint32_t reply_end;
> > +     /**< Maximal value of actual window size over sent packets.
> */
> > +     uint32_t max_win;
> > +     /**< Maximal value of ACK over sent packets. */
> > +     uint32_t max_ack;
> 
> Not sure about the word "over" in above definitions.

Changed to "in"

> 
> > +};
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this structure may change without prior
> notice
> > + *
> > + * RTE_FLOW_ACTION_TYPE_CONNTRACK
> > + *
> > + * Configuration and initial state for the connection tracking
> module.
> > + * This structure could be used for both setting and query.
> > + */
> > +struct rte_flow_action_conntrack {
> > +     uint16_t peer_port; /**< The peer port number, can be the
> same port. */
> > +     /**< Direction of this connection when creating a flow, the
> value only
> > +      * affects the subsequent flows creation.
> > +      */
> 
> As for rte_flow_tcp_dir_param, better to move comments before, on
> their own line.
> 
> > +     uint32_t is_original_dir:1;
> > +     /**< Enable / disable the conntrack HW module. When disabled,
> the
> > +      * result will always be RTE_FLOW_CONNTRACK_FLAG_DISABLED.
> > +      * In this state the HW will act as passthrough.
> > +      * It only affects this conntrack object in the HW without
> any effect
> > +      * to the other objects.
> > +      */
> > +     uint32_t enable:1;
> > +     /**< At least one ack was seen, after the connection was
> established. */
> > +     uint32_t live_connection:1;
> > +     /**< Enable selective ACK on this connection. */
> > +     uint32_t selective_ack:1;
> > +     /**< A challenge ack has passed. */
> > +     uint32_t challenge_ack_passed:1;
> > +     /**< 1: The last packet is seen that comes from the original
> direction.
> > +      * 0: From the reply direction.
> > +      */
> > +     uint32_t last_direction:1;
> > +     /**< No TCP check will be done except the state change. */
> > +     uint32_t liberal_mode:1;
> > +     /**< The current state of the connection. */
> > +     enum rte_flow_conntrack_state state;
> > +     /**< Scaling factor for maximal allowed ACK window. */
> > +     uint8_t max_ack_window;
> > +     /**< Maximal allowed number of retransmission times. */
> > +     uint8_t retransmission_limit;
> > +     /**< TCP parameters of the original direction. */
> > +     struct rte_flow_tcp_dir_param original_dir;
> > +     /**< TCP parameters of the reply direction. */
> > +     struct rte_flow_tcp_dir_param reply_dir;
> > +     /**< The window value of the last packet passed this
> conntrack. */
> > +     uint16_t last_window;
> > +     enum rte_flow_conntrack_tcp_last_index last_index;
> > +     /**< The sequence of the last packet passed this conntrack.
> */
> > +     uint32_t last_seq;
> > +     /**< The acknowledgement of the last packet passed this
> conntrack. */
> > +     uint32_t last_ack;
> > +     /**< The total value ACK + payload length of the last packet
> passed
> > +      * this conntrack.
> > +      */
> > +     uint32_t last_end;
> > +};
> > +
> > +/**
> > + * RTE_FLOW_ACTION_TYPE_CONNTRACK
> > + *
> > + * Wrapper structure for the context update interface.
> > + * Ports cannot support updating, and the only valid solution is
> to
> > + * destroy the old context and create a new one instead.
> > + */
> > +struct rte_flow_modify_conntrack {
> > +     /**< New connection tracking parameters to be updated. */
> > +     struct rte_flow_action_conntrack new_ct;
> > +     uint32_t direction:1; /**< The direction field will be
> updated. */
> > +     /**< All the other fields except direction will be updated.
> */
> > +     uint32_t state:1;
> > +     uint32_t reserved:30; /**< Reserved bits for the future
> usage.
> > +*/ };
> 
> 

Thanks



More information about the dev mailing list