[dpdk-dev] [RFC 17.08] flow_classify: add librte_flow_classify library

Adrien Mazarguil adrien.mazarguil at 6wind.com
Mon May 22 11:13:23 CEST 2017

Previous message: [dpdk-dev] [RFC 17.08] flow_classify: add librte_flow_classify library
Next message: [dpdk-dev] Occasional instability in RSS Hashes/Queues from X540 NIC
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Fri, May 19, 2017 at 12:11:53PM +0200, Thomas Monjalon wrote:
> 19/05/2017 11:11, Gaëtan Rivet:
> > On Fri, May 19, 2017 at 08:57:01AM +0000, Ananyev, Konstantin wrote:
> > > From: Thomas Monjalon [mailto:thomas at monjalon.net]
> > >> 18/05/2017 13:33, Ferruh Yigit:
> > >> > On 5/17/2017 5:38 PM, Gaëtan Rivet wrote:
> > >> > > The other is the expression of flows through a shared syntax. Using
> > >> > > flags to propose presets can be simpler, but will probably not be flexible
> > >> > > enough. rte_flow_items are a first-class citizen in DPDK and are
> > >> > > already a data type that can express flows with flexibility. As
> > >> > > mentioned, they are however missing a few elements to fully cover IPFIX
> > >> > > meters, but nothing that cannot be added I think.
> > >> > >
> > >> > > So I was probably not clear enough, but I was thinking about
> > >> > > supporting rte_flow_items in rte_flow_classify as the possible key
> > >> > > applications would use to configure their measurements. This should not
> > >> > > require rte_flow supports from the PMDs they would be using, only
> > >> > > rte_flow_item parsing from the rte_flow_classify library.
> > >> > >
> > >> > > Otherwise, DPDK will probably end up with two competing flow
> > >> > > representations. Additionally, it may be interesting for applications
> > >> > > to bind these data directly to rte_flow actions once the
> > >> > > classification has been analyzed.
> > >> >
> > >> > Thanks for clarification, I see now what you and Konstantin is proposing.
> > >> >
> > >> > And yes it makes sense to use rte_flow to define flows in the library, I
> > >> > will update the RFC.
> > >>
> > >> Does it mean that rte_flow.h must be moved from ethdev to this
> > >> new flow library? Or will it depend of ethdev?
> > 
> > Even outside of lib/librte_ether, wouldn't rte_flow stay dependent on
> > rte_ether?
> > 
> > >
> > >Just a thought: probably move rte_flow.h to  lib/librte_net?
> > >Konstantin
> > 
> > If we are to move rte_flow, why not lib/librte_flow?
> 
> There are 3 different things:
> 1/ rte_flow.h for flow description
> 2/ rte_flow API in ethdev for HW offloading
> 3/ SW flow table (this new lib)
> 
> 2 and 3 will depends on 1.
> I think moving rte_flow.h in librte_net is a good idea.

If I had to choose, it would be librte_flow over librte_net because rte_flow
is not necessarily about matching protocol headers (e.g. you can can match
meta properties like physical ports or the fact traffic comes from a
specific VF).

However, I am not sure a separate library is actually necessary, I think the
requirements can be addressed by rte_flow (in its current directory)
directly.

One assumption is that the COUNT action as currently described by rte_flow
satisfies the counters requirements from this proposal, new actions could be
added later to return other flow-related properties. In short there is no
need to return info from individual packets, only from the flows themselves.

If the above is true, then as pointed earlier by Gaetan this proposal can be
summarized as a software implementation for rte_flow_query() and related
actions.

To determine if a packet is part of a given flow in software and update the
associated flow counters, it must be parsed and compared against patterns of
all existing rte_flow rules until one of them matches. For accurate results,
this must be done on all TX/RX traffic.

RFCv1 does so by automatically hooking burst functions while RFCv2 does so
by expecting the application to call rte_flow_classify_stats_get().

One issue I would like to raise before going on is the CPU cost of doing all
this. Parsing all incoming/outgoing traffic without missing any, looking up
related flows and updating counters in software seems like a performance
killer. Applications will likely request assistance from HW to minimize this
cost as much as possible (e.g. using the rte_flow MARK action if COUNT is
not supported directly).

Assuming a flow is identified by HW, parsing it once again in software with
the proposed API to update the related stats seems counterproductive; a
hybrid HW/SW solution with the SW part automatically used as a fallback when
hardware is not capable enough would be better and easier to use.

The topic of software fallbacks for rte_flow was brought up some time ago
(can't find the exact thread). The intent was to expose a common set of
features between PMDs so that applications do not have to implement their
own fallbacks. They would request it on a rule basis by setting a kind of
"sw_fallback" bit in flow rule attributes (struct rte_flow_attr). This bit
would be checked by PMDs and/or the rte_flow_* wrapper functions after the
underlying PMD refuses to validate/create a rule.

Basically I think rte_flow_classify could be entirely implemented in
rte_flow through this "sw_fallback" approach in order for applications to
automatically benefit from HW acceleration when PMDs can handle it. It then
makes sense for the underlying implementation to use RX/TX hooks if
necessary (as in RFCv1). These hooks would be removed by destroying the
related flow rule(s).

This would also open the door to a full SW implementation for rte_flow given
that once the packet parser is there, most actions can be implemented rather
easily (well, that's still a lot of work.)

Bottom line is I'm not against a separate SW implementation not tied to a
port_id for rte_flow_classify, but I would like to hear the community's
thoughts about the above first.

-- 
Adrien Mazarguil
6WIND

Previous message: [dpdk-dev] [RFC 17.08] flow_classify: add librte_flow_classify library
Next message: [dpdk-dev] Occasional instability in RSS Hashes/Queues from X540 NIC
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the dev mailing list