[dpdk-dev] [RFC] Generic flow director/filtering/classification API

Adrien Mazarguil adrien.mazarguil at 6wind.com
Wed Jul 13 22:03:27 CEST 2016


On Mon, Jul 11, 2016 at 10:42:36AM +0000, Chandran, Sugesh wrote:
> Hi Adrien,
> 
> Thank you for your response,
> Please see my comments inline.

Hi Sugesh,

Sorry for the delay, please see my answers inline as well.

[...]
> > > > Flow director
> > > > -------------
> > > >
> > > > Flow director (FDIR) is the name of the most capable filter type, which
> > > > covers most features offered by others. As such, it is the most
> > widespread
> > > > in PMDs that support filtering (i.e. all of them besides **e1000**).
> > > >
> > > > It is also the only type that allows an arbitrary 32 bits value provided by
> > > > applications to be attached to a filter and returned with matching packets
> > > > instead of relying on the destination queue to recognize flows.
> > > >
> > > > Unfortunately, even FDIR requires applications to be aware of low-level
> > > > capabilities and limitations (most of which come directly from **ixgbe**
> > and
> > > > **i40e**):
> > > >
> > > > - Bitmasks are set globally per device (port?), not per filter.
> > > [Sugesh] This means application cannot define filters that matches on
> > arbitrary different offsets?
> > > If that’s the case, I assume the application has to program bitmask in
> > advance. Otherwise how
> > > the API framework deduce this bitmask information from the rules?? Its
> > not very clear to me
> > > that how application pass down the bitmask information for multiple filters
> > on same port?
> > 
> > This is my understanding of how flow director currently works, perhaps
> > someome more familiar with it can answer this question better than I could.
> > 
> > Let me take an example, if particular device can only handle a single IPv4
> > mask common to all flow rules (say only to match destination addresses),
> > updating that mask to also match the source address affects all defined and
> > future flow rules simultaneously.
> > 
> > That is how FDIR currently works and I think it is wrong, as it penalizes
> > devices that do support individual bit-masks per rule, and is a little
> > awkward from an application point of view.
> > 
> > What I suggest for the new API instead is the ability to specify one
> > bit-mask per rule, and let the PMD deal with HW limitations by automatically
> > configuring global bitmasks from the first added rule, then refusing to add
> > subsequent rules if they specify a conflicting bit-mask. Existing rules
> > remain unaffected that way, and applications do not have to be extra
> > cautious.
> > 
> [Sugesh] The issue with that approach is, the hardware simply discards the rule
> when it is a super set of first one eventhough the hardware is capable of 
> handling it. How its guaranteed the first rule will set the bitmask for all the
> subsequent rules. 

Just to clarify, the API only says that new rules cannot affect existing
ones (which I think makes sense from a user's perspective), so as long as
the PMD does whatever is needed to make all rules work together, there
should not be any problem with this approach.

Even if the PMD has to temporarily remove an existing rule and reconfigure
global masks in order to add subsequent rules, it is fine as long as packets
aren't misdirected in the meantime (they may be dropped if there is no other
choice).

> How about having a CLASSIFER_TYPE for the classifier. Every port can have 
> set of supported flow types(for eg: L3_TYPE, L4_TYPE, L4_TYPE_8BYTE_FLEX,
> L4_TYPE_16BYTE_FLEX) based on the underlying FDIR support. Application can query 
> this and set the type accordingly while initializing the port. This way the first rule need 
> not set all the bits that may needed in the future rules. 

Again from a user's POV, I think doing so would add unwanted HW-specific
complexity. 

However this concern can be handled through a different approach. Let's say
user creates a pattern that only specifies a IP header with a given
bit-mask.

In FDIR language this translates to:

- Set global mask for IPv4 accordingly, remaining global masks all zeroed
  (assumed default value).

- Create an IPv4 flow.

>From now on, all rules specifying a IPv4 header must have this exact
bitmask (implicitly or explicitly), otherwise they cannot be created,
i.e. the global bitmask for IPv4 becomes immutable.

Now user creates a TCPv4 rule (as long as it uses the same IPv4 mask), to
handle this FDIR would:

- Keep global immutable mask for IPv4 unchanged, set global TCP mask
  according to the flow rule.

- Create a TCPv4 flow.

>From this point on, like IPv4, subsequent TCP rules must have this exact
bitmask and so on as the global bitmask becomes immutable.

Basically, only protocol bit-masks affected by existing flow rules are
immutable, others can be changed later. Global flow masks for protocols
become mutable again when no existing flow rule uses them.

Does it look fine for you?

[...]
> > > > +--------------------------+
> > > > | Copy to queue 8          |
> > > > +==========+===============+
> > > > | PASSTHRU |               |
> > > > +----------+-----------+---+
> > > > | QUEUE    | ``queue`` | 8 |
> > > > +----------+-----------+---+
> > > >
> > > > ``ID``
> > > > ^^^^^^
> > > >
> > > > Attaches a 32 bit value to packets.
> > > >
> > > > +----------------------------------------------+
> > > > | ID                                           |
> > > > +========+=====================================+
> > > > | ``id`` | 32 bit value to return with packets |
> > > > +--------+-------------------------------------+
> > > >
> > > [Sugesh] I assume the application has to program the flow
> > > with a unique ID and matching packets are stamped with this ID
> > > when reporting to the software. The uniqueness of ID is NOT
> > > guaranteed by the API framework. Correct me if I am wrong here.
> > 
> > You are right, if the way I wrote it is not clear enough, I'm open to
> > suggestions to improve it.
> [Sugesh] I guess its fine and would like to confirm the same. Perhaps
> it would be nice to mention that the IDs are application defined.

OK, I will make it clearer.

> > > [Sugesh] Is it a limitation to use only 32 bit ID? Is it possible to have a
> > > 64 bit ID? So that application can use the control plane flow pointer
> > > Itself as an ID. Does it make sense?
> > 
> > I've specified a 32 bit ID for now because this is what FDIR supports and
> > also what existing devices can report today AFAIK (i40e and mlx5).
> > 
> > We could use 64 bit for future-proofness in a separate action like "ID64"
> > when at least one device supports it.
> > 
> > To PMD maintainers: please comment if you know devices that support
> > tagging
> > matching packets with more than 32 bits of user-provided data!
> [Sugesh] I guess the flow director ID is 64 bit , The XL710 datasheet says so.
> And in the 'rte_mbuf' structure the 64 bit FDIR-ID is shared with rss hash. This can be
> a software driver limitation that expose only 32 bit. Possibly because of cache 
> alignment issues? Since the hardware can support 64 bit, I feel it make sense 
> to support 64 bit as well.

I agree we need 64 bit support, but then we also need a solution for devices
that support only 32 bit. Possible methods I can think of:

- A separate "ID64" action (or a "ID32" one, perhaps with a better name).

- A single ID action with an unlimited number of bytes to return with
  packets (would actually be a string). PMDs can then refuse to create flow
  rules requesting an unsupported number of bytes. Devices supporting fewer
  than 32 bits are also included this way without the need for yet another
  action.

Thoughts?

[...]
> > > [Sugesh] Another concern is the cost and time of installing these rules
> > > in the hardware. Can we make these APIs time bound(or at least an option
> > to
> > > set the time limit to execute these APIs), so that
> > > Application doesn’t have to wait so long when installing and deleting flows
> > with
> > > slow hardware/NIC. What do you think? Most of the datapath flow
> > installations are
> > > dynamic and triggered only when there is
> > > an ingress traffic. Delay in flow insertion/deletion have unpredictable
> > consequences.
> > 
> > This API is (currently) aimed at the control path only, and must indeed be
> > assumed to be slow. Creating million of rules may take quite long as it may
> > involve syscalls and other time-consuming synchronization things on the
> > PMD
> > side.
> > 
> > So currently there is no plan to have rules added from the data path with
> > time constraints. I think it would be implemented through a different set of
> > functions anyway.
> > 
> > I do not think adding time limits is practical, even specifying in the API
> > that creating a single flow rule must take less than a maximum number of
> > seconds in order to be effective is too much of a constraint (applications
> > that create all flows during init may not care after all).
> > 
> > You should consider in any case that modifying flow rules will always be
> > slower than receiving packets, there is no way around that. Applications
> > have to live with it and provide a software fallback for incoming packets
> > while managing flow rules.
> > 
> > Moreover, think about what happens when you hit the maximum number of
> > flow
> > rules and cannot create any more. Applications need to implement some
> > kind
> > of fallback in their data path.
> > 
> > Offloading flows in HW is also only useful if they live much longer than the
> > time taken to create and delete them. Perhaps applications may choose to
> > do
> > so after detecting long lived flows such as TCP sessions.
> > 
> > You may have one separate control thread dedicated to manage flows and
> > keep your normal control thread unaffected by delays. Several threads can
> > even be dedicated, one per device.
> [Sugesh] I agree that the flow insertion cannot be as fast as the packet receiving 
> rate.  From application point of view the problem will be when hardware flow 
> insertion takes longer than software flow insertion. At least application has to know
> the cost of inserting/deleting a rule in hardware beforehand. Otherwise how application
> can choose the right flow candidate for hardware. My point here is application is expecting 
> a deterministic behavior from a classifier while inserting and deleting rules.

Understood, however it will be difficult to estimate, particularly if a PMD
must rearrange flow rules to make room for a new one due to priority levels
collision or some other HW-related reason. I mean, spent time cannot be
assumed to be constant, even PMDs cannot know in advance because it also
depends on the performance of the host CPU.

Such applications may find it easier to measure elapsed time for the rules
they create, make statistics and extrapolate from this information for
future rules. I do not think the PMD can help much here.

> > > [Sugesh] Another query is on the synchronization part. What if same rules
> > are
> > > handled from different threads? Is application responsible for handling the
> > concurrent
> > > hardware programming?
> > 
> > Like most (if not all) DPDK APIs, applications are responsible for managing
> > locking issues as decribed in 4.3 (Behavior). Since this is a control path
> > API and applications usually have a single control thread, locking should
> > not be necessary in most cases.
> > 
> > Regarding my above comment about using several control threads to
> > manage
> > different devices, section 4.3 says:
> > 
> >  "There is no provision for reentrancy/multi-thread safety, although nothing
> >  should prevent different devices from being configured at the same
> >  time. PMDs may protect their control path functions accordingly."
> > 
> > I'd like to emphasize it is not "per port" but "per device", since in a few
> > cases a configurable resource is shared by several ports. It may be
> > difficult for applications to determine which ports are shared by a given
> > device but this falls outside the scope of this API.
> > 
> > Do you think adding the guarantee that it is always safe to configure two
> > different ports simultaneously without locking from the application side is
> > necessary? In which case the PMD would be responsible for locking shared
> > resources.
> [Sugesh] This would be little bit complicated when some of ports are not under 
> DPDK itself(what if one port is managed by Kernel) Or ports are tied by 
> different application. Locking in PMD helps when the ports are accessed by 
> multiple DPDK application. However what if the port itself not under DPDK?

Well, either we do not care about what happens outside of the DPDK context,
or PMDs must find a way to satisfy everyone. I'm not a fan of locking either
but it would be nice if flow rules configuration could be attempted on
different ports simultaneously without the risk of wrecking anything, so
that applications do not need to care.

Possible cases for a dual port device with global flow rule settings
affecting both ports:

1) ports 1 & 2 are managed by DPDK: this is the easy case, a rule that needs
   to alter a global setting necessary for an existing rule on any port is
   not allowed (EEXIST). PMD must maintain a device context common to both
   ports in order for this to work. This context is either under lock, or
   the first port on which a flow rule is created owns all future flow
   rules.

2) port 1 is managed by DPDK, port 2 by something else, the PMD is aware of
   it and knows that port 2 may modify the global context: no flow rules can
   be created from the DPDK application due to safety issues (EBUSY?).

3) port 1 is managed by DPDK, port 2 by something else, the PMD is aware of
   it and knows that port 2 will not modify flow rules: PMD should not care,
   no lock necessary.

4) port 1 is managed by DPDK, port 2 by something else and the PMD is not
   aware of it: either flow rules cannot be created ever at all, or we say
   it is user's reponsibility to make sure this does not happen.

Considering that most control operations performed by DPDK affect the device
regardless of other applications, I think 1) is the only case that should be
defined, otherwise 4), defined as user's responsibility.

> > > > Destruction
> > > > ~~~~~~~~~~~
> > > >
> > > > Flow rules destruction is not automatic, and a queue should not be
> > released
> > > > if any are still attached to it. Applications must take care of performing
> > > > this step before releasing resources.
> > > >
> > > > ::
> > > >
> > > >  int
> > > >  rte_flow_destroy(uint8_t port_id,
> > > >                   struct rte_flow *flow);
> > > >
> > > >
> > > [Sugesh] I would suggest having a clean-up API is really useful as the
> > releasing of
> > > Queue(is it applicable for releasing of port too?) is not guaranteeing the
> > automatic flow
> > > destruction.
> > 
> > Would something like rte_flow_flush(port_id) do the trick? I wanted to
> > emphasize in this first draft that applications should really keep the flow
> > pointers around in order to manage/destroy them. It is their responsibility,
> > not PMD's.
> [Sugesh] Thanks, I think the flush call will do.

Noted, will add it.

> > > This way application can initialize the port,
> > > clean-up all the existing rules and create new rules  on a clean slate.
> > 
> > No resource can be released as long as a flow rule is using it (bad things
> > may happen otherwise), all flow rules must be destroyed first, thus none can
> > possibly remain after initializing a port. It is assumed that PMDs do
> > automatic clean up during init if necessary to ensure this.
> [Sugesh] That will do.

I will make it more explicit as well.

[...]

-- 
Adrien Mazarguil
6WIND


More information about the dev mailing list