[dpdk-dev] [RFC v2] Flow classification library

Ferruh Yigit ferruh.yigit at intel.com
Tue May 23 18:42:09 CEST 2017


On 5/23/2017 2:30 PM, Adrien Mazarguil wrote:
> On Tue, May 23, 2017 at 01:58:44PM +0100, Ferruh Yigit wrote:
>> On 5/23/2017 1:26 PM, Adrien Mazarguil wrote:
>>> On Mon, May 22, 2017 at 02:53:28PM +0100, Ferruh Yigit wrote:
>>>> On 5/19/2017 5:30 PM, Iremonger, Bernard wrote:
>>>>> Hi Ferruh,
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Ferruh Yigit
>>>>>> Sent: Thursday, May 18, 2017 7:12 PM
>>>>>> To: dev at dpdk.org
>>>>>> Cc: Yigit, Ferruh <ferruh.yigit at intel.com>; Mcnamara, John
>>>>>> <john.mcnamara at intel.com>; Tahhan, Maryam
>>>>>> <maryam.tahhan at intel.com>
>>>>>> Subject: [dpdk-dev] [RFC v2] Flow classification library
>>>>>>
>>>>>> DPDK works with packets, but some network administration tools works
>>>>>> based on flow information.
>>>>>>
>>>>>> This library is suggested to provide helper API to convert packet based
>>>>>> information to the flow records.
>>>>>>
>>>>>> Basically the library consist of a single API that gets packets, flow definition
>>>>>> and action as parameter and provides flow stats based on action. Application
>>>>>> should call the API for all received packets.
>>>>>>
>>>>>> Library header file has more comments on how library works and provided
>>>>>> APIs.
>>>>>>
>>>>>> Packets to flow conversion will cause performance drop, that is why
>>>>>> conversion done on demand by an API call provided by this library.
>>>>>>
>>>>>> Initial implementation in mind is to provide support for IPFIX metering
>>>>>> process but library planned to be as generic as possible. And flow information
>>>>>> provided by this library is missing to implement full IPFIX features, but this is
>>>>>> planned to be initial step.
>>>>>>
>>>>>> Flows are defined using rte_flow, also measurements (actions) are provided
>>>>>> by rte_flow. To support more IPFIX measurements, the implementation may
>>>>>> require extending rte_flow addition to implementing this library.
>>>>>
>>>>> Do you know what extensions are needed to the rte_flow code?
>>>>
>>>> The extension may be required on two fields:
>>>> 1- Defining the flow
>>>> 2- Available actions
>>>>
>>>> For defining the flow, an update may not be required, specially at first
>>>> version of the library.
>>>>
>>>> But for action, there may be some updates.
>>>>
>>>> IPFIX RFC defines Metering process as [1], (in [2]). This library should
>>>> provide helper APIs to metering process.
>>>>
>>>> Currently only action can be used in rte_flow is COUNT, more actions can
>>>> be added to help "packet header capturing, timestamping, sampling,
>>>> classifying" tasks of the metering process.
>>>>
>>>> The exact list depends on the what will be implemented in this release.
>>>>
>>>>
>>>> [1]
>>>>    Metering Process
>>>>
>>>>       The Metering Process generates Flow Records.  Inputs to the
>>>>       process are packet headers, characteristics, and Packet Treatment
>>>>       observed at one or more Observation Points.
>>>>
>>>>       The Metering Process consists of a set of functions that includes
>>>>       packet header capturing, timestamping, sampling, classifying, and
>>>>       maintaining Flow Records.
>>>>
>>>>       The maintenance of Flow Records may include creating new records,
>>>>       updating existing ones, computing Flow statistics, deriving
>>>>       further Flow properties, detecting Flow expiration, passing Flow
>>>>       Records to the Exporting Process, and deleting Flow Records.
>>>>
>>>> [2]
>>>> https://tools.ietf.org/html/rfc7011
>>>
>>> Since I did not take this into account in my previous answer [3], I now
>>> understand several of these requirements cannot be met by hardware (at least
>>> in the near future). Therefore I think it makes sense to leave IPFIX and
>>> more generally the maintenance of software data associated with flows to
>>> separate libraries, instead of adding many new rte_flow actions that can
>>> only be implemented in software.
>>>
>>> A hybrid solution as described in [3] is still needed regardless to offload
>>> flow recognition, so that only flows of interest are reprocessed in software
>>> to compute IPFIX and other data.
>>>
>>> You suggested at one point to take flow rules in addition to mbufs as input
>>> to handle that. Well, that's actually a nice approach.
>>>
>>> For this to work, rte_flow_classify would have to use opaque handles like
>>> rte_flow, provided back by the application when attempting to classify
>>> traffic. If the handle is not known (e.g. MARK is unsupported), a separate
>>> API function could take a mbuf as input and spit the related
>>> rte_flow_classify object if any.
>>>
>>> To be clear:
>>>
>>> 1. Create classifier object:
>>>
>>>    classify = rte_flow_classify_create([some rte_flow pattern],
>>>       [classify-specific actions list, associated resources]);
>>>
>>> 2. Create some flow rule with a MARK action to identify it uniquely. This
>>>    step might fail and flow can be NULL, that's not an issue:
>>>
>>>    flow = rte_flow_create([the same pattern], MARK with id 42)
>>>
>>> 3. For each received packet:
>>>
>>>    /*
>>>     * Attempt HW and fall back on SW for flow identification in order to
>>>     * update classifier flow-related data.
>>>     */
>>>    if (flow) {
>>>       if (mbuf->ol_flags & PKT_RX_FDIR_ID && mbuf->hash.fdir.hi == 42)
>>>          tmp_classify = classify;
>>>    } else {
>>>       tmp_classify = rte_flow_classify_lookup([classifier candidates], mbuf);
>>>    }
>>>    if (tmp_classify)
>>>          rte_flow_classify_update(tmp_classify, mbuf);
>>>
>>> 4. At some point, retrieve computed data from the classifier object itself:
>>>
>>>    rte_flow_classify_stats_get(classify, [output buffer(s)])
>>>
>>> On the RX path, the MARK action can be enough to implement the above. When
>>> not supported, it could also be emulated through the "sw_fallback" bit
>>> described in [3] however if the above approach is fine, no need for that.
>>>
>>> It's a bit more complicated to benefit from rte_flow on the TX path since no
>>> MARK data can be returned. There is currently no other solution than doing
>>> it all in software anyway.
>>>
>>> [3] http://dpdk.org/ml/archives/dev/2017-May/066177.html
>>>
>>
>> Using MARK action is good idea indeed, but I believe the software
>> version needs to be implemented anyway, so the MARK action can be next
>> step optimization.
> 
> Does it mean you are you fine with the separation of RFCv2's
> rte_flow_classify_stats_get() into a sort of rte_flow_classify_lookup() and
> rte_flow_classify_update() as described?

No indeed, I was planning to postpone that step and keep continue with
single rte_flow_classify_stats_get()

> 
> There is otherwise no change needed to benefit from MARK optimization
> beside providing the ability to not perform the lookup step when not
> necessary.
> 
>> And another thing is, having 3. and 4. as separate steps means
>> rte_flow_classify to maintain the flow_records in the library for an
>> unknown time.
> 
> I think it should be an application's problem. The library does not
> necessarily allocate anything since the related temporary storage can be
> provided as arguments to rte_flow_classify_create().
> 
> Expiration could be handled by making rte_flow_classify_lookup() return some
> error code when the flow has expired. Application would then be free to call
> rte_flow_classify_destroy() directly or retrieve stats one last time through
> rte_flow_classify_stats_get() before that. 
> 
>> Application needs to explicitly do the call for 3., why not return the
>> output buffers immediately so the application do the required
>> association of data to the ports.
> 
> 3. is typically done in the data path and must be as fast as possible,
> right? The way I see it, in many cases if the flow is already identified
> (classifier object provided) the mbuf does not even need to be parsed for
> associated counters to be incremented.
> 
> Also because rte_flow_classify_stats_get() could perform extra computation
> to convert internal data to an application-friendly format.
> 
>> Mainly, I would like to keep same API logic in RFC v2, but agree to add
>> two new APIs:
>> rte_flow_classify_create()
>> rte_flow_classify_destroy()
>>
>> Create will return and opaque rte_flow object (as done in rte_flow),
>> meanwhile it can create some masks in rte_flow object to help parsing mbuf.
> 
> OK, I suggest naming that object "struct rte_flow_classify" to avoid
> confusion though.

OK, makes sense.

> 
>> And rte_flow_classify_stats_get() will get rte_flow object and stats as
>> parameter. Also instead of getting an array of filters, to simplify the
>> logic, it will get a rte_flow object at a time.
> 
> Perfect.
> 
>> I am planning to send an updated RFC (v3) as above, if there is no major
>> objection, we can continue to discuss on that RFC.
> 
> Except for my above comments, looks like we're converging.
> 



More information about the dev mailing list