[dpdk-dev] [RFC 17.08] flow_classify: add librte_flow_classify library

Ferruh Yigit ferruh.yigit at intel.com
Thu May 18 14:12:36 CEST 2017


On 5/17/2017 5:10 PM, Ananyev, Konstantin wrote:
>>> Hi Ferruh,
>>> Please see my comments/questions below.
>>> Thanks
>>> Konstantin
>>>
>>>> +
>>>> +/**
>>>> + * @file
>>>> + *
>>>> + * RTE Flow Classify Library
>>>> + *
>>>> + * This library provides flow record information with some measured properties.
>>>> + *
>>>> + * Application can select variety of flow types based on various flow keys.
>>>> + *
>>>> + * Library only maintains flow records between rte_flow_classify_stats_get()
>>>> + * calls and with a maximum limit.
>>>> + *
>>>> + * Provided flow record will be linked list rte_flow_classify_stat_xxx
>>>> + * structure.
>>>> + *
>>>> + * Library is responsible from allocating and freeing memory for flow record
>>>> + * table. Previous table freed with next rte_flow_classify_stats_get() call and
>>>> + * all tables are freed with rte_flow_classify_type_reset() or
>>>> + * rte_flow_classify_type_set(x, 0). Memory for table allocated on the fly while
>>>> + * creating records.
>>>> + *
>>>> + * A rte_flow_classify_type_set() with a valid type will register Rx/Tx
>>>> + * callbacks and start filling flow record table.
>>>> + * With rte_flow_classify_stats_get(), pointer sent to caller and meanwhile
>>>> + * library continues collecting records.
>>>> + *
>>>> + *  Usage:
>>>> + *  - application calls rte_flow_classify_type_set() for a device
>>>> + *  - library creates Rx/Tx callbacks for packets and start filling flow table
>>>
>>> Does it necessary to use an  RX callback here?
>>> Can library provide an API like collect(port_id, input_mbuf[], pkt_num) instead?
>>> So the user would have a choice either setup a callback or call collect() directly.
>>
>> This was also comment from Morten, I will update RFC to use direct API call.
>>
>>>
>>>> + *    for that type of flow (currently only one flow type supported)
>>>> + *  - application calls rte_flow_classify_stats_get() to get pointer to linked
>>>> + *    listed flow table. Library assigns this pointer to another value and keeps
>>>> + *    collecting flow data. In next rte_flow_classify_stats_get(), library first
>>>> + *    free the previous table, and pass current table to the application, keep
>>>> + *    collecting data.
>>>
>>> Ok, but that means that you can't use stats_get() for the same type
>>> from 2 different threads without explicit synchronization?
>>
>> Correct.
>> And multiple threads shouldn't be calling this API. It doesn't store
>> previous flow data, so multiple threads calling this only can have piece
>> of information. Do you see any use case that multiple threads can call
>> this API?
> 
> One example would be when you have multiple queues per port,
> managed/monitored by different cores.
> BTW, how are you going to collect the stats in that way?
> 
>>
>>>
>>>> + *  - application calls rte_flow_classify_type_reset(), library unregisters the
>>>> + *    callbacks and free all flow table data.
>>>> + *
>>>> + */
>>>> +
>>>> +enum rte_flow_classify_type {
>>>> +	RTE_FLOW_CLASSIFY_TYPE_GENERIC = (1 << 0),
>>>> +	RTE_FLOW_CLASSIFY_TYPE_MAX,
>>>> +};
>>>> +
>>>> +#define RTE_FLOW_CLASSIFY_TYPE_MASK = (((RTE_FLOW_CLASSIFY_TYPE_MAX - 1) << 1) - 1)
>>>> +
>>>> +/**
>>>> + * Global configuration struct
>>>> + */
>>>> +struct rte_flow_classify_config {
>>>> +	uint32_t type; /* bitwise enum rte_flow_classify_type values */
>>>> +	void *flow_table_prev;
>>>> +	uint32_t flow_table_prev_item_count;
>>>> +	void *flow_table_current;
>>>> +	uint32_t flow_table_current_item_count;
>>>> +} rte_flow_classify_config[RTE_MAX_ETHPORTS];
>>>> +
>>>> +#define RTE_FLOW_CLASSIFY_STAT_MAX UINT16_MAX
>>>> +
>>>> +/**
>>>> + * Classification stats data struct
>>>> + */
>>>> +struct rte_flow_classify_stat_generic {
>>>> +	struct rte_flow_classify_stat_generic *next;
>>>> +	uint32_t id;
>>>> +	uint64_t timestamp;
>>>> +
>>>> +	struct ether_addr src_mac;
>>>> +	struct ether_addr dst_mac;
>>>> +	uint32_t src_ipv4;
>>>> +	uint32_t dst_ipv4;
>>>> +	uint8_t l3_protocol_id;
>>>> +	uint16_t src_port;
>>>> +	uint16_t dst_port;
>>>> +
>>>> +	uint64_t packet_count;
>>>> +	uint64_t packet_size; /* bytes */
>>>> +};
>>>
>>> Ok, so if I understood things right, for generic type it will always classify all incoming packets by:
>>> <src_mac, dst_mac, src_ipv4, dst_ipv4, l3_protocol_id, src_port, dst_port>
>>> all by absolute values, and represent results as a linked list.
>>> Is that correct, or I misunderstood your intentions here?
>>
>> Correct.
>>
>>> If so, then I see several disadvantages here:
>>> 1) It is really hard to predict what kind of stats is required for that particular cases.
>>>  Let say some people would like to collect stat by <dst_mac,, vlan> ,
>>> another by <dst_ipv4,subnet_mask>, third ones by <l4 dst_port> and so on.
>>> Having just one hardcoded filter doesn't seem very felxable/usable.
>>> I think you need to find a way to allow user to define what type of filter they want to apply.
>>
>> The flow type should be provided by applications, according their needs,
>> and needs to be implemented in this library. The generic one will be the
>> only one implemented in first version:
>> enum rte_flow_classify_type {
>> 	RTE_FLOW_CLASSIFY_TYPE_GENERIC = (1 << 0),
>> 	RTE_FLOW_CLASSIFY_TYPE_MAX,
>> };
>>
>>
>> App should set the type first via the API:
>> rte_flow_classify_type_set(uint8_t port_id, uint32_t type);
>>
>>
>> And the stats for this type will be returned, because returned type can
>> be different type of struct, returned as void:
>> rte_flow_classify_stats_get(uint8_t port_id, void *stats);
> 
> I understand that, but it means that for every different filter user wants to use,
> someone has to update the library: define a new type and write a new piece of code to handle it.
> That seems not flexible and totally un-extendable from user perspective.
> Even  HW allows some flexibility with RX filters.
> Why not allow user to specify a classification filter  he/she wants for that particular case?
> In a way both rte_flow and rte_acl work?
> 
>>
>>> I think it was discussed already, but I still wonder why rte_flow_item can't be used for that approach?
>>
>>
>>> 2) Even  one 10G port can produce you ~14M rte_flow_classify_stat_generic entries in one second
>>> (all packets have different ipv4/ports or so).
>>> Accessing/retrieving items over linked list with 14M entries - doesn't sound like a good idea.
>>> I'd say we need some better way to retrieve/present collected data.
>>
>> This is to keep flows, so I expect the numbers will be less comparing to
>> the packet numbers.
> 
> That was an  extreme example to show how bad the selected approach should behave.
> What I am trying to say: we need a way to collect and retrieve stats in a quick and easy way.
> Let say right now user invoked stats_get(port=0, type=generic).
> Now, he is interested to get stats for particular dst_ip only.
> The only way to get it: walk over whole list stats_get() returned and examine each entry one by one.
> 
> I think would be much better to have something like:
> 
> struct rte_flow_stats {timestamp; packet_count; packet_bytes; ..};
> 
> <fill rte_flow_item (or something else) to define desired filter>
> 
> filter_id = rte_flow_stats_register(.., &rte_flow_item);
> ....
> struct rte_flow_stats stats;
> rte_flow_stats_get(..., filter_id, &stats);
> 
> That allows user to define flows to collect stats for.
> Again in that case you don't need to worry about when/where to destroy the previous
> version of your stats.

Except from using rte_flow, above suggest instead of:
- set key/filter
- poll collect()
- when ever app wants call stats_get()

using:
- poll stats_get(key/filter);

specially after switched from callbacks to polling, this makes sense
because application already will have to do to continuous calls to this
library. Merging set filter/collect/stats_get into same function saves
library from storing/deleting stats until app asks for them, as you
mentioned above.

So, I will update RFC according.

> Of course the open question is how to treat packets that would match more than one flow
> (priority/insertion order/something else?), but I suppose we'll need to deal with that question anyway.
>  
> Konstantin
> 
>> It is possible to use fixed size arrays for this. But I think it is easy
>> to make this switch later, I would like to see the performance effect
>> before doing this switch. Do you think is it OK to start like this and
>> give that decision during implementation?



More information about the dev mailing list