Understanding Flow API action RSS
    Ivan Malov 
    ivan.malov at oktetlabs.ru
       
    Sun Jan  9 14:03:14 CET 2022
    
    
  
Hi Ori,
On Sun, 9 Jan 2022, Ori Kam wrote:
> Hi Stephen and Ivan
>
>> -----Original Message-----
>> From: Stephen Hemminger <stephen at networkplumber.org>
>> Sent: Tuesday, January 4, 2022 11:56 PM
>> Subject: Re: Understanding Flow API action RSS
>>
>> On Tue, 4 Jan 2022 21:29:14 +0300 (MSK)
>> Ivan Malov <ivan.malov at oktetlabs.ru> wrote:
>>
>>> Hi Stephen,
>>>
>>> On Tue, 4 Jan 2022, Stephen Hemminger wrote:
>>>
>>>> On Tue, 04 Jan 2022 13:41:55 +0100
>>>> Thomas Monjalon <thomas at monjalon.net> wrote:
>>>>
>>>>> +Cc Ori Kam, rte_flow maintainer
>>>>>
>>>>> 29/12/2021 15:34, Ivan Malov:
>>>>>> Hi all,
>>>>>>
>>>>>> In 'rte_flow.h', there is 'struct rte_flow_action_rss'. In it, 'queue' is
>>>>>> to provide "Queue indices to use". But it is unclear whether the order of
>>>>>> elements is meaningful or not. Does that matter? Can queue indices repeat?
>>>>
>>>> The order probably doesn't matter, it is like the RSS indirection table.
>>>
>>> Sorry, but RSS indirection table (RETA) assumes some structure. In it,
>>> queue indices can repeat, and the order is meaningful. In DPDK, RETA
>>> may comprise multiple "groups", each one comprising 64 entries.
>>>
>>> This 'queue' array in flow action RSS does not stick with the same
>>> terminology, it does not reuse the definition of RETA "group", etc.
>>> Just "queue indices to use". No definition of order, no structure.
>>>
>>> The API contract is not clear. Neither to users, nor to PMDs.
>>>
>> From API in RSS the queues are simply the queue ID, order doesn't matter,
> Duplicating the queue may affect the the spread based on the HW/PMD.
> In common case each queue should appear only once and the PMD may duplicate
> entries to get the best performance.
Look. In a DPDK PMD, one has "global" RSS table. Consider the following
example: 0, 0, 1, 1, 2, 2, 3, 3 ... and so on. As you may see, queue
indices may repeat. They may have different order: 1, 1, 0, 0, ... .
The order is of great importance. If you send a packet to a
DPDK-powered server, you can know in advance its hash value.
Hence, you may strictly predict which RSS table entry this
hash will point at. That predicts the target Rx queue.
So the questions which one should attempt to clarify, are as follows:
1) Is the 'queue' array ordered? (Does the order of elements matter?)
2) Can its elements repeat? (*allowed* or *not allowed*?)
>
>>>>
>>>>    rx queue = RSS_indirection_table[ RSS_hash_value % RSS_indirection_table_size ]
>>>>
>>>> So you could play with multiple queues matching same hash value, but that
>>>> would be uncommon.
>>>>
>>>>>> An ethdev may have "global" RSS setting with an indirection table of some
>>>>>> fixed size (say, 512). In what comes to flow rules, does that size matter?
>>>>
>>>> Global RSS is only used if the incoming packet does not match any rte_flow
>>>> action. If there is a a RTE_FLOW_ACTION_TYPE_QUEUE or RTE_FLOW_ACTION_TYPE_RSS
>>>> these take precedence.
>>>
>>> Yes, I know all of that. The question is how does the PMD select RETA size
>>> for this action? Can it select an arbitrary value? Or should it stick with
>>> the "global" one (eg. 512)? How does the user know the table size?
>>>
>>> If the user simply wants to spread traffic across the given queues,
>>> the effective table size is a don't care to them, and the existing
>>> API contract is fine. But if the user expects that certain packets
>>> hit some precise queues, they need to know the table size for that.
>>>
> Just like you said RSS simply spread the traffic to the given queues.
Yes, to the given queues. The question is whether the 'queue' array
has RETA properties (order matters; elements can repeat) or not.
> If application wants to send traffic to some queue it should use the queue action.
Yes, but that's not what I mean. Consider the following example. The user
generates packets with random IP addresses at machine A. These packets
hit DPDK at machine B. For a given *packet*, the sender (A) can
compute its RSS hash in software. This will point out the RETA
entry index. But, in order to predict the exact *queue* index,
the sender has to know the table (its contents, its size).
For a "global" DPDK RSS setting, the table can be easily obtained with
an ethdev callback / API. Very simple. Fixed-size table, and it can
be queried. But how does one obtain similar knowledge for RSS action?
>
>>> So, the question is whether the users should or should not build
>>> any expectations of the effective table size and, if they should,
>>> are they supposed to use the "global" table size for that?
>>
>> You are right this area is completely undocumented. Personally would really like
>> it if rte_flow had a reference software implementation and all the HW vendors
>> had to make sure their HW matched the SW reference version. But this a case
>> where the funding is all on the HW side, and no one has time or resources
>> to do a complete SW version..
>>
>> A sane implementation would configure RSS indirection as across all
>> rx queues that were available when the device was started; ie all queues
>> that did not have deferred start set. Then the application would start/stop
>> queues and use rte_flow to reach them.
>>
>> But it doesn't appear the HW follows that model.
>>
>>
>>>>>> When the user selects 'RTE_ETH_HASH_FUNCTION_DEFAULT' in action RSS, does
>>>>>> that allow the PMD to configure an arbitrary, non-Toeplitz hash algorithm?
>>>>
>>>> No the default is always Toeplitz.  This goes back to the original definition
>>>> of RSS which is in Microsoft NDIS and uses Toeplitz.
>>>
>>> Then why have a dedicated enum named TOEPLITZ? Also, once again, the
>>> documentation should be more specific to say which algorithm exactly
>>> this DEFAULT choice provides. Otherwise, it is very vague.
>>>
>>>>
>>>> DPDK should have more examples of using rte_flow, I have some samples
>>>> but they aren't that useful.
>>>>
>>>
>>> I could not agree more.
>
> Feel free to add/suggest what example are missing.
>
>>>
>>> Thanks,
>>> Ivan M.
>
> Best,
> Ori
>
    
    
More information about the dev
mailing list