[dpdk-dev] [PATCH v3 2/2] ethdev: add hierarchical scheduler API

Jerin Jacob jerin.jacob at caviumnetworks.com
Mon Apr 10 16:00:12 CEST 2017


-----Original Message-----
> Date: Fri, 7 Apr 2017 17:47:40 +0000
> From: "Dumitrescu, Cristian" <cristian.dumitrescu at intel.com>
> To: Jerin Jacob <jerin.jacob at caviumnetworks.com>
> CC: "dev at dpdk.org" <dev at dpdk.org>, "thomas.monjalon at 6wind.com"
>  <thomas.monjalon at 6wind.com>, "balasubramanian.manoharan at cavium.com"
>  <balasubramanian.manoharan at cavium.com>, "hemant.agrawal at nxp.com"
>  <hemant.agrawal at nxp.com>, "shreyansh.jain at nxp.com"
>  <shreyansh.jain at nxp.com>
> Subject: RE: [PATCH v3 2/2] ethdev: add hierarchical scheduler API
> 
> Hi Jerin,
> 
> Thanks for your review!
> 
> > 
> > Thanks Cristian for v3.
> > 
> > From Cavium HW perceptive, v3 is in relatively good shape to consume it,
> > Except the below mentioned two pointers.
> > 
> > 1) We strongly believes, application explicitly need to give level id, while
> > creating topology(i.e rte_tm_node_add()). Reasons are,
> > 
> > - In the capability side we are exposing nr_levels etc
> > - I think, _All_ the HW implementation expects to be connected from
> > level-n to leveln-1. IMO, Its better to express that in API.
> > - For the SW implementations, which don't care abut the specific level id for
> > the
> >   connection can ignore the level id passed by the application.
> >   Let the definition be "level" aware.
> > 
> 
> The current API proposal creates a new node and connects it to its parent in a single step, so when a new node is added its level if completely known based on its parent level.
> 
> Therefore, specifying the level of the node when the node is added is redundant and therefore not needed. My concern is this requirement can introduce inconsistency into the API, as the user can specify a level ID for the new node that is different than (parent node level ID + 1).

Yes. I think, its better if we return error if implementation does not
supports in such case.

> 
> But based on private conversation it looks to me that you guys have a strong opinion on this, so I am taking the action to identify a (nice) way to implement your requirement and do it.

OK

> 
> > 2) There are lot of capability in the TM definition. I don't have strong option
> > here as TM stuff comes in control path.
> > 
> > So expect point (1), generally we are fine with V3.
> > 
> 
> This is good news!
> 
> > Detailed comments below,
> > 
> > > +
> > 
> > This definition is not used anywhere
> 
> This is the typical value to be passed to the shaper profile adjust parameter. Will take the action to document this.

OK

> 
> > 
> > > +/**
> > > + * Color
> > > + */
> > > +enum rte_tm_color {
> > > +	RTE_TM_STATS_N_BYTES_QUEUED = 1 << 9,
> > 
> > I think, For bitmask, it is better to add ULL.
> > example:
> >         RTE_TM_STATS_N_BYTES_QUEUED = 1ULL << 9,
> 
> The enum fields are uint32_t, not uint64_t; therefore, we cannot use uint64_t constants. This is why all enums in DPDK are using uint32_t values. I am not sure, there might be a way to use uint64_t constants by relying on compiler extensions, by I wanted to keep ourselves out of trouble :).

OK

> 
> 
> > > +};
> > 
> > I think, The above definitions are used as "uint64_t stats_mask" in
> > the remaining sections. How about changing to "enum rte_tm_stats_type
> > stats_mask"
> > instead of uint64_t stats_mask?
> > 
> 
> The mask is a collection of flags (so multiple enum flags are typically enabled in this mask, e.g. stats_mask = RTE_TM_STATS_N_PKT | RTE_TM_STATS_N_BYTES | ...), therefore it cannot be of the same type as the enum.
> 
> I am taking the action to document that stats_mask is built out by using this specific enum flags.

OK

> 
> > > +
> > > +/**
> > > + * Traffic manager dynamic updates
> > > +struct rte_tm_level_capabilities {
> > 
> > IMO, level can be either leaf or nonleaf. If so, following struct makes more
> > sense to me
> > 
> >         int is_leaf;
> >         uint32_t n_nodes_max;
> >         union {
> >                 struct rte_tm_node_capabilities nonleaf;
> >                 struct rte_tm_node_capabilities leaf;
> >         };
> > 
> 
> This was the way it was done in previous versions, but Hemant rightly observed that leaf nodes typically have different capabilities as non-leaf nodes, hence the current solution.

OK. But still it has to be union. Right? because a level can be either leaf
or non-leaf(not both).


> 
> > > +	 * (see enum rte_tm_dynamic_update_type).
> > > +	 */
> > > +	uint64_t dynamic_update_mask;
> > 
> > IMO, It is better to change as
> > enum rte_tm_dynamic_update_type dynamic_update_mask
> > instead of
> > uint64_t dynamic_update_mask;
> > 
> 
> Same answer as for the other enum above (mask is not equal to a single enum value, but a set of enum flags; basically each bit of the mask corresponds to a different enum value).

OK

I think, we can add "@see enum rte_tm_dynamic_update_type" scheme in
the header file wherever there is a `reference to other element in the
header file.

reference: http://dpdk.org/browse/dpdk/tree/lib/librte_eventdev/rte_eventdev.h#n257


> > > +
> > > +	union {
> > > +		/**< Parameters only valid for non-leaf nodes. */
> > > +		struct {
> > > +			/**< For each priority, indicates whether the children
> > > +			 * nodes sharing the same priority are to be
> > scheduled
> > > +			 * by WFQ or by WRR. When NULL, it indicates that
> > WFQ
> > > +			 * is to be used for all priorities. When non-NULL, it
> > > +			 * points to a pre-allocated array of *n_priority*
> > > +			 * elements, with a non-zero value element
> > indicating
> > > +			 * WFQ and a zero value element for WRR.
> > > +			 */
> > > +			int *scheduling_mode_per_priority;
> > > +
> > > +			/**< Number of priorities. */
> > > +			uint32_t n_priorities;
> > > +		} nonleaf;
> > 
> > 
> > Since we are adding all node "connecting" parameter in rte_tm_node_add().
> > How about adding WFQ vs WRR as boolean value in rte_tm_node_add() to
> > maintain
> > the consistency
> 
> This is not about the parent node managing this new node as one of its children nodes, it is about how this new node will manage its future children nodes, hence the reason to put it here.

So, Is it same as rte_tm_node_scheduling_mode_update()? if so, then we
may not need this here. if not, what is the use case we are trying to achieve
here?

> 
> > 
> > How about new error type in "enum rte_tm_error_type" to specify the
> > connection
> > error due to requested mode WFQ or WRR not supported.
> > 
> 
> I think we already have it, it is called: RTE_TM_ERROR_TYPE_NODE_PARAMS_SCHEDULING_MODE.

I think, explicit error code may help, something like
RTE_TM_ERROR_TYPE_NODE_PARAMS_SCHEDULING_WFQ
and
RTE_TM_ERROR_TYPE_NODE_PARAMS_SCHEDULING_WRR

> 
> > > +
> > > +/**
> > > +/**
> > > + * Traffic manager get number of leaf nodes
> > > + *
> > > + * Each leaf node sits on on top of a TX queue of the current Ethernet
> > port.
> > > + * Therefore, the set of leaf nodes is predefined, their number is always
> > equal
> > > + * to N (where N is the number of TX queues configured for the current
> > port)
> > > + * and their IDs are 0 .. (N-1).
> > > + *
> > > + * @param port_id
> > > + *   The port identifier of the Ethernet device.
> > > + * @param n_leaf_nodes
> > > + *   Number of leaf nodes for the current port.
> > > + * @param error
> > > + *   Error details. Filled in only on error, when not NULL.
> > > + * @return
> > > + *   0 on success, non-zero error code otherwise.
> > > + */
> > > +int
> > > +rte_tm_get_leaf_nodes(uint8_t port_id,
> > > +	uint32_t *n_leaf_nodes,
> > > +	struct rte_tm_error *error);
> > 
> > In order to keep consistency with rest of the API, IMO, the API
> > name can be changed to rte_tm_leaf_nodes_get()
> > 
> 
> IMO this is not a node API, it is a port API, hence the attempt to avoid rte_tm_node_XYZ().
> 
> Maybe a longer but less confusing name is: rte_tm_get_number_of_leaf_nodes()?

No strong opinion here. Shorter version may be rte_tm_leaf_nodes_count()



More information about the dev mailing list