[dpdk-dev] [PATCH v4 1/4] ethdev: add the API for getting burst mode information

Wang, Haiyue haiyue.wang at intel.com
Mon Nov 4 15:17:34 CET 2019


> -----Original Message-----
> From: Ray Kinsella <mdr at ashroe.eu>
> Sent: Monday, November 4, 2019 21:49
> To: Thomas Monjalon <thomas at monjalon.net>
> Cc: dev at dpdk.org; Yigit, Ferruh <ferruh.yigit at intel.com>; 'Damjan Marion' <dmarion at me.com>; Wang,
> Haiyue <haiyue.wang at intel.com>; Jerin Jacob Kollanukkaran <jerinj at marvell.com>;
> viacheslavo at mellanox.com; stephen at networkplumber.org; arybchenko at solarflare.com
> Subject: Re: [dpdk-dev] [PATCH v4 1/4] ethdev: add the API for getting burst mode information
> 
> 
> 
> On 04/11/2019 13:09, Thomas Monjalon wrote:
> > 04/11/2019 13:07, Ray Kinsella:
> >>
> >> On 04/11/2019 11:30, Thomas Monjalon wrote:
> >>> 04/11/2019 11:03, Ray Kinsella:
> >>>> On 04/11/2019 09:54, Thomas Monjalon wrote:
> >>>>> 04/11/2019 10:49, Ray Kinsella:
> >>>>>> On 03/11/2019 22:41, Thomas Monjalon wrote:
> >>>>>>> 03/11/2019 21:35, Ray Kinsella:
> >>>>>>>> On 29/10/2019 14:27, Ferruh Yigit wrote:
> >>>>>>>>> On 10/26/2019 5:23 PM, Thomas Monjalon wrote:
> >>>>>>>>>> 26/10/2019 11:23, Wang, Haiyue:
> >>>>>>>>>>> From: Thomas Monjalon [mailto:thomas at monjalon.net]
> >>>>>>>>>>>> 26/10/2019 06:40, Wang, Haiyue:
> >>>>>>>>>>>>> From: Thomas Monjalon [mailto:thomas at monjalon.net]
> >>>>>>>>>>>>>> 25/10/2019 18:02, Jerin Jacob:
> >>>>>>>>>>>>>>> On Fri, Oct 25, 2019 at 9:15 PM Thomas Monjalon <thomas at monjalon.net> wrote:
> >>>>>>>>>>>>>>>> 25/10/2019 16:08, Ferruh Yigit:
> >>>>>>>>>>>>>>>>> On 10/25/2019 10:36 AM, Thomas Monjalon wrote:
> >>>>>>>>>>>>>>>>>> 15/10/2019 09:51, Haiyue Wang:
> >>>>>>>>>>>>>>>>>>> Some PMDs have more than one RX/TX burst paths, add the ethdev API
> >>>>>>>>>>>>>>>>>>> that allows an application to retrieve the mode information about
> >>>>>>>>>>>>>>>>>>> Rx/Tx packet burst such as Scalar or Vector, and Vector technology
> >>>>>>>>>>>>>>>>>>> like AVX2.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> I missed this patch. I and Andrew, maintainers of ethdev, were not CC'ed.
> >>>>>>>>>>>>>>>>>> Ferruh, I would expect to be Cc'ed and/or get a notification before merging.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> It has been discussed in the mail list and went through multiple discussions,
> >>>>>>>>>>>>>>>>> patch is out since the August, +1 to cc all maintainers I missed that part,
> >>>>>>>>>>>>>>>>> but when the patch is reviewed and there is no objection, why block the merge?
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> I'm not saying blocking the merge.
> >>>>>>>>>>>>>>>> My bad is that I missed the patch and I am asking for help with a notification
> >>>>>>>>>>>>>>>> in this case. Same for Andrew I guess.
> >>>>>>>>>>>>>>>> Note: it is merged in master and I am looking to improve this feature.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> +/**
> >>>>>>>>>>>>>>>>>>> + * Ethernet device RX/TX queue packet burst mode information structure.
> >>>>>>>>>>>>>>>>>>> + * Used to retrieve information about packet burst mode setting.
> >>>>>>>>>>>>>>>>>>> + */
> >>>>>>>>>>>>>>>>>>> +struct rte_eth_burst_mode {
> >>>>>>>>>>>>>>>>>>> +  uint64_t options;
> >>>>>>>>>>>>>>>>>>> +};
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Why a struct for an integer?
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Again by a request from me, to not need to break the API if we need to add more
> >>>>>>>>>>>>>>>>> thing in the future.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> I would replace it with a string. This is the most flexible API.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> IMO, Probably, best of both worlds make a good option here,
> >>>>>>>>>>>>>>> as Haiyue suggested if we have an additional dev_specific[1] in structure.
> >>>>>>>>>>>>>>> and when a pass to the application, let common code make final string as
> >>>>>>>>>>>>>>> (options flags to string + dev_specific)
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> options flag can be zero if PMD does not have any generic flags nor
> >>>>>>>>>>>>>>> interested in such a scheme.
> >>>>>>>>>>>>>>> Generic flags will help at least to have some common code.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> [1]
> >>>>>>>>>>>>>>> struct rte_eth_burst_mode {
> >>>>>>>>>>>>>>>         uint64_t options;
> >>>>>>>>>>>>>>>         char dev_specific[128]; /* PMD has specific burst mode information */
> >>>>>>>>>>>>>>> };
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I really don't see how we can have generic flags.
> >>>>>>>>>>>>>> The flags which are proposed are just matching
> >>>>>>>>>>>>>> the functions implemented in Intel PMDs.
> >>>>>>>>>>>>>> And this is a complicate solution.
> >>>>>>>>>>>>>> Why not just returning a name for the selected Rx/Tx mode?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Intel PMDs use the *generic* methods like x86 SSE, AVX2, ARM NEON, PPC ALTIVEC,
> >>>>>>>>>>>>> 'dev->data->scattered_rx' etc for the target : "DPDK is the Data Plane Development Kit
> >>>>>>>>>>>>> that consists of libraries to accelerate packet processing workloads running on a wide
> >>>>>>>>>>>>> variety of CPU architectures."
> >>>>>>>>>>>>
> >>>>>>>>>>>> How RTE_ETH_BURST_SCATTERED and RTE_ETH_BURST_BULK_ALLOC are generic?
> >>>>>>>>>>>> They just match some features of the Intel PMDs.
> >>>>>>>>>>>> Why not exposing other optimizations of the Rx/Tx implementations?
> >>>>>>>>>>>> You totally missed the point of generic burst mode description.
> >>>>>>>>>>>>
> >>>>>>>>>>>>> If understand these new experimental APIs from above, then bit options is the best,
> >>>>>>>>>>>>> and we didn't invent new words to describe them, just from the CPU & other *generic*
> >>>>>>>>>>>>> technology. And the application can loop to check which kind of burst is running by
> >>>>>>>>>>>>> just simple bit test.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> If PMDs missed these, they can update them in future roadmaps to enhance their PMDs,
> >>>>>>>>>>>>> like MLX5 supports ARM NEON, x86 SSE.
> >>>>>>>>>>>>
> >>>>>>>>>>>> I have no word!
> >>>>>>>>>>>> You really think other PMDs should learn from Intel how to "enhance" their PMD?
> >>>>>>>>>>>> You talk about mlx5, did you look at its code? Did you see the burst modes
> >>>>>>>>>>>> depending on which specific hardware path is used (MPRQ, EMPW, inline)?
> >>>>>>>>>>>> Or depending on which offloads are handled?
> >>>>>>>>>>>>
> >>>>>>>>>>>> Again, the instruction set used by the function is a small part
> >>>>>>>>>>>> of the burst mode optimization.
> >>>>>>>>>>>>
> >>>>>>>>>>>> So you did not reply to my question:
> >>>>>>>>>>>> Why not just returning a name for the selected Rx/Tx mode?
> >>>>>>>>>>>
> >>>>>>>>>>> In fact, RFC v1/v2 returns the *name*, but the *name* is hard for
> >>>>>>>>>>> application to do further processing, strcmp, strstr ? Not so nice
> >>>>>>>>>>> for C code, and it is not so standard, So switch it to bit definition.
> >>>>>>>>>>
> >>>>>>>>>> Again, please answer my question: why do you need it?
> >>>>>>>>>> I think it is just informative, that's why a string should be enough.
> >>>>>>>>>> I am clearly against the bitmap because it is way too much restrictive.
> >>>>>>>>>> I disagree that knowing it is using AVX2 or AVX512 is so interesting.
> >>>>>>>>>> What you would like to know is whether it is processing packets 4 by 4,
> >>>>>>>>>> for instance, or to know which offload is supported, or what hardware trick
> >>>>>>>>>> is used in the datapath design.
> >>>>>>>>>> There are so many options in a datapath design that it cannot be
> >>>>>>>>>> represented with a bitmap. And it makes no sense to have some design
> >>>>>>>>>> criterias more important than others.
> >>>>>>>>>> I Cc an Intel architect (Edwin) who could explain you how much
> >>>>>>>>>> a datapath design is more complicate than just using AVX instructions.
> >>>>>>>>>
> >>>>>>>>> As I understand this is to let applications to give informed decision based on
> >>>>>>>>> what vectorization is used in the driver, currently this is not know by the
> >>>>>>>>> application.
> >>>>>>>>>
> >>>>>>>>> And as previously replied, the main target of the API is to define the vector
> >>>>>>>>> path, not all optimizations, so the number is limited.
> >>>>>>>
> >>>>>>> No!
> >>>>>>> The name of this API is "burst mode information",
> >>>>>>> not "vector instructions used".
> >>>>>>> I think the main error is that in Intel PMDs,
> >>>>>>> each Rx/Tx function use different vector instructions.
> >>>>>>> So you generalize that knowing the vectors instructions
> >>>>>>> will give you a good information about the performance.
> >>>>>>> But this is generally wrong!
> >>>>>>> The right level of infos is much more complex.
> >>>>>>
> >>>>>> I don't think anyone was suggesting limiting it to purely describing PMD optimization
> >>>>>> with vector instructions. If there are other commonalities let's describe those also.
> >>>>>>
> >>>>>> Vectorization was thought to be a good starting point - IMHO it is.
> >>>>>>
> >>>>>>>
> >>>>>>>>> There are many optimization in the data path, I agree we may not represent all
> >>>>>>>>> of them, and agreed existing enum having "RTE_ETH_BURST_BULK_ALLOC" and similar
> >>>>>>>>> causing this confusion, perhaps we can remove them.
> >>>>>>>>>
> >>>>>>>>> And if the requirement from the application is just informative, I would agree
> >>>>>>>>> that free text string will be better, right now 'rte_eth_rx/tx_burst_mode_get()'
> >>>>>>>>> is the main API to provide the information and
> >>>>>>>>> 'rte_eth_burst_mode_option_name()' is a helper for application/driver to log
> >>>>>>>>> this information.
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>> Well look we have a general deficit of information about what is happening under
> >>>>>>>> the covers in DPDK. The end user may get wildly different performance characteristics
> >>>>>>>> based on the DPDK configuration. Simple example is using flow director causes the i40e
> >>>>>>>> PMD to switch to using a scalar code path, and performance may as much as half.
> >>>>>>>>
> >>>>>>>> This can cause no end of head-scratching in consuming products, I have done some
> >>>>>>>> of that head scratching myself, it is a usability nightmare.
> >>>>>>>>
> >>>>>>>> FD.io VPP tries to work around this by mining the call stack, to give the user _some_
> >>>>>>>> kind of information about what is happening. These kind of heroics should not be necessary.
> >>>>>>>>
> >>>>>>>> For exactly the same reasons as telemetry, we should be trying to give the users as much
> >>>>>>>> information as possible, in as standard as format as possible. Otherwise DPDK
> >>>>>>>> becomes arcane leaving the user running gdb to understand what is going on, as I
> >>>>>>>> frequently do.
> >>>>>>>
> >>>>>>> I agree we must provide a clue to understand the performance result.
> >>>>>>> As Stephen commented at the very beginning, a log is enough for such debug.
> >>>>>>> But his comment was ignored.
> >>>>>>
> >>>>>> Do we expect applications built on DPDK to have to grep it's log to make such discoveries?
> >>>>>> It's very brittle and arcane way to provide information, if nothing else.
> >>>>>>
> >>>>>>> You wanted an API, fine.
> >>>>>>> I am OK to have an API to request infos which are also in logs.
> >>>>>>
> >>>>>> I would point out that an API to query meta-data is common practice else where.
> >>>>>> GStreamer GstCaps and Linux Sysfs are the closest example I can think of.
> >>>>>>
> >>>>>>>
> >>>>>>>> Finally, again for the same reasons as telemetry, I would say that machine readable is the
> >>>>>>>> ideal here.
> >>>>>>>
> >>>>>>> I disagree here. There is no need to make this info machine readable.
> >>>>>>> We want a clue about the optimizations which are all about creativity.
> >>>>>>> And we cannot make creativity of developers "machine readable".
> >>>>>>
> >>>>>> I am more concerned about the creativity in how developers describe optimizations.
> >>>>>> If there is no standardization of strings (or bits), the API will be challenging to use.
> >>>>>
> >>>>> No it won't be challenging because it will be just a string to print.
> >>>>
> >>>> Well the challenge is getting everyone to use the same set of strings,
> >>>> such that what is returned by the API has common meaning.
> >>>>
> >>>> I am fine with strings.
> >>>> So long as we have a method of encouraging folks to use a standard set were possible.
> >>>
> >>> I don't understand why you insist on standardizing.
> >>> Every drivers are different.
> >>> The routine names will have a sense only in the context of the driver.
> >>
> >> The more diversity in description, the more the user is reaching for documentation.
> >> If the user is lucky enough that description has documentation.
> >
> > I would go even further:
> > The documentation will not explain each Rx/Tx routine.
> > The user can have some guess, but the main information is to see
> > that the routine changed from one run to the other, so he can expect
> > a change in the performance result.
> > And as a bonus, he can ask more explanation to the maintainers
> > by giving the routine name he got from the API.
> >
> >> We can argue about this indefinitely, instead of proposing a standard. :-)
> >> The best way to this is to leave as the API as experimental for some period - as Haiyue suggests.
> >> And then review as the API drops experimental status, with a view to standardizing if possible?
> >
> > Yes we can argue indefinitely.
> > My position is against standardizing this information.
> > (not even talking about the ABI breaks it could cause)
> 
> So here is my proposed work-around.
> 
> We are probably to early to describe any commonalities beyond vectorization.
> Vectorization is seen as too Intel specific.
> 
> Instead let's go with the PMD-specific strings for v20 ABI.
> Let's review the PMD-specific strings that emerge during v20 + Experimental,
> and see if it possible to start to standardize for v21 ABI?
> 
> My position is that some degree of standardization is necessary here, for usability if no other reason.
> I am happy to give it time for that to emerge instead of trying to dictate it.
> 
> So go with PMD-specific strings, and review after 1 year to see if we can improve?
> 

Hi Experts,

Please give feedback on new patch. :-)

http://patchwork.dpdk.org/patch/62368/

struct rte_eth_burst_mode {
	uint64_t flags; /**< The ORed values of RTE_ETH_BURST_FLAG_xxx */

#define RTE_ETH_BURST_MODE_INFO_SIZE 1024 /**< Maximum size for information */
	char info[RTE_ETH_BURST_MODE_INFO_SIZE]; /**< burst mode information */
};

If most of PMDs want to follow some rule to format string, maybe new flag can be
defined like: RTE_ETH_BURST_FLAG_FMT_BY_XXX, then the 'info' has some kind of the
same format. And the old application can still just *display* it. New application
can parse it by the known format as indicated by RTE_ETH_BURST_FLAG_FMT_BY_XXX.

Buy in this design ?


> >
> >>>>> The challenge is trying to fix the design characteristics in an API.
> >>>>
> >>>> I thought Haiyue's patch with a fair degree of input from Ferruh and others is a pretty solid
> start.
> >>>> Let's describe those commonalities that _do_ exist today - it may not be enough, but it's better
> than we had.
> >>>
> >>> The initial requirement is to describe what makes the performance
> >>> change from one routine to the other.
> >>> We don't want to describe the commonalities but the specific differences.
> >>> Please let's focus on the real requirement and build an API which really helps.
> >>> As said several times, a PMD-specific string would be a good API.
> >
> >
> >


More information about the dev mailing list