[dpdk-dev] [PATCH v2] ethdev: fix device info getting

Thomas Monjalon thomas at monjalon.net
Tue Oct 23 09:28:45 CEST 2018


23/10/2018 03:25, Lu, Wenzhuo:
> Hi Thomas, Ferruh, Andrew,
> 
> From: Thomas Monjalon [mailto:thomas at monjalon.net]
> > 22/10/2018 14:01, Ferruh Yigit:
> > > On 8/23/2018 9:58 AM, Andrew Rybchenko wrote:
> > > > On 22.08.2018 19:55, Ferruh Yigit wrote:
> > > >> On 8/14/2018 1:57 AM, Lu, Wenzhuo wrote:
> > > >>> Hi Andrew,
> > > >>> From: Andrew Rybchenko [mailto:arybchenko at solarflare.com]
> > > >>>> On 13.08.2018 05:50, Lu, Wenzhuo wrote:
> > > >>>>> From: Thomas Monjalon [mailto:thomas at monjalon.net]
> > > >>>>>> 16/07/2018 03:58, Lu, Wenzhuo:
> > > >>>>>>> Hi Andrew,
> > > >>>>>>> From: Lu, Wenzhuo
> > > >>>>>>>> From: Andrew Rybchenko [mailto:arybchenko at solarflare.com]
> > > >>>>>>>>>
> > > >>>>>>>>> Hi, Wenzhuo,
> > > >>>>>>>>>
> > > >>>>>>>>> I'm sorry, but I have more even harder questions than the
> > > >>>>>>>>> previous
> > > >>>> one.
> > > >>>>>>>>> This questions are rather generic and mainly to ethdev
> > maintainers.
> > > >>>>>>>>>
> > > >>>>>>>>> On 13.07.2018 05:42, Wenzhuo Lu wrote:
> > > >>>>>>>>>> The device information cannot be gotten correctly before
> > > >>>>>>>>>> the configuration is set. Because on some NICs the
> > > >>>>>>>>>> information has dependence on the configuration.
> > > >>>>>>>>> Thinking about it I have the following question. Is it valid
> > > >>>>>>>>> behaviour of the dev_info if it changes after configuration?
> > > >>>>>>>>> I always thought that the primary goal of the dev_info is to
> > > >>>>>>>>> provide information to app about device capabilities to
> > > >>>>>>>>> allow app configure device and queues correctly. Now we see
> > > >>>>>>>>> the case when dev_info changes on configure. May be it is
> > > >>>>>>>>> acceptable, but it is really suspicious. If we accept it, it should
> > be documented.
> > > >>>>>>>>> May be dev_info should be split into parts: part which is
> > > >>>>>>>>> persistent and part which may depend on device configuration.
> > > >>>>>>>> As I remember, the similar discussion has happened :) I've
> > > >>>>>>>> raised the similar suggestion like this. But we don’t make it
> > happen.
> > > >>>>>>>> The reason is, you see, this is the rte layer's behavior. So
> > > >>>>>>>> the user doesn't have to know it. From APP's PoV, it inputs
> > > >>>>>>>> the configuration, it calls this API "rte_eth_dev_configure".
> > > >>>>>>>> It doesn't know  the configuration is copied before getting the
> > info or not.
> > > >>>>>>>> So, to my opinion, we can still keep the behavior. We only
> > > >>>>>>>> need to split it into parts when we do see the case that cannot
> > make it.
> > > >>>>>>> Maybe I talked too much about the patch. Think about it again.
> > > >>>>>>> Your comments is about how to use the APIs,
> > > >>>>>>> rte_eth_dev_info_get,
> > > >>>>>> rte_eth_dev_configure. To my opinion, rte_eth_dev_info_get is
> > > >>>>>> just to get the info. It can be called anywhere, before
> > > >>>>>> configuration or after. It's reasonable the info changes with the
> > configuration changing.
> > > >>>>>>> But we do have something missing, like,
> > > >>>>>>> rte_eth_dev_capability_get which
> > > >>>>>> should be stable. APP can use this API to get the necessary
> > > >>>>>> info before configuration.
> > > >>>>>>> A question, maybe a little divergent thinking, that APP should
> > > >>>>>>> have some
> > > >>>>>> intelligence to handle the capability automatically. So getting
> > > >>>>>> the capability is not so good and effective, looks like we
> > > >>>>>> still need the human
> > > >>>> involvement.
> > > >>>>>> Maybe that the reason currently we suppose APP know the
> > > >>>>>> capability from the paper copies, examples...
> > > >>>>>>
> > > >>>>>> I am not sure to understand all the sentences.
> > > >>>>>> But I agree that we should take a decision about the stability
> > > >>>>>> of these
> > > >>>> infos.
> > > >>>>>> Either infos cannot change after probing, or we must document
> > > >>>>>> that the app must request infos regularly (when?).
> > > >>>>> Sorry, I missed this mail.
> > > >>>>>
> > > >>>>> I have the concern that different NICs have different behavior.
> > > >>>>> One info
> > > >>>> can be stable on a NIC but dynamic on another. Considering this,
> > > >>>> we may better not splitting the rte_eth_dev_info_get to 2 APIs.
> > > >>>> And comparing with handling this in rte layer, maybe we can let every
> > NIC has its own decision.
> > > >>>>> I have an idea. Maybe we can add a parameter for potential
> > > >>>>> dynamic fields. Like, Changing uint16_t nb_rx_queues; to struct
> > > >>>>> nb_rx_queues { uint16_t value; bool stable; }
> > > >>>> May be it is just very bad example, but as I understand
> > > >>>> nb_rx_queues is mainly required to configure the device properly.
> > > >>>> Or should app configure, get new value, reconfigure again, get
> > > >>>> new value and so on and stop when previous is equal to the new one.
> > Yes, I dramatise and it sounds really bad.
> > > >>>> In any case it would over-complicate interface and no single app
> > > >>>> will do it correctly.
> > > >>> I  think you're talking about max_rx_queues. APP can get that info
> > before configuration. Then configure rx queue number which is not larger
> > than it. That's enough.
> > > >>> nb_rx_queues should be the number which is configured by APP and
> > how many queues are actually used. To my opinion, it's mainly used by the
> > GUI to show the value to human being.
> > > >>>
> > > >>> BTW, max_rx_queues could be an good example that shows that
> > some parameters are stable on some NICs but not on other NICs.
> > > >>> Take Intel NICs for example (I don’t familiar with others.), normally
> > max_rx_queues is stable on PF. But on VF, as the max number is decided by
> > PF, it could be dynamic. When VF starts, it can get an default value from PF.
> > If it not enough, it can request a larger one from PF. If the number works, VF
> > can get a new number.
> > > >> "struct rte_eth_dev_info" is a little overloaded, it has:
> > > >> - static info, like *device
> > > >> - device limitations, max_*, *_lim
> > > >> - device capabilities, *_capa
> > > >> - suggested configurations, default_*conf
> > > >> - device configuration, nb_[r/t]x_queues
> > > >> - other, switch_info
> > > >>
> > > >> There is a concern that some values are dynamic, but this is not
> > > >> new, for example nb_rx/tx_queues can be changed by
> > > >> rte_eth_dev_rx/tx_queue_config() API and rte_eth_dev_info() output
> > will be changed.
> > > >
> > > > The example looks different to me. It is explicit changes directly
> > > > requested by the application. So, it is not a surprise that it changes.
> > > >
> > > >> For this patch suggested configuration changes based on some other
> > > >> config values looks ok as concept.
> > > >> So I think we can say after every configuration related API dev
> > > >> info can be changed.
> > > >
> > > > I think that saying that any configuration changes may result in any
> > > > changes in dev_info is hardly helpful. I'd suggest to be more specific.
> > > > Yes, it is harder and will have bugs, but at least it is helpful.
> > >
> > > Hi Andrew, Wenzhuo,
> > >
> > > Back to this patch, which fixes an actual defect,
> > >
> > > What do you think about:
> > > 1- Keep existing patch but extend it as, save the original "dev->data"
> > > and revert it back to this original data on all error path.
> > > 2- Update rte_eth_dev_info() API document and say default
> > > configuration can be changed based on other config fields. So this
> > > reduces the scope of things can change in dev_info.
> > 
> > I think we are doing too much juggling with data in ethdev layer.
> > All these things should be the responsibility of the PMD.
> > My radical proposal would be to remove rte_eth_dev_info and integrate all
> > the data into rte_eth_dev_data.
> > 
> Sorry for missing this discussion. It's a good discussion about how to optimize the rte_eth.
> But I have to say that above discussion can reach a huge reconsitution of the rte_eth and impact every PMD. Is that fair? 
> This patch is only try to revert a bad commit as we already find bug. As I remember, at the beginning, Andrew said the discussion may not about the patch but generic. So could we just tell if this patch itself OK at first?  Thanks.

I have no strong opinion about your patch.
Sorry, I am more interested by a possible future big clean-up.
About a temporary fix, I am OK if Ferruh and Andrew are OK.




More information about the dev mailing list