[dpdk-dev] rte_eal_hotplug_remove() generates error message

Hideyuki Yamashita yamashita.hideyuki at po.ntt-tx.co.jp
Thu Dec 27 06:52:17 CET 2018


> On Tue, Dec 18, 2018 at 08:30:16PM +0900, Hideyuki Yamashita wrote:
> > > On Tue, Dec 18, 2018 at 02:52:16PM +0900, Hideyuki Yamashita wrote:
> > > > > On Tue, Dec 18, 2018 at 12:11:33PM +0900, Hideyuki Yamashita wrote:
> > > > > > 
> > > > > > On Mon, Dec 17, 2018 at 11:21:01AM +0000, Burakov, Anatoly wrote:
> > > > > > > > On 17-Dec-18 10:45 AM, Hideyuki Yamashita wrote:
> > > > > > > > > > On 17-Dec-18 10:02 AM, Hideyuki Yamashita wrote:
> > > > > > > > > > > Dear Thomas and all,
> > > > > > > > > > > 
> > > > > > > > > > > I took a look on dpdk code.
> > > > > > > > > > > I firstly write qustions and my analisys
> > > > > > > > > > > on the current dpdk code follows after that.
> > > > > > > > > > > 
> > > > > > > > > > > [1.Questions]
> > > > > > > > > > > I have several questions to ask again.
> > > > > > > > > > > Is my understanding correct about followings
> > > > > > > > > > > 
> > > > > > > > > > > Q1: "EAL:ERROR, Invalid memory" is ignorable
> > > > > > > > > > > 
> > > > > > > > > > > Q2: there is no big difference between calling
> > > > > > > > > > > rte_eal_hotplug_remove(bus->name, dev->name)
> > > > > > > > > > > and
> > > > > > > > > > > rte_dev_remove(dev) because anyway it
> > > > > > > > > > > reaches to rte_pmd_vhost_remove and encounter
> > > > > > > > > > > the same error.
> > > > > > > > > > > 
> > > > > > > > > > > [2.snip from my code]
> > > > > > > > > > > .....
> > > > > > > > > > >           rte_eth_dev_close(port_id);
> > > > > > > > > > >           ret = rte_dev_remove(dev);
> > > > > > > > > > >           if (ret < 0)
> > > > > > > > > > >                   return ret;
> > > > > > > > > > >           rte_eth_dev_release_port(&rte_eth_devices[port_id]);
> > > > > > > > > > > 
> > > > > > > > > > > [3. My analysis on dpdk code]
> > > > > > > > > > > static int
> > > > > > > > > > >     rte_pmd_vhost_remove(struct rte_vdev_device *dev)
> > > > > > > > > > >     {
> > > > > > > > > > >      ...........
> > > > > > > > > > >            eth_dev_close(eth_dev);
> > > > > > > > > > > 
> > > > > > > > > > >             rte_free(vring_states[eth_dev->data->port_id]);
> > > > > > > > > > >             vring_states[eth_dev->data->port_id] = NULL;
> > > > > > > > > > > 
> > > > > > > > > > >             rte_eth_dev_release_port(eth_dev);
> > > > > > > > > > > 
> > > > > > > > > > > As you can see in rte_eth_vhost.c
> > > > > > > > > > > It calls both eth_dev_close and rte_eth_dev_release_port.
> > > > > > > > > > > And inside both functions, it tries to free mac_addrs.
> > > > > > > > > > >           rte_free(dev->data->mac_addrs);       //in rth_dev_close
> > > > > > > > > > >           rte_free(eth_dev->data->mac_addrs);  //in rte_eth_dev_release_port
> > > > > > > > > > > 
> > > > > > > > > > > I understand that is the reason why
> > > > > > > > > > > /* Free the memory space back to heap */
> > > > > > > > > > > void rte_free(void *addr)
> > > > > > > > > > > {
> > > > > > > > > > >           if (addr == NULL) return;
> > > > > > > > > > >           if (malloc_heap_free(malloc_elem_from_data(addr)) < 0)
> > > > > > > > > > >                   RTE_LOG(ERR, EAL, "Error: Invalid memory\n");
> > > > > > > > > > > }
> > > > > > > > > > > encounter the error.
> > > > > > > > > > > As an experiment, I commented out one of them, "ERR, Invalid memory"
> > > > > > > > > > > disappered.
> > > > > > > > > > > 
> > > > > > > > > > > Thanks and BR,
> > > > > > > > > > > Hideyuki Yamashita
> > > > > > > > > > > NTT TechnoCross
> > > > > > > > > > > 
> > > > > > > > > > > > Adding my colleague Yasufumi and Hiroyuki as CC.
> > > > > > > > > > > > 
> > > > > > > > > > > > We are waiting valuable advice from you.
> > > > > > > > > > > > 
> > > > > > > > > > > > Thanks in advance,
> > > > > > > > > > > > Hideyuki Yamashita
> > > > > > > > > > > > NTT TechnoCross
> > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > > Dear Thomas and all,
> > > > > > > > > > > > > 
> > > > > > > > > > > > > I hope you all get safely back home after DPDK summit.
> > > > > > > > > > > > > (When I get back Japan, it is chilling. (start of winter))
> > > > > > > > > > > > > 
> > > > > > > > > > > > > On DPDK 18.11.0, we tried to remove vhost device by using rte_eal_hotplug_remove().
> > > > > > > > > > > > > However, following syslog message is printed.
> > > > > > > > > > > > > “EAL: Error: Invalid memory”
> > > > > > > > > > > > > 
> > > > > > > > > > > > > At DPDK summit San Jose, we had chance to ask Thomas how to handle the error message, and he gave us following advice:
> > > > > > > > > > > > > Replace “rte_eal_hotplug_add()” to “rte_dev_probe(devargs)” and
> > > > > > > > > > > > > “rte_eal_hotplug_remove()” to “rte_eth_dev_close() and rte_dev_remove(rte_dev)”
> > > > > > > > > > > > > 
> > > > > > > > > > > > > We tested above changes, but the result is the same (i.e., same error message is printed).
> > > > > > > > > > > > > The debug log message says:
> > > > > > > > > > > > > ---
> > > > > > > > > > > > > [primary]
> > > > > > > > > > > > > VHOST_CONFIG: vhost-user server: socket created, fd: 17
> > > > > > > > > > > > > VHOST_CONFIG: bind to /tmp/sock0
> > > > > > > > > > > > > EAL: Error: Invalid memory
> > > > > > > > > > > > > VHOST_CONFIG: vhost-user server: socket created, fd: 17
> > > > > > > > > > > > > VHOST_CONFIG: bind to /tmp/sock0
> > > > > > > > > > > > > 
> > > > > > > > > > > > > [secondary]
> > > > > > > > > > > > > APP: devargs=eth_vhost0,iface=/tmp/sock0,queues=1
> > > > > > > > > > > > > EAL: request: eal_dev_mp_request
> > > > > > > > > > > > > EAL: msg: eal_dev_mp_request
> > > > > > > > > > > > > EAL: request: bus_vdev_mp
> > > > > > > > > > > > > EAL: msg: bus_vdev_mp
> > > > > > > > > > > > > EAL: msg: bus_vdev_mp
> > > > > > > > > > > > > EAL: reply: eal_dev_mp_request
> > > > > > > > > > > > > EAL: msg: eal_dev_mp_request
> > > > > > > > > > > > > rte_eth_promiscuous_disable: Function not supported
> > > > > > > > > > > > > rte_eth_allmulticast_disable: Function not supported
> > > > > > > > > > > > > APP: To Server: add
> > > > > > > > > > > > > EAL: request: eal_dev_mp_request
> > > > > > > > > > > > > EAL: msg: eal_dev_mp_request
> > > > > > > > > > > > > EAL: reply: eal_dev_mp_request
> > > > > > > > > > > > > EAL: msg: eal_dev_mp_request
> > > > > > > > > > > > > APP: To Server: del
> > > > > > > > > > > > > APP: devargs=eth_vhost0,iface=/tmp/sock0,queues=1
> > > > > > > > > > > > > EAL: request: eal_dev_mp_request
> > > > > > > > > > > > > EAL: msg: eal_dev_mp_request
> > > > > > > > > > > > > EAL: request: bus_vdev_mp
> > > > > > > > > > > > > EAL: msg: bus_vdev_mp
> > > > > > > > > > > > > EAL: msg: bus_vdev_mp
> > > > > > > > > > > > > EAL: reply: eal_dev_mp_request
> > > > > > > > > > > > > EAL: msg: eal_dev_mp_request
> > > > > > > > > > > > > rte_eth_promiscuous_disable: Function not supported
> > > > > > > > > > > > > rte_eth_allmulticast_disable: Function not supported
> > > > > > > > > > > > > APP: To Server: add
> > > > > > > > > > > > > ---
> > > > > > > > > > > > > 
> > > > > > > > > > > > > We would like to ask:
> > > > > > > > > > > > > 1)	Is the message “EAL: Error: Invalid memory” ignorable or not? There is no obstacle except this message to re-add the vhost device.
> > > > > > > > > > > > > 2)	Which is the better(best?) way to add/del vhost device “rte_eal_hotplug_add/remove()” or the way Thomas suggested?
> > > > > > > > > > > > > 
> > > > > > > > > > > > > Thanks in advance and have a nice day.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > BR,
> > > > > > > > > > > > > Hideyuki Yamashita
> > > > > > > > > > > > > NTT TechnoCross
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > Hi Hideyuki,
> > > > > > > > > > 
> > > > > > > > > > The error you're referring to (about invalid memory) means that you're trying to free a pointer that points to invalid memory. Meaning, either the pointer itself is not pointing to an allocated area, or it points to memory that has already been freed.
> > > > > > > > > > 
> > > > > > > > > > If dev->data->mac_addrs and eth_dev->data->mac_addrs point to the same area, this is a bug, because this would lead to double free, and rte_malloc will rightly complain about invalid memory. Now, malloc won't try to do anything with the invalid memory, so the error itself is harmless *as far as malloc is concerned*, but these attempts to free the memory twice should be fixed whereever they happen.
> > > > > > > > > > 
> > > > > > > > > > I'm not well-versed in dev infrastructure, so i wouldn't be able to say which one of the rte_free calls is an extra, unneeded one. This is something e.g. Thomas could help with, or the driver maintainer.
> > > > > > > > > > 
> > > > > > > > > > --Thanks,
> > > > > > > > > > Anatoly
> > > > > > > > > Hello Anatoly,
> > > > > > > > > 
> > > > > > > > > Thanks for your reply for my newbie question.
> > > > > > > > > Now I understand that this error is harmless from DPDK application(SPP)
> > > > > > > > > point of view in practice. Thanks.
> > > > > > > > > But anyway if there is a double free logic, it is a bug and should be
> > > > > > > > > fixed.
> > > > > > > > > The remaining issues are
> > > > > > > > > 1. If it is really a bug (or my mis-understanding)
> > > > > > > > > 2. If is is a bug which function should remove rte_free(mac_addrs)
> > > > > > > > 
> > > > > > > > From description, it looks like a bug. Correct usage of API (rte_dev_close()
> > > > > > > > followed by rte_dev_remove()) should not trigger any errors. You might want
> > > > > > > > to create a BugZilla entry describing the issue.
> > > > > > > 
> > > > > > > I also think it's a bug. The rte_free(dev->data->mac_addrs) in
> > > > > > > vhost PMD's eth_dev_close() should be removed, as the common data
> > > > > > > freeing has been moved to rte_eth_dev_release_port().
> > > > > > > 
> > > > > > > @Hideyuki, thanks for digging into this!
> > > > > > 
> > > > > > Hello Twei,
> > > > > > 
> > > > > > I share the same view with you.
> > > > > > rte_eth_dev_release_port() is used in commonly in many pmds.
> > > > > > rte_dev_close() is vhost pmd local.
> > > > > > 
> > > > > > BTW, what is the difference between using
> > > > > > rte_eal_hotplug_remove and rte_eth_dev_close()+rte_dev_remove?
> > > > > > (I see no big difference right now from application developer point of
> > > > > > view)
> > > > > 
> > > > > IMO, the difference is that the former one doesn't require
> > > > > operating on rte_device directly.
> > > > > 
> > > > > > 
> > > > > > Next question is what should I do next (you know I am very newbie).
> > > > > > File this bug into BUgzzila of DPDK?
> > > > > 
> > > > > The issue is quite clear, I can fix it directly.
> > > > > 
> > > > > And as the fix is also quite clear, if you want to contribute
> > > > > patch, you can also try to send a fix patch [1].
> > > > > 
> > > > > Either way is fine for me. Just let me know. :-)
> > > > > 
> > > > > [1] https://core.dpdk.org/contribute/#send
> > > > Hello,
> > > > 
> > > > Thanks for your guidance.
> > > > 
> > > > Well, I think it is a good chance to be "a contributer of DPDK".
> > > > (even for small one. I am getting excited already)
> > > > So let me give a try.
> > > 
> > > Great! Looking forward to your patch. :-)
> > Hello,
> > 
> > I posted my very first patch to DPDK.
> > After that very basic questions come up
> > to my mind.
> > When will those will be released(if that is merged safely)?
> 
> For the release time, you can check the schedule of the
> next release from below link:
> 
> https://core.dpdk.org/roadmap/#dates
> 
> > That is my natural question as application developer.
> > (Next release 19.02 of course?)
> 
> Yes.
Dear Tiwei and all,

I have question.

Q1.
I CCed my patch to stable at dpdk.org
Is that mean it will be reflected into 18.11 stable release?
The reason why I ask this is I think it is better that bug should 
be reflected into LTS because I and SPP users mainly use DPDK LTS(18.11).

Q2.
If yes, when will those stable release will be released?
(e.g. once a month )

Q3.
I understand my patch is acked and merged into 
dpdk-next-virtio.
What is dpdk-next-virtio?
It is the branch which will be reflected into next release?

Thanks in advance!

BR,
Hideyuki Yamashita
NTT TechnoCross
 
> > 
> > I need to craeate patch also to SPP(my app)
> > and I have to write down something about 
> > fix release timing of this bug in my commit message
> > to notify the SPP users.
> > (e.g. Until the next release of DPDK, you may encounter
> > "EAL:Error Invalid memory" when deleteing vhost pmd.)
> > 
> > Thanks in advance for your kind advice to newbie.
> > 
> > BR,
> > Hideyuki Yamashita
> > NTT TechnoCross
> >  
> > > > 
> > > > Thanks for your kindness.
> > > > 
> > > > BR,
> > > > Hideyuki Yamashita
> > > > NTT TechnoCross
> > > > 
> > > > > 
> > > > > > 
> > > > > > Thanks for your confirming.
> > > > > > 
> > > > > > BR,
> > > > > > Hideyuki Yamashita
> > > > > > NTT TechnoCross
> > > > > >  
> > > > > > > > 
> > > > > > > > > 
> > > > > > > > > Thanks,
> > > > > > > > > Hideyuki Yamashita
> > > > > > > > > NTT TechnoCross
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > -- 
> > > > > > > > Thanks,
> > > > > > > > Anatoly
> > > > > > 
> > > > > > 
> > > > 
> > 
> > 
> > 




More information about the dev mailing list