[dpdk-dev] rte_eal_hotplug_remove() generates error message

Burakov, Anatoly anatoly.burakov at intel.com
Mon Dec 17 11:38:51 CET 2018


On 17-Dec-18 10:23 AM, Burakov, Anatoly wrote:
> On 17-Dec-18 10:02 AM, Hideyuki Yamashita wrote:
>> Dear Thomas and all,
>>
>> I took a look on dpdk code.
>> I firstly write qustions and my analisys
>> on the current dpdk code follows after that.
>>
>> [1.Questions]
>> I have several questions to ask again.
>> Is my understanding correct about followings
>>
>> Q1: "EAL:ERROR, Invalid memory" is ignorable
>>
>> Q2: there is no big difference between calling
>> rte_eal_hotplug_remove(bus->name, dev->name)
>> and
>> rte_dev_remove(dev) because anyway it
>> reaches to rte_pmd_vhost_remove and encounter
>> the same error.
>>
>> [2.snip from my code]
>> .....
>>          rte_eth_dev_close(port_id);
>>          ret = rte_dev_remove(dev);
>>          if (ret < 0)
>>                  return ret;
>>          rte_eth_dev_release_port(&rte_eth_devices[port_id]);
>>
>> [3. My analysis on dpdk code]
>> static int
>>    rte_pmd_vhost_remove(struct rte_vdev_device *dev)
>>    {
>>     ...........
>>           eth_dev_close(eth_dev);
>>
>>            rte_free(vring_states[eth_dev->data->port_id]);
>>            vring_states[eth_dev->data->port_id] = NULL;
>>
>>            rte_eth_dev_release_port(eth_dev);
>>
>> As you can see in rte_eth_vhost.c
>> It calls both eth_dev_close and rte_eth_dev_release_port.
>> And inside both functions, it tries to free mac_addrs.
>>          rte_free(dev->data->mac_addrs);       //in rth_dev_close
>>          rte_free(eth_dev->data->mac_addrs);  //in 
>> rte_eth_dev_release_port
>>
>> I understand that is the reason why
>> /* Free the memory space back to heap */
>> void rte_free(void *addr)
>> {
>>          if (addr == NULL) return;
>>          if (malloc_heap_free(malloc_elem_from_data(addr)) < 0)
>>                  RTE_LOG(ERR, EAL, "Error: Invalid memory\n");
>> }
>> encounter the error.
>> As an experiment, I commented out one of them, "ERR, Invalid memory"
>> disappered.
>>
>> Thanks and BR,
>> Hideyuki Yamashita
>> NTT TechnoCross
>>
>>> Adding my colleague Yasufumi and Hiroyuki as CC.
>>>
>>> We are waiting valuable advice from you.
>>>
>>> Thanks in advance,
>>> Hideyuki Yamashita
>>> NTT TechnoCross
>>>
>>>>
>>>> Dear Thomas and all,
>>>>
>>>> I hope you all get safely back home after DPDK summit.
>>>> (When I get back Japan, it is chilling. (start of winter))
>>>>
>>>> On DPDK 18.11.0, we tried to remove vhost device by using 
>>>> rte_eal_hotplug_remove().
>>>> However, following syslog message is printed.
>>>> “EAL: Error: Invalid memory”
>>>>
>>>> At DPDK summit San Jose, we had chance to ask Thomas how to handle 
>>>> the error message, and he gave us following advice:
>>>> Replace “rte_eal_hotplug_add()” to “rte_dev_probe(devargs)” and
>>>> “rte_eal_hotplug_remove()” to “rte_eth_dev_close() and 
>>>> rte_dev_remove(rte_dev)”
>>>>
>>>> We tested above changes, but the result is the same (i.e., same 
>>>> error message is printed).
>>>> The debug log message says:
>>>> ---
>>>> [primary]
>>>> VHOST_CONFIG: vhost-user server: socket created, fd: 17
>>>> VHOST_CONFIG: bind to /tmp/sock0
>>>> EAL: Error: Invalid memory
>>>> VHOST_CONFIG: vhost-user server: socket created, fd: 17
>>>> VHOST_CONFIG: bind to /tmp/sock0
>>>>
>>>> [secondary]
>>>> APP: devargs=eth_vhost0,iface=/tmp/sock0,queues=1
>>>> EAL: request: eal_dev_mp_request
>>>> EAL: msg: eal_dev_mp_request
>>>> EAL: request: bus_vdev_mp
>>>> EAL: msg: bus_vdev_mp
>>>> EAL: msg: bus_vdev_mp
>>>> EAL: reply: eal_dev_mp_request
>>>> EAL: msg: eal_dev_mp_request
>>>> rte_eth_promiscuous_disable: Function not supported
>>>> rte_eth_allmulticast_disable: Function not supported
>>>> APP: To Server: add
>>>> EAL: request: eal_dev_mp_request
>>>> EAL: msg: eal_dev_mp_request
>>>> EAL: reply: eal_dev_mp_request
>>>> EAL: msg: eal_dev_mp_request
>>>> APP: To Server: del
>>>> APP: devargs=eth_vhost0,iface=/tmp/sock0,queues=1
>>>> EAL: request: eal_dev_mp_request
>>>> EAL: msg: eal_dev_mp_request
>>>> EAL: request: bus_vdev_mp
>>>> EAL: msg: bus_vdev_mp
>>>> EAL: msg: bus_vdev_mp
>>>> EAL: reply: eal_dev_mp_request
>>>> EAL: msg: eal_dev_mp_request
>>>> rte_eth_promiscuous_disable: Function not supported
>>>> rte_eth_allmulticast_disable: Function not supported
>>>> APP: To Server: add
>>>> ---
>>>>
>>>> We would like to ask:
>>>> 1)    Is the message “EAL: Error: Invalid memory” ignorable or not? 
>>>> There is no obstacle except this message to re-add the vhost device.
>>>> 2)    Which is the better(best?) way to add/del vhost device 
>>>> “rte_eal_hotplug_add/remove()” or the way Thomas suggested?
>>>>
>>>> Thanks in advance and have a nice day.
>>>>
>>>> BR,
>>>> Hideyuki Yamashita
>>>> NTT TechnoCross
>>>
>>
>>
>>
> 
> Hi Hideyuki,
> 
> The error you're referring to (about invalid memory) means that you're 
> trying to free a pointer that points to invalid memory. Meaning, either 
> the pointer itself is not pointing to an allocated area, or it points to 
> memory that has already been freed.
> 
> If dev->data->mac_addrs and eth_dev->data->mac_addrs point to the same 
> area, this is a bug, because this would lead to double free, and 
> rte_malloc will rightly complain about invalid memory. Now, malloc won't 
> try to do anything with the invalid memory, so the error itself is 
> harmless *as far as malloc is concerned*, but these attempts to free the 
> memory twice should be fixed whereever they happen.
> 
> I'm not well-versed in dev infrastructure, so i wouldn't be able to say 
> which one of the rte_free calls is an extra, unneeded one. This is 
> something e.g. Thomas could help with, or the driver maintainer.
> 

...with that said, to me, this looks suspicious:


 >          rte_eth_dev_close(port_id);
 >          ret = rte_dev_remove(dev);
 >          if (ret < 0)
 >                  return ret;
 >          rte_eth_dev_release_port(&rte_eth_devices[port_id]);

So, you're closing the port, then you remove the device, then you  
release port. From your own analysis, what dev_remove does is the following:

 >    rte_pmd_vhost_remove(struct rte_vdev_device *dev)
 >    {
 >     ...........
 >           eth_dev_close(eth_dev);
 >
 >            rte_free(vring_states[eth_dev->data->port_id]);
 >            vring_states[eth_dev->data->port_id] = NULL;
 >
 >            rte_eth_dev_release_port(eth_dev);

So, you have duplicate call to eth_dev_close() (which is likely not a  
problem), and a duplicate call to rte_eth_dev_release_port(). So, if  
rte_dev_remove() already calls rte_eth_dev_release_port(), why are you  
calling it in your code?

Now, as i said above, i'm not well-versed enough in the dev  
infrastructure to know which of the calls should go where, but i'm  
pretty sure you shouldn't call rte_eth_dev_release_port twice - this is  
likely what's causing your double-free.

-- 
Thanks,
Anatoly


More information about the dev mailing list