[dpdk-dev] [PATCH v2] vhost: fix mbuf alloc failure

Maxime Coquelin maxime.coquelin at redhat.com
Tue May 5 10:20:56 CEST 2020


(Please try to avoid HTML for the replies, it makes it hard to follow)

See my replies below:

On 5/5/20 7:48 AM, Tummala, Sivaprasad wrote:
> Hi Flavio,
> 
>  
> 
> Thanks for your comments.
> 
>  
> 
> SNIPPED
> 
>  
> 
>>                      pkts[i] = virtio_dev_pktmbuf_alloc(dev,
> mbuf_pool, buf_len);
> 
>> -                   if (unlikely(pkts[i] == NULL))
> 
>> +                  if (unlikely(pkts[i] == NULL)) {
> 
>> +                              /*
> 
>> +                              * mbuf allocation fails for jumbo
> packets when external
> 
>> +                              * buffer allocation is not allowed and
> linear buffer
> 
>> +                              * is required. Drop this packet.
> 
>> +                              */
> 
>> +#ifdef RTE_LIBRTE_VHOST_DEBUG
> 
>> +                              VHOST_LOG_DATA(ERR,
> 
>> +                                          "Failed to allocate memory
> for mbuf. Packet dropped!\n"); #endif
> 
>  
> 
> That message is useful to spot that missing packets that happens once in
> a while, so we should be able to see it even in production without debug
> enabled. However, we can't let it flood the log.
> 
> Agreed, but VHOST_LOG wrapper does not have rate limit functionality.
> 
>  
> 
>  
> 
> I am not sure if librte eal has this functionality, but if not you could
> limit by using a static bool:
> 
>  
> 
> static bool allocerr_warned = false;
> 
>  
> 
> if (allocerr_warned) {
> 
>     VHOST_LOG_DATA(ERR,
> 
>     "Failed to allocate memory for mbuf. Packet dropped!\n");
> 
>     allocerr_warned = true;
> 
> }
> 
>  
> 
> This is good idea, but having a static variable makes it file scope
> making it to  entire VHOST devices. Hence if the intention is to
> implement device specific
> 
> log rate limit, should not we resort to `dev->allocerr_warn` counter
> mechanism, which resets after n failures `#define LOG_ALLOCFAIL 32`.

I prefer Flavio's proposal, it would have less performance impact than
increasing struct virtio_net size. As soon as we can see the error
popping once in the log message, it gives some clues on what to
investigate. Maybe providing more details on the failure could help,
like printing the pool name and the requested length.

Maxime



More information about the dev mailing list