[dpdk-dev] Rx Can't receive anymore packet after received 1.5 billion packet.

vuonglv at viettel.com.vn vuonglv at viettel.com.vn
Tue Jul 18 03:36:54 CEST 2017



On 07/17/2017 05:31 PM, cristian.dumitrescu at intel.com wrote:
>
>> -----Original Message-----
>> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of
>> vuonglv at viettel.com.vn
>> Sent: Monday, July 17, 2017 3:04 AM
>> Cc: users at dpdk.org; dev at dpdk.org
>> Subject: [dpdk-dev] Rx Can't receive anymore packet after received 1.5
>> billion packet.
>>
>> Hi DPDK team,
>> Sorry when I send this email to both of group users and dev. But I have
>> big problem: Rx core on my application can not receive anymore packet
>> after I did the stress test to it (~1 day Rx core received ~ 1.5 billion
>> packet). Rx core still alive but didn't receive any packet and didn't
>> generate any log. Below is my system configuration:
>> - OS: CentOS 7
>> - Kernel: 3.10.0-514.16.1.el7.x86_64
>> - Huge page: 32G: 16384 page 2M
>> - NIC card: Intel 85299
>> - DPDK version: 16.11
>> - Architecture: Rx (lcore 1) received packet then queue to the ring
>> ----- Worker (lcore 2) dequeue packet in the ring and free it (use
>> rte_pktmbuf_free() function).
>> - Mempool create: rte_pktmbuf_pool_create (
>>                                            "rx_pool",                  /*
>> name */
>>                                            8192,                     /*
>> number of elemements in the mbuf pool */
>> 256,                                            /* Size of per-core
>> object cache */
>> 0,                                                 /* Size of
>> application private are between rte_mbuf struct and data buffer */
>>                                            RTE_MBUF_DEFAULT_BUF_SIZE, /*
>> Size of data buffer in each mbuf (2048 + 128)*/
>> 0                                                   /* socket id */
>>                               );
>> If I change "number of elemements in the mbuf pool" from 8192 to 512, Rx
>> have same problem after shorter time (~ 30s).
>>
>> Please tell me if you need more information. I am looking forward to
>> hearing from you.
>>
>>
>> Many thanks,
>> Vuong Le
> Hi Vuong,
>
> This is likely to be a buffer leakage problem. You might have a path in your code where you are not freeing a buffer and therefore this buffer gets "lost", as the application is not able to use this buffer any more since it is not returned back to the pool, so the pool of free buffers shrinks over time up to the moment when it eventually becomes empty, so no more packets can be received.
>
> You might want to periodically monitor the numbers of free buffers in your pool; if this is the root cause, then you should be able to see this number constantly decreasing until it becomes flat zero, otherwise you should be able to the number of free buffers oscillating around an equilibrium point.
>
> Since it takes a relatively big number of packets to get to this issue, it is likely that the code path that has this problem is not executed very frequently: it might be a control plane packet that is not freed up, or an ARP request/reply pkt, etc.
>
> Regards,
> Cristian
Hi Cristian,
Thanks for your response, I am doing your ideal. But let me show you 
another case i have tested before. I changed architecture of my 
application as below:
- Architecture: Rx (lcore 1) received packet then queue to the ring 
----- after that: Rx (lcore 1) dequeue packet in the ring and free it 
immediately.
(old architecture as above)
With new architecture Rx still receive packet after 2 day and everything 
look good.  Unfortunately, My application must run in old architecture.

Any ideal for me?


Many thanks,
Vuong Le


More information about the dev mailing list