[dpdk-users] [dpdk-dev] Rx Can't receive anymore packet after received 1.5 billion packet.
cristian.dumitrescu at intel.com
Mon Jul 17 12:31:14 CEST 2017
> -----Original Message-----
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of
> vuonglv at viettel.com.vn
> Sent: Monday, July 17, 2017 3:04 AM
> Cc: users at dpdk.org; dev at dpdk.org
> Subject: [dpdk-dev] Rx Can't receive anymore packet after received 1.5
> billion packet.
> Hi DPDK team,
> Sorry when I send this email to both of group users and dev. But I have
> big problem: Rx core on my application can not receive anymore packet
> after I did the stress test to it (~1 day Rx core received ~ 1.5 billion
> packet). Rx core still alive but didn't receive any packet and didn't
> generate any log. Below is my system configuration:
> - OS: CentOS 7
> - Kernel: 3.10.0-514.16.1.el7.x86_64
> - Huge page: 32G: 16384 page 2M
> - NIC card: Intel 85299
> - DPDK version: 16.11
> - Architecture: Rx (lcore 1) received packet then queue to the ring
> ----- Worker (lcore 2) dequeue packet in the ring and free it (use
> rte_pktmbuf_free() function).
> - Mempool create: rte_pktmbuf_pool_create (
> "rx_pool", /*
> name */
> 8192, /*
> number of elemements in the mbuf pool */
> 256, /* Size of per-core
> object cache */
> 0, /* Size of
> application private are between rte_mbuf struct and data buffer */
> RTE_MBUF_DEFAULT_BUF_SIZE, /*
> Size of data buffer in each mbuf (2048 + 128)*/
> 0 /* socket id */
> If I change "number of elemements in the mbuf pool" from 8192 to 512, Rx
> have same problem after shorter time (~ 30s).
> Please tell me if you need more information. I am looking forward to
> hearing from you.
> Many thanks,
> Vuong Le
This is likely to be a buffer leakage problem. You might have a path in your code where you are not freeing a buffer and therefore this buffer gets "lost", as the application is not able to use this buffer any more since it is not returned back to the pool, so the pool of free buffers shrinks over time up to the moment when it eventually becomes empty, so no more packets can be received.
You might want to periodically monitor the numbers of free buffers in your pool; if this is the root cause, then you should be able to see this number constantly decreasing until it becomes flat zero, otherwise you should be able to the number of free buffers oscillating around an equilibrium point.
Since it takes a relatively big number of packets to get to this issue, it is likely that the code path that has this problem is not executed very frequently: it might be a control plane packet that is not freed up, or an ARP request/reply pkt, etc.
More information about the users