[dpdk-users] What to do after rte_eth_tx_burst: free or send again remaining packets?

Wiles, Keith keith.wiles at intel.com
Sat Jan 28 23:43:49 CET 2017


> On Jan 28, 2017, at 1:57 PM, Peter Keereweer <peterkeereweer at hotmail.com> wrote:
> 
> Hi!
> 
> Currently I'am running some tests with the Load Balancer Sample Application. I'm testing the Load Balancer Sample Application by sending packets with pktgen.
> I have a setup of 2 servers with each server containing a Intel 10Gbe 82599 NIC (connected to each other). I have configured the Load Balancer application to use 1 core for RX, 1 worker core and 1 TX core. The TX core sends all packets back to the pktgen application.
> 
> With the pktgen I send 1024 UDP packets to the Load Balancer. Every packet processed by the worker core will be printed to the screen (I added this code by myself). If I send 1024 UDP packets, 1008 ( = 7 x 144) packets will be printed to the screen. This is  correct, because the RX core reads packets with a burst size of 144. So if I send 1024 packets, I expect 1008 packets back in the pktgen application. But surprisingly I only receive 224 packets instead of 1008 packets. After some research I found that that  224 packets is not just a random number, its 7 x 32 (= 224). So if the RX reads 7 x 144 packets, I get back 7 x 32 packets. After digging into the code from the Load Balancer application I found in 'runtime.c' in the 'app_lcore_io_tx' function this code :
> 
> n_pkts = rte_eth_tx_burst(
>                                 port,
>                                 0,
>                                 lp->tx.mbuf_out[port].array,
>                                 (uint16_t) n_mbufs);
> 
> ...
> 
> if (unlikely(n_pkts < n_mbufs)) {
>                                 uint32_t k;
>                                 for (k = n_pkts; k < n_mbufs; k ++) {
>                                         struct rte_mbuf *pkt_to_free = lp->tx.mbuf_out[port].array[k];
>                                         rte_pktmbuf_free(pkt_to_free);
>                                 }
>                         }
> 
> What I understand from this code is that n_mbufs 'packets' are send with 'rte_eth_tx_burst' function. This function returns n_pkts, the number of packets that are actually send. If the actual number of packets send is smaller then n_mbufs (packets ready for  send given to the rte_eth_tx_burst) then all remaining packets, which are not send, are freed. In de the Load Balancer application, n_mbufs is equal to 144. But in my case 'rte_eth_tx_burst' returns the value 32, and not 144. So 32 packets are actually send  and the remaining packets (144 - 32 = 112) are freed. This is the reason why I get 224 (7 x 32) packets back instead of 1008 (= 7 x 144).
> 
> But the question is: why are the remaining packets freed instead of trying to send them again? If I look into the 'pktgen.c', there is a function '_send_burst_fast' where all remaining packets are trying to be send again (in a while loop until they are all  send) instead of freeing them (see code below) :
> 
> static __inline__ void
> _send_burst_fast(port_info_t *info, uint16_t qid)
> {
>         struct mbuf_table   *mtab = &info->q[qid].tx_mbufs;
>         struct rte_mbuf **pkts;
>         uint32_t ret, cnt;
> 
>         cnt = mtab->len;
>         mtab->len = 0;
> 
>         pkts    = mtab->m_table;
> 
>         if (rte_atomic32_read(&info->port_flags) & PROCESS_TX_TAP_PKTS) {
>                 while (cnt > 0) {
>                         ret = rte_eth_tx_burst(info->pid, qid, pkts, cnt);
> 
>                         pktgen_do_tx_tap(info, pkts, ret);
> 
>                         pkts += ret;
>                         cnt -= ret;
>                 }
>         } else {
>                 while(cnt > 0) {
>                         ret = rte_eth_tx_burst(info->pid, qid, pkts, cnt);
> 
>                         pkts += ret;
>                         cnt -= ret;
>                 }
>         }
> } 
> 
> Why is this while loop (sending packets until they have all been send) not implemented in the 'app_lcore_io_tx' function in the Load Balancer application? That would make sense right? It looks like that the Load Balancer application makes an assumption that  if not all packets have been send, the remaining packets failed during the sending proces and should be freed.

The size of the TX ring on the hardware is limited in size, but you can adjust that size. In pktgen I attempt to send all packets requested to be sent, but in the load balancer the developer decided to just drop the packets that are not sent as the TX hardware ring or even a SW ring is full. This normally means the core is sending packets faster then the HW ring on the NIC can send the packets.

It was just a choice of the developer to drop the packets instead of trying again until the packets array is empty. One possible way to fix this is to increase the size of the TX ring 2-4 time larger then the RX ring. This still does not truly solve the problem it just moves it to the RX ring. The NIC if is does not have a valid RX descriptor and a place to DMA the packet into memory it gets dropped at the wire. BTW increasing the TX ring size also means the these packets will not returned to the free pool and you can exhaust the packet pool. The packets are stuck on the TX ring as done because the threshold to reclaim the done packets is too high.

Say you have 1024 ring size and the high watermark for flushing the done off the ring is 900 packets. Then if the packet pool is only 512 packets then when you send 512 packets they will all be on the TX done queue and now you are in a deadlock not being able to send a packet as they are all on the TX done ring. This normally does not happen as the ring sizes or normally much smaller then the number of TX packets or even RX packets.

In pktgen I attempt to send all of the packets requested as it does not make any sense for the user to ask to send 10000 packets and pktgen only send some number less as the core sending the packets can over run the TX queue at some point.

I hope that helps.

> 
> I hope someone can help me with this questions. Thank you in advance!!
> 
> Peter

Regards,
Keith



More information about the users mailing list