[dpdk-users] users Digest, Vol 155, Issue 7

Wiles, Keith keith.wiles at intel.com
Tue Oct 23 17:22:57 CEST 2018
Previous message: [dpdk-users] users Digest, Vol 155, Issue 7
Next message: [dpdk-users] IPSEC-SECGW - type no-offload config sample?
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

> On Oct 22, 2018, at 11:17 PM, Wajeeha Javed <wajeeha.javed123 at gmail.com> wrote:
> 
> Hi Keith, 

Please try to reply inline to the text and do not top post, it makes its hard to follow so many email threads.

> 
> Thanks for your reply. Please find below my comments
> 
> >> You're right, in my application all the packets are stored inside mbuf. The reason for not using the next pointer of mbuf is that it might get used by the fragmented packets having size greater than MTU. 
> 
> >> I have tried using small buffer of STAILQ linked list for each port having STAILQ entry and pointer to mbuf packets burst. I allocate the stailq entry, set the mbuf pointer in the stailq entry, the link the stailq entry to the stailq list using stailq macros. I observe millions of packet loss, the stailq linked list could only hold less than 1 million packets per second at line rate of 10Gbits/sec.
> 
> >> I would like to prevent data loss, could you please guide me what is the best optimal solution for increasing the number of mbufs without freeing or overwriting them for a delay of 2 secs.

Using the stailq method is my best guess to solve your problem. If you are calling malloc on each packet you want to save at the time you need to link the packets that would be the reason you can not hold the packets without dropping some at the wire.

Allocate all of the stailq blocks and keep them in some type of array or list too avoid doing an allocation call at startup. Other then this type of help and not doing the code myself this is all I have for you, sorry. The amount of memory allocated for the stailq structures is going to be more then 28M blocks all sorts of cache issues could be causing the problem.

> 
> Thanks & Best Regards,
> 
> Wajeeha Javed
> 
> 
> 
> On Tue, Oct 16, 2018 at 3:02 PM Wiles, Keith <keith.wiles at intel.com> wrote:
> Sorry, you must have replied to my screwup not sending the reply in pure text format. I did send an updated reply to hopefully fix that problem. More comments inline below. All emails to the list must be in ‘text' format not ‘Rich Text’ format :-(
> 
> > On Oct 15, 2018, at 11:42 PM, Wajeeha Javed <wajeeha.javed123 at gmail.com> wrote:
> > 
> > Hi,
> > 
> > Thanks, everyone for your reply. Please find below my comments.
> > 
> > *I've failed to find explicit limitations from the first glance.*
> > * NB_MBUF define is typically internal to examples/apps.*
> > * The question I'd like to double-check if the host has enought*
> > * RAM and hugepages allocated? 5 million mbufs already require about*
> > * 10G.*
> > 
> > Total Ram = 128 GB
> > Available Memory = 23GB free
> > 
> > Total Huge Pages = 80
> > 
> > Free Huge Page = 38
> > Huge Page Size = 1GB
> > 
> > *The mempool uses uint32_t for most sizes and the number of mempool items
> > is uint32_t so ok with the number of entries in a can be ~4G as stated be
> > make sure you have enough *
> > 
> > *memory as the over head for mbufs is not just the header + the packet size*
> > 
> > Right. Currently, there are total of 80 huge pages, 40 for each numa node
> > (Numa node 0 and Numa node 1). I observed that I was using only 16 huge
> > pages while the other 16
> > 
> > huge pages were used by other dpdk  application. By running only my dpdk
> > application on numa node 0, I was able to increase the mempool size to 14M
> > that uses all the
> > 
> > huge pages of Numa node 0.
> > 
> > *My question is why are you copying the mbuf and not just linking the mbufs
> > into a link list? Maybe I do not understand the reason. I would try to make
> > sure you do not do a copy of the *
> > 
> > *data and just link the mbufs together using the next pointer in the mbuf
> > header unless you have chained mbufs already.*
> > 
> > The reason for copying the Mbuf is due to the NIC limitations, I cannot
> > have more than 16384 Rx descriptors, whereas  I want to withhold all the
> > packets coming at a line rate of 10GBits/sec for each port. I created a
> > circular queue running on a FIFO basis. Initially, I thought of using
> > rte_mbuf* packet burst for a delay of 2 secs. Now at line rate, we receive
> > 14Million
> 
> I assume in your driver a mbuf is used to receive the packet data, which means the packet is inside an mbuf (if not then why not?). The mbuf data does not need to be copied you can use the ’next’ pointer in the mbuf to create a single link list. If you use fragmented packets in your design, which means you are using the ’next’ pointer in the mbuf to chain the frame fragments into a single packet then using ’next’ will not work. Plus when you call rte_pktmbuf_free() you need to make sure the next pointer is NULL or it will free the complete chain of mbufs (not what you want here).
> 
> In the case where you are using chained mbufs for a single packet then you can create a set of small buffers to hold the STAILQ pointers and the pointer to the mbuf. Then add the small structure onto a link list as this method maybe the best solution in the long run instead of trying to use the mbuf->next pointer.
> 
> Have a look at the rte_tailq.h and eal_common_tailqs.c files and rte_mempool.c (plus many other libs in DPDK). Use the rte_tailq_entry structure to create a linked list of mempool structures for searching and debugging mempools in the system. The 'struct rte_tailq_entry’ is just adding a simple structure to point to the mempool structure and allows it to build a linked list with the correct pointer types.
> 
> You can create a mempool of rte_tailq_entry structures if you want a fast and clean way to allocate/free the tailq entry structures.
> 
> Then you do not need to copy the packet memory anyplace just allocate a tailq entry structure, set the mbuf pointer in the tailq entry, the link the tailq entry  to the tailq list. These macros for tailq support are not the easiest to understand :-(, but once you understand the idea it becomes clearer.
> 
> I hope that helps.
> 
> > 
> > Packet/s, so descriptor get full and I don't have other option left than
> > copying the mbuf to the circular queue rather than using a rte_mbuf*
> > pointer. I know I have to make a
> > 
> > compromise on performance to achieve a delay for packets. So for copying
> > mbufs, I allocate memory from Mempool to copy the mbuf received and then
> > free it. Please find the
> > 
> > code snippet below.
> > 
> > How we can chain different mbufs together? According to my understanding
> > chained mbufs in the API are used for storing segments of the fragmented
> > packets that are greater
> > 
> > than MTU. Even If we chain the mbufs together using next pointer we need to
> > free the mbufs received, otherwise we will not be able to get free Rx
> > descriptors at a line rate of
> > 
> > 10GBits/sec and eventually all the Rx descriptors will be filled and NIC
> > will not receive any more packets.
> > 
> > <Code>
> > 
> > for( j = 0; j < nb_rx; j++) {
> > m = pkts_burst[j];
> > struct rte_mbuf* copy_mbuf = pktmbuf_copy(m, pktmbuf_pool[sockid]);
> > ....
> > rte_pktmbuf_free(m);
> > }
> > 
> > </Code>
> > 
> > *The other question is can you drop any packets if not then you only have
> > the linking option IMO. If you can drop packets then you can just start
> > dropping them when the ring is getting full. Holding onto 28m packets for
> > two seconds can cause other protocol related problems and TCP could be
> > sending retransmitted packets and now you have caused a bunch of work on
> > the RX side *
> > 
> > *at **the end point.*
> > I would like my DPDK application to have zero packet loss, it only delays
> > all the received packet for 2 secs than transmitted them as it is without
> > any change or processing to packets.
> > Moreover, DPDK application is receiving tap traffic(monitoring traffic)
> > rather than real-time traffic. So there will not be any TCP or any other
> > protocol-related problems.
> > 
> > Looking forward to your reply.
> > 
> > 
> > Best Regards,
> > 
> > Wajeeha Javed
> 
> Regards,
> Keith
> 

Regards,
Keith
Previous message: [dpdk-users] users Digest, Vol 155, Issue 7
Next message: [dpdk-users] IPSEC-SECGW - type no-offload config sample?
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the users mailing list