<div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Nov 22, 2023 at 4:05 PM Ferruh Yigit <<a href="mailto:ferruh.yigit@amd.com" target="_blank">ferruh.yigit@amd.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On 11/22/2023 6:01 AM, kumaraparameshwaran rathinavel wrote:<br> > Hi Folks,<br> > <br> > The current GRO code uses an unoptimised version of flow lookup where<br> > each flow in the table is iterated over during the flow matching<br> > process. For a rte_gro_reassemble_burst in lightweight mode this would<br> > not cause much of an impact. But with rte_gro_reassemble which is done<br> > with a timeout interval, this causes higher CPU utilisation during<br> > throughput tests. The proposal here is to use a Hash based flowtable<br> > which could make use of the rte_hash table implementation in DPDK.<br> > There could be a hash table for each of the GRO types. The lookup<br> > function and the key could be different for each one of the types. If<br> > there is a consensus that this could have a better performance impact I<br> > would work on an initial patch set. Please let me know your thoughts.<br> > <br> <br> <br> Hi Kumara,<br> <br> Your proposal looks reasonable to me, I think it worth to try.<br> cc'ed techboard for more comment.<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Thanks Ferruh - Sure I will get a initial patch set with TCP/IPv4 GRO type. <br></blockquote> <br> Do you have any performance measurement with the existing code? To have<br> it helps to evaluate impact of the change.<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div>I did some testing sometime back and the observations were that on a 10Gbps link, the throughput value with iperf testing <br>of unoptimised and optimised were almost the same, but the CPU conservation was upto 30-35%. So any tests running in <br></div><div>parallel like imix kind of traffic would definitely have better results. I will try to profile the two cases with some performance impacting <br></div><div>results. <br></div></blockquote></blockquote></div></div>