<div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Nov 22, 2023 at 4:05 PM Ferruh Yigit <<a href="mailto:ferruh.yigit@amd.com" target="_blank">ferruh.yigit@amd.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On 11/22/2023 6:01 AM, kumaraparameshwaran rathinavel wrote:<br>
> Hi Folks,<br>
> <br>
> The current GRO code uses an unoptimised version of flow lookup where<br>
> each flow in the table is iterated over during the flow matching<br>
> process. For a rte_gro_reassemble_burst in lightweight mode this would<br>
> not cause much of an impact. But with rte_gro_reassemble which is done<br>
> with a timeout interval, this causes higher CPU utilisation during<br>
> throughput tests. The proposal here is to use a Hash based flowtable<br>
> which could make use of the rte_hash table implementation in DPDK.<br>
> There could be a hash table for each of the GRO types. The lookup<br>
> function and the key could be different for each one of the types. If<br>
> there is a consensus that this could have a better performance impact I<br>
> would work on an initial patch set. Please let me know your thoughts.<br>
> <br>
<br>
<br>
Hi Kumara,<br>
<br>
Your proposal looks reasonable to me, I think it worth to try.<br>
cc'ed techboard for more comment.<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Thanks Ferruh - Sure I will get a initial patch set with TCP/IPv4 GRO type. <br></blockquote>
<br>
Do you have any performance measurement with the existing code? To have<br>
it helps to evaluate impact of the change.<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div>I did some testing sometime back and the observations were that on a 10Gbps link, the throughput value with iperf testing <br>of unoptimised and optimised were almost the same, but the CPU conservation was upto 30-35%. So any tests running in <br></div><div>parallel like imix kind of traffic would definitely have better results. I will try to profile the two cases with some performance impacting <br></div><div>results. <br></div></blockquote></blockquote></div></div>