<div dir="ltr">Thanks Stephen for addressing my queries , and it is helpful. <div> </div><div> One more follow up question on the same , Can DPDK HQOS be customized based on Use case ?</div><div> </div><div> For example: Hqos config for one of the use cases , <b>One Port , One Subport , 16 Pipes & Each Pipe with only one TC</b>.</div><div> 16 pipe config was allowed but changing the 13TCs to 1TC is not allowed per Pipe.</div><div> </div><div> Can I still use 13 TCs but use the QueueSize as 0, Can that impact performance ? </div><div> </div><div> </div><div>Thanks</div><div>Farooq.J</div><div> </div><div> </div></div><br><div class="gmail_quote gmail_quote_container"><div dir="ltr" class="gmail_attr">On Wed, May 21, 2025 at 7:48 PM Stephen Hemminger <<a href="mailto:stephen@networkplumber.org">stephen@networkplumber.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On Mon, 28 Apr 2025 16:55:07 +0530<br>
farooq basha <<a href="mailto:farooq.juturu@gmail.com" target="_blank">farooq.juturu@gmail.com</a>> wrote:<br>
<br>
> Hello DevTeam,<br>
> <br>
> I am planning to use DPDK HQOS for Traffic shaping with a<br>
> run-to-completion Model. While I was reading the dpdk-qos document, I came<br>
> across the following statement.<br>
> <br>
> "*Running enqueue and dequeue operations for the same output port from<br>
> different cores is likely to cause significant impact on scheduler’s<br>
> performance and it is therefore not recommended"*<br>
> <br>
> Let's take an example, Port1 & Port2 have 4 Rx queues and each Queue<br>
> mapped to a different CPU. Traffic coming on port1 gets forwarded to port2<br>
> . With the above limitation application needs to take a lock before doing<br>
> rte_sched_port_enqueue & dequeue operation. Performance is limited to only<br>
> 1 CPU even though Traffic is coming on 4 Different CPUs.<br>
> <br>
> Correct me if my understanding is Wrong?<br>
> <br>
> Thanks<br>
> Basha<br>
<br>
The HQOS code is not thread safe so yes you need a lock.<br>
The traffic scheduling (QOS) needs to be at last stage of the pipeline just<br>
before mbufs are passed to the device.<br>
<br>
The issue is that QOS is single threaded, so lock is required. <br>
<br>
The statement is misleading, the real overhead is the lock; the secondary<br>
overhead is the cache miss that will happen if processing on different cores.<br>
But if you are doing that you are going to cut performance a lot from cache<br>
misses.<br>
</blockquote></div>