<div dir="ltr">Thanks Stephen for addressing my queries , and it is helpful. <div> </div><div> One more follow up question on the same , Can DPDK HQOS be customized based on Use case ?</div><div> </div><div> For example: Hqos config for one of the use cases , <b>One Port , One Subport , 16 Pipes & Each Pipe with only one TC</b>.</div><div> 16 pipe config was allowed but changing the 13TCs to 1TC is not allowed per Pipe.</div><div> </div><div> Can I still use 13 TCs but use the QueueSize as 0, Can that impact performance ? </div><div> </div><div> </div><div>Thanks</div><div>Farooq.J</div><div> </div><div> </div></div><br><div class="gmail_quote gmail_quote_container"><div dir="ltr" class="gmail_attr">On Wed, May 21, 2025 at 7:48 PM Stephen Hemminger <<a href="mailto:stephen@networkplumber.org">stephen@networkplumber.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On Mon, 28 Apr 2025 16:55:07 +0530<br> farooq basha <<a href="mailto:farooq.juturu@gmail.com" target="_blank">farooq.juturu@gmail.com</a>> wrote:<br> <br> > Hello DevTeam,<br> > <br> > I am planning to use DPDK HQOS for Traffic shaping with a<br> > run-to-completion Model. While I was reading the dpdk-qos document, I came<br> > across the following statement.<br> > <br> > "*Running enqueue and dequeue operations for the same output port from<br> > different cores is likely to cause significant impact on scheduler’s<br> > performance and it is therefore not recommended"*<br> > <br> > Let's take an example, Port1 & Port2 have 4 Rx queues and each Queue<br> > mapped to a different CPU. Traffic coming on port1 gets forwarded to port2<br> > . With the above limitation application needs to take a lock before doing<br> > rte_sched_port_enqueue & dequeue operation. Performance is limited to only<br> > 1 CPU even though Traffic is coming on 4 Different CPUs.<br> > <br> > Correct me if my understanding is Wrong?<br> > <br> > Thanks<br> > Basha<br> <br> The HQOS code is not thread safe so yes you need a lock.<br> The traffic scheduling (QOS) needs to be at last stage of the pipeline just<br> before mbufs are passed to the device.<br> <br> The issue is that QOS is single threaded, so lock is required. <br> <br> The statement is misleading, the real overhead is the lock; the secondary<br> overhead is the cache miss that will happen if processing on different cores.<br> But if you are doing that you are going to cut performance a lot from cache<br> misses.<br> </blockquote></div>