<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<font face="monospace">Hello <br>
<br>
I did experiments where I sent packets to the hairpin queues and
the CPU queue at the same time.<br>
During testing, I found that when the CPU queue is overloaded too
much, the hairpin queues also begin to drop packets.</font><br>
<font face="monospace"><br>
Example 1:
<br>
Sending 10 Gbps to hairpin queues<br>
Resulting throughput is 10 Gbps<br>
Expected result<br>
<br>
Example 2:
<br>
Sending 20 Gbps to CPU queue<br>
Resulting throughput is 11 Gbps (9 Gbps drop)<br>
Expected result<br>
<br>
Example 3:
<br>
Sending 10 Gbps to hairpin queues and 20 Gbps to CPU queue<br>
Resulting throughput is 21Gbps, 10 Gbps (zero packet drop) from
hairpin + 11 Gbps from CPU <br>
Expected result<br>
<br>
Example 4:
<br>
Sending 10 Gbps to hairpin queues and 50 Gbps to CPU queue
<br>
Resulting throughput is 16 Gbps, 5Gbps (50%+ packet drop) from
hairpin + 11Gbps from CPU, <br>
Unexpected result...<br>
<br>
Experiments setup:<br>
sudo mlxconfig -y -d 0000:c4:00.0 set MEMIC_SIZE_LIMIT=0
HAIRPIN_DATA_BUFFER_LOCK=1<br>
sudo mlxfwreset -y -d 0000:c4:00.0 reset<br>
sudo dpdk-testpmd -l 0-1 -n 4 -a 0000:c4:00.0,hp_buf_log_sz=13 --
--rxq=1 --txq=1 --hairpinq=12 --hairpin-mode=0x1110 -i<br>
flow create 0 ingress pattern eth src is 00:10:94:00:00:02 /
end actions queue index 0 / end<br>
flow create 0 ingress pattern eth src is 00:10:94:00:00:03 /
end actions rss queues 1 2 3 4 5 6 7 8 end / end<br>
<br>
So I can't achieve my goal: traffic from the hairpin queues is not
dropped if the CPU queue is overloaded.<br>
</font><font face="monospace">Any idea how to achieve this in
example 4?<br>
What is the problem, full packet buffers/memory in the device that
are shared between the hairpin and CPU queues?<br>
<br>
</font><font face="monospace">Any guidance or suggestions on how to
achieve this would be greatly appreciated.</font><br>
<font face="monospace"><br>
Mário<br>
<br>
</font>
<div class="moz-cite-prefix">On 27/06/2024 13:42, Mário Kuka wrote:<br>
</div>
<blockquote type="cite"
cite="mid:82d8f67c-3b0b-46c2-a94b-8457d0c602c2@cesnet.cz">
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<font face="monospace">Hi Dmitry,<br>
<br>
Thank you for your helpful reply.<br>
</font>
<pre class="moz-quote-pre" style="white-space: pre-wrap;" wrap=""><blockquote
type="cite"><pre class="moz-quote-pre" wrap="">Try enabling "Explicit Tx rule" mode if possible.
I was able to achieve 137 Mpps @ 64B with the following command:
dpdk-testpmd -a 21:00.0 -a c1:00.0 --in-memory -- \
-i --rxq=1 --txq=1 --hairpinq=8 --hairpin-mode=0x10</pre></blockquote>
Based o this I was able to achieve 142 Mpps(96.08 Gbps) @ 64B with the following command:
sudo dpdk-testpmd -l 0-1 -n 4 -a 0000:c4:00.0,hp_buf_log_sz=13 \
--in-memory -- --rxq=1 --txq=1 --hairpinq=12 --hairpin-mode=0x10 -i
flow create 0 ingress pattern eth src is 00:10:94:00:00:02 / end actions rss queues 1 2 3 4 5 6 7 8 9 10 11 12 end / end
Almost full speed :).
Any other value of "hp_buf_log_sz" or more queues does not get better results, but instead makes them worse.
<blockquote type="cite"><pre class="moz-quote-pre" wrap="">RxQ pinned in device memory requires firmware configuration [1]:
mlxconfig -y -d $pci_addr set MEMIC_SIZE_LIMIT=0 HAIRPIN_DATA_BUFFER_LOCK=1
mlxfwreset -y -d $pci_addr reset
[1]: <a class="moz-txt-link-freetext"
href="https://doc.dpdk.org/guides/platform/mlx5.html?highlight=hairpin_data_buffer_lock"
moz-do-not-send="true">https://doc.dpdk.org/guides/platform/mlx5.html?highlight=hairpin_data_buffer_lock</a>
However, pinned RxQ didn't improve anything for me.</pre></blockquote>
I tried it, but it didn't improve anything for me either.
Mário
</pre>
<div class="moz-cite-prefix">On 25/06/2024 02:22, Kozlyuk wrote:<br>
</div>
<blockquote type="cite"
cite="mid:20240625032224.45b65339@sovereign">
<pre class="moz-quote-pre" wrap="">Hi Mário,
2024-06-19 08:45 (UTC+0200), Mário Kuka:
</pre>
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">Hello,
I want to use hairpin queues to forward high priority traffic (such as
LACP).
My goal is to ensure that this traffic is not dropped in case the
software pipeline is overwhelmed.
But during testing with dpdk-testpmd I can't achieve full throughput for
hairpin queues.
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">For maintainers: I'd like to express interest in this use case too.
</pre>
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">The best result I have been able to achieve for 64B packets is 83 Gbps
in this configuration:
$ sudo dpdk-testpmd -l 0-1 -n 4 -a 0000:17:00.0,hp_buf_log_sz=19 --
--rxq=1 --txq=1 --rxd=4096 --txd=4096 --hairpinq=2
testpmd> flow create 0 ingress pattern eth src is 00:10:94:00:00:03 /
end actions rss queues 1 2 end / end
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">Try enabling "Explicit Tx rule" mode if possible.
I was able to achieve 137 Mpps @ 64B with the following command:
dpdk-testpmd -a 21:00.0 -a c1:00.0 --in-memory -- \
-i --rxq=1 --txq=1 --hairpinq=8 --hairpin-mode=0x10
You might get even better speed, because my flow rules were more complicated
(RTE Flow based "router on-a-stick"):
flow create 0 ingress group 1 pattern eth / vlan vid is 721 / end actions of_set_vlan_vid vlan_vid 722 / rss queues 1 2 3 4 5 6 7 8 end / end
flow create 1 ingress group 1 pattern eth / vlan vid is 721 / end actions of_set_vlan_vid vlan_vid 722 / rss queues 1 2 3 4 5 6 7 8 end / end
flow create 0 ingress group 1 pattern eth / vlan vid is 722 / end actions of_set_vlan_vid vlan_vid 721 / rss queues 1 2 3 4 5 6 7 8 end / end
flow create 1 ingress group 1 pattern eth / vlan vid is 722 / end actions of_set_vlan_vid vlan_vid 721 / rss queues 1 2 3 4 5 6 7 8 end / end
flow create 0 ingress group 0 pattern end actions jump group 1 / end
flow create 1 ingress group 0 pattern end actions jump group 1 / end
</pre>
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">For packets in the range 68-80B I measured even lower throughput.
Full throughput I measured only from packets larger than 112B
For only one queue, I didn't get more than 55Gbps:
$ sudo dpdk-testpmd -l 0-1 -n 4 -a 0000:17:00.0,hp_buf_log_sz=19 --
--rxq=1 --txq=1 --rxd=4096 --txd=4096 --hairpinq=1 -i
testpmd> flow create 0 ingress pattern eth src is 00:10:94:00:00:03 /
end actions queue index 1 / end
I tried to use locked device memory for TX and RX queues, but it seems
that this is not supported:
"--hairpin-mode=0x011000" (bit 16 - hairpin TX queues will use locked
device memory, bit 12 - hairpin RX queues will use locked device memory)
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">RxQ pinned in device memory requires firmware configuration [1]:
mlxconfig -y -d $pci_addr set MEMIC_SIZE_LIMIT=0 HAIRPIN_DATA_BUFFER_LOCK=1
mlxfwreset -y -d $pci_addr reset
[1]: <a class="moz-txt-link-freetext"
href="https://doc.dpdk.org/guides/platform/mlx5.html?highlight=hairpin_data_buffer_lock"
moz-do-not-send="true">https://doc.dpdk.org/guides/platform/mlx5.html?highlight=hairpin_data_buffer_lock</a>
However, pinned RxQ didn't improve anything for me.
TxQ pinned in device memory is not supported by net/mlx5.
TxQ pinned to DPDK memory made performance awful (predictably).
</pre>
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">I was expecting that achieving full throughput with hairpin queues would
not be a problem.
Is my expectation too optimistic?
What other parameters besides 'hp_buf_log_sz' can I use to achieve full
throughput?
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">In my experiments, default "hp_buf_log_sz" of 16 is optimal.
The most influential parameter appears to be the number of hairpin queues.
</pre>
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">I tried combining the following parameters: mprq_en=, rxqs_min_mprq=,
mprq_log_stride_num=, txq_inline_mpw=, rxq_pkt_pad_en=,
but with no positive impact on throughput.
</pre>
</blockquote>
</blockquote>
<br>
</blockquote>
<br>
</body>
</html>