<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-2">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
{font-family:"Lucida Console";
panose-1:2 11 6 9 4 5 4 2 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0cm;
margin-bottom:.0001pt;
font-size:11.0pt;
font-family:"Calibri",sans-serif;
mso-fareast-language:EN-US;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:#0563C1;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:#954F72;
text-decoration:underline;}
p.msonormal0, li.msonormal0, div.msonormal0
{mso-style-name:msonormal;
mso-margin-top-alt:auto;
margin-right:0cm;
mso-margin-bottom-alt:auto;
margin-left:0cm;
font-size:12.0pt;
font-family:"Times New Roman",serif;}
span.EmailStyle18
{mso-style-type:personal;
font-family:"Calibri",sans-serif;
color:windowtext;}
span.EmailStyle19
{mso-style-type:personal-reply;
font-family:"Calibri",sans-serif;
color:#1F497D;}
.MsoChpDefault
{mso-style-type:export-only;
font-size:10.0pt;}
@page WordSection1
{size:612.0pt 792.0pt;
margin:70.85pt 70.85pt 70.85pt 70.85pt;}
div.WordSection1
{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="SK" link="#0563C1" vlink="#954F72">
<div class="WordSection1">
<p class="MsoNormal"><span style="color:#1F497D">Adding the logfile.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D">One thing that's in the logs but didn't explicitly mention is the DPDK version we've tried this with:<o:p></o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D">EAL: RTE Version: 'DPDK 22.07.0'<o:p></o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D">We also tried earlier versions going back to 21.08, with no luck. I also did a quick check on 22.11, also with no luck.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D">Juraj<o:p></o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D"><o:p> </o:p></span></p>
<div style="border:none;border-left:solid blue 1.5pt;padding:0cm 0cm 0cm 4.0pt">
<div>
<div style="border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0cm 0cm 0cm">
<p class="MsoNormal"><b><span lang="EN-US" style="mso-fareast-language:SK">From:</span></b><span lang="EN-US" style="mso-fareast-language:SK"> Juraj Linkeš
<br>
<b>Sent:</b> Friday, January 20, 2023 12:56 PM<br>
<b>To:</b> 'aman.deep.singh@intel.com' <aman.deep.singh@intel.com>; 'yuying.zhang@intel.com' <yuying.zhang@intel.com>; Xing, Beilei <beilei.xing@intel.com><br>
<b>Cc:</b> dev@dpdk.org; Ruifeng Wang <Ruifeng.Wang@arm.com>; 'Lijian Zhang' <Lijian.Zhang@arm.com>; 'Honnappa Nagarahalli' <Honnappa.Nagarahalli@arm.com><br>
<b>Subject:</b> Testpmd/l3fwd port shutdown failure on Arm Altra systems<o:p></o:p></span></p>
</div>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal"><span lang="EN-US">Hello i40e and testpmd maintainers,<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">We're hitting an issue with DPDK testpmd on Ampere Altra servers in FD.io lab.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">A bit of background: along with VPP performance tests (which uses DPDK), we're running a small number of basic DPDK testpmd and l3fwd tests in FD.io as well. This is to catch any performance differences due to VPP updating
its DPDK version.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">We're running both l3fwd tests and testpmd tests. The Altra servers are two socket and the topology is TG -> DUT1 -> DUT2 -> TG, traffic flows in both directions, but nothing gets forwarded (with a slight caveat - put
a pin in this). There's nothing special in the tests, just forwarding traffic. The NIC we're testing is xl710-QDA2.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">The same tests are passing on all other testbeds - we have various two node (1 DUT, 1 TG) and three node (2 DUT, 1 TG) Intel and Arm testbeds and with various NICs (Intel 700 and 800 series and the Intel testbeds use
some Mellanox NICs as well). We don't have quite the same combination of another three node topology with the same NIC though, so it looks like something with testpmd/l3fwd and xl710-QDA2 on Altra servers.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">VPP performance tests are passing, but l3fwd and testpmd fail. This leads us to believe to it's a software issue, but there could something wrong with the hardware. I'll talk about testpmd from now on, but as far we can
tell, the behavior is the same for testpmd and l3fwd.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Getting back to the caveat mentioned earlier, there seems to be something wrong with port shutdown. When running testpmd on a testbed that hasn't been used for a while it seems that all ports are up right away (we don't
see any "Port 0|1: link state change event") and the setup works fine (forwarding works). After restarting testpmd (restarting on one server is sufficient), the ports between DUT1 and DUT2 (but not between DUTs and TG) go down and are not usable in DPDK, VPP
or in Linux (with i40e kernel driver) for a while (measured in minutes, sometimes dozens of minutes; the duration is seemingly random). The ports eventually recover and can be used again, but there's nothing in syslog suggesting what happened.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">What seems to be happening is testpmd put the ports into some faulty state. This only happens on the DUT1 -> DUT2 link though (the ports between the two testpmds), not on TG -> DUT1 link (the TG port is left alone).<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Some more info:<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">We've come across the issue with this configuration:<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-family:"Lucida Console",serif">OS: Ubuntu20.04 with kernel 5.4.0-65-generic.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-family:"Lucida Console",serif">Old NIC firmware, never upgraded: 6.01 0x800035da 1.1747.0.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-family:"Lucida Console",serif">Drivers versions: i40e 2.17.15 and iavf 4.3.19.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">As well as with this configuration:<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-family:"Lucida Console",serif">OS: Ubuntu22.04 with kernel 5.15.0-46-generic.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-family:"Lucida Console",serif">Updated firmware: 8.30 0x8000a4ae 1.2926.0.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-family:"Lucida Console",serif">Drivers: i40e 2.19.3 and iavf 4.5.3.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Unsafe noiommu mode is disabled:<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-family:"Lucida Console",serif">cat /sys/module/vfio/parameters/enable_unsafe_noiommu_mode<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-family:"Lucida Console",serif">N<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">We used DPDK 22.07 in manual testing and built it on DUTs, using generic build:<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-family:"Lucida Console",serif">meson -Dexamples=l3fwd -Dc_args=-DRTE_LIBRTE_I40E_16BYTE_RX_DESC=y -Dplatform=generic build<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">We're running testpmd with this command:<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-family:"Lucida Console",serif">sudo build/app/dpdk-testpmd -v -l 1,2 -a 0004:04:00.1 -a 0004:04:00.0 --in-memory -- -i --forward-mode=io --burst=64 --txq=1 --rxq=1 --tx-offloads=0x0 --numa --auto-start --total-num-mbufs=32768
--nb-ports=2 --portmask=0x3 --max-pkt-len=1518 --mbuf-size=16384 --nb-cores=1<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">And l3fwd (with different macs on the other server):<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-family:"Lucida Console",serif">sudo /tmp/openvpp-testing/dpdk/build/examples/dpdk-l3fwd -v -l 1,2 -a 0004:04:00.0 -a 0004:04:00.1 --in-memory -- --parse-ptype --eth-dest="0,40:a6:b7:85:e7:79" --eth-dest="1,3c:fd:fe:c3:e7:a1"
--config="(0, 0, 2),(1, 0, 2)" -P -L -p 0x3<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">We tried adding logs with </span><span lang="EN-US" style="font-family:"Lucida Console",serif">--log-level=pmd,debug</span><span lang="EN-US"> and</span><span lang="EN-US" style="font-family:"Lucida Console",serif">
--no-lsc-interrupt</span><span lang="EN-US">, but that didn't reveal anything helpful, as far as we can tell - please have a look at the attached log. The faulty port is port0 (starts out as down, then we waited for around 25 minutes for it to go up and then
we shut down testpmd).<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">We'd like to ask for pointers on what could be the cause or how to debug this issue further.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Thanks,<br>
Juraj<o:p></o:p></span></p>
</div>
</div>
</body>
</html>