<div dir="ltr"><div><div><div><div><div><div>Hi Andrew,<br><br></div>The compilation warning issue is now resolved. Again, thank you guys for fixing this for us. I can run the tests on the Mellanox CX5s again, however I'm running into a couple new issues with running the prologues on the Intel cards.<br><br></div>When running testing on the Intel XL710s, I see this error appear in the log:<br><br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">ERROR prologue Environment LIB 14:16:13.650<br>Too few networks in available configuration (0) in comparison with required (1)<br></blockquote><br></div>This seems like a trivial configuration error, perhaps this is something I need to set up in ts-rigs. I briefly searched through the examples there and didn't see any mention of how to set up a network.<br></div><div>I will attach this log just in case you need more information.<br></div><div><br></div>There is a different error when running on the Intel E810s. It appears to me like it starts DPDK, does some configuration inside DPDK and on the device, and then fails to bring the device back up. Since this error seems very non-trivial, I will also attach this log.<br><br></div>Thanks,<br></div>Adam<br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, Sep 1, 2023 at 3:59 AM Andrew Rybchenko <<a href="mailto:andrew.rybchenko@oktetlabs.ru">andrew.rybchenko@oktetlabs.ru</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<div>Hi Adam,<br>
<br>
On 8/31/23 22:38, Adam Hassick wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div>Hi Andrew,<br>
</div>
<div><br>
I have one additional question as well: Does the test engine
support running tests on two ARMv8 test agents?</div>
<div><br>
</div>
<div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">1. We'll sort out
warnings this week. Thanks for heads up.<br>
</blockquote>
<div><br>
</div>
<div>Great. Let me know when that's fixed.</div>
</div>
</div>
</blockquote>
<br>
Done. We also fixed a number of warnings in TE.<br>
Also we fixed root test package name to be consistent with the
repository name.<br>
<br>
<blockquote type="cite">
<div dir="ltr">
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>Support for old LTS branches was dropped some time ago,
but in the future it is definitely possible to keep it for
new LTS branches. I think 22.11 is supported, but I'm not
sure about older LTS releases.</div>
</blockquote>
<div><br>
</div>
<div>Good to know.<br>
<div> <br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> 2. You can add
command-line option --sanity to run tests marked with
TEST_HARNESS_SANITY requirement (see
dpdk-ethdev-ts/scripts/run.sh and grep TEST_HARNESS_SANITY
dpdk-ethdev-ts to see which tests are marked). Yes, there
is a space for terminology improvement here. We'll do it.<br>
</blockquote>
</div>
</div>
</div>
</blockquote>
<br>
Done. Now it is called --checkup.<br>
<br>
<blockquote type="cite">
<div dir="ltr">
<div>
<div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><br>
Also it takes a lot of time because of failures and tests
which wait for some timeout.<br>
</blockquote>
</div>
<div><br>
</div>
<div>That makes sense to me. We'll use the time to complete
tests on virtio or the Intel devices as a reference for how
long the tests really take to complete.<br>
</div>
<div>We will explore the possibility of periodically running
the sanity tests for patches.<br>
</div>
</div>
</div>
</blockquote>
<br>
I'll double-check and let you know how long entire TS runs on Intel
X710, E810, Mellanox CX5 and virtio net. Just to ensure that time
observed in your case looks the same.<br>
<br>
<blockquote type="cite">
<div dir="ltr">
<div>
<div> <br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> The test harness can
provide coverage reports based on gcov, but I'm not sure
what you mean by a "dial" to control test coverage.
Provided reports are rather for human to analyze.<br>
</blockquote>
</div>
<div><br>
</div>
<div>The general idea is to have some kind of parameter on the
test suite, which could be an integer ranging from zero to
ten, that controls how many tests are run based on how
important the test is.<br>
<br>
</div>
<div>Similar to how some command line interfaces provide a
verbosity level parameter (some number of "-v" arguments) to
control the importance of the information in the log.<br>
</div>
The verbosity level zero only prints very important log
messages, while ten prints everything.<br>
</div>
<div><br>
In much the same manner as above, this "dial" parameter
controls what tests are run and with what parameters based on
how important those tests and test parameter combinations are.<br>
Coverage Level zero tells the suite to run a very basic set of
important tests, with minimal parameterization. This mode
would take only ~5-10 minutes to run.<br>
In contrast, Coverage Level ten includes all the edge cases,
every combination of test parameters, everything the test
suite can do, which takes the normal several hours to run.<br>
The values 1 - 9 are between those two extremes, allowing the
user to get a gradient of test coverage in the results and to
limit the running time.<br>
<br>
</div>
Then we could, for example, run the "run.sh" with a level of 2
or 3 for incoming patches that need quick results, and with a
level of 10 for the less often run periodic tests performed on
main or LTS branches.<br>
</div>
</blockquote>
<br>
Understood now. Thanks a lot for the idea. We'll discuss it and come
back.<br>
<br>
<blockquote type="cite">
<div dir="ltr">
<div>
<div>
<div> </div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<div> 3. Yes, really many tests on Mellanox CX5 NICs
report unexpected testing results. Unfortunately it is
time consuming to fill in expectations database since
it is necessary to analyze testing results and
classify if it is a bug or just acceptable behaviour
aspect.<br>
<br>
Bublik allows to compare results of two runs. It is
useful for human, but still not good for automation.<br>
<br>
I have local patch for mlx5 driver which reports Tx
ring size maximum. It makes pass rate higher. It is a
problem for test harness that mlx5 does not report
limits right now.<br>
<br>
Pass rate on Intel X710 is about 92% on my test rig.
Pass rate on virtio net is 99% right now and could be
done 100% easily (just one thing to fix in
expectations).<br>
<br>
I think logs storage setup is essential for logs
analysis. Of course, you can request HTML logs when
you run tests (--log-html=html) or generate after run
using dpdk-ethdev-ts/scripts/html-log.sh and open
index.html in a browser, but logs storage makes it
more convenient.<br>
</div>
</div>
</blockquote>
<div><br>
We are interested in setting up Bublik, potentially as an
externally-facing component, once we have our process of
running the test suite stabilized.</div>
<div>Once we are able to run the test suite again, I'll see
what the pass rate is on our other hardware.<br>
Good to know that it isn't an issue with our dev testbed
causing the high fail rate.</div>
</div>
<div>
<div><br>
</div>
<div>For Intel hardware, we have an XL710 and an Intel
E810-C in our development testbed. Although they are
slightly different devices, ideally the pass rate will be
identical or similar. I have yet to set up a VM pair for
virtio, but we will soon.<br>
</div>
<div><br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> Latest version of
test-environment has examples of our CGI scripts which
we use for log storage (see tools/log_server/README.md).<br>
<br>
Also all bits for Jenkins setup are available. See
dpdk-ethdev-ts/jenkins/README.md and examples of jenkins
files in ts-rigs-sample.<br>
</blockquote>
</div>
<div><br>
</div>
<div>Jenkins integration, setting up production rig
configurations, and permanent log storage will be our next
steps once I am able to run the tests again.<br>
</div>
<div>Unless there is an easy way to have meson not pass
"-Werror" into GCC. Then I would be able to run the test
suite.<br>
</div>
</div>
</div>
</div>
</blockquote>
<br>
Hopefully it is resolved now.<br>
<br>
I thought a bit more about your usecase for Jenkins. I'm not 100%
sure that existing pipelines are convenient for your usecase.<br>
Fill free to ask questions when you are on it.<br>
<br>
Thanks,<br>
Andrew.<br>
<br>
<blockquote type="cite">
<div dir="ltr">
<div>
<div>
<div><br>
</div>
<div>Thanks,<br>
</div>
<div>Adam<br>
</div>
<div><br>
</div>
<div> </div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<div> <br>
On 8/29/23 17:02, Adam Hassick wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div>
<div>
<div>Hi Andrew,<br>
<br>
</div>
That fix seems to have resolved the issue,
thanks for the quick turnaround time on that
patch.<br>
</div>
<div>Now that we have the RCF timeout issue
resolved, there are a few other questions and
issues that we have about the tests themselves.</div>
<br>
</div>
<div>1. The test suite fails to build with a couple
warnings.<br>
</div>
<div><br>
</div>
<div>Below is the stderr log from compilation:<br>
</div>
<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">FAILED: <a href="mailto:lib/76b5a35@@ts_dpdk_pmd@sta/dpdk_pmd_ts.c.o" target="_blank">lib/76b5a35@@ts_dpdk_pmd@sta/dpdk_pmd_ts.c.o</a><br>
cc -Ilib/76b5a35@@ts_dpdk_pmd@sta -Ilib
-I../../lib
-I/opt/tsf/dpdk-ethdev-ts/ts/inst/default/include
-fdiagnostics-color=always -pipe
-D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -Werror
-g -D_GNU_SOURCE -O0 -ggdb -Wall -W -fPIC -MD -MQ
'<a href="mailto:lib/76b5a35@@ts_dpdk_pmd@sta/dpdk_pmd_ts.c.o" target="_blank">lib/76b5a35@@ts_dpdk_pmd@sta/dpdk_pmd_ts.c.o</a>'
-MF '<a href="mailto:lib/76b5a35@@ts_dpdk_pmd@sta/dpdk_pmd_ts.c.o.d" target="_blank">lib/76b5a35@@ts_dpdk_pmd@sta/dpdk_pmd_ts.c.o.d</a>'
-o '<a href="mailto:lib/76b5a35@@ts_dpdk_pmd@sta/dpdk_pmd_ts.c.o" target="_blank">lib/76b5a35@@ts_dpdk_pmd@sta/dpdk_pmd_ts.c.o</a>'
-c ../../lib/dpdk_pmd_ts.c<br>
../../lib/dpdk_pmd_ts.c: In function
‘test_create_traffic_generator_params’:<br>
../../lib/dpdk_pmd_ts.c:5577:5: error: format not
a string literal and no format arguments
[-Werror=format-security]<br>
5577 | rc = te_kvpair_add(result, buf, mode);<br>
| ^~<br>
cc1: all warnings being treated as errors<br>
ninja: build stopped: subcommand failed.<br>
ninja: Entering directory `.'<br>
FAILED: <a href="mailto:lib/76b5a35@@ts_dpdk_pmd@sta/dpdk_pmd_ts.c.o" target="_blank">lib/76b5a35@@ts_dpdk_pmd@sta/dpdk_pmd_ts.c.o</a><br>
cc -Ilib/76b5a35@@ts_dpdk_pmd@sta -Ilib
-I../../lib
-I/opt/tsf/dpdk-ethdev-ts/ts/inst/default/include
-fdiagnostics-color=always -pipe
-D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -Werror
-g -D_GNU_SOURCE -O0 -ggdb -Wall -W -fPIC -MD -MQ
'<a href="mailto:lib/76b5a35@@ts_dpdk_pmd@sta/dpdk_pmd_ts.c.o" target="_blank">lib/76b5a35@@ts_dpdk_pmd@sta/dpdk_pmd_ts.c.o</a>'
-MF '<a href="mailto:lib/76b5a35@@ts_dpdk_pmd@sta/dpdk_pmd_ts.c.o.d" target="_blank">lib/76b5a35@@ts_dpdk_pmd@sta/dpdk_pmd_ts.c.o.d</a>'
-o '<a href="mailto:lib/76b5a35@@ts_dpdk_pmd@sta/dpdk_pmd_ts.c.o" target="_blank">lib/76b5a35@@ts_dpdk_pmd@sta/dpdk_pmd_ts.c.o</a>'
-c ../../lib/dpdk_pmd_ts.c<br>
../../lib/dpdk_pmd_ts.c: In function
‘test_create_traffic_generator_params’:<br>
../../lib/dpdk_pmd_ts.c:5577:5: error: format not
a string literal and no format arguments
[-Werror=format-security]<br>
5577 | rc = te_kvpair_add(result, buf, mode);<br>
| ^~<br>
cc1: all warnings being treated as errors<br>
</blockquote>
<div>
<div>
<div><br>
</div>
<div>This error wasn't occurring last week,
which was the last time I ran the tests.<br>
</div>
<div>The TE host and the DUT have GCC v9.4.0
installed, and the tester has GCC v11.4.0
installed, if this information is helpful.<br>
</div>
<div><br>
</div>
<div>2. On the Mellanox CX5s, there are over
6,000 tests run, which collectively take
around 9 hours. Is it possible, and would it
make sense, to lower the test coverage and
have the test suite run faster?<br>
<br>
</div>
<div>For some context, we run immediate testing
on incoming patches for DPDK main and
development branches, as well as periodic test
runs on the main, stable, and LTS branches.<br>
</div>
<div>For us to consider including this test
suite as part of our immediate testing on
patches, we would have to reduce the test
coverage to the most important tests.<br>
This is primarily to reduce the testing time
to, for example, less than 30 minutes. Testing
on patches can't take too long because the lab
can receive numerous patches each day, which
each require individual testing runs.<br>
<br>
</div>
<div>At what frequency we run these tests, and
on what, still needs to be discussed with the
DPDK community, but it would be nice to know
if the test suite had a "dial" to control the
testing coverage.<br>
</div>
<div><br>
</div>
<div>3. We see a lot of test failures on our
Mellanox CX5 NICs. Around 2,300 of ~6,600
tests passed. Is there anything we can do to
diagnose these test failures?<br>
</div>
<div><br>
</div>
<div>Thanks,<br>
</div>
<div>Adam<br>
</div>
<div><br>
</div>
</div>
</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Tue, Aug 29,
2023 at 8:07 AM Andrew Rybchenko <<a href="mailto:andrew.rybchenko@oktetlabs.ru" target="_blank">andrew.rybchenko@oktetlabs.ru</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<div>Hi Adam,<br>
<br>
I've pushed the fix in main branch and a new
tag v1.18.1. It should solve the problem with
IPv6 address from DNS.<br>
<br>
Andrew.<br>
<br>
On 8/29/23 00:05, Andrew Rybchenko wrote:<br>
</div>
<blockquote type="cite">
<div>Hi Adam,<br>
<br>
> Does the test engine prefer to use IPv6
over IPv4 for initiating the RCF connection
to the test bed hosts? And if so, is there a
way to force it to use IPv4?<br>
<br>
Brilliant idea. If DNS returns both IPv4 and
IPv6 addresses in your case, I guess it is
the root cause of the problem.<br>
Of course, it is TE problem since I see
really weird code in
lib/comm_net_engine/comm_net_engine.c line
135.<br>
<br>
I've pushed fix to the branch
user/arybchik/fix_ipv4_only in
ts-factory/test-environment repository.
Please, try.<br>
<br>
It is late night fix with minimal testing
and no review. I'll pass it through review
process tomorrow and<br>
hopefully it will be released in one-two
days.<br>
<br>
Andrew.<br>
<br>
On 8/28/23 18:02, Adam Hassick wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div>
<div>
<div>Hi Andrew,<br>
<br>
</div>
We have yet to notice a distinct
pattern with the failures. Sometimes,
the RCF will start and connect without
issue a few times in a row before
failing to connect again. Once the
issue begins to occur, neither
rebooting all of the hosts (test
engine VM, tester, IUT) or deleting
all of the build directories (suites,
agents, inst) and rebooting the hosts
afterward resolves the issue. When it
begins working again seems very
arbitrary to us.<br>
<br>
</div>
<div>I do usually try to terminate the
test engine with Ctrl+C, but when it
hangs while trying to start RCF, that
does not work.<br>
</div>
<div><br>
</div>
<div>Does the test engine prefer to use
IPv6 over IPv4 for initiating the RCF
connection to the test bed hosts? And
if so, is there a way to force it to
use IPv4?<br>
<br>
</div>
<div> - Adam<br>
</div>
</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Fri,
Aug 25, 2023 at 1:35 PM Andrew Rybchenko
<<a href="mailto:andrew.rybchenko@oktetlabs.ru" target="_blank">andrew.rybchenko@oktetlabs.ru</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<div>> I'll double-check test
engine on Ubuntu 20.04 and Ubuntu
22.04.<br>
<br>
Done. It works fine for me without
any issues.<br>
<br>
Have you noticed any pattern when it
works or does not work?<br>
May be it is a problem of not clean
state after termination?<br>
Does it work fine the first time
after DUTs reboot?<br>
How do you terminate testing? It
should be done using Ctrl+C in
terminal where you execute run.sh
command.<br>
In this case it should shutdown
gracefully and close all test agents
and engine applications.<br>
<br>
(I'm trying to understand why you've
seen many test agent processes. It
should not happen.)<br>
<br>
Andrew.<br>
<br>
On 8/25/23 17:41, Andrew Rybchenko
wrote:<br>
</div>
<blockquote type="cite">
<div>On 8/25/23 17:06, Adam Hassick
wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div>
<div>Hi Andrew,<br>
<br>
</div>
Two of our systems (the Test
Engine runner and the DUT
host) are running Ubuntu 20.04
LTS, however this morning I
noticed that the tester system
(the one having issues) is
running Ubuntu 22.04 LTS.<br>
</div>
<div>This could be the source of
the problem. I encountered a
dependency issue trying to run
the Test Engine on 22.04 LTS,
so I downgraded the system.
Since the tester is also the
host having connection issues,
I will try downgrading that
system to 20.04, and see if
that changes anything.<br>
</div>
</div>
</blockquote>
<br>
Unlikely, but who knows. We run
tests (DUTs) on Ubuntu 20.04, Ubuntu
22.04, Ubuntu 22.10, Ubuntu 23.04,
Debian 11 and Fedora 38 every night.<br>
Right now Debian 11 is used for test
engine in nightly regressions.<br>
<br>
I'll double-check test engine on
Ubuntu 20.04 and Ubuntu 22.04.<br>
<br>
<blockquote type="cite">
<div dir="ltr">
<div>I did try passing in the
"--vg-rcf" argument to the
run.sh script of the test
suite after installing
valgrind, but there was no
additional output that I saw.<br>
</div>
</div>
</blockquote>
<br>
Sorry, I should valgrind output
should be in valgrind.te_rcf
(direction where you run test
engine).<br>
<br>
<blockquote type="cite">
<div dir="ltr">
<div><br>
</div>
<div>I will try pulling in the
changes you've pushed up, and
will see if that fixes
anything.<br>
<br>
</div>
<div>Thanks,<br>
</div>
<div>Adam<br>
</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Fri, Aug
25, 2023 at 9:57 AM Andrew
Rybchenko <<a href="mailto:andrew.rybchenko@oktetlabs.ru" target="_blank">andrew.rybchenko@oktetlabs.ru</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<div>Hello Adam, <br>
<br>
On 8/24/23 23:54, Andrew
Rybchenko wrote:<br>
</div>
<blockquote type="cite">I'd
like to try to repeat the
problem locally. Which
Linux distro is running on
test engine and agents? <br>
<br>
In fact I know one problem
with Debian 12 and Fedora
38 and we have <br>
patch in review to fix it,
however, the behaviour is
different in <br>
this case, so it is unlike
the same problem. <br>
</blockquote>
<br>
I've just published a new
tag which fixes known test
engine side problems on
Debian 12 and Fedora 38.<br>
<br>
<blockquote type="cite"> <br>
One more idea is to
install valgrind on the
test engine host and <br>
run with option --vg-rcf
to check if something
weird is happening. <br>
<br>
What I don't understand
right now is why I see
just one failed attempt <br>
to connect in your log.txt
and then Logger shutdown
after 9 minutes. <br>
<br>
Andrew. <br>
<br>
On 8/24/23 23:29, Adam
Hassick wrote: <br>
<blockquote type="cite"> >
Is there any firewall in
the network or on test
hosts which could block
incoming TCP connection
to the port 23571 <a href="http://iol-dts-tester.dpdklab.iol.unh.edu:23571" target="_blank"><http://iol-dts-tester.dpdklab.iol.unh.edu:23571></a>
from the host where you
run test engine? <br>
<br>
Our test engine host and
the testbed are on the
same subnet. The
connection does work
sometimes. <br>
<br>
> If behaviour the
same on the next try and
you see that test agent
is kept running, could
you check using <br>
> <br>
> # netstat -tnlp <br>
> <br>
> that Test Agent is
listening on the port
and try to establish TCP
connection from test
agent using <br>
> <br>
> $ telnet <a href="http://iol-dts-tester.dpdklab.iol.unh.edu" target="_blank">iol-dts-tester.dpdklab.iol.unh.edu</a>
<a href="http://iol-dts-tester.dpdklab.iol.unh.edu:23571" target="_blank"><http://iol-dts-tester.dpdklab.iol.unh.edu:23571></a>
23571 <a href="http://iol-dts-tester.dpdklab.iol.unh.edu:23571" target="_blank"><http://iol-dts-tester.dpdklab.iol.unh.edu:23571></a>
<br>
> <br>
> and check if TCP
connection could be
established. <br>
<br>
I was able to replicate
the same behavior again,
where it hangs while RCF
is trying to start. <br>
Running this command, I
see this in the output:
<br>
<br>
tcp 0 0 <a href="http://0.0.0.0:23571" target="_blank">0.0.0.0:23571</a>
<a href="http://0.0.0.0:23571" target="_blank"><http://0.0.0.0:23571></a>
0.0.0.0:*
LISTEN 18599/ta <br>
<br>
So it seems like it is
listening on the correct
port. <br>
Additionally, I was able
to connect to the Tester
machine from our Test
Engine host using
telnet. It printed the
PID of the process once
the connection was
opened. <br>
<br>
I tried running the "ta"
application manually on
the command line, and it
didn't print anything at
all. <br>
Maybe the issue is
something on the Test
Engine side. <br>
<br>
On Thu, Aug 24, 2023 at
2:35 PM Andrew Rybchenko
<<a href="mailto:andrew.rybchenko@oktetlabs.ru" target="_blank">andrew.rybchenko@oktetlabs.ru</a> <a href="mailto:andrew.rybchenko@oktetlabs.ru" target="_blank"><mailto:andrew.rybchenko@oktetlabs.ru></a>>
wrote: <br>
<br>
Hi Adam, <br>
<br>
> On the tester
host (which appears to
be the Peer agent),
there <br>
are four processes
that I see running,
which look like the test
<br>
agent processes. <br>
<br>
Before the next try
I'd recommend to kill
these processes. <br>
<br>
Is there any
firewall in the network
or on test hosts which
could <br>
block incoming TCP
connection to the port
23571 <br>
<a href="http://iol-dts-tester.dpdklab.iol.unh.edu:23571" target="_blank"><http://iol-dts-tester.dpdklab.iol.unh.edu:23571></a>
from the host <br>
where you run test
engine? <br>
<br>
If behaviour the
same on the next try and
you see that test agent
is <br>
kept running, could
you check using <br>
<br>
# netstat -tnlp <br>
<br>
that Test Agent is
listening on the port
and try to establish TCP
<br>
connection from test
agent using <br>
<br>
$ telnet <a href="http://iol-dts-tester.dpdklab.iol.unh.edu" target="_blank">iol-dts-tester.dpdklab.iol.unh.edu</a>
<br>
<a href="http://iol-dts-tester.dpdklab.iol.unh.edu:23571" target="_blank"><http://iol-dts-tester.dpdklab.iol.unh.edu:23571></a>
23571 <br>
<a href="http://iol-dts-tester.dpdklab.iol.unh.edu:23571" target="_blank"><http://iol-dts-tester.dpdklab.iol.unh.edu:23571></a>
<br>
<br>
and check if TCP
connection could be
established. <br>
<br>
Another idea is to
login Tester under root
as testing does, get <br>
start TA command
from the log and try it
by hands without -n and
<br>
remove extra
escaping. <br>
<br>
# sudo
PATH=${PATH}:/tmp/linux_x86_root_76872_1692885663_1
<br>
LD_LIBRARY_PATH=${LD_LIBRARY_PATH}${LD_LIBRARY_PATH:+:}/tmp/linux_x86_root_76872_1692885663_1
/tmp/linux_x86_root_76872_1692885663_1/ta Peer 23571
host=iol-dts-tester.dpdklab.iol.unh.edu:port=23571:user=root:key=/opt/tsf/keys/id_ed25519:ssh_port=22:copy_timeout=15:kill_timeout=15:sudo=:shell=<br>
<br>
Hopefully in this
case test agent
directory remains in the
/tmp and <br>
you don't need to
copy it as testing does.
<br>
May be output could
shed some light on
what's going on. <br>
<br>
Andrew. <br>
<br>
On 8/24/23 17:30,
Adam Hassick wrote: <br>
<blockquote type="cite">
Hi Andrew, <br>
<br>
This is the output
that I see in the
terminal when this
failure <br>
occurs, after the
test agent binaries
build and the test
engine <br>
starts: <br>
<br>
Platform default
build - pass <br>
Simple RCF
consistency check
succeeded <br>
--->>>
Starting Logger...done
<br>
--->>>
Starting
RCF...rcf_net_engine_connect():
Connection timed <br>
out <a href="http://iol-dts-tester.dpdklab.iol.unh.edu:23571" target="_blank">iol-dts-tester.dpdklab.iol.unh.edu:23571</a>
<br>
<a href="http://iol-dts-tester.dpdklab.iol.unh.edu:23571" target="_blank"><http://iol-dts-tester.dpdklab.iol.unh.edu:23571></a>
<br>
<br>
Then, it hangs
here until I kill the
"te_rcf" and "te_tee"
<br>
processes. I let
it hang for around 9
minutes. <br>
<br>
On the tester host
(which appears to be
the Peer agent), there
are <br>
four processes
that I see running,
which look like the
test agent <br>
processes. <br>
<br>
ta.Peer is an
empty file. I've
attached the log.txt
from this run. <br>
<br>
- Adam <br>
<br>
On Thu, Aug 24,
2023 at 4:22 AM Andrew
Rybchenko <br>
<<a href="mailto:andrew.rybchenko@oktetlabs.ru" target="_blank">andrew.rybchenko@oktetlabs.ru</a> <br>
<a href="mailto:andrew.rybchenko@oktetlabs.ru" target="_blank"><mailto:andrew.rybchenko@oktetlabs.ru></a>>
wrote: <br>
<br>
Hi Adam, <br>
<br>
Yes,
TE_RCFUNIX_TIMEOUT is
in seconds. I've
double-checked <br>
that it goes
to 'copy_timeout' in
ts-conf/rcf.conf. <br>
Description in
in
doc/sphinx/pages/group_te_engine_rcf.rst
<br>
says that
copy_timeout is in
seconds and
implementation in <br>
lib/rcfunix/rcfunix.c
passes the value to
select() tv_sec. <br>
Theoretically
select() could be
interrupted by signal,
but I <br>
think it is
unlikely here. <br>
<br>
I'm not sure
that I understand what
do you mean by RCF <br>
connection
timeout. Does it
happen on TE startup
when RCF <br>
starts test
agents. If so,
TE_RCFUNIX_TIMEOUT
could help. Or <br>
does it happen
when tests are in
progress, e.g. in the
middle <br>
of a test. If
so, TE_RCFUNIX_TIMEOUT
is unrelated and most
<br>
likely either
host with test agent
dies or test agent
itself <br>
crashes. It
would be easier for me
if classify it if you
share <br>
text log
(log.txt, full or just
corresponding fragment
with <br>
some context).
Also content of
ta.DPDK or ta.Peer
file <br>
depending on
which agent has
problems could shed
some light. <br>
Corresponding
files contain
stdout/stderr of test
agents. <br>
<br>
Andrew. <br>
<br>
On 8/23/23
17:45, Adam Hassick
wrote: <br>
<blockquote type="cite">
Hi Andrew, <br>
<br>
I've set up
a test rig
repository here, and
have created <br>
configurations for
our development
testbed based off of
the <br>
examples. <br>
We've been
able to get the test
suite to run
manually on <br>
Mellanox CX5
devices once. <br>
However, we
are running into an
issue where, when
RCF starts, <br>
the RCF
connection times out
very frequently. We
aren't sure <br>
why this is
the case. <br>
It works
sometimes, but most
of the time when we
try to run <br>
the test
engine, it
encounters this
issue. <br>
I've tried
changing the RCF
port by setting <br>
"TE_RCF_PORT=<some
port number>" and
rebooting the
testbed <br>
machines.
Neither seems to fix
the issue. <br>
<br>
It also
seems like the
timeout takes far
longer than 60 <br>
seconds,
even when running
"export
TE_RCFUNIX_TIMEOUT=60"
<br>
before I try
to run the test
suite. <br>
I assume the
unit for this
variable is seconds?
<br>
<br>
Thanks, <br>
Adam <br>
<br>
On Mon, Aug
21, 2023 at 10:19 AM
Adam Hassick <br>
<<a href="mailto:ahassick@iol.unh.edu" target="_blank">ahassick@iol.unh.edu</a> <a href="mailto:ahassick@iol.unh.edu" target="_blank"><mailto:ahassick@iol.unh.edu></a>>
wrote: <br>
<br>
Hi
Andrew, <br>
<br>
Thanks,
I've cloned the
example repository
and will start <br>
setting
up a configuration
for our development
testbed <br>
today.
I'll let you know if
I run into any
difficulties <br>
or have
any questions. <br>
<br>
- Adam
<br>
<br>
On Sun,
Aug 20, 2023 at
4:40 AM Andrew
Rybchenko <br>
<<a href="mailto:andrew.rybchenko@oktetlabs.ru" target="_blank">andrew.rybchenko@oktetlabs.ru</a> <br>
<a href="mailto:andrew.rybchenko@oktetlabs.ru" target="_blank"><mailto:andrew.rybchenko@oktetlabs.ru></a>>
wrote: <br>
<br>
Hi
Adam, <br>
<br>
I've
published <br>
<a href="https://github.com/ts-factory/ts-rigs-sample" target="_blank">https://github.com/ts-factory/ts-rigs-sample</a>
<br>
<a href="https://github.com/ts-factory/ts-rigs-sample" target="_blank"><https://github.com/ts-factory/ts-rigs-sample></a>.
<br>
Hopefully it will
help to define your
test rigs and <br>
successfully run
some tests manually.
Feel free to <br>
ask
any questions and
I'll answer here and
try to <br>
update
documentation. <br>
<br>
Meanwhile I'll
prepare missing bits
for steps (2) and <br>
(3).
<br>
Hopefully everything
is in place for step
(4), but we <br>
need
to make steps (2)
and (3) first. <br>
<br>
Andrew. <br>
<br>
On
8/18/23 21:40,
Andrew Rybchenko
wrote: <br>
<blockquote type="cite">
Hi Adam, <br>
<br>
> I've
conferred with the
rest of the team,
and we <br>
think it would be
best to move
forward with
mainly <br>
option B. <br>
<br>
OK, I'll provide
the sample on
Monday for you. It
is <br>
almost ready right
now, but I need to
double-check <br>
it
before publishing.
<br>
<br>
Regards, <br>
Andrew. <br>
<br>
On
8/17/23 20:03,
Adam Hassick
wrote: <br>
<blockquote type="cite">
Hi Andrew, <br>
<br>
I'm adding the
CI mailing list
to this <br>
conversation.
Others in the
community might
find <br>
this
conversation
valuable. <br>
<br>
We do want to
run testing on a
regular basis.
The <br>
Jenkins
integration will
be very useful
for us, as <br>
most of our CI
is orchestrated
by Jenkins. <br>
I've conferred
with the rest of
the team, and we
<br>
think it would
be best to move
forward with
mainly <br>
option B. <br>
If you would
like to know
anything about
our <br>
testbeds that
would help you
with creating an
<br>
example ts-rigs
repo, I'd be
happy to answer
any <br>
questions you
have. <br>
<br>
We have multiple
test rigs (we
call these <br>
"DUT-tester
pairs") that we
run our existing
<br>
hardware testing
on, with
differing
network <br>
hardware and CPU
architecture. I
figured this
might <br>
be an important
detail. <br>
<br>
Thanks, <br>
Adam <br>
<br>
On Thu, Aug 17,
2023 at 11:44 AM
Andrew Rybchenko
<br>
<<a href="mailto:andrew.rybchenko@oktetlabs.ru" target="_blank">andrew.rybchenko@oktetlabs.ru</a>
<br>
<a href="mailto:andrew.rybchenko@oktetlabs.ru" target="_blank"><mailto:andrew.rybchenko@oktetlabs.ru></a>>
wrote: <br>
<br>
Greatings Adam, <br>
<br>
I'm happy to hear that you're trying to bring <br>
it up. <br>
<br>
As I understand the final goal is to run it on <br>
regular basis. So, we need to make it properly <br>
from the very beginning. <br>
Bring up of all features consists of 4 steps: <br>
<br>
1. Create site-specific repository (we call it <br>
ts-rigs) which contains information about test <br>
rigs and other site-specific information like <br>
where to send mails, where to store logs etc. <br>
It is required for manual execution as well, <br>
since test rigs description is essential. I'll <br>
return to the topic below. <br>
<br>
2. Setup logs storage for automated runs. <br>
Basically it is a disk space plus apache2 web <br>
server with few CGI scripts which help a lot to <br>
save disk space. <br>
<br>
3. Setup Bublik web application which provides <br>
web interface to view testing results. Same as <br>
<a href="https://ts-factory.io/bublik" target="_blank">https://ts-factory.io/bublik</a>
<br>
<a href="https://ts-factory.io/bublik" target="_blank"><https://ts-factory.io/bublik></a> <br>
<br>
4. Setup Jenkins to run tests on regularly, <br>
save logs in log storage (2) and import it to <br>
bublik (3). <br>
<br>
Last few month we spent on our homework to make <br>
it simpler to bring up automated execution <br>
using Jenkins - <br>
<a href="https://github.com/ts-factory/te-jenkins" target="_blank">https://github.com/ts-factory/te-jenkins</a>
<br>
<a href="https://github.com/ts-factory/te-jenkins" target="_blank"><https://github.com/ts-factory/te-jenkins></a>
<br>
Corresponding bits in dpdk-ethdev-ts will be <br>
available tomorrow. <br>
<br>
Let's return to the step (1). <br>
<br>
Unfortunately there is no publicly available <br>
example of the ts-rigs repository since <br>
sensitive site-specific information is located <br>
there. But I'm ready to help you to create it <br>
for UNH. I see two options here: <br>
<br>
(A) I'll ask questions and based on your <br>
answers will create the first draft with my <br>
comments. <br>
<br>
(B) I'll make a template/example ts-rigs repo, <br>
publish it and you'll create UNH ts-rigs based <br>
on it. <br>
<br>
Of course, I'll help to debug and finally bring <br>
it up in any case. <br>
<br>
(A) is a bit simpler for me and you, but (B) is <br>
a bit more generic and will help other <br>
potential users to bring it up. <br>
We can combine (A)+(B). I.e. start from (A). <br>
What do you think? <br>
<br>
Thanks, <br>
Andrew. <br>
<br>
On 8/17/23 15:18, Konstantin Ushakov wrote: <br>
<blockquote type="cite">
Greetings
Adam, <br>
<br>
<br>
Thanks for contacting us. I copy Andrew who <br>
would be happy to help <br>
<br>
Thanks, <br>
Konstantin <br>
<br>
<blockquote type="cite">
On 16 Aug
2023, at
21:50, Adam
Hassick <br>
<a href="mailto:ahassick@iol.unh.edu" target="_blank"><ahassick@iol.unh.edu></a> <br>
<a href="mailto:ahassick@iol.unh.edu" target="_blank"><mailto:ahassick@iol.unh.edu></a> wrote: <br>
<br>
<br>
Greetings Konstantin, <br>
<br>
I am in the process of setting up the DPDK <br>
Poll Mode Driver test suite as an addition to <br>
our testing coverage for DPDK at the UNH lab. <br>
<br>
I have some questions about how to set the <br>
test suite arguments. <br>
<br>
I have been able to configure the Test Engine <br>
to connect to the hosts in the testbed. The <br>
RCF, Configurator, and Tester all begin to <br>
run, however the prelude of the test suite <br>
fails to run. <br>
<br>
<a href="https://ts-factory.io/doc/dpdk-ethdev-ts/index.html#test-parameters" target="_blank">https://ts-factory.io/doc/dpdk-ethdev-ts/index.html#test-parameters</a>
<a href="https://ts-factory.io/doc/dpdk-ethdev-ts/index.html#test-parameters" target="_blank"><https://ts-factory.io/doc/dpdk-ethdev-ts/index.html#test-parameters></a>
<br>
<br>
The documentation mentions that there are <br>
several test parameters for the test suite, <br>
like for the IUT test link MAC, etc. These <br>
seem like they would need to be set somewhere <br>
to run many of the tests. <br>
<br>
I see in the Test Engine documentation, there <br>
are instructions on how to create new <br>
parameters for test suites in the Tester <br>
configuration, but there is nothing in the <br>
user guide or in the Tester guide for how to <br>
set the arguments for the parameters when <br>
running the test suite that I can find. I'm <br>
not sure if I need to write my own Tester <br>
config, or if I should be setting these in <br>
some other way. <br>
<br>
How should these values be set? <br>
<br>
I'm also not sure what environment <br>
variables/arguments are strictly necessary or <br>
which are optional. <br>
<br>
Regards, <br>
Adam <br>
<br>
-- *Adam Hassick* <br>
Senior Developer <br>
UNH InterOperability Lab <br>
<a href="mailto:ahassick@iol.unh.edu" target="_blank">ahassick@iol.unh.edu</a>
<br>
<a href="mailto:ahassick@iol.unh.edu" target="_blank"><mailto:ahassick@iol.unh.edu></a> <br>
<a href="http://iol.unh.edu" target="_blank">iol.unh.edu</a>
<a href="https://www.iol.unh.edu/" target="_blank"><https://www.iol.unh.edu/></a>
<br>
+1 (603) 475-8248 <br>
</blockquote>
</blockquote>
<br>
<br>
<br>
--
*Adam Hassick* <br>
Senior Developer
<br>
UNH
InterOperability
Lab <br>
<a href="mailto:ahassick@iol.unh.edu" target="_blank">ahassick@iol.unh.edu</a>
<a href="mailto:ahassick@iol.unh.edu" target="_blank"><mailto:ahassick@iol.unh.edu></a>
<br>
<a href="http://iol.unh.edu" target="_blank">iol.unh.edu</a> <a href="https://www.iol.unh.edu/" target="_blank"><https://www.iol.unh.edu/></a>
<br>
+1 (603)
475-8248 <br>
</blockquote>
<br>
</blockquote>
<br>
<br>
<br>
--
*Adam
Hassick* <br>
Senior
Developer <br>
UNH
InterOperability Lab
<br>
<a href="mailto:ahassick@iol.unh.edu" target="_blank">ahassick@iol.unh.edu</a> <a href="mailto:ahassick@iol.unh.edu" target="_blank"><mailto:ahassick@iol.unh.edu></a>
<br>
<a href="http://iol.unh.edu" target="_blank">iol.unh.edu</a>
<a href="https://www.iol.unh.edu/" target="_blank"><https://www.iol.unh.edu/></a>
<br>
+1 (603)
475-8248 <br>
<br>
<br>
<br>
--
*Adam Hassick* <br>
Senior
Developer <br>
UNH
InterOperability Lab
<br>
<a href="mailto:ahassick@iol.unh.edu" target="_blank">ahassick@iol.unh.edu</a> <a href="mailto:ahassick@iol.unh.edu" target="_blank"><mailto:ahassick@iol.unh.edu></a>
<br>
<a href="http://iol.unh.edu" target="_blank">iol.unh.edu</a>
<a href="https://www.iol.unh.edu/" target="_blank"><https://www.iol.unh.edu/></a>
<br>
+1 (603)
475-8248 <br>
</blockquote>
<br>
<br>
<br>
-- *Adam
Hassick* <br>
Senior Developer <br>
UNH
InterOperability Lab <br>
<a href="mailto:ahassick@iol.unh.edu" target="_blank">ahassick@iol.unh.edu</a> <a href="mailto:ahassick@iol.unh.edu" target="_blank"><mailto:ahassick@iol.unh.edu></a>
<br>
<a href="http://iol.unh.edu" target="_blank">iol.unh.edu</a>
<a href="https://www.iol.unh.edu/" target="_blank"><https://www.iol.unh.edu/></a>
<br>
+1 (603) 475-8248
<br>
</blockquote>
<br>
<br>
<br>
-- <br>
*Adam Hassick* <br>
Senior Developer <br>
UNH InterOperability Lab
<br>
<a href="mailto:ahassick@iol.unh.edu" target="_blank">ahassick@iol.unh.edu</a> <a href="mailto:ahassick@iol.unh.edu" target="_blank"><mailto:ahassick@iol.unh.edu></a>
<br>
<a href="http://iol.unh.edu" target="_blank">iol.unh.edu</a>
<a href="https://www.iol.unh.edu/" target="_blank"><https://www.iol.unh.edu/></a>
<br>
+1 (603) 475-8248 <br>
</blockquote>
<br>
</blockquote>
<br>
</div>
</blockquote>
</div>
<br clear="all">
<br>
<span class="gmail_signature_prefix">--
</span><br>
<div dir="ltr" class="gmail_signature">
<div dir="ltr">
<div>
<div><b><span style="background-color:rgb(255,255,255)"><span style="color:rgb(102,102,102)">Adam Hassick</span></span></b><br>
</div>
<span style="color:rgb(102,102,102)"></span></div>
<div><span style="color:rgb(102,102,102)">Senior
Developer</span></div>
<div><span style="color:rgb(102,102,102)"><span style="color:rgb(11,83,148)"><span style="background-color:rgb(255,255,255)">UNH
InterOperability Lab</span></span></span><span style="color:rgb(102,102,102)"></span></div>
<div><span style="color:rgb(102,102,102)"><a href="mailto:ahassick@iol.unh.edu" target="_blank">ahassick@iol.unh.edu</a><br>
</span></div>
<div><span style="color:rgb(102,102,102)"><a href="https://www.iol.unh.edu/" target="_blank">iol.unh.edu</a><br>
</span></div>
+1 (603) 475-8248<br>
</div>
</div>
</blockquote>
<br>
</blockquote>
<br>
</div>
</blockquote>
</div>
<br clear="all">
<br>
<span class="gmail_signature_prefix">-- </span><br>
<div dir="ltr" class="gmail_signature">
<div dir="ltr">
<div>
<div><b><span style="background-color:rgb(255,255,255)"><span style="color:rgb(102,102,102)">Adam
Hassick</span></span></b><br>
</div>
<span style="color:rgb(102,102,102)"></span></div>
<div><span style="color:rgb(102,102,102)">Senior
Developer</span></div>
<div><span style="color:rgb(102,102,102)"><span style="color:rgb(11,83,148)"><span style="background-color:rgb(255,255,255)">UNH InterOperability Lab</span></span></span><span style="color:rgb(102,102,102)"></span></div>
<div><span style="color:rgb(102,102,102)"><a href="mailto:ahassick@iol.unh.edu" target="_blank">ahassick@iol.unh.edu</a><br>
</span></div>
<div><span style="color:rgb(102,102,102)"><a href="https://www.iol.unh.edu/" target="_blank">iol.unh.edu</a><br>
</span></div>
+1 (603) 475-8248<br>
</div>
</div>
</blockquote>
<br>
</blockquote>
<br>
</div>
</blockquote>
</div>
<br clear="all">
<br>
<span class="gmail_signature_prefix">-- </span><br>
<div dir="ltr" class="gmail_signature">
<div dir="ltr">
<div>
<div><b><span style="background-color:rgb(255,255,255)"><span style="color:rgb(102,102,102)">Adam
Hassick</span></span></b><br>
</div>
<span style="color:rgb(102,102,102)"></span></div>
<div><span style="color:rgb(102,102,102)">Senior
Developer</span></div>
<div><span style="color:rgb(102,102,102)"><span style="color:rgb(11,83,148)"><span style="background-color:rgb(255,255,255)">UNH
InterOperability Lab</span></span></span><span style="color:rgb(102,102,102)"></span></div>
<div><span style="color:rgb(102,102,102)"><a href="mailto:ahassick@iol.unh.edu" target="_blank">ahassick@iol.unh.edu</a><br>
</span></div>
<div><span style="color:rgb(102,102,102)"><a href="https://www.iol.unh.edu/" target="_blank">iol.unh.edu</a><br>
</span></div>
+1 (603) 475-8248<br>
</div>
</div>
</blockquote>
<br>
</div>
</blockquote>
</div>
</div>
</div>
</blockquote>
<br>
</div>
</blockquote></div><br clear="all"><br><span class="gmail_signature_prefix">-- </span><br><div dir="ltr" class="gmail_signature"><div dir="ltr"><div><div><b><span style="background-color:rgb(255,255,255)"><span style="color:rgb(102,102,102)">Adam Hassick</span></span></b><br></div><span style="color:rgb(102,102,102)"></span></div><div><span style="color:rgb(102,102,102)">Senior Developer</span></div><div><span style="color:rgb(102,102,102)"><span style="color:rgb(11,83,148)"><span style="background-color:rgb(255,255,255)">UNH InterOperability Lab</span></span></span><span style="color:rgb(102,102,102)"></span></div><div><span style="color:rgb(102,102,102)"><a href="mailto:ahassick@iol.unh.edu" target="_blank">ahassick@iol.unh.edu</a><br></span></div><div><span style="color:rgb(102,102,102)"><a href="https://www.iol.unh.edu/" target="_blank">iol.unh.edu</a><br></span></div>+1 (603) 475-8248<br></div></div>