<div dir="ltr"><div>Hi Andrew,<br></div><div><br>I have one additional question as well: Does the test engine support running tests on two ARMv8 test agents?</div><div><br></div><div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">1. We'll sort out warnings this week.
Thanks for heads up.<br></blockquote><div><br></div><div>Great. Let me know when that's fixed.</div></div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div>Support for old LTS branches was dropped some time ago, but in the
future it is definitely possible to keep it for new LTS branches.
I think 22.11 is supported, but I'm not sure about older LTS
releases.</div></blockquote><div><br></div><div>Good to know.<br><div> <br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
2. You can add command-line option --sanity to run tests marked
with TEST_HARNESS_SANITY requirement (see
dpdk-ethdev-ts/scripts/run.sh and grep TEST_HARNESS_SANITY
dpdk-ethdev-ts to see which tests are marked). Yes, there is a
space for terminology improvement here. We'll do it.<br><br>
Also it takes a lot of time because of failures and tests which
wait for some timeout.<br></blockquote>
</div><div><br></div><div>That makes sense to me. We'll use the time to complete tests on virtio or the Intel devices as a reference for how long the tests really take to complete.<br></div><div>We will explore the possibility of periodically running the sanity tests for patches.<br></div><div> <br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
The test harness can provide coverage reports based on gcov, but
I'm not sure what you mean by a "dial" to control test coverage.
Provided reports are rather for human to analyze.<br></blockquote></div><div><br></div><div>The general idea is to have some kind of parameter on the test suite, which could be an integer ranging from zero to ten, that controls how many tests are run based on how important the test is.<br><br></div><div>Similar to how some command line interfaces provide a verbosity level
parameter (some number of "-v" arguments) to control the importance of
the information in the log.<br></div>The verbosity level zero only prints very important log messages, while ten prints everything.<br></div><div><br>In much the same manner as above, this "dial" parameter controls what tests are run and with what parameters based on how important those tests and test parameter combinations are.<br>Coverage Level zero tells the suite to run a very basic set of important tests, with minimal parameterization. This mode would take only ~5-10 minutes to run.<br>In contrast, Coverage Level ten includes all the edge cases, every combination of test parameters, everything the test suite can do, which takes the normal several hours to run.<br>The values 1 - 9 are between those two extremes, allowing the user to get a gradient of test coverage in the results and to limit the running time.<br><br></div>Then we could, for example, run the "run.sh" with a level of 2 or 3 for incoming patches that need quick results, and with a level of 10 for the less often run periodic tests performed on main or LTS branches.<br><div><div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><div>
3. Yes, really many tests on Mellanox CX5 NICs report unexpected
testing results. Unfortunately it is time consuming to fill in
expectations database since it is necessary to analyze testing
results and classify if it is a bug or just acceptable behaviour
aspect.<br>
<br>
Bublik allows to compare results of two runs. It is useful for
human, but still not good for automation.<br>
<br>
I have local patch for mlx5 driver which reports Tx ring size
maximum. It makes pass rate higher. It is a problem for test
harness that mlx5 does not report limits right now.<br>
<br>
Pass rate on Intel X710 is about 92% on my test rig. Pass rate on
virtio net is 99% right now and could be done 100% easily (just
one thing to fix in expectations).<br>
<br>
I think logs storage setup is essential for logs analysis. Of
course, you can request HTML logs when you run tests
(--log-html=html) or generate after run using
dpdk-ethdev-ts/scripts/html-log.sh and open index.html in a
browser, but logs storage makes it more convenient.<br></div></div></blockquote><div><br>We are interested in setting up Bublik, potentially as an externally-facing component, once we have our process of running the test suite stabilized.</div><div>Once we are able to run the test suite again, I'll see what the pass rate is on our other hardware.<br>Good to know that it isn't an issue with our dev testbed causing the high fail rate.</div></div><div><div><br></div><div>For Intel hardware, we have an XL710 and an Intel E810-C in our development testbed. Although they are slightly different devices, ideally the pass rate will be identical or similar. I have yet to set up a VM pair for virtio, but we will soon.<br></div><div><br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
Latest version of test-environment has examples of our CGI scripts
which we use for log storage (see tools/log_server/README.md).<br><br>
Also all bits for Jenkins setup are available. See
dpdk-ethdev-ts/jenkins/README.md and examples of jenkins files in
ts-rigs-sample.<br></blockquote>
</div><div><br></div><div>Jenkins integration, setting up production rig configurations, and permanent log storage will be our next steps once I am able to run the tests again.<br></div><div>Unless there is an easy way to have meson not pass "-Werror" into GCC. Then I would be able to run the test suite.<br></div><div><br></div><div>Thanks,<br></div><div>Adam<br></div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><div>
<br>
On 8/29/23 17:02, Adam Hassick wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div>
<div>
<div>Hi Andrew,<br>
<br>
</div>
That fix seems to have resolved the issue, thanks for the
quick turnaround time on that patch.<br>
</div>
<div>Now that we have the RCF timeout issue resolved, there
are a few other questions and issues that we have about the
tests themselves.</div>
<br>
</div>
<div>1. The test suite fails to build with a couple warnings.<br>
</div>
<div><br>
</div>
<div>Below is the stderr log from compilation:<br>
</div>
<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">FAILED:
<a href="mailto:lib/76b5a35@@ts_dpdk_pmd@sta/dpdk_pmd_ts.c.o" target="_blank">lib/76b5a35@@ts_dpdk_pmd@sta/dpdk_pmd_ts.c.o</a><br>
cc -Ilib/76b5a35@@ts_dpdk_pmd@sta -Ilib -I../../lib
-I/opt/tsf/dpdk-ethdev-ts/ts/inst/default/include
-fdiagnostics-color=always -pipe -D_FILE_OFFSET_BITS=64 -Wall
-Winvalid-pch -Werror -g -D_GNU_SOURCE -O0 -ggdb -Wall -W
-fPIC -MD -MQ '<a href="mailto:lib/76b5a35@@ts_dpdk_pmd@sta/dpdk_pmd_ts.c.o" target="_blank">lib/76b5a35@@ts_dpdk_pmd@sta/dpdk_pmd_ts.c.o</a>'
-MF '<a href="mailto:lib/76b5a35@@ts_dpdk_pmd@sta/dpdk_pmd_ts.c.o.d" target="_blank">lib/76b5a35@@ts_dpdk_pmd@sta/dpdk_pmd_ts.c.o.d</a>' -o
'<a href="mailto:lib/76b5a35@@ts_dpdk_pmd@sta/dpdk_pmd_ts.c.o" target="_blank">lib/76b5a35@@ts_dpdk_pmd@sta/dpdk_pmd_ts.c.o</a>' -c
../../lib/dpdk_pmd_ts.c<br>
../../lib/dpdk_pmd_ts.c: In function
‘test_create_traffic_generator_params’:<br>
../../lib/dpdk_pmd_ts.c:5577:5: error: format not a string
literal and no format arguments [-Werror=format-security]<br>
5577 | rc = te_kvpair_add(result, buf, mode);<br>
| ^~<br>
cc1: all warnings being treated as errors<br>
ninja: build stopped: subcommand failed.<br>
ninja: Entering directory `.'<br>
FAILED: <a href="mailto:lib/76b5a35@@ts_dpdk_pmd@sta/dpdk_pmd_ts.c.o" target="_blank">lib/76b5a35@@ts_dpdk_pmd@sta/dpdk_pmd_ts.c.o</a><br>
cc -Ilib/76b5a35@@ts_dpdk_pmd@sta -Ilib -I../../lib
-I/opt/tsf/dpdk-ethdev-ts/ts/inst/default/include
-fdiagnostics-color=always -pipe -D_FILE_OFFSET_BITS=64 -Wall
-Winvalid-pch -Werror -g -D_GNU_SOURCE -O0 -ggdb -Wall -W
-fPIC -MD -MQ '<a href="mailto:lib/76b5a35@@ts_dpdk_pmd@sta/dpdk_pmd_ts.c.o" target="_blank">lib/76b5a35@@ts_dpdk_pmd@sta/dpdk_pmd_ts.c.o</a>'
-MF '<a href="mailto:lib/76b5a35@@ts_dpdk_pmd@sta/dpdk_pmd_ts.c.o.d" target="_blank">lib/76b5a35@@ts_dpdk_pmd@sta/dpdk_pmd_ts.c.o.d</a>' -o
'<a href="mailto:lib/76b5a35@@ts_dpdk_pmd@sta/dpdk_pmd_ts.c.o" target="_blank">lib/76b5a35@@ts_dpdk_pmd@sta/dpdk_pmd_ts.c.o</a>' -c
../../lib/dpdk_pmd_ts.c<br>
../../lib/dpdk_pmd_ts.c: In function
‘test_create_traffic_generator_params’:<br>
../../lib/dpdk_pmd_ts.c:5577:5: error: format not a string
literal and no format arguments [-Werror=format-security]<br>
5577 | rc = te_kvpair_add(result, buf, mode);<br>
| ^~<br>
cc1: all warnings being treated as errors<br>
</blockquote>
<div>
<div>
<div><br>
</div>
<div>This error wasn't occurring last week, which was the
last time I ran the tests.<br>
</div>
<div>The TE host and the DUT have GCC v9.4.0 installed, and
the tester has GCC v11.4.0 installed, if this information
is helpful.<br>
</div>
<div><br>
</div>
<div>2. On the Mellanox CX5s, there are over 6,000 tests
run, which collectively take around 9 hours. Is it
possible, and would it make sense, to lower the test
coverage and have the test suite run faster?<br>
<br>
</div>
<div>For some context, we run immediate testing on incoming
patches for DPDK main and development branches, as well as
periodic test runs on the main, stable, and LTS branches.<br>
</div>
<div>For us to consider including this test suite as part of
our immediate testing on patches, we would have to reduce
the test coverage to the most important tests.<br>
This is primarily to reduce the testing time to, for
example, less than 30 minutes. Testing on patches can't
take too long because the lab can receive numerous patches
each day, which each require individual testing runs.<br>
<br>
</div>
<div>At what frequency we run these tests, and on what,
still needs to be discussed with the DPDK community, but
it would be nice to know if the test suite had a "dial" to
control the testing coverage.<br>
</div>
<div><br>
</div>
<div>3. We see a lot of test failures on our Mellanox CX5
NICs. Around 2,300 of ~6,600 tests passed. Is there
anything we can do to diagnose these test failures?<br>
</div>
<div><br>
</div>
<div>Thanks,<br>
</div>
<div>Adam<br>
</div>
<div><br>
</div>
</div>
</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Tue, Aug 29, 2023 at
8:07 AM Andrew Rybchenko <<a href="mailto:andrew.rybchenko@oktetlabs.ru" target="_blank">andrew.rybchenko@oktetlabs.ru</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<div>Hi Adam,<br>
<br>
I've pushed the fix in main branch and a new tag v1.18.1.
It should solve the problem with IPv6 address from DNS.<br>
<br>
Andrew.<br>
<br>
On 8/29/23 00:05, Andrew Rybchenko wrote:<br>
</div>
<blockquote type="cite">
<div>Hi Adam,<br>
<br>
> Does the test engine prefer to use IPv6 over IPv4
for initiating the RCF connection to the test bed hosts?
And if so, is there a way to force it to use IPv4?<br>
<br>
Brilliant idea. If DNS returns both IPv4 and IPv6
addresses in your case, I guess it is the root cause of
the problem.<br>
Of course, it is TE problem since I see really weird
code in lib/comm_net_engine/comm_net_engine.c line 135.<br>
<br>
I've pushed fix to the branch
user/arybchik/fix_ipv4_only in
ts-factory/test-environment repository. Please, try.<br>
<br>
It is late night fix with minimal testing and no review.
I'll pass it through review process tomorrow and<br>
hopefully it will be released in one-two days.<br>
<br>
Andrew.<br>
<br>
On 8/28/23 18:02, Adam Hassick wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div>
<div>
<div>Hi Andrew,<br>
<br>
</div>
We have yet to notice a distinct pattern with the
failures. Sometimes, the RCF will start and
connect without issue a few times in a row before
failing to connect again. Once the issue begins to
occur, neither rebooting all of the hosts (test
engine VM, tester, IUT) or deleting all of the
build directories (suites, agents, inst) and
rebooting the hosts afterward resolves the issue.
When it begins working again seems very arbitrary
to us.<br>
<br>
</div>
<div>I do usually try to terminate the test engine
with Ctrl+C, but when it hangs while trying to
start RCF, that does not work.<br>
</div>
<div><br>
</div>
<div>Does the test engine prefer to use IPv6 over
IPv4 for initiating the RCF connection to the test
bed hosts? And if so, is there a way to force it
to use IPv4?<br>
<br>
</div>
<div> - Adam<br>
</div>
</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Fri, Aug 25, 2023
at 1:35 PM Andrew Rybchenko <<a href="mailto:andrew.rybchenko@oktetlabs.ru" target="_blank">andrew.rybchenko@oktetlabs.ru</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<div>> I'll double-check test engine on Ubuntu
20.04 and Ubuntu 22.04.<br>
<br>
Done. It works fine for me without any issues.<br>
<br>
Have you noticed any pattern when it works or
does not work?<br>
May be it is a problem of not clean state after
termination?<br>
Does it work fine the first time after DUTs
reboot?<br>
How do you terminate testing? It should be done
using Ctrl+C in terminal where you execute
run.sh command.<br>
In this case it should shutdown gracefully and
close all test agents and engine applications.<br>
<br>
(I'm trying to understand why you've seen many
test agent processes. It should not happen.)<br>
<br>
Andrew.<br>
<br>
On 8/25/23 17:41, Andrew Rybchenko wrote:<br>
</div>
<blockquote type="cite">
<div>On 8/25/23 17:06, Adam Hassick wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div>
<div>Hi Andrew,<br>
<br>
</div>
Two of our systems (the Test Engine runner
and the DUT host) are running Ubuntu 20.04
LTS, however this morning I noticed that
the tester system (the one having issues)
is running Ubuntu 22.04 LTS.<br>
</div>
<div>This could be the source of the
problem. I encountered a dependency issue
trying to run the Test Engine on 22.04
LTS, so I downgraded the system. Since the
tester is also the host having connection
issues, I will try downgrading that system
to 20.04, and see if that changes
anything.<br>
</div>
</div>
</blockquote>
<br>
Unlikely, but who knows. We run tests (DUTs) on
Ubuntu 20.04, Ubuntu 22.04, Ubuntu 22.10, Ubuntu
23.04, Debian 11 and Fedora 38 every night.<br>
Right now Debian 11 is used for test engine in
nightly regressions.<br>
<br>
I'll double-check test engine on Ubuntu 20.04
and Ubuntu 22.04.<br>
<br>
<blockquote type="cite">
<div dir="ltr">
<div>I did try passing in the "--vg-rcf"
argument to the run.sh script of the test
suite after installing valgrind, but there
was no additional output that I saw.<br>
</div>
</div>
</blockquote>
<br>
Sorry, I should valgrind output should be in
valgrind.te_rcf (direction where you run test
engine).<br>
<br>
<blockquote type="cite">
<div dir="ltr">
<div><br>
</div>
<div>I will try pulling in the changes
you've pushed up, and will see if that
fixes anything.<br>
<br>
</div>
<div>Thanks,<br>
</div>
<div>Adam<br>
</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Fri,
Aug 25, 2023 at 9:57 AM Andrew Rybchenko
<<a href="mailto:andrew.rybchenko@oktetlabs.ru" target="_blank">andrew.rybchenko@oktetlabs.ru</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<div>Hello Adam, <br>
<br>
On 8/24/23 23:54, Andrew Rybchenko
wrote:<br>
</div>
<blockquote type="cite">I'd like to try
to repeat the problem locally. Which
Linux distro is running on test engine
and agents? <br>
<br>
In fact I know one problem with Debian
12 and Fedora 38 and we have <br>
patch in review to fix it, however,
the behaviour is different in <br>
this case, so it is unlike the same
problem. <br>
</blockquote>
<br>
I've just published a new tag which
fixes known test engine side problems on
Debian 12 and Fedora 38.<br>
<br>
<blockquote type="cite"> <br>
One more idea is to install valgrind
on the test engine host and <br>
run with option --vg-rcf to check if
something weird is happening. <br>
<br>
What I don't understand right now is
why I see just one failed attempt <br>
to connect in your log.txt and then
Logger shutdown after 9 minutes. <br>
<br>
Andrew. <br>
<br>
On 8/24/23 23:29, Adam Hassick wrote:
<br>
<blockquote type="cite"> > Is there
any firewall in the network or on
test hosts which could block
incoming TCP connection to the port
23571 <a href="http://iol-dts-tester.dpdklab.iol.unh.edu:23571" target="_blank"><http://iol-dts-tester.dpdklab.iol.unh.edu:23571></a>
from the host where you run test
engine? <br>
<br>
Our test engine host and the testbed
are on the same subnet. The
connection does work sometimes. <br>
<br>
> If behaviour the same on the
next try and you see that test agent
is kept running, could you check
using <br>
> <br>
> # netstat -tnlp <br>
> <br>
> that Test Agent is listening
on the port and try to establish TCP
connection from test agent using <br>
> <br>
> $ telnet <a href="http://iol-dts-tester.dpdklab.iol.unh.edu" target="_blank">iol-dts-tester.dpdklab.iol.unh.edu</a>
<a href="http://iol-dts-tester.dpdklab.iol.unh.edu:23571" target="_blank"><http://iol-dts-tester.dpdklab.iol.unh.edu:23571></a>
23571 <a href="http://iol-dts-tester.dpdklab.iol.unh.edu:23571" target="_blank"><http://iol-dts-tester.dpdklab.iol.unh.edu:23571></a>
<br>
> <br>
> and check if TCP connection
could be established. <br>
<br>
I was able to replicate the same
behavior again, where it hangs while
RCF is trying to start. <br>
Running this command, I see this in
the output: <br>
<br>
tcp 0 0 <a href="http://0.0.0.0:23571" target="_blank">0.0.0.0:23571</a>
<a href="http://0.0.0.0:23571" target="_blank"><http://0.0.0.0:23571></a>
0.0.0.0:* LISTEN
18599/ta <br>
<br>
So it seems like it is listening on
the correct port. <br>
Additionally, I was able to connect
to the Tester machine from our Test
Engine host using telnet. It printed
the PID of the process once the
connection was opened. <br>
<br>
I tried running the "ta" application
manually on the command line, and it
didn't print anything at all. <br>
Maybe the issue is something on the
Test Engine side. <br>
<br>
On Thu, Aug 24, 2023 at 2:35 PM
Andrew Rybchenko <<a href="mailto:andrew.rybchenko@oktetlabs.ru" target="_blank">andrew.rybchenko@oktetlabs.ru</a>
<a href="mailto:andrew.rybchenko@oktetlabs.ru" target="_blank"><mailto:andrew.rybchenko@oktetlabs.ru></a>>
wrote: <br>
<br>
Hi Adam, <br>
<br>
> On the tester host (which
appears to be the Peer agent), there
<br>
are four processes that I see
running, which look like the test <br>
agent processes. <br>
<br>
Before the next try I'd
recommend to kill these processes. <br>
<br>
Is there any firewall in the
network or on test hosts which could
<br>
block incoming TCP connection to
the port 23571 <br>
<a href="http://iol-dts-tester.dpdklab.iol.unh.edu:23571" target="_blank"><http://iol-dts-tester.dpdklab.iol.unh.edu:23571></a>
from the host <br>
where you run test engine? <br>
<br>
If behaviour the same on the
next try and you see that test agent
is <br>
kept running, could you check
using <br>
<br>
# netstat -tnlp <br>
<br>
that Test Agent is listening on
the port and try to establish TCP <br>
connection from test agent using
<br>
<br>
$ telnet <a href="http://iol-dts-tester.dpdklab.iol.unh.edu" target="_blank">iol-dts-tester.dpdklab.iol.unh.edu</a>
<br>
<a href="http://iol-dts-tester.dpdklab.iol.unh.edu:23571" target="_blank"><http://iol-dts-tester.dpdklab.iol.unh.edu:23571></a>
23571 <br>
<a href="http://iol-dts-tester.dpdklab.iol.unh.edu:23571" target="_blank"><http://iol-dts-tester.dpdklab.iol.unh.edu:23571></a>
<br>
<br>
and check if TCP connection
could be established. <br>
<br>
Another idea is to login Tester
under root as testing does, get <br>
start TA command from the log
and try it by hands without -n and <br>
remove extra escaping. <br>
<br>
# sudo
PATH=${PATH}:/tmp/linux_x86_root_76872_1692885663_1
<br>
LD_LIBRARY_PATH=${LD_LIBRARY_PATH}${LD_LIBRARY_PATH:+:}/tmp/linux_x86_root_76872_1692885663_1
/tmp/linux_x86_root_76872_1692885663_1/ta Peer 23571
host=iol-dts-tester.dpdklab.iol.unh.edu:port=23571:user=root:key=/opt/tsf/keys/id_ed25519:ssh_port=22:copy_timeout=15:kill_timeout=15:sudo=:shell=<br>
<br>
Hopefully in this case test
agent directory remains in the /tmp
and <br>
you don't need to copy it as
testing does. <br>
May be output could shed some
light on what's going on. <br>
<br>
Andrew. <br>
<br>
On 8/24/23 17:30, Adam Hassick
wrote: <br>
<blockquote type="cite"> Hi
Andrew, <br>
<br>
This is the output that I see
in the terminal when this failure
<br>
occurs, after the test agent
binaries build and the test engine
<br>
starts: <br>
<br>
Platform default build - pass
<br>
Simple RCF consistency check
succeeded <br>
--->>> Starting
Logger...done <br>
--->>> Starting
RCF...rcf_net_engine_connect():
Connection timed <br>
out <a href="http://iol-dts-tester.dpdklab.iol.unh.edu:23571" target="_blank">iol-dts-tester.dpdklab.iol.unh.edu:23571</a>
<br>
<a href="http://iol-dts-tester.dpdklab.iol.unh.edu:23571" target="_blank"><http://iol-dts-tester.dpdklab.iol.unh.edu:23571></a>
<br>
<br>
Then, it hangs here until I
kill the "te_rcf" and "te_tee" <br>
processes. I let it hang for
around 9 minutes. <br>
<br>
On the tester host (which
appears to be the Peer agent),
there are <br>
four processes that I see
running, which look like the test
agent <br>
processes. <br>
<br>
ta.Peer is an empty file. I've
attached the log.txt from this
run. <br>
<br>
- Adam <br>
<br>
On Thu, Aug 24, 2023 at
4:22 AM Andrew Rybchenko <br>
<<a href="mailto:andrew.rybchenko@oktetlabs.ru" target="_blank">andrew.rybchenko@oktetlabs.ru</a>
<br>
<a href="mailto:andrew.rybchenko@oktetlabs.ru" target="_blank"><mailto:andrew.rybchenko@oktetlabs.ru></a>>
wrote: <br>
<br>
Hi Adam, <br>
<br>
Yes, TE_RCFUNIX_TIMEOUT is
in seconds. I've double-checked <br>
that it goes to
'copy_timeout' in
ts-conf/rcf.conf. <br>
Description in in
doc/sphinx/pages/group_te_engine_rcf.rst
<br>
says that copy_timeout is
in seconds and implementation in <br>
lib/rcfunix/rcfunix.c
passes the value to select()
tv_sec. <br>
Theoretically select()
could be interrupted by signal,
but I <br>
think it is unlikely here.
<br>
<br>
I'm not sure that I
understand what do you mean by RCF
<br>
connection timeout. Does
it happen on TE startup when RCF <br>
starts test agents. If so,
TE_RCFUNIX_TIMEOUT could help. Or
<br>
does it happen when tests
are in progress, e.g. in the
middle <br>
of a test. If so,
TE_RCFUNIX_TIMEOUT is unrelated
and most <br>
likely either host with
test agent dies or test agent
itself <br>
crashes. It would be
easier for me if classify it if
you share <br>
text log (log.txt, full or
just corresponding fragment with <br>
some context). Also
content of ta.DPDK or ta.Peer file
<br>
depending on which agent
has problems could shed some
light. <br>
Corresponding files
contain stdout/stderr of test
agents. <br>
<br>
Andrew. <br>
<br>
On 8/23/23 17:45, Adam
Hassick wrote: <br>
<blockquote type="cite"> Hi
Andrew, <br>
<br>
I've set up a test rig
repository here, and have
created <br>
configurations for our
development testbed based off of
the <br>
examples. <br>
We've been able to get
the test suite to run manually
on <br>
Mellanox CX5 devices
once. <br>
However, we are running
into an issue where, when RCF
starts, <br>
the RCF connection times
out very frequently. We aren't
sure <br>
why this is the case. <br>
It works sometimes, but
most of the time when we try to
run <br>
the test engine, it
encounters this issue. <br>
I've tried changing the
RCF port by setting <br>
"TE_RCF_PORT=<some
port number>" and rebooting
the testbed <br>
machines. Neither seems
to fix the issue. <br>
<br>
It also seems like the
timeout takes far longer than 60
<br>
seconds, even when
running "export
TE_RCFUNIX_TIMEOUT=60" <br>
before I try to run the
test suite. <br>
I assume the unit for
this variable is seconds? <br>
<br>
Thanks, <br>
Adam <br>
<br>
On Mon, Aug 21, 2023 at
10:19 AM Adam Hassick <br>
<<a href="mailto:ahassick@iol.unh.edu" target="_blank">ahassick@iol.unh.edu</a>
<a href="mailto:ahassick@iol.unh.edu" target="_blank"><mailto:ahassick@iol.unh.edu></a>>
wrote: <br>
<br>
Hi Andrew, <br>
<br>
Thanks, I've cloned
the example repository and will
start <br>
setting up a
configuration for our
development testbed <br>
today. I'll let you
know if I run into any
difficulties <br>
or have any
questions. <br>
<br>
- Adam <br>
<br>
On Sun, Aug 20, 2023
at 4:40 AM Andrew Rybchenko <br>
<<a href="mailto:andrew.rybchenko@oktetlabs.ru" target="_blank">andrew.rybchenko@oktetlabs.ru</a>
<br>
<a href="mailto:andrew.rybchenko@oktetlabs.ru" target="_blank"><mailto:andrew.rybchenko@oktetlabs.ru></a>>
wrote: <br>
<br>
Hi Adam, <br>
<br>
I've published <br>
<a href="https://github.com/ts-factory/ts-rigs-sample" target="_blank">https://github.com/ts-factory/ts-rigs-sample</a>
<br>
<a href="https://github.com/ts-factory/ts-rigs-sample" target="_blank"><https://github.com/ts-factory/ts-rigs-sample></a>.
<br>
Hopefully it
will help to define your test
rigs and <br>
successfully run
some tests manually. Feel free
to <br>
ask any
questions and I'll answer here
and try to <br>
update
documentation. <br>
<br>
Meanwhile I'll
prepare missing bits for steps
(2) and <br>
(3). <br>
Hopefully
everything is in place for step
(4), but we <br>
need to make
steps (2) and (3) first. <br>
<br>
Andrew. <br>
<br>
On 8/18/23
21:40, Andrew Rybchenko wrote: <br>
<blockquote type="cite">
Hi Adam, <br>
<br>
> I've
conferred with the rest of the
team, and we <br>
think it would
be best to move forward with
mainly <br>
option B. <br>
<br>
OK, I'll
provide the sample on Monday
for you. It is <br>
almost ready
right now, but I need to
double-check <br>
it before
publishing. <br>
<br>
Regards, <br>
Andrew. <br>
<br>
On 8/17/23
20:03, Adam Hassick wrote: <br>
<blockquote type="cite">
Hi Andrew, <br>
<br>
I'm adding
the CI mailing list to this
<br>
conversation. Others in the
community might find <br>
this
conversation valuable. <br>
<br>
We do want
to run testing on a regular
basis. The <br>
Jenkins
integration will be very
useful for us, as <br>
most of our
CI is orchestrated by
Jenkins. <br>
I've
conferred with the rest of
the team, and we <br>
think it
would be best to move
forward with mainly <br>
option B. <br>
If you would
like to know anything about
our <br>
testbeds
that would help you with
creating an <br>
example
ts-rigs repo, I'd be happy
to answer any <br>
questions
you have. <br>
<br>
We have
multiple test rigs (we call
these <br>
"DUT-tester
pairs") that we run our
existing <br>
hardware
testing on, with differing
network <br>
hardware and
CPU architecture. I figured
this might <br>
be an
important detail. <br>
<br>
Thanks, <br>
Adam <br>
<br>
On Thu, Aug
17, 2023 at 11:44 AM Andrew
Rybchenko <br>
<<a href="mailto:andrew.rybchenko@oktetlabs.ru" target="_blank">andrew.rybchenko@oktetlabs.ru</a>
<br>
<a href="mailto:andrew.rybchenko@oktetlabs.ru" target="_blank"><mailto:andrew.rybchenko@oktetlabs.ru></a>>
wrote: <br>
<br>
Greatings Adam, <br>
<br>
I'm
happy to hear that you're
trying to bring <br>
it up. <br>
<br>
As I
understand the final goal is
to run it on <br>
regular
basis. So, we need to make
it properly <br>
from the
very beginning. <br>
Bring up
of all features consists of
4 steps: <br>
<br>
1.
Create site-specific
repository (we call it <br>
ts-rigs)
which contains information
about test <br>
rigs and
other site-specific
information like <br>
where to
send mails, where to store
logs etc. <br>
It is
required for manual
execution as well, <br>
since
test rigs description is
essential. I'll <br>
return
to the topic below. <br>
<br>
2. Setup
logs storage for automated
runs. <br>
Basically it is a disk space
plus apache2 web <br>
server
with few CGI scripts which
help a lot to <br>
save
disk space. <br>
<br>
3. Setup
Bublik web application which
provides <br>
web
interface to view testing
results. Same as <br>
<a href="https://ts-factory.io/bublik" target="_blank">https://ts-factory.io/bublik</a>
<br>
<a href="https://ts-factory.io/bublik" target="_blank"><https://ts-factory.io/bublik></a>
<br>
<br>
4. Setup
Jenkins to run tests on
regularly, <br>
save
logs in log storage (2) and
import it to <br>
bublik
(3). <br>
<br>
Last few
month we spent on our
homework to make <br>
it
simpler to bring up
automated execution <br>
using
Jenkins - <br>
<a href="https://github.com/ts-factory/te-jenkins" target="_blank">https://github.com/ts-factory/te-jenkins</a>
<br>
<a href="https://github.com/ts-factory/te-jenkins" target="_blank"><https://github.com/ts-factory/te-jenkins></a>
<br>
Corresponding bits in
dpdk-ethdev-ts will be <br>
available tomorrow. <br>
<br>
Let's
return to the step (1). <br>
<br>
Unfortunately there is no
publicly available <br>
example
of the ts-rigs repository
since <br>
sensitive site-specific
information is located <br>
there.
But I'm ready to help you to
create it <br>
for UNH.
I see two options here: <br>
<br>
(A) I'll
ask questions and based on
your <br>
answers
will create the first draft
with my <br>
comments. <br>
<br>
(B) I'll
make a template/example
ts-rigs repo, <br>
publish
it and you'll create UNH
ts-rigs based <br>
on it. <br>
<br>
Of
course, I'll help to debug
and finally bring <br>
it up in
any case. <br>
<br>
(A) is a
bit simpler for me and you,
but (B) is <br>
a bit
more generic and will help
other <br>
potential users to bring it
up. <br>
We can
combine (A)+(B). I.e. start
from (A). <br>
What do
you think? <br>
<br>
Thanks,
<br>
Andrew.
<br>
<br>
On
8/17/23 15:18, Konstantin
Ushakov wrote: <br>
<blockquote type="cite">
Greetings Adam, <br>
<br>
<br>
Thanks
for contacting us. I copy
Andrew who <br>
would
be happy to help <br>
<br>
Thanks, <br>
Konstantin <br>
<br>
<blockquote type="cite">
On 16 Aug 2023, at
21:50, Adam Hassick <br>
<a href="mailto:ahassick@iol.unh.edu" target="_blank"><ahassick@iol.unh.edu></a>
<br>
<a href="mailto:ahassick@iol.unh.edu" target="_blank"><mailto:ahassick@iol.unh.edu></a>
wrote: <br>
<br>
<br>
Greetings Konstantin, <br>
<br>
I am
in the process of
setting up the DPDK <br>
Poll
Mode Driver test suite
as an addition to <br>
our
testing coverage for
DPDK at the UNH lab. <br>
<br>
I
have some questions
about how to set the <br>
test
suite arguments. <br>
<br>
I
have been able to
configure the Test
Engine <br>
to
connect to the hosts in
the testbed. The <br>
RCF,
Configurator, and Tester
all begin to <br>
run,
however the prelude of
the test suite <br>
fails to run. <br>
<br>
<a href="https://ts-factory.io/doc/dpdk-ethdev-ts/index.html#test-parameters" target="_blank">https://ts-factory.io/doc/dpdk-ethdev-ts/index.html#test-parameters</a>
<a href="https://ts-factory.io/doc/dpdk-ethdev-ts/index.html#test-parameters" target="_blank"><https://ts-factory.io/doc/dpdk-ethdev-ts/index.html#test-parameters></a>
<br>
<br>
The
documentation mentions
that there are <br>
several test parameters
for the test suite, <br>
like
for the IUT test link
MAC, etc. These <br>
seem
like they would need to
be set somewhere <br>
to
run many of the tests. <br>
<br>
I
see in the Test Engine
documentation, there <br>
are
instructions on how to
create new <br>
parameters for test
suites in the Tester <br>
configuration, but there
is nothing in the <br>
user
guide or in the Tester
guide for how to <br>
set
the arguments for the
parameters when <br>
running the test suite
that I can find. I'm <br>
not
sure if I need to write
my own Tester <br>
config, or if I should
be setting these in <br>
some
other way. <br>
<br>
How
should these values be
set? <br>
<br>
I'm
also not sure what
environment <br>
variables/arguments are
strictly necessary or <br>
which are optional. <br>
<br>
Regards, <br>
Adam
<br>
<br>
--
*Adam Hassick* <br>
Senior Developer <br>
UNH
InterOperability Lab <br>
<a href="mailto:ahassick@iol.unh.edu" target="_blank">ahassick@iol.unh.edu</a> <br>
<a href="mailto:ahassick@iol.unh.edu" target="_blank"><mailto:ahassick@iol.unh.edu></a>
<br>
<a href="http://iol.unh.edu" target="_blank">iol.unh.edu</a>
<a href="https://www.iol.unh.edu/" target="_blank"><https://www.iol.unh.edu/></a>
<br>
+1
(603) 475-8248 <br>
</blockquote>
</blockquote>
<br>
<br>
<br>
--
*Adam
Hassick* <br>
Senior
Developer <br>
UNH
InterOperability Lab <br>
<a href="mailto:ahassick@iol.unh.edu" target="_blank">ahassick@iol.unh.edu</a>
<a href="mailto:ahassick@iol.unh.edu" target="_blank"><mailto:ahassick@iol.unh.edu></a>
<br>
<a href="http://iol.unh.edu" target="_blank">iol.unh.edu</a>
<a href="https://www.iol.unh.edu/" target="_blank"><https://www.iol.unh.edu/></a>
<br>
+1 (603)
475-8248 <br>
</blockquote>
<br>
</blockquote>
<br>
<br>
<br>
-- *Adam
Hassick* <br>
Senior Developer <br>
UNH InterOperability
Lab <br>
<a href="mailto:ahassick@iol.unh.edu" target="_blank">ahassick@iol.unh.edu</a>
<a href="mailto:ahassick@iol.unh.edu" target="_blank"><mailto:ahassick@iol.unh.edu></a>
<br>
<a href="http://iol.unh.edu" target="_blank">iol.unh.edu</a>
<a href="https://www.iol.unh.edu/" target="_blank"><https://www.iol.unh.edu/></a>
<br>
+1 (603) 475-8248 <br>
<br>
<br>
<br>
-- *Adam
Hassick* <br>
Senior Developer <br>
UNH InterOperability Lab
<br>
<a href="mailto:ahassick@iol.unh.edu" target="_blank">ahassick@iol.unh.edu</a>
<a href="mailto:ahassick@iol.unh.edu" target="_blank"><mailto:ahassick@iol.unh.edu></a>
<br>
<a href="http://iol.unh.edu" target="_blank">iol.unh.edu</a>
<a href="https://www.iol.unh.edu/" target="_blank"><https://www.iol.unh.edu/></a>
<br>
+1 (603) 475-8248 <br>
</blockquote>
<br>
<br>
<br>
-- *Adam Hassick* <br>
Senior Developer <br>
UNH InterOperability Lab <br>
<a href="mailto:ahassick@iol.unh.edu" target="_blank">ahassick@iol.unh.edu</a>
<a href="mailto:ahassick@iol.unh.edu" target="_blank"><mailto:ahassick@iol.unh.edu></a>
<br>
<a href="http://iol.unh.edu" target="_blank">iol.unh.edu</a>
<a href="https://www.iol.unh.edu/" target="_blank"><https://www.iol.unh.edu/></a>
<br>
+1 (603) 475-8248 <br>
</blockquote>
<br>
<br>
<br>
-- <br>
*Adam Hassick* <br>
Senior Developer <br>
UNH InterOperability Lab <br>
<a href="mailto:ahassick@iol.unh.edu" target="_blank">ahassick@iol.unh.edu</a>
<a href="mailto:ahassick@iol.unh.edu" target="_blank"><mailto:ahassick@iol.unh.edu></a>
<br>
<a href="http://iol.unh.edu" target="_blank">iol.unh.edu</a>
<a href="https://www.iol.unh.edu/" target="_blank"><https://www.iol.unh.edu/></a>
<br>
+1 (603) 475-8248 <br>
</blockquote>
<br>
</blockquote>
<br>
</div>
</blockquote>
</div>
<br clear="all">
<br>
<span class="gmail_signature_prefix">-- </span><br>
<div dir="ltr" class="gmail_signature">
<div dir="ltr">
<div>
<div><b><span style="background-color:rgb(255,255,255)"><span style="color:rgb(102,102,102)">Adam
Hassick</span></span></b><br>
</div>
<span style="color:rgb(102,102,102)"></span></div>
<div><span style="color:rgb(102,102,102)">Senior
Developer</span></div>
<div><span style="color:rgb(102,102,102)"><span style="color:rgb(11,83,148)"><span style="background-color:rgb(255,255,255)">UNH
InterOperability Lab</span></span></span><span style="color:rgb(102,102,102)"></span></div>
<div><span style="color:rgb(102,102,102)"><a href="mailto:ahassick@iol.unh.edu" target="_blank">ahassick@iol.unh.edu</a><br>
</span></div>
<div><span style="color:rgb(102,102,102)"><a href="https://www.iol.unh.edu/" target="_blank">iol.unh.edu</a><br>
</span></div>
+1 (603) 475-8248<br>
</div>
</div>
</blockquote>
<br>
</blockquote>
<br>
</div>
</blockquote>
</div>
<br clear="all">
<br>
<span class="gmail_signature_prefix">-- </span><br>
<div dir="ltr" class="gmail_signature">
<div dir="ltr">
<div>
<div><b><span style="background-color:rgb(255,255,255)"><span style="color:rgb(102,102,102)">Adam
Hassick</span></span></b><br>
</div>
<span style="color:rgb(102,102,102)"></span></div>
<div><span style="color:rgb(102,102,102)">Senior
Developer</span></div>
<div><span style="color:rgb(102,102,102)"><span style="color:rgb(11,83,148)"><span style="background-color:rgb(255,255,255)">UNH
InterOperability Lab</span></span></span><span style="color:rgb(102,102,102)"></span></div>
<div><span style="color:rgb(102,102,102)"><a href="mailto:ahassick@iol.unh.edu" target="_blank">ahassick@iol.unh.edu</a><br>
</span></div>
<div><span style="color:rgb(102,102,102)"><a href="https://www.iol.unh.edu/" target="_blank">iol.unh.edu</a><br>
</span></div>
+1 (603) 475-8248<br>
</div>
</div>
</blockquote>
<br>
</blockquote>
<br>
</div>
</blockquote>
</div>
<br clear="all">
<br>
<span class="gmail_signature_prefix">-- </span><br>
<div dir="ltr" class="gmail_signature">
<div dir="ltr">
<div>
<div><b><span style="background-color:rgb(255,255,255)"><span style="color:rgb(102,102,102)">Adam Hassick</span></span></b><br>
</div>
<span style="color:rgb(102,102,102)"></span></div>
<div><span style="color:rgb(102,102,102)">Senior Developer</span></div>
<div><span style="color:rgb(102,102,102)"><span style="color:rgb(11,83,148)"><span style="background-color:rgb(255,255,255)">UNH
InterOperability Lab</span></span></span><span style="color:rgb(102,102,102)"></span></div>
<div><span style="color:rgb(102,102,102)"><a href="mailto:ahassick@iol.unh.edu" target="_blank">ahassick@iol.unh.edu</a><br>
</span></div>
<div><span style="color:rgb(102,102,102)"><a href="https://www.iol.unh.edu/" target="_blank">iol.unh.edu</a><br>
</span></div>
+1 (603) 475-8248<br>
</div>
</div>
</blockquote>
<br>
</div>
</blockquote></div></div></div>