<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<div class="moz-cite-prefix">Hello Adam, <br>
<br>
On 8/24/23 23:54, Andrew Rybchenko wrote:<br>
</div>
<blockquote type="cite"
cite="mid:873c7972-3e5a-9e82-9449-4d12b2c96032@oktetlabs.ru">I'd
like to try to repeat the problem locally. Which Linux distro is
running on test engine and agents?
<br>
<br>
In fact I know one problem with Debian 12 and Fedora 38 and we
have
<br>
patch in review to fix it, however, the behaviour is different in
<br>
this case, so it is unlike the same problem.
<br>
</blockquote>
<br>
I've just published a new tag which fixes known test engine side
problems on Debian 12 and Fedora 38.<br>
<br>
<blockquote type="cite"
cite="mid:873c7972-3e5a-9e82-9449-4d12b2c96032@oktetlabs.ru">
<br>
One more idea is to install valgrind on the test engine host and
<br>
run with option --vg-rcf to check if something weird is happening.
<br>
<br>
What I don't understand right now is why I see just one failed
attempt
<br>
to connect in your log.txt and then Logger shutdown after 9
minutes.
<br>
<br>
Andrew.
<br>
<br>
On 8/24/23 23:29, Adam Hassick wrote:
<br>
<blockquote type="cite"> > Is there any firewall in the network
or on test hosts which could block incoming TCP connection to
the port 23571
<a class="moz-txt-link-rfc2396E" href="http://iol-dts-tester.dpdklab.iol.unh.edu:23571"><http://iol-dts-tester.dpdklab.iol.unh.edu:23571></a> from the
host where you run test engine?
<br>
<br>
Our test engine host and the testbed are on the same subnet. The
connection does work sometimes.
<br>
<br>
> If behaviour the same on the next try and you see that
test agent is kept running, could you check using
<br>
>
<br>
> # netstat -tnlp
<br>
>
<br>
> that Test Agent is listening on the port and try to
establish TCP connection from test agent using
<br>
>
<br>
> $ telnet iol-dts-tester.dpdklab.iol.unh.edu
<a class="moz-txt-link-rfc2396E" href="http://iol-dts-tester.dpdklab.iol.unh.edu:23571"><http://iol-dts-tester.dpdklab.iol.unh.edu:23571></a> 23571
<a class="moz-txt-link-rfc2396E" href="http://iol-dts-tester.dpdklab.iol.unh.edu:23571"><http://iol-dts-tester.dpdklab.iol.unh.edu:23571></a>
<br>
>
<br>
> and check if TCP connection could be established.
<br>
<br>
I was able to replicate the same behavior again, where it hangs
while RCF is trying to start.
<br>
Running this command, I see this in the output:
<br>
<br>
tcp 0 0 0.0.0.0:23571
<a class="moz-txt-link-rfc2396E" href="http://0.0.0.0:23571"><http://0.0.0.0:23571></a> 0.0.0.0:*
LISTEN 18599/ta
<br>
<br>
So it seems like it is listening on the correct port.
<br>
Additionally, I was able to connect to the Tester machine from
our Test Engine host using telnet. It printed the PID of the
process once the connection was opened.
<br>
<br>
I tried running the "ta" application manually on the command
line, and it didn't print anything at all.
<br>
Maybe the issue is something on the Test Engine side.
<br>
<br>
On Thu, Aug 24, 2023 at 2:35 PM Andrew Rybchenko
<<a class="moz-txt-link-abbreviated" href="mailto:andrew.rybchenko@oktetlabs.ru">andrew.rybchenko@oktetlabs.ru</a>
<a class="moz-txt-link-rfc2396E" href="mailto:andrew.rybchenko@oktetlabs.ru"><mailto:andrew.rybchenko@oktetlabs.ru></a>> wrote:
<br>
<br>
Hi Adam,
<br>
<br>
> On the tester host (which appears to be the Peer
agent), there
<br>
are four processes that I see running, which look like the
test
<br>
agent processes.
<br>
<br>
Before the next try I'd recommend to kill these processes.
<br>
<br>
Is there any firewall in the network or on test hosts which
could
<br>
block incoming TCP connection to the port 23571
<br>
<a class="moz-txt-link-rfc2396E" href="http://iol-dts-tester.dpdklab.iol.unh.edu:23571"><http://iol-dts-tester.dpdklab.iol.unh.edu:23571></a> from
the host
<br>
where you run test engine?
<br>
<br>
If behaviour the same on the next try and you see that test
agent is
<br>
kept running, could you check using
<br>
<br>
# netstat -tnlp
<br>
<br>
that Test Agent is listening on the port and try to
establish TCP
<br>
connection from test agent using
<br>
<br>
$ telnet iol-dts-tester.dpdklab.iol.unh.edu
<br>
<a class="moz-txt-link-rfc2396E" href="http://iol-dts-tester.dpdklab.iol.unh.edu:23571"><http://iol-dts-tester.dpdklab.iol.unh.edu:23571></a>
23571
<br>
<a class="moz-txt-link-rfc2396E" href="http://iol-dts-tester.dpdklab.iol.unh.edu:23571"><http://iol-dts-tester.dpdklab.iol.unh.edu:23571></a>
<br>
<br>
and check if TCP connection could be established.
<br>
<br>
Another idea is to login Tester under root as testing does,
get
<br>
start TA command from the log and try it by hands without -n
and
<br>
remove extra escaping.
<br>
<br>
# sudo PATH=${PATH}:/tmp/linux_x86_root_76872_1692885663_1
<br>
LD_LIBRARY_PATH=${LD_LIBRARY_PATH}${LD_LIBRARY_PATH:+:}/tmp/linux_x86_root_76872_1692885663_1
/tmp/linux_x86_root_76872_1692885663_1/ta Peer 23571
host=iol-dts-tester.dpdklab.iol.unh.edu:port=23571:user=root:key=/opt/tsf/keys/id_ed25519:ssh_port=22:copy_timeout=15:kill_timeout=15:sudo=:shell=<br>
<br>
Hopefully in this case test agent directory remains in the
/tmp and
<br>
you don't need to copy it as testing does.
<br>
May be output could shed some light on what's going on.
<br>
<br>
Andrew.
<br>
<br>
On 8/24/23 17:30, Adam Hassick wrote:
<br>
<blockquote type="cite"> Hi Andrew,
<br>
<br>
This is the output that I see in the terminal when this
failure
<br>
occurs, after the test agent binaries build and the test
engine
<br>
starts:
<br>
<br>
Platform default build - pass
<br>
Simple RCF consistency check succeeded
<br>
--->>> Starting Logger...done
<br>
--->>> Starting RCF...rcf_net_engine_connect():
Connection timed
<br>
out iol-dts-tester.dpdklab.iol.unh.edu:23571
<br>
<a class="moz-txt-link-rfc2396E" href="http://iol-dts-tester.dpdklab.iol.unh.edu:23571"><http://iol-dts-tester.dpdklab.iol.unh.edu:23571></a>
<br>
<br>
Then, it hangs here until I kill the "te_rcf" and "te_tee"
<br>
processes. I let it hang for around 9 minutes.
<br>
<br>
On the tester host (which appears to be the Peer agent),
there are
<br>
four processes that I see running, which look like the
test agent
<br>
processes.
<br>
<br>
ta.Peer is an empty file. I've attached the log.txt from
this run.
<br>
<br>
- Adam
<br>
<br>
On Thu, Aug 24, 2023 at 4:22 AM Andrew Rybchenko
<br>
<<a class="moz-txt-link-abbreviated" href="mailto:andrew.rybchenko@oktetlabs.ru">andrew.rybchenko@oktetlabs.ru</a>
<br>
<a class="moz-txt-link-rfc2396E" href="mailto:andrew.rybchenko@oktetlabs.ru"><mailto:andrew.rybchenko@oktetlabs.ru></a>> wrote:
<br>
<br>
Hi Adam,
<br>
<br>
Yes, TE_RCFUNIX_TIMEOUT is in seconds. I've
double-checked
<br>
that it goes to 'copy_timeout' in ts-conf/rcf.conf.
<br>
Description in in
doc/sphinx/pages/group_te_engine_rcf.rst
<br>
says that copy_timeout is in seconds and
implementation in
<br>
lib/rcfunix/rcfunix.c passes the value to select()
tv_sec.
<br>
Theoretically select() could be interrupted by signal,
but I
<br>
think it is unlikely here.
<br>
<br>
I'm not sure that I understand what do you mean by RCF
<br>
connection timeout. Does it happen on TE startup when
RCF
<br>
starts test agents. If so, TE_RCFUNIX_TIMEOUT could
help. Or
<br>
does it happen when tests are in progress, e.g. in the
middle
<br>
of a test. If so, TE_RCFUNIX_TIMEOUT is unrelated and
most
<br>
likely either host with test agent dies or test agent
itself
<br>
crashes. It would be easier for me if classify it if
you share
<br>
text log (log.txt, full or just corresponding fragment
with
<br>
some context). Also content of ta.DPDK or ta.Peer file
<br>
depending on which agent has problems could shed some
light.
<br>
Corresponding files contain stdout/stderr of test
agents.
<br>
<br>
Andrew.
<br>
<br>
On 8/23/23 17:45, Adam Hassick wrote:
<br>
<blockquote type="cite"> Hi Andrew,
<br>
<br>
I've set up a test rig repository here, and have
created
<br>
configurations for our development testbed based off
of the
<br>
examples.
<br>
We've been able to get the test suite to run
manually on
<br>
Mellanox CX5 devices once.
<br>
However, we are running into an issue where, when
RCF starts,
<br>
the RCF connection times out very frequently. We
aren't sure
<br>
why this is the case.
<br>
It works sometimes, but most of the time when we try
to run
<br>
the test engine, it encounters this issue.
<br>
I've tried changing the RCF port by setting
<br>
"TE_RCF_PORT=<some port number>" and rebooting
the testbed
<br>
machines. Neither seems to fix the issue.
<br>
<br>
It also seems like the timeout takes far longer than
60
<br>
seconds, even when running "export
TE_RCFUNIX_TIMEOUT=60"
<br>
before I try to run the test suite.
<br>
I assume the unit for this variable is seconds?
<br>
<br>
Thanks,
<br>
Adam
<br>
<br>
On Mon, Aug 21, 2023 at 10:19 AM Adam Hassick
<br>
<<a class="moz-txt-link-abbreviated" href="mailto:ahassick@iol.unh.edu">ahassick@iol.unh.edu</a>
<a class="moz-txt-link-rfc2396E" href="mailto:ahassick@iol.unh.edu"><mailto:ahassick@iol.unh.edu></a>> wrote:
<br>
<br>
Hi Andrew,
<br>
<br>
Thanks, I've cloned the example repository and
will start
<br>
setting up a configuration for our development
testbed
<br>
today. I'll let you know if I run into any
difficulties
<br>
or have any questions.
<br>
<br>
- Adam
<br>
<br>
On Sun, Aug 20, 2023 at 4:40 AM Andrew Rybchenko
<br>
<<a class="moz-txt-link-abbreviated" href="mailto:andrew.rybchenko@oktetlabs.ru">andrew.rybchenko@oktetlabs.ru</a>
<br>
<a class="moz-txt-link-rfc2396E" href="mailto:andrew.rybchenko@oktetlabs.ru"><mailto:andrew.rybchenko@oktetlabs.ru></a>>
wrote:
<br>
<br>
Hi Adam,
<br>
<br>
I've published
<br>
<a class="moz-txt-link-freetext" href="https://github.com/ts-factory/ts-rigs-sample">https://github.com/ts-factory/ts-rigs-sample</a>
<br>
<a class="moz-txt-link-rfc2396E" href="https://github.com/ts-factory/ts-rigs-sample"><https://github.com/ts-factory/ts-rigs-sample></a>.
<br>
Hopefully it will help to define your test
rigs and
<br>
successfully run some tests manually. Feel
free to
<br>
ask any questions and I'll answer here and
try to
<br>
update documentation.
<br>
<br>
Meanwhile I'll prepare missing bits for
steps (2) and
<br>
(3).
<br>
Hopefully everything is in place for step
(4), but we
<br>
need to make steps (2) and (3) first.
<br>
<br>
Andrew.
<br>
<br>
On 8/18/23 21:40, Andrew Rybchenko wrote:
<br>
<blockquote type="cite"> Hi Adam,
<br>
<br>
> I've conferred with the rest of the
team, and we
<br>
think it would be best to move forward
with mainly
<br>
option B.
<br>
<br>
OK, I'll provide the sample on Monday for
you. It is
<br>
almost ready right now, but I need to
double-check
<br>
it before publishing.
<br>
<br>
Regards,
<br>
Andrew.
<br>
<br>
On 8/17/23 20:03, Adam Hassick wrote:
<br>
<blockquote type="cite"> Hi Andrew,
<br>
<br>
I'm adding the CI mailing list to this
<br>
conversation. Others in the community
might find
<br>
this conversation valuable.
<br>
<br>
We do want to run testing on a regular
basis. The
<br>
Jenkins integration will be very useful
for us, as
<br>
most of our CI is orchestrated by
Jenkins.
<br>
I've conferred with the rest of the
team, and we
<br>
think it would be best to move forward
with mainly
<br>
option B.
<br>
If you would like to know anything about
our
<br>
testbeds that would help you with
creating an
<br>
example ts-rigs repo, I'd be happy to
answer any
<br>
questions you have.
<br>
<br>
We have multiple test rigs (we call
these
<br>
"DUT-tester pairs") that we run our
existing
<br>
hardware testing on, with differing
network
<br>
hardware and CPU architecture. I figured
this might
<br>
be an important detail.
<br>
<br>
Thanks,
<br>
Adam
<br>
<br>
On Thu, Aug 17, 2023 at 11:44 AM Andrew
Rybchenko
<br>
<<a class="moz-txt-link-abbreviated" href="mailto:andrew.rybchenko@oktetlabs.ru">andrew.rybchenko@oktetlabs.ru</a>
<br>
<a class="moz-txt-link-rfc2396E" href="mailto:andrew.rybchenko@oktetlabs.ru"><mailto:andrew.rybchenko@oktetlabs.ru></a>> wrote:
<br>
<br>
Greatings Adam,
<br>
<br>
I'm happy to hear that you're trying
to bring
<br>
it up.
<br>
<br>
As I understand the final goal is to
run it on
<br>
regular basis. So, we need to make
it properly
<br>
from the very beginning.
<br>
Bring up of all features consists of
4 steps:
<br>
<br>
1. Create site-specific repository
(we call it
<br>
ts-rigs) which contains information
about test
<br>
rigs and other site-specific
information like
<br>
where to send mails, where to store
logs etc.
<br>
It is required for manual execution
as well,
<br>
since test rigs description is
essential. I'll
<br>
return to the topic below.
<br>
<br>
2. Setup logs storage for automated
runs.
<br>
Basically it is a disk space plus
apache2 web
<br>
server with few CGI scripts which
help a lot to
<br>
save disk space.
<br>
<br>
3. Setup Bublik web application
which provides
<br>
web interface to view testing
results. Same as
<br>
<a class="moz-txt-link-freetext" href="https://ts-factory.io/bublik">https://ts-factory.io/bublik</a>
<br>
<a class="moz-txt-link-rfc2396E" href="https://ts-factory.io/bublik"><https://ts-factory.io/bublik></a>
<br>
<br>
4. Setup Jenkins to run tests on
regularly,
<br>
save logs in log storage (2) and
import it to
<br>
bublik (3).
<br>
<br>
Last few month we spent on our
homework to make
<br>
it simpler to bring up automated
execution
<br>
using Jenkins -
<br>
<a class="moz-txt-link-freetext" href="https://github.com/ts-factory/te-jenkins">https://github.com/ts-factory/te-jenkins</a>
<br>
<a class="moz-txt-link-rfc2396E" href="https://github.com/ts-factory/te-jenkins"><https://github.com/ts-factory/te-jenkins></a>
<br>
Corresponding bits in dpdk-ethdev-ts
will be
<br>
available tomorrow.
<br>
<br>
Let's return to the step (1).
<br>
<br>
Unfortunately there is no publicly
available
<br>
example of the ts-rigs repository
since
<br>
sensitive site-specific information
is located
<br>
there. But I'm ready to help you to
create it
<br>
for UNH. I see two options here:
<br>
<br>
(A) I'll ask questions and based on
your
<br>
answers will create the first draft
with my
<br>
comments.
<br>
<br>
(B) I'll make a template/example
ts-rigs repo,
<br>
publish it and you'll create UNH
ts-rigs based
<br>
on it.
<br>
<br>
Of course, I'll help to debug and
finally bring
<br>
it up in any case.
<br>
<br>
(A) is a bit simpler for me and you,
but (B) is
<br>
a bit more generic and will help
other
<br>
potential users to bring it up.
<br>
We can combine (A)+(B). I.e. start
from (A).
<br>
What do you think?
<br>
<br>
Thanks,
<br>
Andrew.
<br>
<br>
On 8/17/23 15:18, Konstantin Ushakov
wrote:
<br>
<blockquote type="cite"> Greetings
Adam,
<br>
<br>
<br>
Thanks for contacting us. I copy
Andrew who
<br>
would be happy to help
<br>
<br>
Thanks,
<br>
Konstantin
<br>
<br>
<blockquote type="cite"> On 16 Aug
2023, at 21:50, Adam Hassick
<br>
<a class="moz-txt-link-rfc2396E" href="mailto:ahassick@iol.unh.edu"><ahassick@iol.unh.edu></a>
<br>
<a class="moz-txt-link-rfc2396E" href="mailto:ahassick@iol.unh.edu"><mailto:ahassick@iol.unh.edu></a> wrote:
<br>
<br>
<br>
Greetings Konstantin,
<br>
<br>
I am in the process of setting
up the DPDK
<br>
Poll Mode Driver test suite as
an addition to
<br>
our testing coverage for DPDK at
the UNH lab.
<br>
<br>
I have some questions about how
to set the
<br>
test suite arguments.
<br>
<br>
I have been able to configure
the Test Engine
<br>
to connect to the hosts in the
testbed. The
<br>
RCF, Configurator, and Tester
all begin to
<br>
run, however the prelude of the
test suite
<br>
fails to run.
<br>
<br>
<a class="moz-txt-link-freetext" href="https://ts-factory.io/doc/dpdk-ethdev-ts/index.html#test-parameters">https://ts-factory.io/doc/dpdk-ethdev-ts/index.html#test-parameters</a>
<a class="moz-txt-link-rfc2396E" href="https://ts-factory.io/doc/dpdk-ethdev-ts/index.html#test-parameters"><https://ts-factory.io/doc/dpdk-ethdev-ts/index.html#test-parameters></a>
<br>
<br>
The documentation mentions that
there are
<br>
several test parameters for the
test suite,
<br>
like for the IUT test link MAC,
etc. These
<br>
seem like they would need to be
set somewhere
<br>
to run many of the tests.
<br>
<br>
I see in the Test Engine
documentation, there
<br>
are instructions on how to
create new
<br>
parameters for test suites in
the Tester
<br>
configuration, but there is
nothing in the
<br>
user guide or in the Tester
guide for how to
<br>
set the arguments for the
parameters when
<br>
running the test suite that I
can find. I'm
<br>
not sure if I need to write my
own Tester
<br>
config, or if I should be
setting these in
<br>
some other way.
<br>
<br>
How should these values be set?
<br>
<br>
I'm also not sure what
environment
<br>
variables/arguments are strictly
necessary or
<br>
which are optional.
<br>
<br>
Regards,
<br>
Adam
<br>
<br>
-- *Adam
Hassick*
<br>
Senior Developer
<br>
UNH InterOperability Lab
<br>
<a class="moz-txt-link-abbreviated" href="mailto:ahassick@iol.unh.edu">ahassick@iol.unh.edu</a>
<br>
<a class="moz-txt-link-rfc2396E" href="mailto:ahassick@iol.unh.edu"><mailto:ahassick@iol.unh.edu></a>
<br>
iol.unh.edu
<a class="moz-txt-link-rfc2396E" href="https://www.iol.unh.edu/"><https://www.iol.unh.edu/></a>
<br>
+1 (603) 475-8248
<br>
</blockquote>
</blockquote>
<br>
<br>
<br>
-- *Adam Hassick*
<br>
Senior Developer
<br>
UNH InterOperability Lab
<br>
<a class="moz-txt-link-abbreviated" href="mailto:ahassick@iol.unh.edu">ahassick@iol.unh.edu</a>
<a class="moz-txt-link-rfc2396E" href="mailto:ahassick@iol.unh.edu"><mailto:ahassick@iol.unh.edu></a>
<br>
iol.unh.edu
<a class="moz-txt-link-rfc2396E" href="https://www.iol.unh.edu/"><https://www.iol.unh.edu/></a>
<br>
+1 (603) 475-8248
<br>
</blockquote>
<br>
</blockquote>
<br>
<br>
<br>
-- *Adam Hassick*
<br>
Senior Developer
<br>
UNH InterOperability Lab
<br>
<a class="moz-txt-link-abbreviated" href="mailto:ahassick@iol.unh.edu">ahassick@iol.unh.edu</a>
<a class="moz-txt-link-rfc2396E" href="mailto:ahassick@iol.unh.edu"><mailto:ahassick@iol.unh.edu></a>
<br>
iol.unh.edu <a class="moz-txt-link-rfc2396E" href="https://www.iol.unh.edu/"><https://www.iol.unh.edu/></a>
<br>
+1 (603) 475-8248
<br>
<br>
<br>
<br>
-- *Adam Hassick*
<br>
Senior Developer
<br>
UNH InterOperability Lab
<br>
<a class="moz-txt-link-abbreviated" href="mailto:ahassick@iol.unh.edu">ahassick@iol.unh.edu</a>
<a class="moz-txt-link-rfc2396E" href="mailto:ahassick@iol.unh.edu"><mailto:ahassick@iol.unh.edu></a>
<br>
iol.unh.edu <a class="moz-txt-link-rfc2396E" href="https://www.iol.unh.edu/"><https://www.iol.unh.edu/></a>
<br>
+1 (603) 475-8248
<br>
</blockquote>
<br>
<br>
<br>
-- *Adam Hassick*
<br>
Senior Developer
<br>
UNH InterOperability Lab
<br>
<a class="moz-txt-link-abbreviated" href="mailto:ahassick@iol.unh.edu">ahassick@iol.unh.edu</a> <a class="moz-txt-link-rfc2396E" href="mailto:ahassick@iol.unh.edu"><mailto:ahassick@iol.unh.edu></a>
<br>
iol.unh.edu <a class="moz-txt-link-rfc2396E" href="https://www.iol.unh.edu/"><https://www.iol.unh.edu/></a>
<br>
+1 (603) 475-8248
<br>
</blockquote>
<br>
<br>
<br>
-- <br>
*Adam Hassick*
<br>
Senior Developer
<br>
UNH InterOperability Lab
<br>
<a class="moz-txt-link-abbreviated" href="mailto:ahassick@iol.unh.edu">ahassick@iol.unh.edu</a> <a class="moz-txt-link-rfc2396E" href="mailto:ahassick@iol.unh.edu"><mailto:ahassick@iol.unh.edu></a>
<br>
iol.unh.edu <a class="moz-txt-link-rfc2396E" href="https://www.iol.unh.edu/"><https://www.iol.unh.edu/></a>
<br>
+1 (603) 475-8248
<br>
</blockquote>
<br>
</blockquote>
<br>
</body>
</html>