Setting up DPDK PMD Test Suite

Adam Hassick ahassick at iol.unh.edu
Fri Aug 25 16:06:00 CEST 2023


Hi Andrew,

Two of our systems (the Test Engine runner and the DUT host) are running
Ubuntu 20.04 LTS, however this morning I noticed that the tester system
(the one having issues) is running Ubuntu 22.04 LTS.
This could be the source of the problem. I encountered a dependency issue
trying to run the Test Engine on 22.04 LTS, so I downgraded the system.
Since the tester is also the host having connection issues, I will try
downgrading that system to 20.04, and see if that changes anything.

I did try passing in the "--vg-rcf" argument to the run.sh script of the
test suite after installing valgrind, but there was no additional output
that I saw.

I will try pulling in the changes you've pushed up, and will see if that
fixes anything.

Thanks,
Adam

On Fri, Aug 25, 2023 at 9:57 AM Andrew Rybchenko <
andrew.rybchenko at oktetlabs.ru> wrote:

> Hello Adam,
>
> On 8/24/23 23:54, Andrew Rybchenko wrote:
>
> I'd like to try to repeat the problem locally. Which Linux distro is
> running on test engine and agents?
>
> In fact I know one problem with Debian 12 and Fedora 38 and we have
> patch in review to fix it, however, the behaviour is different in
> this case, so it is unlike the same problem.
>
>
> I've just published a new tag which fixes known test engine side problems
> on Debian 12 and Fedora 38.
>
>
> One more idea is to install valgrind on the test engine host and
> run with option --vg-rcf to check if something weird is happening.
>
> What I don't understand right now is why I see just one failed attempt
> to connect in your log.txt and then Logger shutdown after 9 minutes.
>
> Andrew.
>
> On 8/24/23 23:29, Adam Hassick wrote:
>
>  > Is there any firewall in the network or on test hosts which could block
> incoming TCP connection to the port 23571
> <http://iol-dts-tester.dpdklab.iol.unh.edu:23571>
> <http://iol-dts-tester.dpdklab.iol.unh.edu:23571> from the host where you
> run test engine?
>
> Our test engine host and the testbed are on the same subnet. The
> connection does work sometimes.
>
>  > If behaviour the same on the next try and you see that test agent is
> kept running, could you check using
>  >
>  > # netstat -tnlp
>  >
>  > that Test Agent is listening on the port and try to establish TCP
> connection from test agent using
>  >
>  > $ telnet iol-dts-tester.dpdklab.iol.unh.edu
> <http://iol-dts-tester.dpdklab.iol.unh.edu:23571>
> <http://iol-dts-tester.dpdklab.iol.unh.edu:23571> 23571
> <http://iol-dts-tester.dpdklab.iol.unh.edu:23571>
> <http://iol-dts-tester.dpdklab.iol.unh.edu:23571>
>  >
>  > and check if TCP connection could be established.
>
> I was able to replicate the same behavior again, where it hangs while RCF
> is trying to start.
> Running this command, I see this in the output:
>
> tcp        0      0 0.0.0.0:23571 <http://0.0.0.0:23571>
> <http://0.0.0.0:23571>           0.0.0.0:*               LISTEN
>  18599/ta
>
> So it seems like it is listening on the correct port.
> Additionally, I was able to connect to the Tester machine from our Test
> Engine host using telnet. It printed the PID of the process once the
> connection was opened.
>
> I tried running the "ta" application manually on the command line, and it
> didn't print anything at all.
> Maybe the issue is something on the Test Engine side.
>
> On Thu, Aug 24, 2023 at 2:35 PM Andrew Rybchenko <
> andrew.rybchenko at oktetlabs.ru <mailto:andrew.rybchenko at oktetlabs.ru>
> <andrew.rybchenko at oktetlabs.ru>> wrote:
>
>     Hi Adam,
>
>      > On the tester host (which appears to be the Peer agent), there
>     are four processes that I see running, which look like the test
>     agent processes.
>
>     Before the next try I'd recommend to kill these processes.
>
>     Is there any firewall in the network or on test hosts which could
>     block incoming TCP connection to the port 23571
>     <http://iol-dts-tester.dpdklab.iol.unh.edu:23571>
> <http://iol-dts-tester.dpdklab.iol.unh.edu:23571> from the host
>     where you run test engine?
>
>     If behaviour the same on the next try and you see that test agent is
>     kept running, could you check using
>
>     # netstat -tnlp
>
>     that Test Agent is listening on the port and try to establish TCP
>     connection from test agent using
>
>     $ telnet iol-dts-tester.dpdklab.iol.unh.edu
>     <http://iol-dts-tester.dpdklab.iol.unh.edu:23571>
> <http://iol-dts-tester.dpdklab.iol.unh.edu:23571> 23571
>     <http://iol-dts-tester.dpdklab.iol.unh.edu:23571>
> <http://iol-dts-tester.dpdklab.iol.unh.edu:23571>
>
>     and check if TCP connection could be established.
>
>     Another idea is to login Tester under root as testing does, get
>     start TA command from the log and try it by hands without -n and
>     remove extra escaping.
>
>     # sudo PATH=${PATH}:/tmp/linux_x86_root_76872_1692885663_1
>
> LD_LIBRARY_PATH=${LD_LIBRARY_PATH}${LD_LIBRARY_PATH:+:}/tmp/linux_x86_root_76872_1692885663_1
> /tmp/linux_x86_root_76872_1692885663_1/ta Peer 23571
> host=iol-dts-tester.dpdklab.iol.unh.edu:
> port=23571:user=root:key=/opt/tsf/keys/id_ed25519:ssh_port=22:copy_timeout=15:kill_timeout=15:sudo=:shell=
>
>     Hopefully in this case test agent directory remains in the /tmp and
>     you don't need to copy it as testing does.
>     May be output could shed some light on what's going on.
>
>     Andrew.
>
>     On 8/24/23 17:30, Adam Hassick wrote:
>
>     Hi Andrew,
>
>     This is the output that I see in the terminal when this failure
>     occurs, after the test agent binaries build and the test engine
>     starts:
>
>     Platform default build - pass
>     Simple RCF consistency check succeeded
>     --->>> Starting Logger...done
>     --->>> Starting RCF...rcf_net_engine_connect(): Connection timed
>     out iol-dts-tester.dpdklab.iol.unh.edu:23571
>     <http://iol-dts-tester.dpdklab.iol.unh.edu:23571>
> <http://iol-dts-tester.dpdklab.iol.unh.edu:23571>
>
>     Then, it hangs here until I kill the "te_rcf" and "te_tee"
>     processes. I let it hang for around 9 minutes.
>
>     On the tester host (which appears to be the Peer agent), there are
>     four processes that I see running, which look like the test agent
>     processes.
>
>     ta.Peer is an empty file. I've attached the log.txt from this run.
>
>      - Adam
>
>     On Thu, Aug 24, 2023 at 4:22 AM Andrew Rybchenko
>     <andrew.rybchenko at oktetlabs.ru
>     <mailto:andrew.rybchenko at oktetlabs.ru> <andrew.rybchenko at oktetlabs.ru>>
> wrote:
>
>         Hi Adam,
>
>         Yes, TE_RCFUNIX_TIMEOUT is in seconds. I've double-checked
>         that it goes to 'copy_timeout' in ts-conf/rcf.conf.
>         Description in in doc/sphinx/pages/group_te_engine_rcf.rst
>         says that copy_timeout is in seconds and implementation in
>         lib/rcfunix/rcfunix.c passes the value to select() tv_sec.
>         Theoretically select() could be interrupted by signal, but I
>         think it is unlikely here.
>
>         I'm not sure that I understand what do you mean by RCF
>         connection timeout. Does it happen on TE startup when RCF
>         starts test agents. If so, TE_RCFUNIX_TIMEOUT could help. Or
>         does it happen when tests are in progress, e.g. in the middle
>         of a test. If so, TE_RCFUNIX_TIMEOUT is unrelated and most
>         likely either host with test agent dies or test agent itself
>         crashes. It would be easier for me if classify it if you share
>         text log (log.txt, full or just corresponding fragment with
>         some context). Also content of ta.DPDK or ta.Peer file
>         depending on which agent has problems could shed some light.
>         Corresponding files contain stdout/stderr of test agents.
>
>         Andrew.
>
>         On 8/23/23 17:45, Adam Hassick wrote:
>
>         Hi Andrew,
>
>         I've set up a test rig repository here, and have created
>         configurations for our development testbed based off of the
>         examples.
>         We've been able to get the test suite to run manually on
>         Mellanox CX5 devices once.
>         However, we are running into an issue where, when RCF starts,
>         the RCF connection times out very frequently. We aren't sure
>         why this is the case.
>         It works sometimes, but most of the time when we try to run
>         the test engine, it encounters this issue.
>         I've tried changing the RCF port by setting
>         "TE_RCF_PORT=<some port number>" and rebooting the testbed
>         machines. Neither seems to fix the issue.
>
>         It also seems like the timeout takes far longer than 60
>         seconds, even when running "export TE_RCFUNIX_TIMEOUT=60"
>         before I try to run the test suite.
>         I assume the unit for this variable is seconds?
>
>         Thanks,
>         Adam
>
>         On Mon, Aug 21, 2023 at 10:19 AM Adam Hassick
>         <ahassick at iol.unh.edu <mailto:ahassick at iol.unh.edu>
> <ahassick at iol.unh.edu>> wrote:
>
>             Hi Andrew,
>
>             Thanks, I've cloned the example repository and will start
>             setting up a configuration for our development testbed
>             today. I'll let you know if I run into any difficulties
>             or have any questions.
>
>              - Adam
>
>             On Sun, Aug 20, 2023 at 4:40 AM Andrew Rybchenko
>             <andrew.rybchenko at oktetlabs.ru
>             <mailto:andrew.rybchenko at oktetlabs.ru>
> <andrew.rybchenko at oktetlabs.ru>> wrote:
>
>                 Hi Adam,
>
>                 I've published
>                 https://github.com/ts-factory/ts-rigs-sample
>                 <https://github.com/ts-factory/ts-rigs-sample>
> <https://github.com/ts-factory/ts-rigs-sample>.
>                 Hopefully it will help to define your test rigs and
>                 successfully run some tests manually. Feel free to
>                 ask any questions and I'll answer here and try to
>                 update documentation.
>
>                 Meanwhile I'll prepare missing bits for steps (2) and
>                 (3).
>                 Hopefully everything is in place for step (4), but we
>                 need to make steps (2) and (3) first.
>
>                 Andrew.
>
>                 On 8/18/23 21:40, Andrew Rybchenko wrote:
>
>                 Hi Adam,
>
>                 > I've conferred with the rest of the team, and we
>                 think it would be best to move forward with mainly
>                 option B.
>
>                 OK, I'll provide the sample on Monday for you. It is
>                 almost ready right now, but I need to double-check
>                 it before publishing.
>
>                 Regards,
>                 Andrew.
>
>                 On 8/17/23 20:03, Adam Hassick wrote:
>
>                 Hi Andrew,
>
>                 I'm adding the CI mailing list to this
>                 conversation. Others in the community might find
>                 this conversation valuable.
>
>                 We do want to run testing on a regular basis. The
>                 Jenkins integration will be very useful for us, as
>                 most of our CI is orchestrated by Jenkins.
>                 I've conferred with the rest of the team, and we
>                 think it would be best to move forward with mainly
>                 option B.
>                 If you would like to know anything about our
>                 testbeds that would help you with creating an
>                 example ts-rigs repo, I'd be happy to answer any
>                 questions you have.
>
>                 We have multiple test rigs (we call these
>                 "DUT-tester pairs") that we run our existing
>                 hardware testing on, with differing network
>                 hardware and CPU architecture. I figured this might
>                 be an important detail.
>
>                 Thanks,
>                 Adam
>
>                 On Thu, Aug 17, 2023 at 11:44 AM Andrew Rybchenko
>                 <andrew.rybchenko at oktetlabs.ru
>                 <mailto:andrew.rybchenko at oktetlabs.ru>
> <andrew.rybchenko at oktetlabs.ru>> wrote:
>
>                     Greatings Adam,
>
>                     I'm happy to hear that you're trying to bring
>                     it up.
>
>                     As I understand the final goal is to run it on
>                     regular basis. So, we need to make it properly
>                     from the very beginning.
>                     Bring up of all features consists of 4 steps:
>
>                     1. Create site-specific repository (we call it
>                     ts-rigs) which contains information about test
>                     rigs and other site-specific information like
>                     where to send mails, where to store logs etc.
>                     It is required for manual execution as well,
>                     since test rigs description is essential. I'll
>                     return to the topic below.
>
>                     2. Setup logs storage for automated runs.
>                     Basically it is a disk space plus apache2 web
>                     server with few CGI scripts which help a lot to
>                     save disk space.
>
>                     3. Setup Bublik web application which provides
>                     web interface to view testing results. Same as
>                     https://ts-factory.io/bublik
>                     <https://ts-factory.io/bublik>
> <https://ts-factory.io/bublik>
>
>                     4. Setup Jenkins to run tests on regularly,
>                     save logs in log storage (2) and import it to
>                     bublik (3).
>
>                     Last few month we spent on our homework to make
>                     it simpler to bring up automated execution
>                     using Jenkins -
>                     https://github.com/ts-factory/te-jenkins
>                     <https://github.com/ts-factory/te-jenkins>
> <https://github.com/ts-factory/te-jenkins>
>                     Corresponding bits in dpdk-ethdev-ts will be
>                     available tomorrow.
>
>                     Let's return to the step (1).
>
>                     Unfortunately there is no publicly available
>                     example of the ts-rigs repository since
>                     sensitive site-specific information is located
>                     there. But I'm ready to help you to create it
>                     for UNH. I see two options here:
>
>                     (A) I'll ask questions and based on your
>                     answers will create the first draft with my
>                     comments.
>
>                     (B) I'll make a template/example ts-rigs repo,
>                     publish it and you'll create UNH ts-rigs based
>                     on it.
>
>                     Of course, I'll help to debug and finally bring
>                     it up in any case.
>
>                     (A) is a bit simpler for me and you, but (B) is
>                     a bit more generic and will help other
>                     potential users to bring it up.
>                     We can combine (A)+(B). I.e. start from (A).
>                     What do you think?
>
>                     Thanks,
>                     Andrew.
>
>                     On 8/17/23 15:18, Konstantin Ushakov wrote:
>
>                     Greetings Adam,
>
>
>                     Thanks for contacting us. I copy Andrew who
>                     would be happy to help
>
>                     Thanks,
>                     Konstantin
>
>                     On 16 Aug 2023, at 21:50, Adam Hassick
>                     <ahassick at iol.unh.edu> <ahassick at iol.unh.edu>
>                     <mailto:ahassick at iol.unh.edu> <ahassick at iol.unh.edu>
> wrote:
>
>                     
>                     Greetings Konstantin,
>
>                     I am in the process of setting up the DPDK
>                     Poll Mode Driver test suite as an addition to
>                     our testing coverage for DPDK at the UNH lab.
>
>                     I have some questions about how to set the
>                     test suite arguments.
>
>                     I have been able to configure the Test Engine
>                     to connect to the hosts in the testbed. The
>                     RCF, Configurator, and Tester all begin to
>                     run, however the prelude of the test suite
>                     fails to run.
>
>
> https://ts-factory.io/doc/dpdk-ethdev-ts/index.html#test-parameters
> <https://ts-factory.io/doc/dpdk-ethdev-ts/index.html#test-parameters>
> <https://ts-factory.io/doc/dpdk-ethdev-ts/index.html#test-parameters>
>
>                     The documentation mentions that there are
>                     several test parameters for the test suite,
>                     like for the IUT test link MAC, etc. These
>                     seem like they would need to be set somewhere
>                     to run many of the tests.
>
>                     I see in the Test Engine documentation, there
>                     are instructions on how to create new
>                     parameters for test suites in the Tester
>                     configuration, but there is nothing in the
>                     user guide or in the Tester guide for how to
>                     set the arguments for the parameters when
>                     running the test suite that I can find. I'm
>                     not sure if I need to write my own Tester
>                     config, or if I should be setting these in
>                     some other way.
>
>                     How should these values be set?
>
>                     I'm also not sure what environment
>                     variables/arguments are strictly necessary or
>                     which are optional.
>
>                     Regards,
>                     Adam
>
>                     --                     *Adam Hassick*
>                     Senior Developer
>                     UNH InterOperability Lab
>                     ahassick at iol.unh.edu
>                     <mailto:ahassick at iol.unh.edu> <ahassick at iol.unh.edu>
>                     iol.unh.edu <https://www.iol.unh.edu/>
> <https://www.iol.unh.edu/>
>                     +1 (603) 475-8248
>
>
>
>
>                 --                 *Adam Hassick*
>                 Senior Developer
>                 UNH InterOperability Lab
>                 ahassick at iol.unh.edu <mailto:ahassick at iol.unh.edu>
> <ahassick at iol.unh.edu>
>                 iol.unh.edu <https://www.iol.unh.edu/>
> <https://www.iol.unh.edu/>
>                 +1 (603) 475-8248
>
>
>
>
>
>             --             *Adam Hassick*
>             Senior Developer
>             UNH InterOperability Lab
>             ahassick at iol.unh.edu <mailto:ahassick at iol.unh.edu>
> <ahassick at iol.unh.edu>
>             iol.unh.edu <https://www.iol.unh.edu/>
> <https://www.iol.unh.edu/>
>             +1 (603) 475-8248
>
>
>
>         --         *Adam Hassick*
>         Senior Developer
>         UNH InterOperability Lab
>         ahassick at iol.unh.edu <mailto:ahassick at iol.unh.edu>
> <ahassick at iol.unh.edu>
>         iol.unh.edu <https://www.iol.unh.edu/> <https://www.iol.unh.edu/>
>         +1 (603) 475-8248
>
>
>
>
>     --     *Adam Hassick*
>     Senior Developer
>     UNH InterOperability Lab
>     ahassick at iol.unh.edu <mailto:ahassick at iol.unh.edu>
> <ahassick at iol.unh.edu>
>     iol.unh.edu <https://www.iol.unh.edu/> <https://www.iol.unh.edu/>
>     +1 (603) 475-8248
>
>
>
>
> --
> *Adam Hassick*
> Senior Developer
> UNH InterOperability Lab
> ahassick at iol.unh.edu <mailto:ahassick at iol.unh.edu> <ahassick at iol.unh.edu>
> iol.unh.edu <https://www.iol.unh.edu/> <https://www.iol.unh.edu/>
> +1 (603) 475-8248
>
>
>
>

-- 
*Adam Hassick*
Senior Developer
UNH InterOperability Lab
ahassick at iol.unh.edu
iol.unh.edu <https://www.iol.unh.edu/>
+1 (603) 475-8248
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mails.dpdk.org/archives/ci/attachments/20230825/88bbd7e3/attachment-0001.htm>


More information about the ci mailing list