Setting up DPDK PMD Test Suite
Andrew Rybchenko
andrew.rybchenko at oktetlabs.ru
Fri Aug 25 15:57:33 CEST 2023
Hello Adam,
On 8/24/23 23:54, Andrew Rybchenko wrote:
> I'd like to try to repeat the problem locally. Which Linux distro is
> running on test engine and agents?
>
> In fact I know one problem with Debian 12 and Fedora 38 and we have
> patch in review to fix it, however, the behaviour is different in
> this case, so it is unlike the same problem.
I've just published a new tag which fixes known test engine side
problems on Debian 12 and Fedora 38.
>
> One more idea is to install valgrind on the test engine host and
> run with option --vg-rcf to check if something weird is happening.
>
> What I don't understand right now is why I see just one failed attempt
> to connect in your log.txt and then Logger shutdown after 9 minutes.
>
> Andrew.
>
> On 8/24/23 23:29, Adam Hassick wrote:
>> > Is there any firewall in the network or on test hosts which could
>> block incoming TCP connection to the port 23571
>> <http://iol-dts-tester.dpdklab.iol.unh.edu:23571> from the host where
>> you run test engine?
>>
>> Our test engine host and the testbed are on the same subnet. The
>> connection does work sometimes.
>>
>> > If behaviour the same on the next try and you see that test agent
>> is kept running, could you check using
>> >
>> > # netstat -tnlp
>> >
>> > that Test Agent is listening on the port and try to establish TCP
>> connection from test agent using
>> >
>> > $ telnet iol-dts-tester.dpdklab.iol.unh.edu
>> <http://iol-dts-tester.dpdklab.iol.unh.edu:23571> 23571
>> <http://iol-dts-tester.dpdklab.iol.unh.edu:23571>
>> >
>> > and check if TCP connection could be established.
>>
>> I was able to replicate the same behavior again, where it hangs while
>> RCF is trying to start.
>> Running this command, I see this in the output:
>>
>> tcp 0 0 0.0.0.0:23571 <http://0.0.0.0:23571>
>> 0.0.0.0:* LISTEN 18599/ta
>>
>> So it seems like it is listening on the correct port.
>> Additionally, I was able to connect to the Tester machine from our
>> Test Engine host using telnet. It printed the PID of the process once
>> the connection was opened.
>>
>> I tried running the "ta" application manually on the command line,
>> and it didn't print anything at all.
>> Maybe the issue is something on the Test Engine side.
>>
>> On Thu, Aug 24, 2023 at 2:35 PM Andrew Rybchenko
>> <andrew.rybchenko at oktetlabs.ru
>> <mailto:andrew.rybchenko at oktetlabs.ru>> wrote:
>>
>> Hi Adam,
>>
>> > On the tester host (which appears to be the Peer agent), there
>> are four processes that I see running, which look like the test
>> agent processes.
>>
>> Before the next try I'd recommend to kill these processes.
>>
>> Is there any firewall in the network or on test hosts which could
>> block incoming TCP connection to the port 23571
>> <http://iol-dts-tester.dpdklab.iol.unh.edu:23571> from the host
>> where you run test engine?
>>
>> If behaviour the same on the next try and you see that test agent is
>> kept running, could you check using
>>
>> # netstat -tnlp
>>
>> that Test Agent is listening on the port and try to establish TCP
>> connection from test agent using
>>
>> $ telnet iol-dts-tester.dpdklab.iol.unh.edu
>> <http://iol-dts-tester.dpdklab.iol.unh.edu:23571> 23571
>> <http://iol-dts-tester.dpdklab.iol.unh.edu:23571>
>>
>> and check if TCP connection could be established.
>>
>> Another idea is to login Tester under root as testing does, get
>> start TA command from the log and try it by hands without -n and
>> remove extra escaping.
>>
>> # sudo PATH=${PATH}:/tmp/linux_x86_root_76872_1692885663_1
>> LD_LIBRARY_PATH=${LD_LIBRARY_PATH}${LD_LIBRARY_PATH:+:}/tmp/linux_x86_root_76872_1692885663_1
>> /tmp/linux_x86_root_76872_1692885663_1/ta Peer 23571
>> host=iol-dts-tester.dpdklab.iol.unh.edu:port=23571:user=root:key=/opt/tsf/keys/id_ed25519:ssh_port=22:copy_timeout=15:kill_timeout=15:sudo=:shell=
>>
>> Hopefully in this case test agent directory remains in the /tmp and
>> you don't need to copy it as testing does.
>> May be output could shed some light on what's going on.
>>
>> Andrew.
>>
>> On 8/24/23 17:30, Adam Hassick wrote:
>>> Hi Andrew,
>>>
>>> This is the output that I see in the terminal when this failure
>>> occurs, after the test agent binaries build and the test engine
>>> starts:
>>>
>>> Platform default build - pass
>>> Simple RCF consistency check succeeded
>>> --->>> Starting Logger...done
>>> --->>> Starting RCF...rcf_net_engine_connect(): Connection timed
>>> out iol-dts-tester.dpdklab.iol.unh.edu:23571
>>> <http://iol-dts-tester.dpdklab.iol.unh.edu:23571>
>>>
>>> Then, it hangs here until I kill the "te_rcf" and "te_tee"
>>> processes. I let it hang for around 9 minutes.
>>>
>>> On the tester host (which appears to be the Peer agent), there are
>>> four processes that I see running, which look like the test agent
>>> processes.
>>>
>>> ta.Peer is an empty file. I've attached the log.txt from this run.
>>>
>>> - Adam
>>>
>>> On Thu, Aug 24, 2023 at 4:22 AM Andrew Rybchenko
>>> <andrew.rybchenko at oktetlabs.ru
>>> <mailto:andrew.rybchenko at oktetlabs.ru>> wrote:
>>>
>>> Hi Adam,
>>>
>>> Yes, TE_RCFUNIX_TIMEOUT is in seconds. I've double-checked
>>> that it goes to 'copy_timeout' in ts-conf/rcf.conf.
>>> Description in in doc/sphinx/pages/group_te_engine_rcf.rst
>>> says that copy_timeout is in seconds and implementation in
>>> lib/rcfunix/rcfunix.c passes the value to select() tv_sec.
>>> Theoretically select() could be interrupted by signal, but I
>>> think it is unlikely here.
>>>
>>> I'm not sure that I understand what do you mean by RCF
>>> connection timeout. Does it happen on TE startup when RCF
>>> starts test agents. If so, TE_RCFUNIX_TIMEOUT could help. Or
>>> does it happen when tests are in progress, e.g. in the middle
>>> of a test. If so, TE_RCFUNIX_TIMEOUT is unrelated and most
>>> likely either host with test agent dies or test agent itself
>>> crashes. It would be easier for me if classify it if you share
>>> text log (log.txt, full or just corresponding fragment with
>>> some context). Also content of ta.DPDK or ta.Peer file
>>> depending on which agent has problems could shed some light.
>>> Corresponding files contain stdout/stderr of test agents.
>>>
>>> Andrew.
>>>
>>> On 8/23/23 17:45, Adam Hassick wrote:
>>>> Hi Andrew,
>>>>
>>>> I've set up a test rig repository here, and have created
>>>> configurations for our development testbed based off of the
>>>> examples.
>>>> We've been able to get the test suite to run manually on
>>>> Mellanox CX5 devices once.
>>>> However, we are running into an issue where, when RCF starts,
>>>> the RCF connection times out very frequently. We aren't sure
>>>> why this is the case.
>>>> It works sometimes, but most of the time when we try to run
>>>> the test engine, it encounters this issue.
>>>> I've tried changing the RCF port by setting
>>>> "TE_RCF_PORT=<some port number>" and rebooting the testbed
>>>> machines. Neither seems to fix the issue.
>>>>
>>>> It also seems like the timeout takes far longer than 60
>>>> seconds, even when running "export TE_RCFUNIX_TIMEOUT=60"
>>>> before I try to run the test suite.
>>>> I assume the unit for this variable is seconds?
>>>>
>>>> Thanks,
>>>> Adam
>>>>
>>>> On Mon, Aug 21, 2023 at 10:19 AM Adam Hassick
>>>> <ahassick at iol.unh.edu <mailto:ahassick at iol.unh.edu>> wrote:
>>>>
>>>> Hi Andrew,
>>>>
>>>> Thanks, I've cloned the example repository and will start
>>>> setting up a configuration for our development testbed
>>>> today. I'll let you know if I run into any difficulties
>>>> or have any questions.
>>>>
>>>> - Adam
>>>>
>>>> On Sun, Aug 20, 2023 at 4:40 AM Andrew Rybchenko
>>>> <andrew.rybchenko at oktetlabs.ru
>>>> <mailto:andrew.rybchenko at oktetlabs.ru>> wrote:
>>>>
>>>> Hi Adam,
>>>>
>>>> I've published
>>>> https://github.com/ts-factory/ts-rigs-sample
>>>> <https://github.com/ts-factory/ts-rigs-sample>.
>>>> Hopefully it will help to define your test rigs and
>>>> successfully run some tests manually. Feel free to
>>>> ask any questions and I'll answer here and try to
>>>> update documentation.
>>>>
>>>> Meanwhile I'll prepare missing bits for steps (2) and
>>>> (3).
>>>> Hopefully everything is in place for step (4), but we
>>>> need to make steps (2) and (3) first.
>>>>
>>>> Andrew.
>>>>
>>>> On 8/18/23 21:40, Andrew Rybchenko wrote:
>>>>> Hi Adam,
>>>>>
>>>>> > I've conferred with the rest of the team, and we
>>>>> think it would be best to move forward with mainly
>>>>> option B.
>>>>>
>>>>> OK, I'll provide the sample on Monday for you. It is
>>>>> almost ready right now, but I need to double-check
>>>>> it before publishing.
>>>>>
>>>>> Regards,
>>>>> Andrew.
>>>>>
>>>>> On 8/17/23 20:03, Adam Hassick wrote:
>>>>>> Hi Andrew,
>>>>>>
>>>>>> I'm adding the CI mailing list to this
>>>>>> conversation. Others in the community might find
>>>>>> this conversation valuable.
>>>>>>
>>>>>> We do want to run testing on a regular basis. The
>>>>>> Jenkins integration will be very useful for us, as
>>>>>> most of our CI is orchestrated by Jenkins.
>>>>>> I've conferred with the rest of the team, and we
>>>>>> think it would be best to move forward with mainly
>>>>>> option B.
>>>>>> If you would like to know anything about our
>>>>>> testbeds that would help you with creating an
>>>>>> example ts-rigs repo, I'd be happy to answer any
>>>>>> questions you have.
>>>>>>
>>>>>> We have multiple test rigs (we call these
>>>>>> "DUT-tester pairs") that we run our existing
>>>>>> hardware testing on, with differing network
>>>>>> hardware and CPU architecture. I figured this might
>>>>>> be an important detail.
>>>>>>
>>>>>> Thanks,
>>>>>> Adam
>>>>>>
>>>>>> On Thu, Aug 17, 2023 at 11:44 AM Andrew Rybchenko
>>>>>> <andrew.rybchenko at oktetlabs.ru
>>>>>> <mailto:andrew.rybchenko at oktetlabs.ru>> wrote:
>>>>>>
>>>>>> Greatings Adam,
>>>>>>
>>>>>> I'm happy to hear that you're trying to bring
>>>>>> it up.
>>>>>>
>>>>>> As I understand the final goal is to run it on
>>>>>> regular basis. So, we need to make it properly
>>>>>> from the very beginning.
>>>>>> Bring up of all features consists of 4 steps:
>>>>>>
>>>>>> 1. Create site-specific repository (we call it
>>>>>> ts-rigs) which contains information about test
>>>>>> rigs and other site-specific information like
>>>>>> where to send mails, where to store logs etc.
>>>>>> It is required for manual execution as well,
>>>>>> since test rigs description is essential. I'll
>>>>>> return to the topic below.
>>>>>>
>>>>>> 2. Setup logs storage for automated runs.
>>>>>> Basically it is a disk space plus apache2 web
>>>>>> server with few CGI scripts which help a lot to
>>>>>> save disk space.
>>>>>>
>>>>>> 3. Setup Bublik web application which provides
>>>>>> web interface to view testing results. Same as
>>>>>> https://ts-factory.io/bublik
>>>>>> <https://ts-factory.io/bublik>
>>>>>>
>>>>>> 4. Setup Jenkins to run tests on regularly,
>>>>>> save logs in log storage (2) and import it to
>>>>>> bublik (3).
>>>>>>
>>>>>> Last few month we spent on our homework to make
>>>>>> it simpler to bring up automated execution
>>>>>> using Jenkins -
>>>>>> https://github.com/ts-factory/te-jenkins
>>>>>> <https://github.com/ts-factory/te-jenkins>
>>>>>> Corresponding bits in dpdk-ethdev-ts will be
>>>>>> available tomorrow.
>>>>>>
>>>>>> Let's return to the step (1).
>>>>>>
>>>>>> Unfortunately there is no publicly available
>>>>>> example of the ts-rigs repository since
>>>>>> sensitive site-specific information is located
>>>>>> there. But I'm ready to help you to create it
>>>>>> for UNH. I see two options here:
>>>>>>
>>>>>> (A) I'll ask questions and based on your
>>>>>> answers will create the first draft with my
>>>>>> comments.
>>>>>>
>>>>>> (B) I'll make a template/example ts-rigs repo,
>>>>>> publish it and you'll create UNH ts-rigs based
>>>>>> on it.
>>>>>>
>>>>>> Of course, I'll help to debug and finally bring
>>>>>> it up in any case.
>>>>>>
>>>>>> (A) is a bit simpler for me and you, but (B) is
>>>>>> a bit more generic and will help other
>>>>>> potential users to bring it up.
>>>>>> We can combine (A)+(B). I.e. start from (A).
>>>>>> What do you think?
>>>>>>
>>>>>> Thanks,
>>>>>> Andrew.
>>>>>>
>>>>>> On 8/17/23 15:18, Konstantin Ushakov wrote:
>>>>>>> Greetings Adam,
>>>>>>>
>>>>>>>
>>>>>>> Thanks for contacting us. I copy Andrew who
>>>>>>> would be happy to help
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Konstantin
>>>>>>>
>>>>>>>> On 16 Aug 2023, at 21:50, Adam Hassick
>>>>>>>> <ahassick at iol.unh.edu>
>>>>>>>> <mailto:ahassick at iol.unh.edu> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> Greetings Konstantin,
>>>>>>>>
>>>>>>>> I am in the process of setting up the DPDK
>>>>>>>> Poll Mode Driver test suite as an addition to
>>>>>>>> our testing coverage for DPDK at the UNH lab.
>>>>>>>>
>>>>>>>> I have some questions about how to set the
>>>>>>>> test suite arguments.
>>>>>>>>
>>>>>>>> I have been able to configure the Test Engine
>>>>>>>> to connect to the hosts in the testbed. The
>>>>>>>> RCF, Configurator, and Tester all begin to
>>>>>>>> run, however the prelude of the test suite
>>>>>>>> fails to run.
>>>>>>>>
>>>>>>>> https://ts-factory.io/doc/dpdk-ethdev-ts/index.html#test-parameters
>>>>>>>> <https://ts-factory.io/doc/dpdk-ethdev-ts/index.html#test-parameters>
>>>>>>>>
>>>>>>>>
>>>>>>>> The documentation mentions that there are
>>>>>>>> several test parameters for the test suite,
>>>>>>>> like for the IUT test link MAC, etc. These
>>>>>>>> seem like they would need to be set somewhere
>>>>>>>> to run many of the tests.
>>>>>>>>
>>>>>>>> I see in the Test Engine documentation, there
>>>>>>>> are instructions on how to create new
>>>>>>>> parameters for test suites in the Tester
>>>>>>>> configuration, but there is nothing in the
>>>>>>>> user guide or in the Tester guide for how to
>>>>>>>> set the arguments for the parameters when
>>>>>>>> running the test suite that I can find. I'm
>>>>>>>> not sure if I need to write my own Tester
>>>>>>>> config, or if I should be setting these in
>>>>>>>> some other way.
>>>>>>>>
>>>>>>>> How should these values be set?
>>>>>>>>
>>>>>>>> I'm also not sure what environment
>>>>>>>> variables/arguments are strictly necessary or
>>>>>>>> which are optional.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Adam
>>>>>>>>
>>>>>>>> -- *Adam Hassick*
>>>>>>>> Senior Developer
>>>>>>>> UNH InterOperability Lab
>>>>>>>> ahassick at iol.unh.edu
>>>>>>>> <mailto:ahassick at iol.unh.edu>
>>>>>>>> iol.unh.edu <https://www.iol.unh.edu/>
>>>>>>>> +1 (603) 475-8248
>>>>>>
>>>>>>
>>>>>>
>>>>>> -- *Adam Hassick*
>>>>>> Senior Developer
>>>>>> UNH InterOperability Lab
>>>>>> ahassick at iol.unh.edu <mailto:ahassick at iol.unh.edu>
>>>>>> iol.unh.edu <https://www.iol.unh.edu/>
>>>>>> +1 (603) 475-8248
>>>>>
>>>>
>>>>
>>>>
>>>> -- *Adam Hassick*
>>>> Senior Developer
>>>> UNH InterOperability Lab
>>>> ahassick at iol.unh.edu <mailto:ahassick at iol.unh.edu>
>>>> iol.unh.edu <https://www.iol.unh.edu/>
>>>> +1 (603) 475-8248
>>>>
>>>>
>>>>
>>>> -- *Adam Hassick*
>>>> Senior Developer
>>>> UNH InterOperability Lab
>>>> ahassick at iol.unh.edu <mailto:ahassick at iol.unh.edu>
>>>> iol.unh.edu <https://www.iol.unh.edu/>
>>>> +1 (603) 475-8248
>>>
>>>
>>>
>>> -- *Adam Hassick*
>>> Senior Developer
>>> UNH InterOperability Lab
>>> ahassick at iol.unh.edu <mailto:ahassick at iol.unh.edu>
>>> iol.unh.edu <https://www.iol.unh.edu/>
>>> +1 (603) 475-8248
>>
>>
>>
>> --
>> *Adam Hassick*
>> Senior Developer
>> UNH InterOperability Lab
>> ahassick at iol.unh.edu <mailto:ahassick at iol.unh.edu>
>> iol.unh.edu <https://www.iol.unh.edu/>
>> +1 (603) 475-8248
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mails.dpdk.org/archives/ci/attachments/20230825/cef5cd03/attachment-0001.htm>
More information about the ci
mailing list