[dpdk-dev] [Bug 826] red_autotest random failures
Liguzinski, WojciechX
wojciechx.liguzinski at intel.com
Wed Nov 24 08:48:16 CET 2021
Hi,
Thanks Lincoln, I will also have a try with such script.
Cheers,
Wojciech
From: Lincoln Lavoie <lylavoie at iol.unh.edu>
Sent: Friday, November 19, 2021 6:26 PM
To: Dumitrescu, Cristian <cristian.dumitrescu at intel.com>
Cc: Thomas Monjalon <thomas at monjalon.net>; David Marchand <david.marchand at redhat.com>; Ajmera, Megha <megha.ajmera at intel.com>; Singh, Jasvinder <jasvinder.singh at intel.com>; Liguzinski, WojciechX <wojciechx.liguzinski at intel.com>; dev <dev at dpdk.org>; Aaron Conole <aconole at redhat.com>; Yigit, Ferruh <ferruh.yigit at intel.com>; ci at dpdk.org; Zegota, AnnaX <annax.zegota at intel.com>
Subject: Re: [dpdk-dev] [Bug 826] red_autotest random failures
Hi All,
I'm not sure if it will help, but this is an example of a failing case in the CI: https://lab.dpdk.org/results/dashboard/patchsets/20222/
The test is running within a docker container. CI is set up to only allow one active unit test at a time, so the host might be running compile jobs, but not other unit tests. This ensures there isn't "competition" for resources like hugepages between two running unit test jobs. The host is actually a VM running on VMware vCenter, not a bare-metal host, the VM's sole purpose is running the docker jobs.
The command to start the unit test run is pretty generic (script is below).
#!/bin/bash
####################################################
# $1 argument: extra arguments to send to meson test
####################################################
# Exit on first command failure
set -e
# Extract dpdk.tar.gz
tar xzfm dpdk.tar.gz
# Compile DPDK
cd dpdk
meson build --werror
ninja -C build install
# Unit test
cd build
meson test --suite fast-tests -t 60 $1
I think a starting point is to understand if the unit test expects or makes assumptions on the system / environment. If it has sole access to a CPU core, minimum number of hugepages, etc. If it would help, I can also give you the DockerFile to build the container (note the RHEL images have to be built on a licensed Redhat server, based on being able to install the required packages).
Cheers,
Lincoln
On Fri, Nov 19, 2021 at 11:54 AM Dumitrescu, Cristian <cristian.dumitrescu at intel.com<mailto:cristian.dumitrescu at intel.com>> wrote:
> -----Original Message-----
> From: Thomas Monjalon <thomas at monjalon.net<mailto:thomas at monjalon.net>>
> Sent: Friday, November 19, 2021 7:26 AM
> To: Dumitrescu, Cristian <cristian.dumitrescu at intel.com<mailto:cristian.dumitrescu at intel.com>>; David Marchand
> <david.marchand at redhat.com<mailto:david.marchand at redhat.com>>; Lincoln Lavoie <lylavoie at iol.unh.edu<mailto:lylavoie at iol.unh.edu>>;
> Ajmera, Megha <megha.ajmera at intel.com<mailto:megha.ajmera at intel.com>>; Singh, Jasvinder
> <jasvinder.singh at intel.com<mailto:jasvinder.singh at intel.com>>; Liguzinski, WojciechX
> <wojciechx.liguzinski at intel.com<mailto:wojciechx.liguzinski at intel.com>>
> Cc: dev <dev at dpdk.org<mailto:dev at dpdk.org>>; Aaron Conole <aconole at redhat.com<mailto:aconole at redhat.com>>; Yigit,
> Ferruh <ferruh.yigit at intel.com<mailto:ferruh.yigit at intel.com>>; ci at dpdk.org<mailto:ci at dpdk.org>; Zegota, AnnaX
> <annax.zegota at intel.com<mailto:annax.zegota at intel.com>>
> Subject: Re: [dpdk-dev] [Bug 826] red_autotest random failures
>
> 18/11/2021 23:10, Liguzinski, WojciechX:
> > Hi,
> >
> > I was trying to reproduce this test failure, but for me RED tests are passing.
> > I was running the exact test command like the one described in Bug 826 -
> 'red_autotest' on the current main branch.
>
> The test is not always failing.
> There are some failing conditions, please find them.
> I think you should try in a container with more limited resources.
>
Hi Thomas,
This is not a fair request IMO. We want to avoid wasting everybody's time, including Wojciech's time. Can the bug originator provide the details on the setup to reproduce the failure, please? Thank you!
On a different point, we should probably tweak our autotests to differentiate between logical failures and those failures related to resources not being available, and flag the test result accordingly in the report. For example, if memory allocation fails, the test should be flagged as "Not enough resources" instead of simply "Failed". In the first case, the next step should be fixing the test setup, while in the second case the next step should be fixing the code. What do people think on this?
Regards,
Cristian
--
Lincoln Lavoie
Principal Engineer, Broadband Technologies
21 Madbury Rd., Ste. 100, Durham, NH 03824
lylavoie at iol.unh.edu<mailto:lylavoie at iol.unh.edu>
https://www.iol.unh.edu
+1-603-674-2755 (m)
[Image removed by sender.]<https://www.iol.unh.edu/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mails.dpdk.org/archives/dev/attachments/20211124/2a2e15f5/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image002.jpg
Type: image/jpeg
Size: 444 bytes
Desc: image002.jpg
URL: <http://mails.dpdk.org/archives/dev/attachments/20211124/2a2e15f5/attachment-0001.jpg>
More information about the dev
mailing list