[dpdk-ci] CI reliability

Thomas Monjalon thomas at monjalon.net
Tue May 26 23:10:10 CEST 2020

26/05/2020 22:27, Lincoln Lavoie:
> On Sun, May 24, 2020 at 5:50 AM Thomas Monjalon <thomas at monjalon.net> wrote:
> > Hi all,
> >
> > I think we have a CI reliability issue in general.
> > Perhaps we lack some alert mechanism warning test platform maintainers
> > when too many tests are failing.
> >
> > Recent example: the community lab compilation test is failing on
> > Fedora 31 for at least 2 weeks, and I don't see any action to fix it:
> >         https://lab.dpdk.org/results/dashboard/patchsets/11040/
> >
> > Because of such recurring errors, the whole CI becomes irrelevant.
> This has been fixed as of yesterday.  The failure was caused by a commit to
> the SPDK repos in how they pull in their dependencies, which was done in a
> way that is not compatible with docker.  The team created a work around so
> that case is fixed, but there is always a risk where other commits for
> those type of items could cause a failure in the containers.

Thanks for fixing

> I asked Brandon to change the scripts that run the testing in the
> containers to try and catch failures from docker separately, so they can be
> flagged as infrastructure, compared to failures of the build.

Yes good idea.

When compiling external projects, we can see some errors which
are not due to the DPDK patch.
I guess we validate any upgrade of the external projects
before making them live?

> I'm also very surprised, this was not raised during the CI meeting, or by
> anyone else.  I'm wondering if this is caused by the actual error logs
> being a little abstracted from the emails, i.e. they are a link and a zip
> file away for the actual email text, so maybe folks are not really looking
> into the output as closely as they should be.  Is this something we can
> make better by including more detail in the email text, so issues are
> caught more quickly?

I think the table in the report is already quite expressive.

As I proposed above, I think we need a better monitoring.
If the same test is failing on many DPDK patches, it should raise an alarm.

More information about the ci mailing list