[dpdk-dev] [PATCH v6 2/2] test/service: fix race condition on stopping lcore

Van Haaren, Harry harry.van.haaren at intel.com
Thu Oct 15 10:11:57 CEST 2020


> -----Original Message-----
> From: David Marchand <david.marchand at redhat.com>
> Sent: Tuesday, October 13, 2020 8:45 PM
> To: Van Haaren, Harry <harry.van.haaren at intel.com>
> Cc: dev <dev at dpdk.org>; Lukasz Wojciechowski
> <l.wojciechow at partner.samsung.com>; Honnappa Nagarahalli
> <Honnappa.Nagarahalli at arm.com>; Phil Yang <phil.yang at arm.com>; Aaron
> Conole <aconole at redhat.com>
> Subject: Re: [PATCH v6 2/2] test/service: fix race condition on stopping lcore
> 
> Hello Harry,
> 
> Long time no see :-)
> 
> On Mon, Sep 21, 2020 at 4:51 PM David Marchand
> <david.marchand at redhat.com> wrote:
> >
> > On Mon, Sep 14, 2020 at 4:30 PM Harry van Haaren
> > <harry.van.haaren at intel.com> wrote:
> > >
> > > This commit fixes a potential race condition in the tests
> > > where the lcore running a service would increment a counter
> > > that was already reset by the test-suite thread. The resulting
> > > race-condition incremented value could cause CI failures, as
> > > indicated by DPDK's CI.
> > >
> > > This patch fixes the race-condition by making use of the
> > > added rte_service_lcore_active() API, which indicates when
> > > a service-core is no longer in the service-core polling loop.
> > >
> > > The unit test makes use of the above function to detect when
> > > all statistics increments are done in the service-core thread,
> > > and then the unit test continues finalizing and checking state.
> > >
> > > Fixes: f28f3594ded2 ("service: add attribute API")
> > >
> > > Reported-by: David Marchand <david.marchand at redhat.com>
> > > Signed-off-by: Harry van Haaren <harry.van.haaren at intel.com>
> > > Reviewed-by: Phil Yang <phil.yang at arm.com>
> > > Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli at arm.com>
> >
> 
> We probably need a followup fix for:
> https://travis-ci.com/github/DPDK/dpdk/jobs/398954463#L10088
> 
> 
> The race is in service_attr_get where we look at/reset spent cycles
> while a service lcore is still running.
> Quoting this test code:
> 
> rte_service_lcore_stop(slcore_id);

/* TODO: implement wait for slcore_id to stop polling here */
 
> TEST_ASSERT_EQUAL(0, rte_service_attr_get(id, attr_calls, &attr_value),
> "Valid attr_get() call didn't return success");
> TEST_ASSERT_EQUAL(1, (attr_value > 0),
> "attr_get() call didn't get call count (zero)");
> 
> TEST_ASSERT_EQUAL(0, rte_service_attr_reset_all(id),
> "Valid attr_reset_all() return success");
> 
> TEST_ASSERT_EQUAL(0, rte_service_attr_get(id, attr_id, &attr_value),
> "Valid attr_get() call didn't return success");

Based on the output you provided ( https://travis-ci.com/github/DPDK/dpdk/jobs/398954463#L10088 )
and the above, indeed it seems that the core may not have stopped yet (race-cond).

Will send a patch - thanks for reporting with details, -Harry 


More information about the dev mailing list