[dpdk-dev] [PATCH v4 2/8] test/distributor: synchronize lcores statistics

Lukasz Wojciechowski l.wojciechow at partner.samsung.com
Thu Oct 8 22:47:48 CEST 2020
Previous message: [dpdk-dev] [PATCH v4 2/8] test/distributor: synchronize lcores statistics
Next message: [dpdk-dev] [PATCH v4 2/8] test/distributor: synchronize lcores statistics
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hi Honappa,

I pushed v5 of the patches today.

However all I fixed in this patch is replacing memset to loop of 
operations on atomic. I didn't move clearing and checking the statistics 
out of the test yet (to make sure no worker is running).

After fixing few more distributor issues I wasn't able now to catch even 
a single failure using the ACQ/REL memory models.
I'll test it maybe tomorrow with RELAXED to see if it will work.

Best regards

Lukasz

W dniu 02.10.2020 o 13:25, Lukasz Wojciechowski pisze:
> Hi Honnappa,
>
> Many thanks for the review!
>
> I'll write my answers here not inline as it would be easier to read them
> in one place, I think.
> So first of all I agree with you in 2 things:
> 1) all uses of statistics must be atomic and lack of that caused most of
> the problems
> 2) it would be better to replace barrier and memset in
> clear_packet_count() with atomic stores as you suggested
>
> So I will apply both of above.
>
> However I wasn't not fully convinced on changing acquire/release to
> relaxed. It wood be perfectly ok
> if it would look like in this Herb Sutter's example:
> https://youtu.be/KeLBd2EJLOU?t=4170
> But in his case the counters are cleared before worker threads start and
> are printout after they are completed.
>
> In case of the dpdk distributor tests both worker and main cores are
> running at the same time. In the sanity_test, the statistics are cleared
> and verified few times for different hashes of packages. The worker
> cores are not stopped at this time and they continue their loops in
> handle procedure. Verification made in main core is an exchange of data
> as the current statistics indicate how the test will result.
>
> So as I wasn't convinced, I run some tests with both both relaxed and
> acquire/release modes and they both fail :(
> The failures caused by statistics errors to number of tests ratio for
> 200000 tests was:
> for relaxed: 0,000790562
> for acq/rel: 0,000091321
>
>
> That's why I'm going to modify tests in such way, that they would:
> 1) clear statistics
> 2) launch worker threads
> 3) run test
> 4) wait for workers procedures to complete
> 5) check stats, verify results and print them out
>
> This way worker main core will use (clear or verify) stats only when
> there are no worker threads. This would make things simpler and allowing
> to focus on testing the distributor not tests. And of course relaxed
> mode would be enough!
>
>
> Best regards
> Lukasz
>
>
> W dniu 29.09.2020 o 07:49, Honnappa Nagarahalli pisze:
>> <snip>
>>
>>> Statistics of handled packets are cleared and read on main lcore, while they
>>> are increased in workers handlers on different lcores.
>>>
>>> Without synchronization occasionally showed invalid values.
>>> This patch uses atomic acquire/release mechanisms to synchronize.
>> In general, load-acquire and store-release memory orderings are required while synchronizing data (that cannot be updated atomically) between threads. In the situation, making counters atomic is enough.
>>
>>> Fixes: c3eabff124e6 ("distributor: add unit tests")
>>> Cc: bruce.richardson at intel.com
>>> Cc: stable at dpdk.org
>>>
>>> Signed-off-by: Lukasz Wojciechowski <l.wojciechow at partner.samsung.com>
>>> Acked-by: David Hunt <david.hunt at intel.com>
>>> ---
>>>    app/test/test_distributor.c | 39 ++++++++++++++++++++++++-------------
>>>    1 file changed, 26 insertions(+), 13 deletions(-)
>>>
>>> diff --git a/app/test/test_distributor.c b/app/test/test_distributor.c index
>>> 35b25463a..0e49e3714 100644
>>> --- a/app/test/test_distributor.c
>>> +++ b/app/test/test_distributor.c
>>> @@ -43,7 +43,8 @@ total_packet_count(void)  {
>>>    	unsigned i, count = 0;
>>>    	for (i = 0; i < worker_idx; i++)
>>> -		count += worker_stats[i].handled_packets;
>>> +		count +=
>>> __atomic_load_n(&worker_stats[i].handled_packets,
>>> +				__ATOMIC_ACQUIRE);
>> RELAXED memory order is sufficient. For ex: the worker threads are not 'releasing' any data that is not atomically updated to the main thread.
>>
>>>    	return count;
>>>    }
>>>
>>> @@ -52,6 +53,7 @@ static inline void
>>>    clear_packet_count(void)
>>>    {
>>>    	memset(&worker_stats, 0, sizeof(worker_stats));
>>> +	rte_atomic_thread_fence(__ATOMIC_RELEASE);
>> Ideally, the counters should be set to 0 atomically rather than using a memset.
>>
>>>    }
>>>
>>>    /* this is the basic worker function for sanity test @@ -72,13 +74,13 @@
>>> handle_work(void *arg)
>>>    	num = rte_distributor_get_pkt(db, id, buf, buf, num);
>>>    	while (!quit) {
>>>    		__atomic_fetch_add(&worker_stats[id].handled_packets,
>>> num,
>>> -				__ATOMIC_RELAXED);
>>> +				__ATOMIC_ACQ_REL);
>> Using the __ATOMIC_ACQ_REL order does not mean anything to the main thread. The main thread might still see the updates from different threads in different order.
>>
>>>    		count += num;
>>>    		num = rte_distributor_get_pkt(db, id,
>>>    				buf, buf, num);
>>>    	}
>>>    	__atomic_fetch_add(&worker_stats[id].handled_packets, num,
>>> -			__ATOMIC_RELAXED);
>>> +			__ATOMIC_ACQ_REL);
>> Same here, do not see why this change is required.
>>
>>>    	count += num;
>>>    	rte_distributor_return_pkt(db, id, buf, num);
>>>    	return 0;
>>> @@ -134,7 +136,8 @@ sanity_test(struct worker_params *wp, struct
>>> rte_mempool *p)
>>>
>>>    	for (i = 0; i < rte_lcore_count() - 1; i++)
>>>    		printf("Worker %u handled %u packets\n", i,
>>> -				worker_stats[i].handled_packets);
>>> +			__atomic_load_n(&worker_stats[i].handled_packets,
>>> +					__ATOMIC_ACQUIRE));
>> __ATOMIC_RELAXED is enough.
>>
>>>    	printf("Sanity test with all zero hashes done.\n");
>>>
>>>    	/* pick two flows and check they go correctly */ @@ -159,7 +162,9
>>> @@ sanity_test(struct worker_params *wp, struct rte_mempool *p)
>>>
>>>    		for (i = 0; i < rte_lcore_count() - 1; i++)
>>>    			printf("Worker %u handled %u packets\n", i,
>>> -					worker_stats[i].handled_packets);
>>> +				__atomic_load_n(
>>> +					&worker_stats[i].handled_packets,
>>> +					__ATOMIC_ACQUIRE));
>> __ATOMIC_RELAXED is enough
>>
>>>    		printf("Sanity test with two hash values done\n");
>>>    	}
>>>
>>> @@ -185,7 +190,8 @@ sanity_test(struct worker_params *wp, struct
>>> rte_mempool *p)
>>>
>>>    	for (i = 0; i < rte_lcore_count() - 1; i++)
>>>    		printf("Worker %u handled %u packets\n", i,
>>> -				worker_stats[i].handled_packets);
>>> +			__atomic_load_n(&worker_stats[i].handled_packets,
>>> +					__ATOMIC_ACQUIRE));
>> __ATOMIC_RELAXED is enough
>>
>>>    	printf("Sanity test with non-zero hashes done\n");
>>>
>>>    	rte_mempool_put_bulk(p, (void *)bufs, BURST); @@ -280,15
>>> +286,17 @@ handle_work_with_free_mbufs(void *arg)
>>>    		buf[i] = NULL;
>>>    	num = rte_distributor_get_pkt(d, id, buf, buf, num);
>>>    	while (!quit) {
>>> -		worker_stats[id].handled_packets += num;
>>>    		count += num;
>>> +		__atomic_fetch_add(&worker_stats[id].handled_packets,
>>> num,
>>> +				__ATOMIC_ACQ_REL);
>> IMO, the problem would be the non-atomic update of the statistics. So, __ATOMIC_RELAXED is enough
>>
>>>    		for (i = 0; i < num; i++)
>>>    			rte_pktmbuf_free(buf[i]);
>>>    		num = rte_distributor_get_pkt(d,
>>>    				id, buf, buf, num);
>>>    	}
>>> -	worker_stats[id].handled_packets += num;
>>>    	count += num;
>>> +	__atomic_fetch_add(&worker_stats[id].handled_packets, num,
>>> +			__ATOMIC_ACQ_REL);
>> Same here, the problem is non-atomic update of the statistics, __ATOMIC_RELAXED is enough.
>> Similarly, for changes below, __ATOMIC_RELAXED is enough.
>>
>>>    	rte_distributor_return_pkt(d, id, buf, num);
>>>    	return 0;
>>>    }
>>> @@ -363,8 +371,9 @@ handle_work_for_shutdown_test(void *arg)
>>>    	/* wait for quit single globally, or for worker zero, wait
>>>    	 * for zero_quit */
>>>    	while (!quit && !(id == zero_id && zero_quit)) {
>>> -		worker_stats[id].handled_packets += num;
>>>    		count += num;
>>> +		__atomic_fetch_add(&worker_stats[id].handled_packets,
>>> num,
>>> +				__ATOMIC_ACQ_REL);
>>>    		for (i = 0; i < num; i++)
>>>    			rte_pktmbuf_free(buf[i]);
>>>    		num = rte_distributor_get_pkt(d,
>>> @@ -379,10 +388,11 @@ handle_work_for_shutdown_test(void *arg)
>>>
>>>    		total += num;
>>>    	}
>>> -	worker_stats[id].handled_packets += num;
>>>    	count += num;
>>>    	returned = rte_distributor_return_pkt(d, id, buf, num);
>>>
>>> +	__atomic_fetch_add(&worker_stats[id].handled_packets, num,
>>> +			__ATOMIC_ACQ_REL);
>>>    	if (id == zero_id) {
>>>    		/* for worker zero, allow it to restart to pick up last packet
>>>    		 * when all workers are shutting down.
>>> @@ -394,10 +404,11 @@ handle_work_for_shutdown_test(void *arg)
>>>    				id, buf, buf, num);
>>>
>>>    		while (!quit) {
>>> -			worker_stats[id].handled_packets += num;
>>>    			count += num;
>>>    			rte_pktmbuf_free(pkt);
>>>    			num = rte_distributor_get_pkt(d, id, buf, buf, num);
>>> +
>>> 	__atomic_fetch_add(&worker_stats[id].handled_packets,
>>> +					num, __ATOMIC_ACQ_REL);
>>>    		}
>>>    		returned = rte_distributor_return_pkt(d,
>>>    				id, buf, num);
>>> @@ -461,7 +472,8 @@ sanity_test_with_worker_shutdown(struct
>>> worker_params *wp,
>>>
>>>    	for (i = 0; i < rte_lcore_count() - 1; i++)
>>>    		printf("Worker %u handled %u packets\n", i,
>>> -				worker_stats[i].handled_packets);
>>> +			__atomic_load_n(&worker_stats[i].handled_packets,
>>> +					__ATOMIC_ACQUIRE));
>>>
>>>    	if (total_packet_count() != BURST * 2) {
>>>    		printf("Line %d: Error, not all packets flushed. "
>>> @@ -514,7 +526,8 @@ test_flush_with_worker_shutdown(struct
>>> worker_params *wp,
>>>    	zero_quit = 0;
>>>    	for (i = 0; i < rte_lcore_count() - 1; i++)
>>>    		printf("Worker %u handled %u packets\n", i,
>>> -				worker_stats[i].handled_packets);
>>> +			__atomic_load_n(&worker_stats[i].handled_packets,
>>> +					__ATOMIC_ACQUIRE));
>>>
>>>    	if (total_packet_count() != BURST) {
>>>    		printf("Line %d: Error, not all packets flushed. "
>>> --
>>> 2.17.1

-- 
Lukasz Wojciechowski
Principal Software Engineer

Samsung R&D Institute Poland
Samsung Electronics
Office +48 22 377 88 25
l.wojciechow at partner.samsung.com
Previous message: [dpdk-dev] [PATCH v4 2/8] test/distributor: synchronize lcores statistics
Next message: [dpdk-dev] [PATCH v4 2/8] test/distributor: synchronize lcores statistics
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the dev mailing list