[dpdk-dev] [PATCH v4 2/8] test/distributor: synchronize lcores statistics

Lukasz Wojciechowski l.wojciechow at partner.samsung.com
Fri Oct 16 14:58:39 CEST 2020


W dniu 16.10.2020 o 14:43, Lukasz Wojciechowski pisze:
> Hi Honnappa,
>
> Thank you for your answer.
> In the current v7 version I followed your advise and used RELAXED memory model.
> And it works without any issues. I guess after fixing other issues found since v4 the distributor works more stable.
> I didn't have time to rearrange all tests in the way I proposed, but I guess if they work like this it's not a top priority.
>
> Can you give an ack on the series? I believe David Marchand is waiting for your opinion to process it.
I'm sorry I didn't see your other comments. I'll try to fix them them today.
>
> Best regards
> Lukasz
>
> W dniu 16.10.2020 o 07:43, Honnappa Nagarahalli pisze:
>> <snip>
>>
>>> Hi Honnappa,
>>>
>>> Many thanks for the review!
>>>
>>> I'll write my answers here not inline as it would be easier to read them in one
>>> place, I think.
>>> So first of all I agree with you in 2 things:
>>> 1) all uses of statistics must be atomic and lack of that caused most of the
>>> problems
>>> 2) it would be better to replace barrier and memset in
>>> clear_packet_count() with atomic stores as you suggested
>>>
>>> So I will apply both of above.
>>>
>>> However I wasn't not fully convinced on changing acquire/release to relaxed.
>>> It wood be perfectly ok if it would look like in this Herb Sutter's example:
>>> https://youtu.be/KeLBd2[]  EJLOU?t=4170
>>> But in his case the counters are cleared before worker threads start and are
>>> printout after they are completed.
>>>
>>> In case of the dpdk distributor tests both worker and main cores are running
>>> at the same time. In the sanity_test, the statistics are cleared and verified
>>> few times for different hashes of packages. The worker cores are not
>>> stopped at this time and they continue their loops in handle procedure.
>>> Verification made in main core is an exchange of data as the current statistics
>>> indicate how the test will result.
>> Agree. The key point we have to note is that the data that is exchanged between the two threads is already atomic (handled_packets is atomic).
>>
>>> So as I wasn't convinced, I run some tests with both both relaxed and
>>> acquire/release modes and they both fail :( The failures caused by statistics
>>> errors to number of tests ratio for
>>> 200000 tests was:
>>> for relaxed: 0,000790562
>>> for acq/rel: 0,000091321
>>>
>>>
>>> That's why I'm going to modify tests in such way, that they would:
>>> 1) clear statistics
>>> 2) launch worker threads
>>> 3) run test
>>> 4) wait for workers procedures to complete
>>> 5) check stats, verify results and print them out
>>>
>>> This way worker main core will use (clear or verify) stats only when there are
>>> no worker threads. This would make things simpler and allowing to focus on
>>> testing the distributor not tests. And of course relaxed mode would be
>>> enough!
>> Agree, this would be the only way to ensure that the main thread sees the correct statistics (just like in the video)
>>
>>> Best regards
>>> Lukasz
>>>
>>>
>>> W dniu 29.09.2020 o 07:49, Honnappa Nagarahalli pisze:
>>>> <snip>
>>>>
>>>>> Statistics of handled packets are cleared and read on main lcore,
>>>>> while they are increased in workers handlers on different lcores.
>>>>>
>>>>> Without synchronization occasionally showed invalid values.
>>>>> This patch uses atomic acquire/release mechanisms to synchronize.
>>>> In general, load-acquire and store-release memory orderings are required
>>> while synchronizing data (that cannot be updated atomically) between
>>> threads. In the situation, making counters atomic is enough.
>>>>> Fixes: c3eabff124e6 ("distributor: add unit tests")
>>>>> Cc: bruce.richardson at intel.com
>>>>> Cc: stable at dpdk.org
>>>>>
>>>>> Signed-off-by: Lukasz Wojciechowski
>>>>> <l.wojciechow at partner.samsung.com>
>>>>> Acked-by: David Hunt <david.hunt at intel.com>
>>>>> ---
>>>>>     app/test/test_distributor.c | 39 ++++++++++++++++++++++++-----------
>>> --
>>>>>     1 file changed, 26 insertions(+), 13 deletions(-)
>>>>>
>>>>> diff --git a/app/test/test_distributor.c
>>>>> b/app/test/test_distributor.c index
>>>>> 35b25463a..0e49e3714 100644
>>>>> --- a/app/test/test_distributor.c
>>>>> +++ b/app/test/test_distributor.c
>>>>> @@ -43,7 +43,8 @@ total_packet_count(void)  {
>>>>>     	unsigned i, count = 0;
>>>>>     	for (i = 0; i < worker_idx; i++)
>>>>> -		count += worker_stats[i].handled_packets;
>>>>> +		count +=
>>>>> __atomic_load_n(&worker_stats[i].handled_packets,
>>>>> +				__ATOMIC_ACQUIRE);
>>>> RELAXED memory order is sufficient. For ex: the worker threads are not
>>> 'releasing' any data that is not atomically updated to the main thread.
>>>>>     	return count;
>>>>>     }
>>>>>
>>>>> @@ -52,6 +53,7 @@ static inline void
>>>>>     clear_packet_count(void)
>>>>>     {
>>>>>     	memset(&worker_stats, 0, sizeof(worker_stats));
>>>>> +	rte_atomic_thread_fence(__ATOMIC_RELEASE);
>>>> Ideally, the counters should be set to 0 atomically rather than using a
>>> memset.
>>>>>     }
>>>>>
>>>>>     /* this is the basic worker function for sanity test @@ -72,13
>>>>> +74,13 @@ handle_work(void *arg)
>>>>>     	num = rte_distributor_get_pkt(db, id, buf, buf, num);
>>>>>     	while (!quit) {
>>>>>     		__atomic_fetch_add(&worker_stats[id].handled_packets,
>>>>> num,
>>>>> -				__ATOMIC_RELAXED);
>>>>> +				__ATOMIC_ACQ_REL);
>>>> Using the __ATOMIC_ACQ_REL order does not mean anything to the main
>>> thread. The main thread might still see the updates from different threads in
>>> different order.
>>>>>     		count += num;
>>>>>     		num = rte_distributor_get_pkt(db, id,
>>>>>     				buf, buf, num);
>>>>>     	}
>>>>>     	__atomic_fetch_add(&worker_stats[id].handled_packets, num,
>>>>> -			__ATOMIC_RELAXED);
>>>>> +			__ATOMIC_ACQ_REL);
>>>> Same here, do not see why this change is required.
>>>>
>>>>>     	count += num;
>>>>>     	rte_distributor_return_pkt(db, id, buf, num);
>>>>>     	return 0;
>>>>> @@ -134,7 +136,8 @@ sanity_test(struct worker_params *wp, struct
>>>>> rte_mempool *p)
>>>>>
>>>>>     	for (i = 0; i < rte_lcore_count() - 1; i++)
>>>>>     		printf("Worker %u handled %u packets\n", i,
>>>>> -				worker_stats[i].handled_packets);
>>>>> +			__atomic_load_n(&worker_stats[i].handled_packets,
>>>>> +					__ATOMIC_ACQUIRE));
>>>> __ATOMIC_RELAXED is enough.
>>>>
>>>>>     	printf("Sanity test with all zero hashes done.\n");
>>>>>
>>>>>     	/* pick two flows and check they go correctly */ @@ -159,7 +162,9
>>>>> @@ sanity_test(struct worker_params *wp, struct rte_mempool *p)
>>>>>
>>>>>     		for (i = 0; i < rte_lcore_count() - 1; i++)
>>>>>     			printf("Worker %u handled %u packets\n", i,
>>>>> -					worker_stats[i].handled_packets);
>>>>> +				__atomic_load_n(
>>>>> +					&worker_stats[i].handled_packets,
>>>>> +					__ATOMIC_ACQUIRE));
>>>> __ATOMIC_RELAXED is enough
>>>>
>>>>>     		printf("Sanity test with two hash values done\n");
>>>>>     	}
>>>>>
>>>>> @@ -185,7 +190,8 @@ sanity_test(struct worker_params *wp, struct
>>>>> rte_mempool *p)
>>>>>
>>>>>     	for (i = 0; i < rte_lcore_count() - 1; i++)
>>>>>     		printf("Worker %u handled %u packets\n", i,
>>>>> -				worker_stats[i].handled_packets);
>>>>> +			__atomic_load_n(&worker_stats[i].handled_packets,
>>>>> +					__ATOMIC_ACQUIRE));
>>>> __ATOMIC_RELAXED is enough
>>>>
>>>>>     	printf("Sanity test with non-zero hashes done\n");
>>>>>
>>>>>     	rte_mempool_put_bulk(p, (void *)bufs, BURST); @@ -280,15
>>>>> +286,17 @@ handle_work_with_free_mbufs(void *arg)
>>>>>     		buf[i] = NULL;
>>>>>     	num = rte_distributor_get_pkt(d, id, buf, buf, num);
>>>>>     	while (!quit) {
>>>>> -		worker_stats[id].handled_packets += num;
>>>>>     		count += num;
>>>>> +		__atomic_fetch_add(&worker_stats[id].handled_packets,
>>>>> num,
>>>>> +				__ATOMIC_ACQ_REL);
>>>> IMO, the problem would be the non-atomic update of the statistics. So,
>>>> __ATOMIC_RELAXED is enough
>>>>
>>>>>     		for (i = 0; i < num; i++)
>>>>>     			rte_pktmbuf_free(buf[i]);
>>>>>     		num = rte_distributor_get_pkt(d,
>>>>>     				id, buf, buf, num);
>>>>>     	}
>>>>> -	worker_stats[id].handled_packets += num;
>>>>>     	count += num;
>>>>> +	__atomic_fetch_add(&worker_stats[id].handled_packets, num,
>>>>> +			__ATOMIC_ACQ_REL);
>>>> Same here, the problem is non-atomic update of the statistics,
>>> __ATOMIC_RELAXED is enough.
>>>> Similarly, for changes below, __ATOMIC_RELAXED is enough.
>>>>
>>>>>     	rte_distributor_return_pkt(d, id, buf, num);
>>>>>     	return 0;
>>>>>     }
>>>>> @@ -363,8 +371,9 @@ handle_work_for_shutdown_test(void *arg)
>>>>>     	/* wait for quit single globally, or for worker zero, wait
>>>>>     	 * for zero_quit */
>>>>>     	while (!quit && !(id == zero_id && zero_quit)) {
>>>>> -		worker_stats[id].handled_packets += num;
>>>>>     		count += num;
>>>>> +		__atomic_fetch_add(&worker_stats[id].handled_packets,
>>>>> num,
>>>>> +				__ATOMIC_ACQ_REL);
>>>>>     		for (i = 0; i < num; i++)
>>>>>     			rte_pktmbuf_free(buf[i]);
>>>>>     		num = rte_distributor_get_pkt(d,
>>>>> @@ -379,10 +388,11 @@ handle_work_for_shutdown_test(void *arg)
>>>>>
>>>>>     		total += num;
>>>>>     	}
>>>>> -	worker_stats[id].handled_packets += num;
>>>>>     	count += num;
>>>>>     	returned = rte_distributor_return_pkt(d, id, buf, num);
>>>>>
>>>>> +	__atomic_fetch_add(&worker_stats[id].handled_packets, num,
>>>>> +			__ATOMIC_ACQ_REL);
>>>>>     	if (id == zero_id) {
>>>>>     		/* for worker zero, allow it to restart to pick up last packet
>>>>>     		 * when all workers are shutting down.
>>>>> @@ -394,10 +404,11 @@ handle_work_for_shutdown_test(void *arg)
>>>>>     				id, buf, buf, num);
>>>>>
>>>>>     		while (!quit) {
>>>>> -			worker_stats[id].handled_packets += num;
>>>>>     			count += num;
>>>>>     			rte_pktmbuf_free(pkt);
>>>>>     			num = rte_distributor_get_pkt(d, id, buf, buf, num);
>>>>> +
>>>>> 	__atomic_fetch_add(&worker_stats[id].handled_packets,
>>>>> +					num, __ATOMIC_ACQ_REL);
>>>>>     		}
>>>>>     		returned = rte_distributor_return_pkt(d,
>>>>>     				id, buf, num);
>>>>> @@ -461,7 +472,8 @@ sanity_test_with_worker_shutdown(struct
>>>>> worker_params *wp,
>>>>>
>>>>>     	for (i = 0; i < rte_lcore_count() - 1; i++)
>>>>>     		printf("Worker %u handled %u packets\n", i,
>>>>> -				worker_stats[i].handled_packets);
>>>>> +			__atomic_load_n(&worker_stats[i].handled_packets,
>>>>> +					__ATOMIC_ACQUIRE));
>>>>>
>>>>>     	if (total_packet_count() != BURST * 2) {
>>>>>     		printf("Line %d: Error, not all packets flushed. "
>>>>> @@ -514,7 +526,8 @@ test_flush_with_worker_shutdown(struct
>>>>> worker_params *wp,
>>>>>     	zero_quit = 0;
>>>>>     	for (i = 0; i < rte_lcore_count() - 1; i++)
>>>>>     		printf("Worker %u handled %u packets\n", i,
>>>>> -				worker_stats[i].handled_packets);
>>>>> +			__atomic_load_n(&worker_stats[i].handled_packets,
>>>>> +					__ATOMIC_ACQUIRE));
>>>>>
>>>>>     	if (total_packet_count() != BURST) {
>>>>>     		printf("Line %d: Error, not all packets flushed. "
>>>>> --
>>>>> 2.17.1
>>> --
>>> Lukasz Wojciechowski
>>> Principal Software Engineer
>>>
>>> Samsung R&D Institute Poland
>>> Samsung Electronics
>>> Office +48 22 377 88 25
>>> l.wojciechow at partner.samsung.com

-- 
Lukasz Wojciechowski
Principal Software Engineer

Samsung R&D Institute Poland
Samsung Electronics
Office +48 22 377 88 25
l.wojciechow at partner.samsung.com



More information about the dev mailing list