[dpdk-dev] [PATCH v4 2/8] test/distributor: synchronize lcores statistics

Lukasz Wojciechowski l.wojciechow at partner.samsung.com
Sat Oct 17 05:34:26 CEST 2020
Previous message: [dpdk-dev] [PATCH v4 2/8] test/distributor: synchronize lcores statistics
Next message: [dpdk-dev] [PATCH] l3fwd-power: make interrupt wakeup log thread safe
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
W dniu 16.10.2020 o 17:42, Honnappa Nagarahalli pisze:
> <snip>
>
>> W dniu 16.10.2020 o 14:43, Lukasz Wojciechowski pisze:
>>> Hi Honnappa,
>>>
>>> Thank you for your answer.
>>> In the current v7 version I followed your advise and used RELAXED memory
>> model.
>>> And it works without any issues. I guess after fixing other issues found
>> since v4 the distributor works more stable.
>>> I didn't have time to rearrange all tests in the way I proposed, but I guess if
>> they work like this it's not a top priority.
> Agree, not a top priority.
>
>>> Can you give an ack on the series? I believe David Marchand is waiting for
>> your opinion to process it.
>> I'm sorry I didn't see your other comments. I'll try to fix them them today.
> No problem, I can review the next series quickly.
So it's already there, the v8.
but you have one patch more to look at :)
I was ready to submit new version, but I run some tests ... and one of 
executions hanged, so I attached myself with gdb and found one issue more.
>
>>> Best regards
>>> Lukasz
>>>
>>> W dniu 16.10.2020 o 07:43, Honnappa Nagarahalli pisze:
>>>> <snip>
>>>>
>>>>> Hi Honnappa,
>>>>>
>>>>> Many thanks for the review!
>>>>>
>>>>> I'll write my answers here not inline as it would be easier to read
>>>>> them in one place, I think.
>>>>> So first of all I agree with you in 2 things:
>>>>> 1) all uses of statistics must be atomic and lack of that caused
>>>>> most of the problems
>>>>> 2) it would be better to replace barrier and memset in
>>>>> clear_packet_count() with atomic stores as you suggested
>>>>>
>>>>> So I will apply both of above.
>>>>>
>>>>> However I wasn't not fully convinced on changing acquire/release to
>> relaxed.
>>>>> It wood be perfectly ok if it would look like in this Herb Sutter's example:
>>>>> https://youtu.be/KeLBd2[]  EJLOU?t=4170 But in his case the counters
>>>>> are cleared before worker threads start and are printout after they
>>>>> are completed.
>>>>>
>>>>> In case of the dpdk distributor tests both worker and main cores are
>>>>> running at the same time. In the sanity_test, the statistics are
>>>>> cleared and verified few times for different hashes of packages. The
>>>>> worker cores are not stopped at this time and they continue their loops
>> in handle procedure.
>>>>> Verification made in main core is an exchange of data as the current
>>>>> statistics indicate how the test will result.
>>>> Agree. The key point we have to note is that the data that is exchanged
>> between the two threads is already atomic (handled_packets is atomic).
>>>>> So as I wasn't convinced, I run some tests with both both relaxed
>>>>> and acquire/release modes and they both fail :( The failures caused
>>>>> by statistics errors to number of tests ratio for
>>>>> 200000 tests was:
>>>>> for relaxed: 0,000790562
>>>>> for acq/rel: 0,000091321
>>>>>
>>>>>
>>>>> That's why I'm going to modify tests in such way, that they would:
>>>>> 1) clear statistics
>>>>> 2) launch worker threads
>>>>> 3) run test
>>>>> 4) wait for workers procedures to complete
>>>>> 5) check stats, verify results and print them out
>>>>>
>>>>> This way worker main core will use (clear or verify) stats only when
>>>>> there are no worker threads. This would make things simpler and
>>>>> allowing to focus on testing the distributor not tests. And of
>>>>> course relaxed mode would be enough!
>>>> Agree, this would be the only way to ensure that the main thread sees
>>>> the correct statistics (just like in the video)
>>>>
>>>>> Best regards
>>>>> Lukasz
>>>>>
>>>>>
>>>>> W dniu 29.09.2020 o 07:49, Honnappa Nagarahalli pisze:
>>>>>> <snip>
>>>>>>
>>>>>>> Statistics of handled packets are cleared and read on main lcore,
>>>>>>> while they are increased in workers handlers on different lcores.
>>>>>>>
>>>>>>> Without synchronization occasionally showed invalid values.
>>>>>>> This patch uses atomic acquire/release mechanisms to synchronize.
>>>>>> In general, load-acquire and store-release memory orderings are
>>>>>> required
>>>>> while synchronizing data (that cannot be updated atomically) between
>>>>> threads. In the situation, making counters atomic is enough.
>>>>>>> Fixes: c3eabff124e6 ("distributor: add unit tests")
>>>>>>> Cc: bruce.richardson at intel.com
>>>>>>> Cc: stable at dpdk.org
>>>>>>>
>>>>>>> Signed-off-by: Lukasz Wojciechowski
>>>>>>> <l.wojciechow at partner.samsung.com>
>>>>>>> Acked-by: David Hunt <david.hunt at intel.com>
>>>>>>> ---
>>>>>>>      app/test/test_distributor.c | 39
>>>>>>> ++++++++++++++++++++++++-----------
>>>>> --
>>>>>>>      1 file changed, 26 insertions(+), 13 deletions(-)
>>>>>>>
>>>>>>> diff --git a/app/test/test_distributor.c
>>>>>>> b/app/test/test_distributor.c index
>>>>>>> 35b25463a..0e49e3714 100644
>>>>>>> --- a/app/test/test_distributor.c
>>>>>>> +++ b/app/test/test_distributor.c
>>>>>>> @@ -43,7 +43,8 @@ total_packet_count(void)  {
>>>>>>>      	unsigned i, count = 0;
>>>>>>>      	for (i = 0; i < worker_idx; i++)
>>>>>>> -		count += worker_stats[i].handled_packets;
>>>>>>> +		count +=
>>>>>>> __atomic_load_n(&worker_stats[i].handled_packets,
>>>>>>> +				__ATOMIC_ACQUIRE);
>>>>>> RELAXED memory order is sufficient. For ex: the worker threads are
>>>>>> not
>>>>> 'releasing' any data that is not atomically updated to the main thread.
>>>>>>>      	return count;
>>>>>>>      }
>>>>>>>
>>>>>>> @@ -52,6 +53,7 @@ static inline void
>>>>>>>      clear_packet_count(void)
>>>>>>>      {
>>>>>>>      	memset(&worker_stats, 0, sizeof(worker_stats));
>>>>>>> +	rte_atomic_thread_fence(__ATOMIC_RELEASE);
>>>>>> Ideally, the counters should be set to 0 atomically rather than
>>>>>> using a
>>>>> memset.
>>>>>>>      }
>>>>>>>
>>>>>>>      /* this is the basic worker function for sanity test @@ -72,13
>>>>>>> +74,13 @@ handle_work(void *arg)
>>>>>>>      	num = rte_distributor_get_pkt(db, id, buf, buf, num);
>>>>>>>      	while (!quit) {
>>>>>>>
>> 	__atomic_fetch_add(&worker_stats[id].handled_packets,
>>>>>>> num,
>>>>>>> -				__ATOMIC_RELAXED);
>>>>>>> +				__ATOMIC_ACQ_REL);
>>>>>> Using the __ATOMIC_ACQ_REL order does not mean anything to the
>> main
>>>>> thread. The main thread might still see the updates from different
>>>>> threads in different order.
>>>>>>>      		count += num;
>>>>>>>      		num = rte_distributor_get_pkt(db, id,
>>>>>>>      				buf, buf, num);
>>>>>>>      	}
>>>>>>>      	__atomic_fetch_add(&worker_stats[id].handled_packets,
>> num,
>>>>>>> -			__ATOMIC_RELAXED);
>>>>>>> +			__ATOMIC_ACQ_REL);
>>>>>> Same here, do not see why this change is required.
>>>>>>
>>>>>>>      	count += num;
>>>>>>>      	rte_distributor_return_pkt(db, id, buf, num);
>>>>>>>      	return 0;
>>>>>>> @@ -134,7 +136,8 @@ sanity_test(struct worker_params *wp, struct
>>>>>>> rte_mempool *p)
>>>>>>>
>>>>>>>      	for (i = 0; i < rte_lcore_count() - 1; i++)
>>>>>>>      		printf("Worker %u handled %u packets\n", i,
>>>>>>> -				worker_stats[i].handled_packets);
>>>>>>> +
>> 	__atomic_load_n(&worker_stats[i].handled_packets,
>>>>>>> +					__ATOMIC_ACQUIRE));
>>>>>> __ATOMIC_RELAXED is enough.
>>>>>>
>>>>>>>      	printf("Sanity test with all zero hashes done.\n");
>>>>>>>
>>>>>>>      	/* pick two flows and check they go correctly */ @@ -159,7
>>>>>>> +162,9 @@ sanity_test(struct worker_params *wp, struct
>> rte_mempool
>>>>>>> *p)
>>>>>>>
>>>>>>>      		for (i = 0; i < rte_lcore_count() - 1; i++)
>>>>>>>      			printf("Worker %u handled %u packets\n", i,
>>>>>>> -
>> 	worker_stats[i].handled_packets);
>>>>>>> +				__atomic_load_n(
>>>>>>> +
>> 	&worker_stats[i].handled_packets,
>>>>>>> +					__ATOMIC_ACQUIRE));
>>>>>> __ATOMIC_RELAXED is enough
>>>>>>
>>>>>>>      		printf("Sanity test with two hash values done\n");
>>>>>>>      	}
>>>>>>>
>>>>>>> @@ -185,7 +190,8 @@ sanity_test(struct worker_params *wp, struct
>>>>>>> rte_mempool *p)
>>>>>>>
>>>>>>>      	for (i = 0; i < rte_lcore_count() - 1; i++)
>>>>>>>      		printf("Worker %u handled %u packets\n", i,
>>>>>>> -				worker_stats[i].handled_packets);
>>>>>>> +
>> 	__atomic_load_n(&worker_stats[i].handled_packets,
>>>>>>> +					__ATOMIC_ACQUIRE));
>>>>>> __ATOMIC_RELAXED is enough
>>>>>>
>>>>>>>      	printf("Sanity test with non-zero hashes done\n");
>>>>>>>
>>>>>>>      	rte_mempool_put_bulk(p, (void *)bufs, BURST); @@ -280,15
>>>>>>> +286,17 @@ handle_work_with_free_mbufs(void *arg)
>>>>>>>      		buf[i] = NULL;
>>>>>>>      	num = rte_distributor_get_pkt(d, id, buf, buf, num);
>>>>>>>      	while (!quit) {
>>>>>>> -		worker_stats[id].handled_packets += num;
>>>>>>>      		count += num;
>>>>>>> +
>> 	__atomic_fetch_add(&worker_stats[id].handled_packets,
>>>>>>> num,
>>>>>>> +				__ATOMIC_ACQ_REL);
>>>>>> IMO, the problem would be the non-atomic update of the statistics.
>>>>>> So, __ATOMIC_RELAXED is enough
>>>>>>
>>>>>>>      		for (i = 0; i < num; i++)
>>>>>>>      			rte_pktmbuf_free(buf[i]);
>>>>>>>      		num = rte_distributor_get_pkt(d,
>>>>>>>      				id, buf, buf, num);
>>>>>>>      	}
>>>>>>> -	worker_stats[id].handled_packets += num;
>>>>>>>      	count += num;
>>>>>>> +	__atomic_fetch_add(&worker_stats[id].handled_packets,
>> num,
>>>>>>> +			__ATOMIC_ACQ_REL);
>>>>>> Same here, the problem is non-atomic update of the statistics,
>>>>> __ATOMIC_RELAXED is enough.
>>>>>> Similarly, for changes below, __ATOMIC_RELAXED is enough.
>>>>>>
>>>>>>>      	rte_distributor_return_pkt(d, id, buf, num);
>>>>>>>      	return 0;
>>>>>>>      }
>>>>>>> @@ -363,8 +371,9 @@ handle_work_for_shutdown_test(void *arg)
>>>>>>>      	/* wait for quit single globally, or for worker zero, wait
>>>>>>>      	 * for zero_quit */
>>>>>>>      	while (!quit && !(id == zero_id && zero_quit)) {
>>>>>>> -		worker_stats[id].handled_packets += num;
>>>>>>>      		count += num;
>>>>>>> +
>> 	__atomic_fetch_add(&worker_stats[id].handled_packets,
>>>>>>> num,
>>>>>>> +				__ATOMIC_ACQ_REL);
>>>>>>>      		for (i = 0; i < num; i++)
>>>>>>>      			rte_pktmbuf_free(buf[i]);
>>>>>>>      		num = rte_distributor_get_pkt(d, @@ -379,10
>> +388,11 @@
>>>>>>> handle_work_for_shutdown_test(void *arg)
>>>>>>>
>>>>>>>      		total += num;
>>>>>>>      	}
>>>>>>> -	worker_stats[id].handled_packets += num;
>>>>>>>      	count += num;
>>>>>>>      	returned = rte_distributor_return_pkt(d, id, buf, num);
>>>>>>>
>>>>>>> +	__atomic_fetch_add(&worker_stats[id].handled_packets,
>> num,
>>>>>>> +			__ATOMIC_ACQ_REL);
>>>>>>>      	if (id == zero_id) {
>>>>>>>      		/* for worker zero, allow it to restart to pick up last
>> packet
>>>>>>>      		 * when all workers are shutting down.
>>>>>>> @@ -394,10 +404,11 @@ handle_work_for_shutdown_test(void *arg)
>>>>>>>      				id, buf, buf, num);
>>>>>>>
>>>>>>>      		while (!quit) {
>>>>>>> -			worker_stats[id].handled_packets += num;
>>>>>>>      			count += num;
>>>>>>>      			rte_pktmbuf_free(pkt);
>>>>>>>      			num = rte_distributor_get_pkt(d, id, buf, buf,
>> num);
>>>>>>> +
>>>>>>> 	__atomic_fetch_add(&worker_stats[id].handled_packets,
>>>>>>> +					num, __ATOMIC_ACQ_REL);
>>>>>>>      		}
>>>>>>>      		returned = rte_distributor_return_pkt(d,
>>>>>>>      				id, buf, num);
>>>>>>> @@ -461,7 +472,8 @@ sanity_test_with_worker_shutdown(struct
>>>>>>> worker_params *wp,
>>>>>>>
>>>>>>>      	for (i = 0; i < rte_lcore_count() - 1; i++)
>>>>>>>      		printf("Worker %u handled %u packets\n", i,
>>>>>>> -				worker_stats[i].handled_packets);
>>>>>>> +
>> 	__atomic_load_n(&worker_stats[i].handled_packets,
>>>>>>> +					__ATOMIC_ACQUIRE));
>>>>>>>
>>>>>>>      	if (total_packet_count() != BURST * 2) {
>>>>>>>      		printf("Line %d: Error, not all packets flushed. "
>>>>>>> @@ -514,7 +526,8 @@ test_flush_with_worker_shutdown(struct
>>>>>>> worker_params *wp,
>>>>>>>      	zero_quit = 0;
>>>>>>>      	for (i = 0; i < rte_lcore_count() - 1; i++)
>>>>>>>      		printf("Worker %u handled %u packets\n", i,
>>>>>>> -				worker_stats[i].handled_packets);
>>>>>>> +
>> 	__atomic_load_n(&worker_stats[i].handled_packets,
>>>>>>> +					__ATOMIC_ACQUIRE));
>>>>>>>
>>>>>>>      	if (total_packet_count() != BURST) {
>>>>>>>      		printf("Line %d: Error, not all packets flushed. "
>>>>>>> --
>>>>>>> 2.17.1
>>>>> --
>>>>> Lukasz Wojciechowski
>>>>> Principal Software Engineer
>>>>>
>>>>> Samsung R&D Institute Poland
>>>>> Samsung Electronics
>>>>> Office +48 22 377 88 25
>>>>> l.wojciechow at partner.samsung.com
>> --
>> Lukasz Wojciechowski
>> Principal Software Engineer
>>
>> Samsung R&D Institute Poland
>> Samsung Electronics
>> Office +48 22 377 88 25
>> l.wojciechow at partner.samsung.com

-- 
Lukasz Wojciechowski
Principal Software Engineer

Samsung R&D Institute Poland
Samsung Electronics
Office +48 22 377 88 25
l.wojciechow at partner.samsung.com
Previous message: [dpdk-dev] [PATCH v4 2/8] test/distributor: synchronize lcores statistics
Next message: [dpdk-dev] [PATCH] l3fwd-power: make interrupt wakeup log thread safe
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the dev mailing list