[dpdk-dev] [RFC PATCH v1 4/6] app/eventdev: add release barriers for pipeline test

Pavan Nikhilesh Bhagavatula pbhagavatula at marvell.com
Tue Jan 5 10:29:25 CET 2021


Hi Feifei,

>Hi, Pavan
>
>Sorry for my late reply and thanks very much for your review.
>
>> -----Original Message-----
>> From: Pavan Nikhilesh Bhagavatula <pbhagavatula at marvell.com>
>> Sent: 2020年12月22日 18:33
>> To: Feifei Wang <Feifei.Wang2 at arm.com>; jerinj at marvell.com;
>Harry van
>> Haaren <harry.van.haaren at intel.com>; Pavan Nikhilesh
>> <pbhagavatula at caviumnetworks.com>
>> Cc: dev at dpdk.org; nd <nd at arm.com>; Honnappa Nagarahalli
>> <Honnappa.Nagarahalli at arm.com>; stable at dpdk.org; Phil Yang
>> <Phil.Yang at arm.com>
>> Subject: RE: [RFC PATCH v1 4/6] app/eventdev: add release barriers
>for
>> pipeline test
>>
>>
>> >Add release barriers before updating the processed packets for
>worker
>> >lcores to ensure the worker lcore has really finished data processing
>> >and then it can update the processed packets number.
>> >
>>
>> I believe we can live with minor inaccuracies in stats being presented
>as
>> atomics are pretty heavy when scheduler is limited to burst size as 1.
>>
>> One option is to move it before a pipeline operation
>(pipeline_event_tx,
>> pipeline_fwd_event etc.) as they imply implicit release barrier (as all
>the
>> changes done to the event should be visible to the next core).
>
>If I understand correctly, your meaning is that move release barriers
>before
>pipeline_event_tx or pipeline_fwd_event. This can ensure the event has
>been
>processed before the next core begins to tx/fwd. For example:

What I meant was event APIs such as `rte_event_enqueue_burst`, `rte_event_eth_tx_adapter_enqueue`
act as an implicit release barrier and the API `rte_event_dequeue_burst` act as an implicit acquire barrier.

Since, pipeline_* test starts with a dequeue() and ends with an enqueue() I don’t believe we need barriers in 
Between.

>
>if (ev.sched_type == RTE_SCHED_TYPE_ATOMIC) {
>		+	__atomic_thread_fence(__ATOMIC_RELEASE);
>			pipeline_event_tx(dev, port, &ev);
>			w->processed_pkts++;
>		} else {
>			ev.queue_id++;
>		+	__atomic_thread_fence(__ATOMIC_RELEASE);
>			pipeline_fwd_event(&ev,
>RTE_SCHED_TYPE_ATOMIC);
>			pipeline_event_enqueue(dev, port, &ev);
>
>However, there are two reasons to prevent this:
>
>First, compare with other tests in app/eventdev, for example, the
>eventdev perf test,
>the wmb is after event operation to ensure operation has been finished
>and then w->processed_pkts++.

In case of perf_* tests start with a dequeue() and finally ends with a mempool_put()
should also act as implicit acquire release pairs making stats consistent?

>So, if we move release barriers before tx/fwd, it may cause that the
>tests of app/eventdev
>become  inconsistent.This may reduce the maintainability of the code
>and make it difficult to understand.
>
>Second, it is a test case, though heavy thread may cause performance
>degradation, it can ensure that
>the operation process and the test result are correct. And maybe for a
>test case, correctness is more important
>than performance.
>

Most of our internal perf test run on 24/48 core combinations and since 
Octeontx2 event device driver supports a burst size of 1, it will show up as
Huge performance degradation.

>So, due to two reasons above, I'm ambivalent about how we should do
>in the next step.
>
>Best Regards
>Feifei

Regards,
Pavan.

>
>> >Fixes: 314bcf58ca8f ("app/eventdev: add pipeline queue worker
>> >functions")
>> >Cc: pbhagavatula at marvell.com
>> >Cc: stable at dpdk.org
>> >
>> >Signed-off-by: Phil Yang <phil.yang at arm.com>
>> >Signed-off-by: Feifei Wang <feifei.wang2 at arm.com>
>> >Reviewed-by: Ruifeng Wang <ruifeng.wang at arm.com>
>> >---
>> > app/test-eventdev/test_pipeline_queue.c | 64
>> >+++++++++++++++++++++----
>> > 1 file changed, 56 insertions(+), 8 deletions(-)
>> >
>> >diff --git a/app/test-eventdev/test_pipeline_queue.c b/app/test-
>> >eventdev/test_pipeline_queue.c index 7bebac34f..0c0ec0ceb
>100644
>> >--- a/app/test-eventdev/test_pipeline_queue.c
>> >+++ b/app/test-eventdev/test_pipeline_queue.c
>> >@@ -30,7 +30,13 @@ pipeline_queue_worker_single_stage_tx(void
>> >*arg)
>> >
>> > 		if (ev.sched_type == RTE_SCHED_TYPE_ATOMIC) {
>> > 			pipeline_event_tx(dev, port, &ev);
>> >-			w->processed_pkts++;
>> >+
>> >+			/* release barrier here ensures stored operation
>> >+			 * of the event completes before the number of
>> >+			 * processed pkts is visible to the main core
>> >+			 */
>> >+			__atomic_fetch_add(&(w->processed_pkts), 1,
>> >+					__ATOMIC_RELEASE);
>> > 		} else {
>> > 			ev.queue_id++;
>> > 			pipeline_fwd_event(&ev,
>> >RTE_SCHED_TYPE_ATOMIC);
>> >@@ -59,7 +65,13 @@
>pipeline_queue_worker_single_stage_fwd(void
>> >*arg)
>> > 		rte_event_eth_tx_adapter_txq_set(ev.mbuf, 0);
>> > 		pipeline_fwd_event(&ev, RTE_SCHED_TYPE_ATOMIC);
>> > 		pipeline_event_enqueue(dev, port, &ev);
>> >-		w->processed_pkts++;
>> >+
>> >+		/* release barrier here ensures stored operation
>> >+		 * of the event completes before the number of
>> >+		 * processed pkts is visible to the main core
>> >+		 */
>> >+		__atomic_fetch_add(&(w->processed_pkts), 1,
>> >+				__ATOMIC_RELEASE);
>> > 	}
>> >
>> > 	return 0;
>> >@@ -84,7 +96,13 @@
>> >pipeline_queue_worker_single_stage_burst_tx(void *arg)
>> > 			if (ev[i].sched_type ==
>> >RTE_SCHED_TYPE_ATOMIC) {
>> > 				pipeline_event_tx(dev, port, &ev[i]);
>> > 				ev[i].op = RTE_EVENT_OP_RELEASE;
>> >-				w->processed_pkts++;
>> >+
>> >+				/* release barrier here ensures stored
>> >operation
>> >+				 * of the event completes before the
>> >number of
>> >+				 * processed pkts is visible to the main
>> >core
>> >+				 */
>> >+				__atomic_fetch_add(&(w-
>> >>processed_pkts), 1,
>> >+						__ATOMIC_RELEASE);
>> > 			} else {
>> > 				ev[i].queue_id++;
>> > 				pipeline_fwd_event(&ev[i],
>> >@@ -121,7 +139,13 @@
>> >pipeline_queue_worker_single_stage_burst_fwd(void *arg)
>> > 		}
>> >
>> > 		pipeline_event_enqueue_burst(dev, port, ev, nb_rx);
>> >-		w->processed_pkts += nb_rx;
>> >+
>> >+		/* release barrier here ensures stored operation
>> >+		 * of the event completes before the number of
>> >+		 * processed pkts is visible to the main core
>> >+		 */
>> >+		__atomic_fetch_add(&(w->processed_pkts), nb_rx,
>> >+				__ATOMIC_RELEASE);
>> > 	}
>> >
>> > 	return 0;
>> >@@ -146,7 +170,13 @@
>pipeline_queue_worker_multi_stage_tx(void
>> >*arg)
>> >
>> > 		if (ev.queue_id == tx_queue[ev.mbuf->port]) {
>> > 			pipeline_event_tx(dev, port, &ev);
>> >-			w->processed_pkts++;
>> >+
>> >+			/* release barrier here ensures stored operation
>> >+			 * of the event completes before the number of
>> >+			 * processed pkts is visible to the main core
>> >+			 */
>> >+			__atomic_fetch_add(&(w->processed_pkts), 1,
>> >+					__ATOMIC_RELEASE);
>> > 			continue;
>> > 		}
>> >
>> >@@ -180,7 +210,13 @@
>> >pipeline_queue_worker_multi_stage_fwd(void *arg)
>> > 			ev.queue_id = tx_queue[ev.mbuf->port];
>> > 			rte_event_eth_tx_adapter_txq_set(ev.mbuf, 0);
>> > 			pipeline_fwd_event(&ev,
>> >RTE_SCHED_TYPE_ATOMIC);
>> >-			w->processed_pkts++;
>> >+
>> >+			/* release barrier here ensures stored operation
>> >+			 * of the event completes before the number of
>> >+			 * processed pkts is visible to the main core
>> >+			 */
>> >+			__atomic_fetch_add(&(w->processed_pkts), 1,
>> >+					__ATOMIC_RELEASE);
>> > 		} else {
>> > 			ev.queue_id++;
>> > 			pipeline_fwd_event(&ev,
>> >sched_type_list[cq_id]);
>> >@@ -214,7 +250,13 @@
>> >pipeline_queue_worker_multi_stage_burst_tx(void *arg)
>> > 			if (ev[i].queue_id == tx_queue[ev[i].mbuf-
>> >>port]) {
>> > 				pipeline_event_tx(dev, port, &ev[i]);
>> > 				ev[i].op = RTE_EVENT_OP_RELEASE;
>> >-				w->processed_pkts++;
>> >+
>> >+				/* release barrier here ensures stored
>> >operation
>> >+				 * of the event completes before the
>> >number of
>> >+				 * processed pkts is visible to the main
>> >core
>> >+				 */
>> >+				__atomic_fetch_add(&(w-
>> >>processed_pkts), 1,
>> >+						__ATOMIC_RELEASE);
>> > 				continue;
>> > 			}
>> >
>> >@@ -254,7 +296,13 @@
>> >pipeline_queue_worker_multi_stage_burst_fwd(void *arg)
>> >
>> >	rte_event_eth_tx_adapter_txq_set(ev[i].mbuf, 0);
>> > 				pipeline_fwd_event(&ev[i],
>> >
>> >	RTE_SCHED_TYPE_ATOMIC);
>> >-				w->processed_pkts++;
>> >+
>> >+				/* release barrier here ensures stored
>> >operation
>> >+				 * of the event completes before the
>> >number of
>> >+				 * processed pkts is visible to the main
>> >core
>> >+				 */
>> >+				__atomic_fetch_add(&(w-
>> >>processed_pkts), 1,
>> >+						__ATOMIC_RELEASE);
>> > 			} else {
>> > 				ev[i].queue_id++;
>> > 				pipeline_fwd_event(&ev[i],
>> >--
>> >2.17.1



More information about the dev mailing list