[dpdk-dev] [RFC PATCH v1 4/6] app/eventdev: add release barriers for pipeline test
Pavan Nikhilesh Bhagavatula
pbhagavatula at marvell.com
Tue Jan 5 10:29:25 CET 2021
Hi Feifei,
>Hi, Pavan
>
>Sorry for my late reply and thanks very much for your review.
>
>> -----Original Message-----
>> From: Pavan Nikhilesh Bhagavatula <pbhagavatula at marvell.com>
>> Sent: 2020年12月22日 18:33
>> To: Feifei Wang <Feifei.Wang2 at arm.com>; jerinj at marvell.com;
>Harry van
>> Haaren <harry.van.haaren at intel.com>; Pavan Nikhilesh
>> <pbhagavatula at caviumnetworks.com>
>> Cc: dev at dpdk.org; nd <nd at arm.com>; Honnappa Nagarahalli
>> <Honnappa.Nagarahalli at arm.com>; stable at dpdk.org; Phil Yang
>> <Phil.Yang at arm.com>
>> Subject: RE: [RFC PATCH v1 4/6] app/eventdev: add release barriers
>for
>> pipeline test
>>
>>
>> >Add release barriers before updating the processed packets for
>worker
>> >lcores to ensure the worker lcore has really finished data processing
>> >and then it can update the processed packets number.
>> >
>>
>> I believe we can live with minor inaccuracies in stats being presented
>as
>> atomics are pretty heavy when scheduler is limited to burst size as 1.
>>
>> One option is to move it before a pipeline operation
>(pipeline_event_tx,
>> pipeline_fwd_event etc.) as they imply implicit release barrier (as all
>the
>> changes done to the event should be visible to the next core).
>
>If I understand correctly, your meaning is that move release barriers
>before
>pipeline_event_tx or pipeline_fwd_event. This can ensure the event has
>been
>processed before the next core begins to tx/fwd. For example:
What I meant was event APIs such as `rte_event_enqueue_burst`, `rte_event_eth_tx_adapter_enqueue`
act as an implicit release barrier and the API `rte_event_dequeue_burst` act as an implicit acquire barrier.
Since, pipeline_* test starts with a dequeue() and ends with an enqueue() I don’t believe we need barriers in
Between.
>
>if (ev.sched_type == RTE_SCHED_TYPE_ATOMIC) {
> + __atomic_thread_fence(__ATOMIC_RELEASE);
> pipeline_event_tx(dev, port, &ev);
> w->processed_pkts++;
> } else {
> ev.queue_id++;
> + __atomic_thread_fence(__ATOMIC_RELEASE);
> pipeline_fwd_event(&ev,
>RTE_SCHED_TYPE_ATOMIC);
> pipeline_event_enqueue(dev, port, &ev);
>
>However, there are two reasons to prevent this:
>
>First, compare with other tests in app/eventdev, for example, the
>eventdev perf test,
>the wmb is after event operation to ensure operation has been finished
>and then w->processed_pkts++.
In case of perf_* tests start with a dequeue() and finally ends with a mempool_put()
should also act as implicit acquire release pairs making stats consistent?
>So, if we move release barriers before tx/fwd, it may cause that the
>tests of app/eventdev
>become inconsistent.This may reduce the maintainability of the code
>and make it difficult to understand.
>
>Second, it is a test case, though heavy thread may cause performance
>degradation, it can ensure that
>the operation process and the test result are correct. And maybe for a
>test case, correctness is more important
>than performance.
>
Most of our internal perf test run on 24/48 core combinations and since
Octeontx2 event device driver supports a burst size of 1, it will show up as
Huge performance degradation.
>So, due to two reasons above, I'm ambivalent about how we should do
>in the next step.
>
>Best Regards
>Feifei
Regards,
Pavan.
>
>> >Fixes: 314bcf58ca8f ("app/eventdev: add pipeline queue worker
>> >functions")
>> >Cc: pbhagavatula at marvell.com
>> >Cc: stable at dpdk.org
>> >
>> >Signed-off-by: Phil Yang <phil.yang at arm.com>
>> >Signed-off-by: Feifei Wang <feifei.wang2 at arm.com>
>> >Reviewed-by: Ruifeng Wang <ruifeng.wang at arm.com>
>> >---
>> > app/test-eventdev/test_pipeline_queue.c | 64
>> >+++++++++++++++++++++----
>> > 1 file changed, 56 insertions(+), 8 deletions(-)
>> >
>> >diff --git a/app/test-eventdev/test_pipeline_queue.c b/app/test-
>> >eventdev/test_pipeline_queue.c index 7bebac34f..0c0ec0ceb
>100644
>> >--- a/app/test-eventdev/test_pipeline_queue.c
>> >+++ b/app/test-eventdev/test_pipeline_queue.c
>> >@@ -30,7 +30,13 @@ pipeline_queue_worker_single_stage_tx(void
>> >*arg)
>> >
>> > if (ev.sched_type == RTE_SCHED_TYPE_ATOMIC) {
>> > pipeline_event_tx(dev, port, &ev);
>> >- w->processed_pkts++;
>> >+
>> >+ /* release barrier here ensures stored operation
>> >+ * of the event completes before the number of
>> >+ * processed pkts is visible to the main core
>> >+ */
>> >+ __atomic_fetch_add(&(w->processed_pkts), 1,
>> >+ __ATOMIC_RELEASE);
>> > } else {
>> > ev.queue_id++;
>> > pipeline_fwd_event(&ev,
>> >RTE_SCHED_TYPE_ATOMIC);
>> >@@ -59,7 +65,13 @@
>pipeline_queue_worker_single_stage_fwd(void
>> >*arg)
>> > rte_event_eth_tx_adapter_txq_set(ev.mbuf, 0);
>> > pipeline_fwd_event(&ev, RTE_SCHED_TYPE_ATOMIC);
>> > pipeline_event_enqueue(dev, port, &ev);
>> >- w->processed_pkts++;
>> >+
>> >+ /* release barrier here ensures stored operation
>> >+ * of the event completes before the number of
>> >+ * processed pkts is visible to the main core
>> >+ */
>> >+ __atomic_fetch_add(&(w->processed_pkts), 1,
>> >+ __ATOMIC_RELEASE);
>> > }
>> >
>> > return 0;
>> >@@ -84,7 +96,13 @@
>> >pipeline_queue_worker_single_stage_burst_tx(void *arg)
>> > if (ev[i].sched_type ==
>> >RTE_SCHED_TYPE_ATOMIC) {
>> > pipeline_event_tx(dev, port, &ev[i]);
>> > ev[i].op = RTE_EVENT_OP_RELEASE;
>> >- w->processed_pkts++;
>> >+
>> >+ /* release barrier here ensures stored
>> >operation
>> >+ * of the event completes before the
>> >number of
>> >+ * processed pkts is visible to the main
>> >core
>> >+ */
>> >+ __atomic_fetch_add(&(w-
>> >>processed_pkts), 1,
>> >+ __ATOMIC_RELEASE);
>> > } else {
>> > ev[i].queue_id++;
>> > pipeline_fwd_event(&ev[i],
>> >@@ -121,7 +139,13 @@
>> >pipeline_queue_worker_single_stage_burst_fwd(void *arg)
>> > }
>> >
>> > pipeline_event_enqueue_burst(dev, port, ev, nb_rx);
>> >- w->processed_pkts += nb_rx;
>> >+
>> >+ /* release barrier here ensures stored operation
>> >+ * of the event completes before the number of
>> >+ * processed pkts is visible to the main core
>> >+ */
>> >+ __atomic_fetch_add(&(w->processed_pkts), nb_rx,
>> >+ __ATOMIC_RELEASE);
>> > }
>> >
>> > return 0;
>> >@@ -146,7 +170,13 @@
>pipeline_queue_worker_multi_stage_tx(void
>> >*arg)
>> >
>> > if (ev.queue_id == tx_queue[ev.mbuf->port]) {
>> > pipeline_event_tx(dev, port, &ev);
>> >- w->processed_pkts++;
>> >+
>> >+ /* release barrier here ensures stored operation
>> >+ * of the event completes before the number of
>> >+ * processed pkts is visible to the main core
>> >+ */
>> >+ __atomic_fetch_add(&(w->processed_pkts), 1,
>> >+ __ATOMIC_RELEASE);
>> > continue;
>> > }
>> >
>> >@@ -180,7 +210,13 @@
>> >pipeline_queue_worker_multi_stage_fwd(void *arg)
>> > ev.queue_id = tx_queue[ev.mbuf->port];
>> > rte_event_eth_tx_adapter_txq_set(ev.mbuf, 0);
>> > pipeline_fwd_event(&ev,
>> >RTE_SCHED_TYPE_ATOMIC);
>> >- w->processed_pkts++;
>> >+
>> >+ /* release barrier here ensures stored operation
>> >+ * of the event completes before the number of
>> >+ * processed pkts is visible to the main core
>> >+ */
>> >+ __atomic_fetch_add(&(w->processed_pkts), 1,
>> >+ __ATOMIC_RELEASE);
>> > } else {
>> > ev.queue_id++;
>> > pipeline_fwd_event(&ev,
>> >sched_type_list[cq_id]);
>> >@@ -214,7 +250,13 @@
>> >pipeline_queue_worker_multi_stage_burst_tx(void *arg)
>> > if (ev[i].queue_id == tx_queue[ev[i].mbuf-
>> >>port]) {
>> > pipeline_event_tx(dev, port, &ev[i]);
>> > ev[i].op = RTE_EVENT_OP_RELEASE;
>> >- w->processed_pkts++;
>> >+
>> >+ /* release barrier here ensures stored
>> >operation
>> >+ * of the event completes before the
>> >number of
>> >+ * processed pkts is visible to the main
>> >core
>> >+ */
>> >+ __atomic_fetch_add(&(w-
>> >>processed_pkts), 1,
>> >+ __ATOMIC_RELEASE);
>> > continue;
>> > }
>> >
>> >@@ -254,7 +296,13 @@
>> >pipeline_queue_worker_multi_stage_burst_fwd(void *arg)
>> >
>> > rte_event_eth_tx_adapter_txq_set(ev[i].mbuf, 0);
>> > pipeline_fwd_event(&ev[i],
>> >
>> > RTE_SCHED_TYPE_ATOMIC);
>> >- w->processed_pkts++;
>> >+
>> >+ /* release barrier here ensures stored
>> >operation
>> >+ * of the event completes before the
>> >number of
>> >+ * processed pkts is visible to the main
>> >core
>> >+ */
>> >+ __atomic_fetch_add(&(w-
>> >>processed_pkts), 1,
>> >+ __ATOMIC_RELEASE);
>> > } else {
>> > ev[i].queue_id++;
>> > pipeline_fwd_event(&ev[i],
>> >--
>> >2.17.1
More information about the dev
mailing list