[dpdk-dev] [PATCH v1] event/sw: performance improvements

Nicolau, Radu radu.nicolau at intel.com
Thu Sep 24 17:27:39 CEST 2020


On 9/24/2020 12:10 AM, Honnappa Nagarahalli wrote:

<snip>





Add minimum burst throughout the scheduler pipeline and a flush counter.

Replace ring API calls with local single threaded implementation where

possible.



Signed-off-by: Radu Nicolau <radu.nicolau at intel.com><mailto:radu.nicolau at intel.com>



Thanks for the patch, a few comments inline.



---

 drivers/event/sw/sw_evdev.h           | 11 +++-

 drivers/event/sw/sw_evdev_scheduler.c | 83

+++++++++++++++++++++++----

 2 files changed, 81 insertions(+), 13 deletions(-)



diff --git a/drivers/event/sw/sw_evdev.h b/drivers/event/sw/sw_evdev.h

index 7c77b2495..95e51065f 100644

--- a/drivers/event/sw/sw_evdev.h

+++ b/drivers/event/sw/sw_evdev.h

@@ -29,7 +29,13 @@

 /* report dequeue burst sizes in buckets */  #define

SW_DEQ_STAT_BUCKET_SHIFT 2

 /* how many packets pulled from port by sched */ -#define

SCHED_DEQUEUE_BURST_SIZE 32

+#define SCHED_DEQUEUE_BURST_SIZE 64

+

+#define SCHED_MIN_BURST_SIZE 8

+#define SCHED_NO_ENQ_CYCLE_FLUSH 256

+/* set SCHED_DEQUEUE_BURST_SIZE to 64 or 128 when setting this to 1*/

+#define SCHED_REFILL_ONCE_PER_CALL 1



Is it possible to make the above #define a runtime option?

Eg, --vdev event_sw,refill_iter=1



That would allow packaged versions of DPDK to be usable in both modes.



+



 #define SW_PORT_HIST_LIST (MAX_SW_PROD_Q_DEPTH) /* size of our

history list */  #define NUM_SAMPLES 64 /* how many data points use

for average stats */ @@ -214,6 +220,9 @@ struct sw_evdev {

     uint32_t xstats_count_mode_port;

     uint32_t xstats_count_mode_queue;



+    uint16_t sched_flush_count;

+    uint16_t sched_min_burst;

+

     /* Contains all ports - load balanced and directed */

     struct sw_port ports[SW_PORTS_MAX] __rte_cache_aligned;



diff --git a/drivers/event/sw/sw_evdev_scheduler.c

b/drivers/event/sw/sw_evdev_scheduler.c

index cff747da8..ca6d1caff 100644

--- a/drivers/event/sw/sw_evdev_scheduler.c

+++ b/drivers/event/sw/sw_evdev_scheduler.c

@@ -26,6 +26,29 @@

 /* use cheap bit mixing, we only need to lose a few bits */  #define

SW_HASH_FLOWID(f) (((f) ^ (f >> 10)) & FLOWID_MASK)



+

+/* single object enq and deq for non MT ring */ static

+__rte_always_inline void sw_nonmt_ring_dequeue(struct rte_ring *r,

+void **obj) {

+    if ((r->prod.tail - r->cons.tail) < 1)

+            return;

+    void **ring = (void **)&r[1];

+    *obj = ring[r->cons.tail & r->mask];

+    r->cons.tail++;

+}

+static __rte_always_inline int

+sw_nonmt_ring_enqueue(struct rte_ring *r, void *obj) {

+    if ((r->capacity + r->cons.tail - r->prod.tail) < 1)

+            return 0;

+    void **ring = (void **)&r[1];

+    ring[r->prod.tail & r->mask] = obj;

+    r->prod.tail++;

+    return 1;

+

Why not make these APIs part of the rte_ring library? You could further optimize them by keeping the indices on the same cacheline.

I'm not sure there is any need for non thread-safe rings outside this particular case.





More information about the dev mailing list