[dpdk-dev] [PATCH v1] event/sw: performance improvements

Bruce Richardson bruce.richardson at intel.com
Fri Sep 25 12:28:05 CEST 2020


On Thu, Sep 24, 2020 at 06:02:48PM +0000, Ananyev, Konstantin wrote:
> 
> 
> 
> <snip>
> 
> Add minimum burst throughout the scheduler pipeline and a flush counter.
> Replace ring API calls with local single threaded implementation where
> possible.
> 
> Signed-off-by: Radu Nicolau mailto:radu.nicolau at intel.com
> 
> Thanks for the patch, a few comments inline.
> 
> ---
>  drivers/event/sw/sw_evdev.h           | 11 +++-
>  drivers/event/sw/sw_evdev_scheduler.c | 83
> +++++++++++++++++++++++----
>  2 files changed, 81 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/event/sw/sw_evdev.h b/drivers/event/sw/sw_evdev.h
> index 7c77b2495..95e51065f 100644
> --- a/drivers/event/sw/sw_evdev.h
> +++ b/drivers/event/sw/sw_evdev.h
> @@ -29,7 +29,13 @@
>  /* report dequeue burst sizes in buckets */  #define
> SW_DEQ_STAT_BUCKET_SHIFT 2
>  /* how many packets pulled from port by sched */ -#define
> SCHED_DEQUEUE_BURST_SIZE 32
> +#define SCHED_DEQUEUE_BURST_SIZE 64
> +
> +#define SCHED_MIN_BURST_SIZE 8
> +#define SCHED_NO_ENQ_CYCLE_FLUSH 256
> +/* set SCHED_DEQUEUE_BURST_SIZE to 64 or 128 when setting this to 1*/
> +#define SCHED_REFILL_ONCE_PER_CALL 1
> 
> Is it possible to make the above #define a runtime option?
> Eg, --vdev event_sw,refill_iter=1
> 
> That would allow packaged versions of DPDK to be usable in both modes.
> 
> +
> 
>  #define SW_PORT_HIST_LIST (MAX_SW_PROD_Q_DEPTH) /* size of our
> history list */  #define NUM_SAMPLES 64 /* how many data points use
> for average stats */ @@ -214,6 +220,9 @@ struct sw_evdev {
>      uint32_t xstats_count_mode_port;
>      uint32_t xstats_count_mode_queue;
> 
> +    uint16_t sched_flush_count;
> +    uint16_t sched_min_burst;
> +
>      /* Contains all ports - load balanced and directed */
>      struct sw_port ports[SW_PORTS_MAX] __rte_cache_aligned;
> 
> diff --git a/drivers/event/sw/sw_evdev_scheduler.c
> b/drivers/event/sw/sw_evdev_scheduler.c
> index cff747da8..ca6d1caff 100644
> --- a/drivers/event/sw/sw_evdev_scheduler.c
> +++ b/drivers/event/sw/sw_evdev_scheduler.c
> @@ -26,6 +26,29 @@
>  /* use cheap bit mixing, we only need to lose a few bits */  #define
> SW_HASH_FLOWID(f) (((f) ^ (f >> 10)) & FLOWID_MASK)
> 
> +
> +/* single object enq and deq for non MT ring */ static
> +__rte_always_inline void sw_nonmt_ring_dequeue(struct rte_ring *r,
> +void **obj) {
> +    if ((r->prod.tail - r->cons.tail) < 1)
> +            return;
> +    void **ring = (void **)&r[1];
> +    *obj = ring[r->cons.tail & r->mask];
> +    r->cons.tail++;
> +}
> +static __rte_always_inline int
> +sw_nonmt_ring_enqueue(struct rte_ring *r, void *obj) {
> +    if ((r->capacity + r->cons.tail - r->prod.tail) < 1)
> +            return 0;
> +    void **ring = (void **)&r[1];
> +    ring[r->prod.tail & r->mask] = obj;
> +    r->prod.tail++;
> +    return 1;
> +
> Why not make these APIs part of the rte_ring library? You could further optimize them by keeping the indices on the same cacheline.
> I'm not sure there is any need for non thread-safe rings outside this particular case.
> [Honnappa] I think if we add the APIs, we will find the use cases.
> But, more than that, I understand that rte_ring structure is exposed to the application. The reason for doing that is the inline functions that rte_ring provides. IMO, we should still maintain modularity and should not use the internals of the rte_ring structure outside of the library.
> 
> +1 to that.
> 
> BTW, is there any real perf benefit from such micor-optimisation?

I'd tend to view these as use-case specific, and I'm not sure we should
clutter up the ring library with yet more functions, especially since they
can't be mixed with the existing enqueue/dequeue functions, since they
don't use the head pointers.



More information about the dev mailing list