[dpdk-dev] Application used for DSW event_dev performance testing

Mattias Rönnblom mattias.ronnblom at ericsson.com
Wed Nov 28 17:55:41 CET 2018


On 2018-11-27 23:33, Venky Venkatesh wrote:
> 
> As you can see the DSW overhead dominates the scene and very little real work is getting done. Is there some configuration or tuning to be done to get the sort of performance you are seeing with multiple cores?
>
I can't explain the behavior you are seeing based on the information you 
have supplied.

Attached is a small DSW throughput test program, that I thought might 
help you to find the issue. It works much like the pipeline simulator I 
used when developing the scheduler, but it's a lot simpler. Remember to 
supply "--vdev=event_dsw0".

I ran it on my 12-core Skylake desktop (@2,9 GHz, turbo disabled). With 
zero work and one stage, I get ~640 Mevent/s. For the first few stages 
you add, you'll see a drop in performance. For example, with 3 stages, 
you are at ~310 Mevent/s.

If you increase DSW_MAX_PORT_OUT_BUFFER and DSW_MAX_PORT_OPS_PER_BG_TASK 
you see improvements in efficiency on high-core-count machines. On my 
system, the above goes to 675 M/s for a 1-stage pipeline, and 460 M/s on 
a 3-stage pipeline, if I apply the following changes to dsw_evdev.h:
-#define DSW_MAX_PORT_OUT_BUFFER (32)
+#define DSW_MAX_PORT_OUT_BUFFER (64)

-#define DSW_MAX_PORT_OPS_PER_BG_TASK (128)
+#define DSW_MAX_PORT_OPS_PER_BG_TASK (512)

With 500 clock cycles of dummy work, the per-event overhead is ~16 TSC 
clock cycles/stage and event (i.e. per scheduled event; enqueue + 
dequeue), if my quick-and-dirty benchmark program does the math 
correctly. This also includes the overhead from the benchmark program 
itself.

Overhead with a real application will be higher.


More information about the dev mailing list