[dpdk-dev] Application used for DSW event_dev performance testing

Venky Venkatesh vvenkatesh at paloaltonetworks.com
Wed Nov 14 22:56:56 CET 2018


Mattias,
Thanks for the prompt response. Appreciate your situation of not being able to share the proprietary code. More answers inline as [VV]:
--Venky 

On 11/14/18, 11:41 AM, "Mattias Rönnblom" <hofors at lysator.liu.se> wrote:

    On 2018-11-14 20:16, Venky Venkatesh wrote:
    > Hi,
    > 
    > https://urldefense.proofpoint.com/v2/url?u=https-3A__mails.dpdk.org_archives_dev_2018-2DSeptember_111344.html&d=DwIDaQ&c=V9IgWpI5PvzTw83UyHGVSoW3Uc1MFWe5J8PTfkrzVSo&r=w2W5SR0mU5u5mz008DZNCsexDN1Lr9bpL7ZGKuD0Zd4&m=H4I6cuKi4kKoypKWz8mjDoXLGgkSNurKbKXrq4qJs5A&s=AD0KG106hPreSKeTQMRzDPwnEfBR9oD6dtjpL2Plt4c&e= mentions that there is a sample application where “worker cores can sustain 300-400 million event/s. With a pipeline
    > with 1000 clock cycles of work per stage, the average event device
    > overhead is somewhere 50-150 clock cycles/event”. Is this sample application code available?
    > 
    It's proprietary code, although it's also been tested by some of our 
    partners.
    
    The primary reason for it not being contributed to DPDK is because it's 
    a fair amount of work to do so. I would refer to it as an eventdev 
    pipeline simulator, rather than a sample app.
    
    > We have written a similar simple sample application where 1 core keeps enqueuing (as NEW/ATOMIC) and n-cores dequeue (and RELEASE) and do no other work. But we are not seeing anything close in terms of performance. Also we are seeing some counter intuitive behaviors such as a burst of 32 is worse than burst of 1. We surely have something wrong and would thus compare against a good application that you have written. Could you pls share it?
    > 
    
    Is this enqueue or dequeue burst? How large is n? Is this explicit release?
 [VV]: Yes both are burst of 32. I tried n=4-7. It is explicit RELEASE.
   
    What do you set nb_events_limit to? Good DSW performance much depends on 
    the average burst size on the event rings, which in turn is dependent on 
    the number of in-flight events. On really high core-count systems you 
    might also want to increase DSW_MAX_PORT_OPS_PER_BG_TASK, since it 
    effectively puts a limit on the maximum number of events buffered on the 
    output buffers.
[VV]:         struct rte_event_dev_config config = {
                        .nb_event_queues = 2,
                        .nb_event_ports = 5,
                        .nb_events_limit  = 4096,
                        .nb_event_queue_flows = 1024,
                        .nb_event_port_dequeue_depth = 128,
                        .nb_event_port_enqueue_depth = 128,
        };
        struct rte_event_port_conf p_conf = {
                        .dequeue_depth = 64,
                        .enqueue_depth = 64,
                        .new_event_threshold = 1024,
                        .disable_implicit_release = 0,
        };
        struct rte_event_queue_conf q_conf = {
                        .schedule_type = RTE_SCHED_TYPE_ATOMIC,
                        .priority = RTE_EVENT_DEV_PRIORITY_NORMAL,
                        .nb_atomic_flows = 1024,
                        .nb_atomic_order_sequences = 1024,
        };

    
    In the pipeline simulator all cores produce events initially, and then 
    recycles events when the number of in-flight events reach a certain 
    threshold (50% of nb_events_limit). A single lcore won't be able to fill 
    the pipeline, if you have zero-work stages.
[VV]: I have a single NEW event enqueue thread(0) and a bunch of “dequeue and RELEASE” threads (1-4) – simple case. I have a stats print thread(5) as well. If the 1 enqueue thread is unable to fill the pipeline, what counter would indicate that? I see the contrary effect -- I am tracking the number of times enqueue fails and that number is large.

    
    Even though I can't send you the simulator code at this point, I'm happy 
    to assist you in any DSW-related endeavors.
[VV]: My program is a simple enough program (nothing proprietary) that I can share. Can I unicast it to you for a quick recommendation?    



More information about the dev mailing list