[dpdk-dev] [ovs-dev] Status of Open vSwitch with DPDK

Mark D. Gray mark.d.gray at intel.com
Mon Aug 17 16:53:01 CEST 2015


On 08/15/15 08:16, Flavio Leitner wrote:
> On Fri, Aug 14, 2015 at 04:04:40PM +0000, Gray, Mark D wrote:
>> Hi Daniele,
>>
>> Thanks for starting this conversation. It is a good list :) I have crossed-posted this
>> to dpdk.org as I feel that some of the points could be interesting to that community
>> as they are related to how DPDK is used.
>>
>> How do "users" of OVS with DPDK feel about this list? Does anyone disagree or
>> does anyone have any additions? What are your experiences?
>>
>>>
>>> There has been some discussion lately about the status of the Open vSwitch
>>> port to DPDK.  While part of the code has been tested for quite some time,
>>> I think we can agree that there are a few rough spots that prevent it from
>>> being easily deployed and used.
>>>
>>> I was hoping to get some feedback from the community about those rough
>>> spots,
>>> i.e. areas where OVS+DPDK can/needs to improve to become more
>>> "production
>>> ready" and user-friendly.
>>>
>>> - PMD threads and queues management: the code has shown several bugs
>>> and
>>> the
>>>    netdev interfaces don't seem up to the job anymore.
>>
>> You had a few ideas about how to refactor this before but I was concerned
>> about the effect it would have on throughput. I can't find the thread.
>>
>> Do you have some further ideas about how to achieve this?
>
> I miss the fact that we can't tell which queue can go to each PMD and
> also that all devices must have the same number of rx queues. I agree
> that there are other issues, but it seems the kind of configuration
> knobs I am looking for might not be the end goal since what has been
> said is to look for a more automated way.  Having said so, I also
> would like to hear if you have further ideas about how to archive that.
>
>
>>>    There's a lot of margin of improvement: we could factor out the code from
>>>    dpif-netdev, add configuration parameters for advanced users, and figure
>>> out
>>>    a way to add unit tests.
>>>
>>
>> I think this is a general issue with both the kernel datapath (and netdevs)
>> and the userspace datapath. There isn't much unit testing (or testing) outside
>> of the slow path.
>
> Maybe we could exercise the interfaces using pcap pmd.
>
>

We had a similar idea. Using this, it would be possible to test the 
entire datapath or netdev for functionality! I don’t think there is an 
equivalent for the kernel datapath?

>>>    Related to this, the system should be as fast as possible out-of-the-box,
>>>    without requiring too much tuning.
>>
>> This is a good point. I think the kernel datapath has a similar issue. You can
>> get a certain level of performance without compiling with -Ofast or
>> pinning threads but you will (even with the kernel datapath) get better
>> performance if you pin threads (and possibly compile differently). I guess
>> it is more visible with the dpdk datapath as performance is one of the key
>> values. It is also more detrimental to the performance if you don't set it
>> up correctly.
>
> Not only that, you need to consider how the resources will be
> distributed upfront so that you don't run out of hugepages, perhaps
> isolate PMD CPUs from the Linux scheduler, etc.  So, I think a more
> realistic goal would be: the system should require minimal/none tuning
> to run with acceptable performance.
>

How do you define "acceptable" performance :)?

>
>> Perhaps we could provide scripts to help do this?
>
> Or profiles (if that isn't included in your scripts definition)
>

Maybe we should define profiles like "performance", "minimum cores", etc

>
>> I think this is also interesting to the DPDK community. There is
>> knowledge required when running DPDK enabled apps to
>> get good performance: core pinning is one thing that comes to mind.
>>
>>>
>>> - Userspace tunneling: while the code has been there for quite some time it
>>>    hasn't received the level of testing that the Linux kernel datapath
>>> tunneling
>>>    has.
>>>
>>
>> Again, there is a lack of test infrastructure in general for OVS. vsperf is a good
>> start, and it would be great to see more people use and contribute to it!
>
> Yes.
>
>
>>> - Documentation: other than a step by step tutorial,  it cannot be said
>>> that
>>>    DPDK is a first class citizen in the OVS documentation.  Manpages could
>>> be
>>>    improved.
>>
>> Easily done. The INSTALL guide is pretty good but the structure could be better.
>> There is also a lack of manpages. Good point.
>
> Yup.
>
>
>>> - Vhost: the code has not received the level of testing of the kernel
>>> vhost.
>>>    Another doubt shared by some developers is whether we should keep
>>>    vhost-cuse, given its relatively low ease of use and the overlapping with
>>>    the far more standard vhost-user.
>>
>> vhost-cuse is required for older versions of qemu. I'm aware of some companies
>> using it as they are restricted to an older version of qemu. I think it is deprecated
>> at the moment? Is there a notice to that effect? We just need a plan for when to
>> remove it and make sure that plan is clear?
>
> Apparently having two solutions to address the same issue causes more
> harm than good, so removing vhost-cuse would be helpful.  I agree that
> we need a clear plan with a soak time so users can either upgrade to
> vhost-user or tell why they can't.
>
>
>>> - Interface management and naming: interfaces must be manually removed
>>> from
>>>    the kernel drivers.
>>>
>>>    We still don't have an easy way to identify them. Ideas are welcome: how
>>> can
>>>    we make this user friendly?  Is there a better solution on the DPDK side?
>>
>> This is a tough one and is interesting to the DPDK community.  The basic issue
>> here is that users are more familiar with linux interfaces and linux naming
>> conventions.
>>
>> "ovs-vsctl add-port bro eth0" makes a lot more sense than
>>
>> "dpdk_nic_bind -b igb_uio<pci_id>", then check the order that the ports
>> are enumerated and then run "ovs-vsctl add-port br0 dpdkN".
>>
>> I can think of ways to do this with physical NICs. For example,
>> you could reference the port by the linux name and when you try to add it, OVS
>> could unbind from the kernel module and bind it to igb_uio?
>>
>> However, I am not sure how you would do it with virtual nics as there is not
>> even a real device.
>>
>> I think a general solution from the dpdk community would be really helpful here.
>
>
> It doesn't look like openvswitch is the right place to fix this.  The
> openvswitch should deal with the port and the system should provide
> the port somehow.  That's what happens with the kernel datapath, for
> instance, openvswitch doesn't load any NIC driver.
>
> So, it seems to be more related to udev/systemd configuration in which
> the sys admin would tell the interfaces and the appropriate driver
> (UIO/VFIO/Bifurcated...).
>
> Even if the system delivers the DPDK port ready, it would be great to
> have some friendly mapping so that users can refer to ports with known
> names.
>

Agreed

>
>>>    How are DPDK interfaces handled by linux distributions? I've heard about
>>>    ongoing work for RHEL and Ubuntu, it would be interesting to coordinate.
>
> We have implemented dpdk/vhost support in initscripts so you could
> configure the ports in the same way as for the kernel devices, but
> how to properly bind to the driver is unclear yet.
>
>
>>> - Insight into the system and debuggability: nothing beats tcpdump for the
>>>    kernel datapath.  Can something similar be done for the userspace
>>> datapath?
>>
>> Yeah, this would be useful. I have my own way of dealing with this. For example,
>> you could dump from the LOCAL port on a NORMAL bridge or add a rule to
>> mirror a flow to another port but I feel there could be a better way to do this in
>> DPDK. I have recently heard that the DPDK team do something with a pcap pmd
>> to help with debugging. A more general approach from dpdk would help a lot.
>
> One idea maybe is that openvswitch could provide a mode to clone TX/RX
> packets to a pcap pmd. Or write the packets using pcap format directly
> to a file (avoid another pmd which might not be available). Or even
> push them using a tap device. Either way tcpdump or wireshark would work.
>
>
>>> - Consistency of the tools: some commands are slightly different for the
>>>    userspace/kernel datapath.  Ideally there shouldn't be any difference.
>
> Could you give some examples?
>
>
>> Yeah, there are some things that could be changed. DPDK just works differently but
>> the benefits are significant :)
>>
>> We need to mount hugepages, bind nics to igb_uio, etc
>>
>> With a lot of this stuff, maybe the DPDK community's tools don't need to emulate
>> the linux networking tools exactly. Maybe over time as the DPDK community
>> and user-base expands, people will become more familiar with the tools, processes, etc
>> and this will be less of an issue?
>>
>>
>>>
>>> - Packaging: how should the distributions package DPDK and OVS? Should
>>> there
>>>    only be a single build to handle both the kernel and the userspace
>>> datapath,
>>>    eventually dynamically linked to DPDK?
>>
>> Yeah. Do we need to start with dpdk if we have compiled with DPDK support???
>
> Well, certainly not everybody wants to have DPDK dependencies neither
> shared nor statically.  Maybe the path is a plug-in architecture?
>
>
>>> - Benchmarks: we often rely on extremely simple flow tables with single
>>> flow
>>>    traffic to evaluate the effect of a change.  That may be ok during
>>>    development, but OVS with the kernel datapath has been tested in
>>> different
>>>    scenarios with more complicated flow tables and even with hostile traffic
>>>    patterns.
>>>
>>>    Efforts in this sense are being made, like the vsperf project, or even
>>> the
>>>    simple ovs-pipeline.py
>>
>> vsperf will really help this.
>
> Indeed, but how is OVS kernel datapath being tested? Is there a
> script?  Maybe we can use the same tests for DPDK.
>
>
>>> I would appreciate feedback on the above points, not (only) in terms of
>>> solutions, but in terms of requirements that you feel are important for our
>>> system to be considered ready.
>
> The list covers technical issues, documentation issues and usability
> issues which are great, thanks for doing it.  However, as said one
> important use-case is extreme performance and that requires configuration
> or tuning flexibility which adds usability/supportability issues.  Will
> those knobs be a valid option provided that the defaults works well enough?
>


I feel that we need to expose knobs up through Open vSwitch in order to 
tune for extreme performance otherwise how do we highlight the value in 
what we are doing? I think we need some way to allow a user to do this 
type of configuration when they know what they are doing (without having 
to recompile the code).

> Thanks,
> fbl
>



More information about the dev mailing list