[dpdk-dev] [ovs-dev] Status of Open vSwitch with DPDK
Mark D. Gray
mark.d.gray at intel.com
Mon Aug 17 16:53:01 CEST 2015
On 08/15/15 08:16, Flavio Leitner wrote:
> On Fri, Aug 14, 2015 at 04:04:40PM +0000, Gray, Mark D wrote:
>> Hi Daniele,
>> Thanks for starting this conversation. It is a good list :) I have crossed-posted this
>> to dpdk.org as I feel that some of the points could be interesting to that community
>> as they are related to how DPDK is used.
>> How do "users" of OVS with DPDK feel about this list? Does anyone disagree or
>> does anyone have any additions? What are your experiences?
>>> There has been some discussion lately about the status of the Open vSwitch
>>> port to DPDK. While part of the code has been tested for quite some time,
>>> I think we can agree that there are a few rough spots that prevent it from
>>> being easily deployed and used.
>>> I was hoping to get some feedback from the community about those rough
>>> i.e. areas where OVS+DPDK can/needs to improve to become more
>>> ready" and user-friendly.
>>> - PMD threads and queues management: the code has shown several bugs
>>> netdev interfaces don't seem up to the job anymore.
>> You had a few ideas about how to refactor this before but I was concerned
>> about the effect it would have on throughput. I can't find the thread.
>> Do you have some further ideas about how to achieve this?
> I miss the fact that we can't tell which queue can go to each PMD and
> also that all devices must have the same number of rx queues. I agree
> that there are other issues, but it seems the kind of configuration
> knobs I am looking for might not be the end goal since what has been
> said is to look for a more automated way. Having said so, I also
> would like to hear if you have further ideas about how to archive that.
>>> There's a lot of margin of improvement: we could factor out the code from
>>> dpif-netdev, add configuration parameters for advanced users, and figure
>>> a way to add unit tests.
>> I think this is a general issue with both the kernel datapath (and netdevs)
>> and the userspace datapath. There isn't much unit testing (or testing) outside
>> of the slow path.
> Maybe we could exercise the interfaces using pcap pmd.
We had a similar idea. Using this, it would be possible to test the
entire datapath or netdev for functionality! I don’t think there is an
equivalent for the kernel datapath?
>>> Related to this, the system should be as fast as possible out-of-the-box,
>>> without requiring too much tuning.
>> This is a good point. I think the kernel datapath has a similar issue. You can
>> get a certain level of performance without compiling with -Ofast or
>> pinning threads but you will (even with the kernel datapath) get better
>> performance if you pin threads (and possibly compile differently). I guess
>> it is more visible with the dpdk datapath as performance is one of the key
>> values. It is also more detrimental to the performance if you don't set it
>> up correctly.
> Not only that, you need to consider how the resources will be
> distributed upfront so that you don't run out of hugepages, perhaps
> isolate PMD CPUs from the Linux scheduler, etc. So, I think a more
> realistic goal would be: the system should require minimal/none tuning
> to run with acceptable performance.
How do you define "acceptable" performance :)?
>> Perhaps we could provide scripts to help do this?
> Or profiles (if that isn't included in your scripts definition)
Maybe we should define profiles like "performance", "minimum cores", etc
>> I think this is also interesting to the DPDK community. There is
>> knowledge required when running DPDK enabled apps to
>> get good performance: core pinning is one thing that comes to mind.
>>> - Userspace tunneling: while the code has been there for quite some time it
>>> hasn't received the level of testing that the Linux kernel datapath
>> Again, there is a lack of test infrastructure in general for OVS. vsperf is a good
>> start, and it would be great to see more people use and contribute to it!
>>> - Documentation: other than a step by step tutorial, it cannot be said
>>> DPDK is a first class citizen in the OVS documentation. Manpages could
>> Easily done. The INSTALL guide is pretty good but the structure could be better.
>> There is also a lack of manpages. Good point.
>>> - Vhost: the code has not received the level of testing of the kernel
>>> Another doubt shared by some developers is whether we should keep
>>> vhost-cuse, given its relatively low ease of use and the overlapping with
>>> the far more standard vhost-user.
>> vhost-cuse is required for older versions of qemu. I'm aware of some companies
>> using it as they are restricted to an older version of qemu. I think it is deprecated
>> at the moment? Is there a notice to that effect? We just need a plan for when to
>> remove it and make sure that plan is clear?
> Apparently having two solutions to address the same issue causes more
> harm than good, so removing vhost-cuse would be helpful. I agree that
> we need a clear plan with a soak time so users can either upgrade to
> vhost-user or tell why they can't.
>>> - Interface management and naming: interfaces must be manually removed
>>> the kernel drivers.
>>> We still don't have an easy way to identify them. Ideas are welcome: how
>>> we make this user friendly? Is there a better solution on the DPDK side?
>> This is a tough one and is interesting to the DPDK community. The basic issue
>> here is that users are more familiar with linux interfaces and linux naming
>> "ovs-vsctl add-port bro eth0" makes a lot more sense than
>> "dpdk_nic_bind -b igb_uio<pci_id>", then check the order that the ports
>> are enumerated and then run "ovs-vsctl add-port br0 dpdkN".
>> I can think of ways to do this with physical NICs. For example,
>> you could reference the port by the linux name and when you try to add it, OVS
>> could unbind from the kernel module and bind it to igb_uio?
>> However, I am not sure how you would do it with virtual nics as there is not
>> even a real device.
>> I think a general solution from the dpdk community would be really helpful here.
> It doesn't look like openvswitch is the right place to fix this. The
> openvswitch should deal with the port and the system should provide
> the port somehow. That's what happens with the kernel datapath, for
> instance, openvswitch doesn't load any NIC driver.
> So, it seems to be more related to udev/systemd configuration in which
> the sys admin would tell the interfaces and the appropriate driver
> Even if the system delivers the DPDK port ready, it would be great to
> have some friendly mapping so that users can refer to ports with known
>>> How are DPDK interfaces handled by linux distributions? I've heard about
>>> ongoing work for RHEL and Ubuntu, it would be interesting to coordinate.
> We have implemented dpdk/vhost support in initscripts so you could
> configure the ports in the same way as for the kernel devices, but
> how to properly bind to the driver is unclear yet.
>>> - Insight into the system and debuggability: nothing beats tcpdump for the
>>> kernel datapath. Can something similar be done for the userspace
>> Yeah, this would be useful. I have my own way of dealing with this. For example,
>> you could dump from the LOCAL port on a NORMAL bridge or add a rule to
>> mirror a flow to another port but I feel there could be a better way to do this in
>> DPDK. I have recently heard that the DPDK team do something with a pcap pmd
>> to help with debugging. A more general approach from dpdk would help a lot.
> One idea maybe is that openvswitch could provide a mode to clone TX/RX
> packets to a pcap pmd. Or write the packets using pcap format directly
> to a file (avoid another pmd which might not be available). Or even
> push them using a tap device. Either way tcpdump or wireshark would work.
>>> - Consistency of the tools: some commands are slightly different for the
>>> userspace/kernel datapath. Ideally there shouldn't be any difference.
> Could you give some examples?
>> Yeah, there are some things that could be changed. DPDK just works differently but
>> the benefits are significant :)
>> We need to mount hugepages, bind nics to igb_uio, etc
>> With a lot of this stuff, maybe the DPDK community's tools don't need to emulate
>> the linux networking tools exactly. Maybe over time as the DPDK community
>> and user-base expands, people will become more familiar with the tools, processes, etc
>> and this will be less of an issue?
>>> - Packaging: how should the distributions package DPDK and OVS? Should
>>> only be a single build to handle both the kernel and the userspace
>>> eventually dynamically linked to DPDK?
>> Yeah. Do we need to start with dpdk if we have compiled with DPDK support???
> Well, certainly not everybody wants to have DPDK dependencies neither
> shared nor statically. Maybe the path is a plug-in architecture?
>>> - Benchmarks: we often rely on extremely simple flow tables with single
>>> traffic to evaluate the effect of a change. That may be ok during
>>> development, but OVS with the kernel datapath has been tested in
>>> scenarios with more complicated flow tables and even with hostile traffic
>>> Efforts in this sense are being made, like the vsperf project, or even
>>> simple ovs-pipeline.py
>> vsperf will really help this.
> Indeed, but how is OVS kernel datapath being tested? Is there a
> script? Maybe we can use the same tests for DPDK.
>>> I would appreciate feedback on the above points, not (only) in terms of
>>> solutions, but in terms of requirements that you feel are important for our
>>> system to be considered ready.
> The list covers technical issues, documentation issues and usability
> issues which are great, thanks for doing it. However, as said one
> important use-case is extreme performance and that requires configuration
> or tuning flexibility which adds usability/supportability issues. Will
> those knobs be a valid option provided that the defaults works well enough?
I feel that we need to expose knobs up through Open vSwitch in order to
tune for extreme performance otherwise how do we highlight the value in
what we are doing? I think we need some way to allow a user to do this
type of configuration when they know what they are doing (without having
to recompile the code).
More information about the dev