[dpdk-dev] [RFC PATCH 0/6] DPDK support to bifurcated driver

Zhou, Danny danny.zhou at intel.com
Tue Nov 25 15:46:10 CET 2014


> -----Original Message-----
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Liang, Cunming
> Sent: Tuesday, November 25, 2014 10:40 PM
> To: Richardson, Bruce; Neil Horman
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [RFC PATCH 0/6] DPDK support to bifurcated driver
> 
> 
> 
> > -----Original Message-----
> > From: Richardson, Bruce
> > Sent: Tuesday, November 25, 2014 10:30 PM
> > To: Neil Horman
> > Cc: Liang, Cunming; dev at dpdk.org
> > Subject: Re: [dpdk-dev] [RFC PATCH 0/6] DPDK support to bifurcated driver
> >
> > On Tue, Nov 25, 2014 at 09:23:16AM -0500, Neil Horman wrote:
> > > On Tue, Nov 25, 2014 at 10:11:16PM +0800, Cunming Liang wrote:
> > > >
> > > > This is a RFC patch set to support "bifurcated driver" in DPDK.
> > > >
> > > >
> > > > What is "bifurcated driver"?
> > > > ===========================
> > > >
> > > > The "bifurcated driver" stands for the kernel NIC driver that supports:
> > > >
> > > > 1. on-demand rx/tx queue pairs split-off and assignment to user space
> > > >
> > > > 2. direct NIC resource(e.g. rx/tx queue registers) access from user space
> > > >
> > > > 3. distributing packets to kernel or user space rx queues by
> > > >    NIC's flow director according to the filter rules
> > > >
> > > > Here's the kernel patch set to support.
> > > > http://comments.gmane.org/gmane.linux.network/333615
> > > >
> > > >
> > > > Usage scenario
> > > > =================
> > > >
> > > > It's well accepted by industry to use DPDK to process fast path packets in
> > > > user space in a high performance fashion, meanwhile processing slow path
> > > > control packets in kernel space is still needed as those packets usually
> > > > rely on in_kernel TCP/IP stacks and/or socket programming interface.
> > > >
> > > > KNI(Kernel NIC Interface) mechanism in DPDK is designed to meet this
> > > > requirement, with below limitation:
> > > >
> > > >   1) Software classifies packets and distributes them to kernel via DPDK
> > > >      software rings, at the cost of significant CPU cycles and memory
> > bandwidth.
> > > >
> > > >   2) Memory copy packets between kernel' socket buffer and mbuf brings
> > > >      significant negative performance impact to KNI performance.
> > > >
> > > > The bifurcated driver provides a alternative approach that not only offloads
> > > > flow classification and distribution to NIC but also support packets zero_copy.
> > > >
> > > > User can use standard ethtool to add filter rules to the NIC in order to
> > > > distribute specific flows to the queues only accessed by kernel driver and
> > > > stack, and add other rules to distribute packets to the queues assigned to
> > > > user-space.
> > > >
> > > > For those rx/tx queue pairs that directly accessed from user space,
> > > > DPDK takes over the packets rx/tx as well as corresponding DMA operation
> > > > for high performance packet I/O.
> > > >
> > > >
> > > > What's the impact and change to DPDK
> > > > ======================================
> > > >
> > > > DPDK usually binds PCIe NIC devices by leveraging kernel' user space driver
> > > > mechanism UIO or VFIO to map entire NIC' PCIe I/O space of NIC to user
> > space.
> > > > The bifurcated driver PMD talks to a NIC interface using raw socket APIs and
> > > > only mmap() limited I/O space (e.g. certain 4K pages) for accessing involved
> > > > rx/tx queue pairs. So the impact and changes mainly comes with below:
> > > >
> > > > - netdev
> > > >     DPDK needs to create a af_packet socket and bind it to a bifurcated
> > netdev.
> > > >     The socket fd will be used to request 'queue pairs info',
> > > >     'split/return queue pairs' and etc. The PCIe device ID, netdev MAC
> > address,
> > > >     numa info are also from the netdev response.
> > > >
> > > > - PCIe device scan and driver probe
> > > >     netdev provides the PCIe device ID information. Refer to the device ID,
> > > >     the correct driver should be used. And for such netdev device, the
> > creation
> > > >     of PCIe device is no longer from scan but the on-demand assignment.
> > > >
> > > > - PCIe BAR mapping
> > > >     "bifurcated driver" maps several pages for the queue pairs.
> > > >     Others BAR register space maps to a fake page. The BAR mapping go
> > through
> > > >     mmap on sockfd. Which is a little different from what UIO/VFIO does.
> > > >
> > > > - PMD
> > > >     The PMD will no longer really initialize and configure NIC.
> > > >     Instead, it only takes care the queue pair setup, rx_burst and tx_burst.
> > > >
> > > > The patch uses eal '--vdev' parameter to assign netdev iface name and
> > number of
> > > > queue pairs. Here's a example about how to configure the bifurcated driver
> > and
> > > > run DPDK testpmd with bifurcated PMD.
> > > >
> > > >   1. Set promisc mode
> > > >   > ifconfig eth0 promisc
> > > >
> > > >   2. Turn on fdir
> > > >   > ethtool -K eth0 ntuple on
> > > >
> > > >   3. Setup a flow director rule to distribute packets with source ip
> > > >      0.0.0.0 to rxq No.0
> > > >   > ethtool -N eth0  flow-type udp4 src-ip 0.0.0.0 action 0
> > > >
> > > >   4. Run testpmd on netdev 'eth0' with 1 queue pair.
> > > >   > ./x86_64-native-linuxapp-gcc/app/testpmd -c 0x3 -n 4 \
> > > >   >  --vdev=rte_bifurc,iface=eth0,qpairs=1 -- \
> > > >   >  -i --rxfreet=32 --txfreet=32 --txrst=32
> > > >   Note:
> > > >     iface and qpairs arguments above specify the netdev interface name and
> > > >     number of qpairs that user space request from the "bifurcated driver"
> > > >     respectively.
> > > >
> > > >   5. Setup a flow director rule to distribute packets with source ip
> > > >      1.1.1.1 to rxq No.32. This needs to be done after testpmd starts.
> > > >   > ethtool -N eth0 flow-type udp4 src-ip 1.1.1.1 action 32
> > > >
> > > > Below illustrates the detailed changes in this patch set.
> > > >
> > > > eal
> > > > --------
> > > > The first two patches are all about the eal API declaration and Linux version
> > > > definition to support af_packet socket and verbs of bifurcated netdev.
> > > > Those APIs include the verbs like open, bind, (un)map, split/retturn,
> > map_umem.
> > > > And other APIs like set_pci, get_ifinfo and get/put_devargs which help to
> > > > generate pci device from bifurcated netdev and get basic netdev info.
> > > >
> > > > The third patch is used to allow probing driver on the PCIe VDEV created from
> > > > a NIC interface driven by "bifurcated driver". It defines a new flag
> > > > 'RTE_PCI_DRV_BIFURC' used for direct ring access PMD.
> > > >
> > > > librte_bifurc
> > > > ---------------
> > > > The library is used as a VDEV bus driver to scan '--vdev=rte_bifurc' VDEV
> > > > from eal command-line. It generates the PCIe VDEV device ready for further
> > > > driver probe. It maintains the bifurcated device information include sockfd,
> > > > hwaddr, mtu, qpairs, iface_name. It's used for other direct ring access PMD
> > > > to apply for bifurcated device info.
> > > >
> > > > direct ring access PMD
> > > > -------------------------
> > > > The patch provides direct ring access PMD for ixgbe. Comparing to the normal
> > > > PMD ixgbe, it uses 'RTE_PCI_DRV_BIFURC' flag during self registration.
> > > > It mostly reuses the existing PMD ops to avoid re-implementing everything
> > > > from scratch. And it also modifies the rx/tx_queue_setup to allow queue
> > > > setup from any queue offset.
> > > >
> > > > Supported NIC driver
> > > > ========================
> > > >
> > > > The "bifurcated driver" kernel patch only supports "ixgbe" driver at the
> > moment,
> > > > so this RFC patch also provides "ixgbe" PMD via direct-mapped rings as
> > sample.
> > > > The support for 40GE(i40e) will be added in the future.
> > > >
> > > > In addition, for those multi-queues enabled NIC with flow director capability
> > > > to do perform packet classification and distribution, there's no special
> > > > technical gap to provide bifurcated driver approach support.
> > > >
> > > > Limitation
> > > > ============
> > > >
> > > > By using "bifurcated driver", user space only takes over the DMA operation.
> > > > For those NIC configure setting, it's out of control from user space PMD.
> > > > All the NIC setting including add/del filter rules need to be done by
> > > > standard Linux network tools(e.g. ethtool).
> > > > So the feature support really depend on how much are supported by ethtool.
> > > >
> > > >
> > > > Any questions, comments and feedback are welcome.
> > > >
> > > >
> > > > -END-
> > > >
> > > > Signed-off-by: Cunming Liang <cunming.liang at intel.com>
> > > > Signed-off-by: Danny Zhou <danny.zhou at intel.com>
> > > >
> > > > *** BLURB HERE ***
> > > >
> > > > Cunming Liang (6):
> > > >   eal: common direct ring access API
> > > >   eal: direct ring access support by linux af_packet
> > > >   pci: allow VDEV as pci device during device driver probe
> > > >   bifurc: add driver to scan bifurcated netdev
> > > >   ixgbe: rx/tx queue stop bug fix
> > > >   ixgbe: PMD for bifurc ixgbe net device
> > > >
> > > >  config/common_linuxapp                         |   5 +
> > > >  lib/Makefile                                   |   1 +
> > > >  lib/librte_bifurc/Makefile                     |  58 +++++
> > > >  lib/librte_bifurc/rte_bifurc.c                 | 284
> > +++++++++++++++++++++
> > > >  lib/librte_bifurc/rte_bifurc.h                 |  90 +++++++
> > > >  lib/librte_eal/common/Makefile                 |   5 +
> > > >  lib/librte_eal/common/include/rte_pci.h        |   4 +
> > > >  lib/librte_eal/common/include/rte_pci_bifurc.h | 186 ++++++++++++++
> > > >  lib/librte_eal/linuxapp/eal/Makefile           |   1 +
> > > >  lib/librte_eal/linuxapp/eal/eal_pci.c          |  42 ++--
> > > >  lib/librte_eal/linuxapp/eal/eal_pci_bifurc.c   | 336
> > +++++++++++++++++++++++++
> > > >  lib/librte_ether/rte_ethdev.c                  |   3 +-
> > > >  lib/librte_pmd_ixgbe/Makefile                  |  13 +-
> > > >  lib/librte_pmd_ixgbe/ixgbe_bifurcate.c         | 303
> > ++++++++++++++++++++++
> > > >  lib/librte_pmd_ixgbe/ixgbe_bifurcate.h         |  57 +++++
> > > >  lib/librte_pmd_ixgbe/ixgbe_rxtx.c              |  44 +++-
> > > >  lib/librte_pmd_ixgbe/ixgbe_rxtx.h              |  10 +
> > > >  mk/rte.app.mk                                  |   6 +
> > > >  18 files changed, 1421 insertions(+), 27 deletions(-)
> > > >  create mode 100644 lib/librte_bifurc/Makefile
> > > >  create mode 100644 lib/librte_bifurc/rte_bifurc.c
> > > >  create mode 100644 lib/librte_bifurc/rte_bifurc.h
> > > >  create mode 100644 lib/librte_eal/common/include/rte_pci_bifurc.h
> > > >  create mode 100644 lib/librte_eal/linuxapp/eal/eal_pci_bifurc.c
> > > >  create mode 100644 lib/librte_pmd_ixgbe/ixgbe_bifurcate.c
> > > >  create mode 100644 lib/librte_pmd_ixgbe/ixgbe_bifurcate.h
> > > >
> > > > --
> > > > 1.8.1.4
> > > >
> > > >
> > > AIUI, the bifurcated driver hasn't yet been accepted upstream, has it?  Given
> > > that, I don't think its wise to pull this in yet ahead of the kernel work, as
> > > there may still be kernel side changes that the user space pmd will have to
> > > adapt to.
> > > Neil
> > >
> > Hence the RFC nature of the patch, I believe. :-) Before the kernel part hits the
> > main kernel tree we can at least discuss the overall direction to be taken for
> > this driver because it's significantly different that any other HW driver.
> [Liang, Cunming] Yes, as Bruce said, that's the major purpose.
> Another one is having this patch, people can run it together with kernel patch.
> It helps to understand the benefit and raise comments per user experience.
> >
> > /Bruce

Echo Bruce. Also the V2 DPDK RFC patchset will be submitted to dpdk.org to support 
V2 netdev kernel patchset with memory protection accordingly. Then people can play 
with bifurcated driver and have a global view on how it works and what kinds of perf. 
can be achieved, instead of keep asking basic questions.


More information about the dev mailing list