[dpdk-dev] [ovs-dev] [PATCH RFC] dpif-netdev: Add support Intel DPDK based ports.
ff at ozog.com
Wed Jan 29 21:47:47 CET 2014
> > First and easy answer: it is open source, so anyone can recompile. So,
> > what's the issue?
> I'm talking from a pure distribution perspective here: Requiring to
> recompile all DPDK based applications to distribute a bugfix or to add
> support for a new PMD is not ideal.
> So ideally OVS would have the possibility to link against the shared
> library long term.
I agree that distribution of DPDK apps is not covered properly at present.
Identifying the proper scheme requires a specific analysis based on the
constraints of the Telecom/Cloud/Networking markets.
In the telecom world, if you fix the underlying framework of an app, you
will still have to validate the solution, ie app/framework. In addition, the
idea of shared libraries introduces the implied requirement to validate apps
against diverse versions of DPDK shared libraries. This translates into
development and support costs.
I also expect many DPDK applications to tackle core networking features,
with sub micro second packet handling delays and even lower than 200ns
(NAT64...). The lazy binding based on ELF PLT represent quite a cost, not
mentioning that optimization stops are shared libraries boundaries (gcc
whole program optimization can be very effective...). Microsoft DLL linkage
are an order of magnitude faster. If Linux was to provide that, I would
probably revise my judgment. (I haven't checked Linux dynamic linking
implementation for some time so my understanding of Linux dynamic linking
may be outdated).
> > I get lost: do you mean ABI + API toward the PMDs or towards the
> > applications using the librte ?
> Towards the PMDs is more straight forward at first so it seems logical to
> focus on that first.
I don't think it is so straight forward. Many recent cards such as Chelsio
and Myricom have a very different "packet memory layout" that does not fit
so easily into actual DPDK architecture.
1) "traditional" architecture: the driver reserves X buffers and provide the
card with descriptors of those buffers. Each packet is DMA'ed into exactly
one buffer. Typically you have 2K buffers, a 64 byte packet consumes exactly
2) "alternative" new architecture: the driver reserves a memory zone, say
4MB, without any structure, and provide a a single zone description and a
ring buffer to the card. (there no individual buffer descriptors any more).
The card fills the memory zone with packets, one next to the other and
specifies where the packets are by updating the supplied ring. Out of the
many issues fitting this scheme into DPDK, you cannot free a single mbuf:
you have to maintain a ref count to the memory zone so that, when all mbufs
have been "released", the memory zone can be freed.
That's quite a stretch from actual paradigm.
Apart from this aspect, managing RSS is two tied to Intel's flow director
concepts and cannot accommodate directly smarter or dumber RSS mechanisms.
That said, I fully agree PMD API should be revisited.
More information about the dev