[dpdk-dev] overcommitting CPUs
venky.venkatesan at intel.com
Wed Aug 27 16:54:21 CEST 2014
DPDK currently isn't exactly poll mode - it has an API that receives and
transmits packets. How you enter that API could be interrupt or polled
-we've left that up to the application to decide, rather than force a
interrupt/NAPI type architecture. I do agree with Alex in that
implementing a interrupt/load driven entry point as an option will make
it usable more widely. There are multiple challenges here - managing the
latency of an interrupt driven scheme in a user-space context, not to
mention very high jitter rates to mention a few.
That said, overcommitment of CPUs can be achieved in other ways as well.
You could allocate and enforce CPU sharing via cgroups, and allocate x%
of a core to the DPDK pthread. It does introduce a degree of
indeterminism to when the DPDK pthread gets scheduled back in (depending
on how many other threads are running on that core). But it is another
On 8/27/2014 1:40 AM, Alex Markuze wrote:
> IMHO adding "Interrupt Mode" to dpdk is important as this can open
> DPDK to a larger public of consumers, I can easily imagine someone
> trying to find user space networking solution (And deciding against
> verbs - RDMA) for the obvious reasons and not needing deterministic
> A few thoughts:
> Deterministic Latency: Its a fiction in a sence that this something
> you will be able to see only in a small controlled environment. As
> network latencies in Data Centres(DC) are dominated by switch queuing
> (One good reference is http://fastpass.mit.edu that Vincent shared a
> few days back).
> Virtual environments: In virtual environments this is especially
> interesting as the NIC driver(Hypervisor) is working in IRQ mode which
> unless the Interrupts are pinned to different cpus then the VM will
> have a disruptive effect on the VM's performance. Moving to interrupt
> mode mode in paravirtualised environments makes sense as in any
> environment that is not carefully crafted you should not expect any
> deterministic guaranties and would opt for a simpler programming model
> - like interrupt mode.
> NAPI: With 10G NICs Most CPUs poll rate is faster then the NIC message
> rate resulting in 1:1 napi_poll callback to IRQ ratio this is true
> even with small packets. In some cases where the CPU is working slower
> - for example when intel_iommu=on,strict is set , you can actually see
> a performance inversion where the "slower" CPU can reach higher B/W
> because the slowdown makes NAPI work with the kernel effectively
> moving to polling mode.
> I think that a smarter DPDK-NAPI is important, but it is a next step
> IFF the interrupt mode is adopted.
> On Wed, Aug 27, 2014 at 8:48 AM, Patel, Rashmin N
> <rashmin.n.patel at intel.com> wrote:
>> You're right and I've felt the same harder part of determinism with other hypervisors' soft switch solutions as well. I think it's worth thinking about.
>> On Aug 26, 2014 9:15 PM, Stephen Hemminger <stephen at networkplumber.org> wrote:
>> The way to handle switch between out of poll mode is to use IRQ coalescing
>> You want to hold off IRQ until there are a couple packets or a short delay.
>> Going out of poll mode
>> is harder to determine.
>> On Tue, Aug 26, 2014 at 9:59 AM, Zhou, Danny <danny.zhou at intel.com> wrote:
>>>> -----Original Message-----
>>>> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Stephen Hemminger
>>>> Sent: Wednesday, August 27, 2014 12:39 AM
>>>> To: Michael Marchetti
>>>> Cc: dev at dpdk.org
>>>> Subject: Re: [dpdk-dev] overcommitting CPUs
>>>> On Tue, 26 Aug 2014 16:27:14 +0000
>>>> "Michael Marchetti" <mmarchetti at sandvine.com> wrote:
>>>>> Hi, has there been any consideration to introduce a non-spinning
>>> network driver (interrupt based), for the purpose of overcommitting
>>>> CPUs in a virtualized environment? This would obviously have reduced
>>> high-end performance but would allow for increased guest
>>>> density (sharing of physical CPUs) on a host.
>>>>> I am interested in adding support for this kind of operation, is there
>>> any interest in the community?
>>>> Better to implement a NAPI like algorithm that adapts from poll to
>>> Agreed, but DPDK is currently pure poll-mode based, so unlike the NAPI'
>>> simple algorithm, the new heuristic algorithm should not switch from
>>> poll-mode to interrupt-mode immediately once there is no packet in the
>>> recent poll. Otherwise, mode switching will be too frequent which brings
>>> serious negative performance impact to DPDK.
More information about the dev