[dpdk-dev] [PATCH 0/4] DPDK memcpy optimization

Jay Rolette rolette at infiniteio.com
Thu Jan 22 20:36:26 CET 2015

Previous message: [dpdk-dev] [PATCH 0/4] DPDK memcpy optimization
Next message: [dpdk-dev] [PATCH 0/4] DPDK memcpy optimization
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Thu, Jan 22, 2015 at 12:27 PM, Luke Gorrie <luke at snabb.co> wrote:

> On 22 January 2015 at 14:29, Jay Rolette <rolette at infiniteio.com> wrote:
>
>> Microseconds matter. Scaling up to 100GbE, nanoseconds matter.
>>
>
> True. Is there a cut-off point though?
>

There are always engineering trade-offs that have to be made. If I'm
optimizing something today, I'm certainly not starting at something that
takes 1ns for an app that is doing L4-7 processing. It's all about
profiling and figuring out where the bottlenecks are.

For past networking products I've built, there was a lot of traffic that
the software didn't have to do much to. Minimal L2/L3 checks, then forward
the packet. It didn't even have to parse the headers because that was
offloaded on an FPGA. The only way to make those packets faster was to turn
them around in the FPGA and not send them to the CPU at all. That change
improved small packet performance by ~30%. That was on high-end network
processors that are significantly faster than Intel processors for packet
handling.

It seems to be a strange thing when you realize that just getting the
packets into the CPU is expensive, nevermind what you do with them after
that.

Does one nanosecond matter?
>

You just have to be careful when talking about things like a nanosecond.
It's sounds really small, but IPG for a 10G link is only 9.6ns. It's all
relative.

AVX512 will fit a 64-byte packet in one register and move that to or from
> memory with one instruction. L1/L2 cache bandwidth per server is growing on
> a double-exponential curve (both bandwidth per core and cores per CPU). I
> wonder if moving data around in cache will soon be too cheap for us to
> justify worrying about.
>

Adding cores helps with aggregate performance, but doesn't really help with
latency on a single packet. That said, I'll take advantage of anything I
can from the hardware to either let me scale up how much traffic I can
handle or the amount of features I can add at the same performance level!

Jay

Previous message: [dpdk-dev] [PATCH 0/4] DPDK memcpy optimization
Next message: [dpdk-dev] [PATCH 0/4] DPDK memcpy optimization
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the dev mailing list