[dpdk-dev] Where is the padding code in DPDK?

Morten Brørup mb at smartsharesystems.com
Thu Nov 15 11:27:18 CET 2018


> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Wiles, Keith
> > On Nov 14, 2018, at 4:51 AM, Morten Brørup <mb at smartsharesystems.com>
> wrote:
> >
> > Anatoly,
> >
> > This differs from the Linux kernel's behavior, where padding belongs
> in the NIC driver layer, not in the protocol layer. If you pass a runt
> frame (too short packet) to a Linux NIC driver's transmission function,
> the NIC driver (or NIC hardware) will pad the frame to make it valid.
> E.g. look at the rhine_start_tx() function in the kernel:
> https://elixir.bootlin.com/linux/v4.9.137/source/drivers/net/ethernet/v
> ia/via-rhine.c#L1800
> 
> The PMD in DPDK rejects the frame or extend the number of bytes to
> send. Padding assumes you are zeroing out the packet to meet the NIC
> required length. In PMDs unless they are concerned with security they
> just make sure the number of bytes to be sent are correct for the
> hardware (60 bytes min). Most NICs can do this padding in hardware as
> the packet is sent.

Great, so let's extend DPDK to provide that feature!

> 
> If we are talking about virtio and only talking to virtio software
> backend then you can send any size packet, but the stacks or code
> receiving the packet you need to make sure it does not throw the packet
> away because it is a runt packet. Most NICs throw away Runts and are
> never received to memory. In software based design like virtio you can
> do whatever you want in the length, but I would suggest following the
> Ethernet standard anyway.

Good point: If virtio is considered an Ethernet type interface (although it is able to handle really large Jumbo frames), then yes, the minimum packet size requirements should apply to this too. This is probably a question for the virtio folks to decide: Is virtio considered an Ethernet interface, or another type of interface (without Ethernet packet size requirements, like the "localhost" pseudo interface)?

But how about other non-physical interfaces, are they all considered Ethernet type interfaces? And to take it to the extreme: Should DPDK by design only support Ethernet type interfaces?

> 
> Now some stacks or code (like Pktgen) assume the hardware will append
> the CRC (4 bytes) and this means the application needs to at least do
> 60 byte frames for the PMD, unless you know the hardware will do the
> right thing. The challenge is that applications in DPDK do not know the
> details of the NIC at that level and should always assume the packet
> being sent and received are valid Ethernet frames. This means at lease
> 60 bytes as all NICs add the CRC now a days and not all of them adjust
> the size of the frame.
> 
> If you do not send the PMD a 60 byte frame then you are expecting the
> NIC to handle the padding and appending the CRC or at least expecting
> the PMD to adjust the size, which I know is not in all PMDs or from my
> dealing with writing Pktgen for DPDK.

You said it! And it proves my point about what higher layer developers probably expect of lower layers.

> 
> If you are expecting DPDK PMDs to be Linux drivers then you need to
> adjust your thinking and only send the PMD 60 bytes at least. Unless
> you want to modify all of the PMDs to force the size to 60bytes, then I
> have no objection to that patch just need to get all of the PMDs
> maintainers to agree with your patch.

I agree that different thinking is required, and Linux is not always perfect. However, we allowed to copy good ideas from Linux - and I think that having padding in Ethernet PMDs is a perfectly logical concept. There are quite a few PMD maintainers, and I was hoping to take the discussion about the high level concept on the open mailing list before we involve the PMD maintainers about the implementation.

I think that a stack or code using DPDK as its lower layer expects DPDK to provide some offloading, and since padding to 60 byte payload is a very common event in stacks (due to empty TCP ACK packets), this is an obvious offload candidate!

Of course, if DPDK was only designed for packet forwarding applications, and not also intended for use as a lower layer for stacks, then padding to 60 byte payload should not be required. I guess that DPDK was initially designed for packet forwarding applications, but is this still the case today, or should DPDK evolve to also accommodate the needs of stacks?

If padding is not included in the PMDs, consider this (highly theoretical examples but for the discussion of the concept): The DPDK packet manipulation libraries could be required to do it, e.g. for fragmentation reassembly of two extremely small packets, totaling less than 60 byte payload, or for IPsec decapsulation of a very small packet. Otherwise the application would have to do it just before calling the PMD TX functions.


> 
> On RX frames of less then 64 bytes (with CRC) are runts and most NICs
> today will not receive these frames unless you program the hardware to
> do so. ‘In my day’ :-) we had collision on the wire which created a
> huge amount of fragments or Runts, today is not the case with point-to-
> point links we have today.

I agree that RX of frames of less than 64 bytes (with CRC) - on Ethernet interfaces! - should still be considered runts, and thus should be discarded and counted as errors.

> 
> >
> > If DPDK does not pad short frames passed to the egress function of
> the NIC drivers, it should be noted in the documentation - this is not
> the expected behavior by protocol developers.
> >
> > Or even better: The NIC hardware (or driver) should ensure padding,
> possibly considering it a TX Offload feature. Generating packets
> shorter than 60 bytes data is common - just consider the amount of TCP
> ACK packets, which are typically only 14 + 20 + 20 = 54 bytes (incl.
> the 14 byte Ethernet header).
> >
> >
> > Med venlig hilsen / kind regards
> > - Morten Brørup
> >
> >> -----Original Message-----
> >> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Burakov,
> Anatoly
> >> Sent: Wednesday, November 14, 2018 11:18 AM
> >> To: Sam
> >> Cc: dev at dpdk.org
> >> Subject: Re: [dpdk-dev] Where is the padding code in DPDK?
> >>
> >> On 14-Nov-18 5:45 AM, Sam wrote:
> >>> OK, then shortly speaking, DPDK will NOT care about padding.
> >>> NIC will care about padding while send and recv with NIC.
> >>> kernel will care about while send and recv with vhostuser port.
> >>>
> >>> Is that right?
> >>
> >> I cannot speak for virtio/vhost user since i am not terribly
> familiar
> >> with them. For regular packets, generally speaking, packets shorter
> >> than
> >> 60 bytes are invalid. Whether DPDK does or does not care about
> padding
> >> is irrelevant, because *you* are attempting to transmit packets that
> >> are
> >> not valid. You shouldn't rely on this behavior.
> >>
> >>>
> >>>
> >>> Burakov, Anatoly <anatoly.burakov at intel.com
> >>> <mailto:anatoly.burakov at intel.com>> 于2018年11月13日周二
> 下午5:29写道:
> >>>
> >>>    On 13-Nov-18 7:16 AM, Sam wrote:
> >>>> Hi all,
> >>>>
> >>>> As we know, ethernet frame must longer then 64B.
> >>>>
> >>>> So if I create rte_mbuf and fill it with just 60B data, will
> >>>> rte_eth_tx_burst add padding data, let the frame longer then
> >> 64B
> >>>>
> >>>> If it does, where is the code?
> >>>>
> >>>
> >>>    Others can correct me if i'm wrong here, but specifically in
> case
> >> of
> >>>    64-byte packets, these are the shortest valid packets that you
> >> can
> >>>    send,
> >>>    and a 64-byte packet will actually carry only 60 bytes' worth of
> >> packet
> >>>    data, because there's a 4-byte CRC frame at the end (see
> Ethernet
> >> frame
> >>>    format). If you enabled CRC offload, then your NIC will append
> >> the 4
> >>>    bytes at transmit. If you haven't, then it's up to each
> >> individual
> >>>    driver/NIC to accept/reject such a packet because it can rightly
> >> be
> >>>    considered malformed.
> >>>
> >>>    In addition, your NIC may add e.g. VLAN tags or other stuff,
> >> again
> >>>    depending on hardware offloads that you have enabled in your TX
> >>>    configuration, which may push the packet size beyond 64 bytes
> >> while
> >>>    having only 60 bytes of actual packet data.
> >>>
> >>>    --
> >>>    Thanks,
> >>>    Anatoly
> >>>
> >>
> >>
> >> --
> >> Thanks,
> >> Anatoly
> >
> 
> Regards,
> Keith



More information about the dev mailing list