[dpdk-dev] [PATCH 01/17] mbuf: add definitions of unified packet types

Olivier MATZ olivier.matz at 6wind.com
Mon Feb 2 12:18:16 CET 2015


Hi Helin,

On 02/02/2015 02:43 AM, Zhang, Helin wrote:
>>> +/*
>>> + * Sixteen bits are divided into several fields to mark packet types.
>>> +Note that
>>> + * each field is indexical.
>>> + * - Bit 3:0 is for tunnel types.
>>> + * - Bit 7:4 is for L3 or outer L3 (for tunneling case) types.
>>> + * - Bit 10:8 is for L4 types. It can also be used for inner L4 types for
>>> + *   tunneling packets.
>>> + * - Bit 13:11 is for inner L3 types.
>>> + * - Bit 15:14 is reserved.
>>
>> Is there a reason why using this specific order?
> Yes, to support ixgbe Vector PMD, outer L3 types and L4 types need to be contiguous
> and in this order.

When you say "need to be", do you mean it's impossible to do in another
manner or just that it would be slower?

>> Also, there are 4 bits for outer L3 types and 3 bits for inner L3 types, but both of
>> them have 6 different supported types. Is it intentional?
> Yes, it is to support ixgbe Vector PMD. Contiguous 7 bits are needed, though 1 bit wasted.

To be honnest, I'm always a surprised that in dpdk we prefer having
a strange API just because it's faster or easier to do on one
specific driver (usually i40e or ixgbe). Unfortunately, trying to
optimize the API for one driver may result in making the rest of the
code (application and other drivers) slower and more complex.

In your proposition, there is no inner l4_type. I consider it's as
useful as the other fields. From what I see, there are only 2 bits
left. What do you think about changing the packet type to 64 bits now?

>From an API point of view, I think it would be good to have the
same structure for inner and outer types. For instance (this is
just an example):

union layer_pkt_type {
	struct {
		uint16_t l2_type:4;
		uint16_t l3_type:4;
		uint16_t l4_type:4;
		uint16_t tun_type:4;
	};
	uint16_t u16;
};

struct pkt_type {
	union layer_pkt_type outer;
	union layer_pkt_type inner;
};

When your application decapsulates tunnels, you can just do
outer = inner and enter into the same code.


>>> + * RTE_PTYPE_L3_IPV6, RTE_PTYPE_L3_IPV6_EXT, RTE_PTYPE_L4_TCP,
>>> +RTE_PTYPE_L4_UDP
>>> + * and RTE_PTYPE_L4_SCTP should be kept as below in a contiguous 7 bits.
>>> + *
>>> + * Note that L3 types values are selected for checking IPV4/IPV6
>>> +header from
>>> + * performance point of view. Reading annotations of
>>> +RTE_ETH_IS_IPV4_HDR and
>>> + * RTE_ETH_IS_IPV6_HDR is needed for any future changes of L3 type
>> values.
>>> + */
>>> +#define RTE_PTYPE_UNKNOWN                   0x0000 /*
>> 0b0000000000000000 */
>>> +/* bit 3:0 for tunnel types */
>>> +#define RTE_PTYPE_TUNNEL_IP                 0x0001 /*
>> 0b0000000000000001 */
>>> +#define RTE_PTYPE_TUNNEL_TCP                0x0002 /*
>> 0b0000000000000010 */
>>> +#define RTE_PTYPE_TUNNEL_UDP                0x0003 /*
>> 0b0000000000000011 */
>>> +#define RTE_PTYPE_TUNNEL_GRE                0x0004 /*
>> 0b0000000000000100 */
>>> +#define RTE_PTYPE_TUNNEL_VXLAN              0x0005 /*
>> 0b0000000000000101 */
>>> +#define RTE_PTYPE_TUNNEL_NVGRE              0x0006 /*
>> 0b0000000000000110 */
>>> +#define RTE_PTYPE_TUNNEL_GENEVE             0x0007 /*
>> 0b0000000000000111 */
>>> +#define RTE_PTYPE_TUNNEL_GRENAT             0x0008 /*
>> 0b0000000000001000 */
>>> +#define RTE_PTYPE_TUNNEL_GRENAT_MAC         0x0009 /*
>> 0b0000000000001001 */
>>> +#define RTE_PTYPE_TUNNEL_GRENAT_MACVLAN     0x000a /*
>> 0b0000000000001010 */
>>> +#define RTE_PTYPE_TUNNEL_MASK               0x000f /*
>> 0b0000000000001111 */
>>> +/* bit 7:4 for L3 types */
>>> +#define RTE_PTYPE_L3_IPV4                   0x0010 /*
>> 0b0000000000010000 */
>>> +#define RTE_PTYPE_L3_IPV4_EXT               0x0030 /*
>> 0b0000000000110000 */
>>> +#define RTE_PTYPE_L3_IPV6                   0x0040 /*
>> 0b0000000001000000 */
>>> +#define RTE_PTYPE_L3_IPV4_EXT_UNKNOWN       0x0090 /*
>> 0b0000000010010000 */
>>> +#define RTE_PTYPE_L3_IPV6_EXT               0x00c0 /*
>> 0b0000000011000000 */
>>> +#define RTE_PTYPE_L3_IPV6_EXT_UNKNOWN       0x00e0 /*
>> 0b0000000011100000 */
>>> +#define RTE_PTYPE_L3_MASK                   0x00f0 /*
>> 0b0000000011110000 */
>>
>> can we expect that when RTE_PTYPE_L3_IPV4, RTE_PTYPE_L3_IPV4_EXT or
>> RTE_PTYPE_L3_IPV4_EXT_UNKNOWN is set, the hardware also verified the
>> L3 checksum?
> RTE_PTYPE_L3_IPV4 means there is NONE-EXT. Each time only one of above 3 can be set.
> These bits don't indicate any checksum, checksum should be indicated by other flags.
> They are just for packet types hardware can recognized.

I think these 2 information are linked:

- if the hardware cannot recognize packet, it cannot calculate the
  checksum because it does not know the packet type
- if the hardware can recognize the packet, it can verify that the
  checksum is good or wrong

Today, we have:

- PKT_RX_IPV4_HDR and PKT_RX_IPV4_HDR_EXT to tell if the packet is
  seen as IPv4 by the hw.

- We can suppose that:

  - PKT_RX_IPV4_HDR(_EXT)=0 -> no hw checksum information
  - PKT_RX_IPV4_HDR(_EXT)=1 and PKT_RX_IP_CKSUM_BAD=0 -> checksum
    is correct
  - PKT_RX_IPV4_HDR(_EXT)=1 and PKT_RX_IP_CKSUM_BAD=1 -> checksum
    is not correct

- We cannot do the same with L4 because we have no L4 type info,
  but it would be good to be able to do the same.

With your patch, you are removing the PKT_RX_IPV4_HDR and
PKT_RX_IPV4_HDR_EXT flags, but I think the above assumption
about checksum should be kept. As you are adding a L4 type
info, the same method could be applied to L4 checksums.

I think this would definitely solve the problem described by Stephen.


>> My understanding is:
>>
>> - if packet_type is IPv4* and PKT_RX_IP_CKSUM_BAD is 0
>>    -> checksum was checked by hw and is good
>> - if packet_type is IPv4* and PKT_RX_IP_CKSUM_BAD is 1
>>    -> checksum was checked by hw and is bad
>> - if packet_type is not IPv4*
>>    -> checksum was not checked by hw
>>
>> I think it would solve the problem asked by Stephen
>> http://dpdk.org/ml/archives/dev/2015-January/011550.html
>>
>>> +/* bit 10:8 for L4 types */
>>> +#define RTE_PTYPE_L4_TCP                    0x0100 /*
>> 0b0000000100000000 */
>>> +#define RTE_PTYPE_L4_UDP                    0x0200 /*
>> 0b0000001000000000 */
>>> +#define RTE_PTYPE_L4_FRAG                   0x0300 /*
>> 0b0000001100000000 */
>>> +#define RTE_PTYPE_L4_SCTP                   0x0400 /*
>> 0b0000010000000000 */
>>> +#define RTE_PTYPE_L4_ICMP                   0x0500 /*
>> 0b0000010100000000 */
>>> +#define RTE_PTYPE_L4_NONFRAG                0x0600 /*
>> 0b0000011000000000 */
>>> +#define RTE_PTYPE_L4_MASK                   0x0700 /*
>> 0b0000011100000000 */
>>
>> Same question for L4.
>>
>> Note: it would means that if a hardware is able to recognize a TCP packet but
>> not to verify the checksum, it has to set RTE_PTYPE_L4 to unknown.
>>
>>> +/* bit 13:11 for inner L3 types */
>>> +#define RTE_PTYPE_INNER_L3_IPV4             0x0800 /*
>> 0b0000100000000000 */
>>> +#define RTE_PTYPE_INNER_L3_IPV4_EXT         0x1000 /*
>> 0b0001000000000000 */
>>> +#define RTE_PTYPE_INNER_L3_IPV6             0x1800 /*
>> 0b0001100000000000 */
>>> +#define RTE_PTYPE_INNER_L3_IPV6_EXT         0x2000 /*
>> 0b0010000000000000 */
>>> +#define RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN 0x2800 /*
>>> +0b0010100000000000 */ #define
>> RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN 0x3000 /* 0b0011000000000000
>> */
> We cannot define the hardware behaviors, it just reports the hardware recognized
> packet information directly to the mbuf.
> Based on my experiment on i40e hardware, if a IPV4 packet with wrong checksum,
> by default, the PMD driver cannot see the packet at all. So we don't need to care
> about it too much!

I agree that the hardware reports some info that can be different
depending on the hw. But the role of the driver is to convert these info
into a common API with a well-defined behavior.

Regards,
Olivier


More information about the dev mailing list