[dpdk-dev] [PATCH 05/18] mbuf: add function to get packet type from data

Olivier MATZ olivier.matz at 6wind.com
Wed Jul 6 14:08:48 CEST 2016


Hi Andrey,

On 07/06/2016 01:59 PM, Chilikin, Andrey wrote:
> Hi Oliver,
> 
>> -----Original Message-----
>> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Olivier MATZ
>> Sent: Wednesday, July 6, 2016 8:43 AM
>> To: Liang, Cunming <cunming.liang at intel.com>; dev at dpdk.org
>> Subject: Re: [dpdk-dev] [PATCH 05/18] mbuf: add function to get packet type
>> from data
>>
>> Hi Cunming,
>>
>> On 07/06/2016 08:44 AM, Liang, Cunming wrote:
>>> Hi Olivier,
>>>
>>> On 7/5/2016 11:41 PM, Olivier Matz wrote:
>>>> Introduce the function rte_pktmbuf_get_ptype() that parses a mbuf and
>>>> returns its packet type. For now, the following packet types are
>>>> parsed:
>>>>     L2: Ether
>>>>     L3: IPv4, IPv6
>>>>     L4: TCP, UDP, SCTP
>>>>
>>>> The goal here is to provide a reference implementation for packet
>>>> type parsing. This function will be used by testpmd in next commits,
>>>> allowing to compare its result with the value given by the hardware.
>>>>
>>>> This function will also be useful when implementing Rx offload
>>>> support in virtio pmd. Indeed, the virtio protocol gives the csum
>>>> start and offset, but it does not give the L4 protocol nor it tells
>>>> if the checksum is relevant for inner or outer. This information has
>>>> to be known to properly set the ol_flags in mbuf.
>>>>
>>>> Signed-off-by: Didier Pallard <didier.pallard at 6wind.com>
>>>> Signed-off-by: Jean Dao <jean.dao at 6wind.com>
>>>> Signed-off-by: Olivier Matz <olivier.matz at 6wind.com>
>>>> ---
>>>>   doc/guides/rel_notes/release_16_11.rst |   5 +
>>>>   lib/librte_mbuf/Makefile               |   5 +-
>>>>   lib/librte_mbuf/rte_mbuf_ptype.c       | 234
>>>> +++++++++++++++++++++++++++++++++
>>>>   lib/librte_mbuf/rte_mbuf_ptype.h       |  43 ++++++
>>>>   lib/librte_mbuf/rte_mbuf_version.map   |   1 +
>>>>   5 files changed, 286 insertions(+), 2 deletions(-)
>>>>   create mode 100644 lib/librte_mbuf/rte_mbuf_ptype.c
>>>>
>>>> [...]
>>>> +
>>>> +/* parse mbuf data to get packet type */ uint32_t
>>>> +rte_pktmbuf_get_ptype(const struct rte_mbuf *m,
>>>> +    struct rte_mbuf_hdr_lens *hdr_lens) {
>>>> +    struct rte_mbuf_hdr_lens local_hdr_lens;
>>>> +    const struct ether_hdr *eh;
>>>> +    struct ether_hdr eh_copy;
>>>> +    uint32_t pkt_type = RTE_PTYPE_L2_ETHER;
>>>> +    uint32_t off = 0;
>>>> +    uint16_t proto;
>>>> +
>>>> +    if (hdr_lens == NULL)
>>>> +        hdr_lens = &local_hdr_lens;
>>>> +
>>>> +    eh = rte_pktmbuf_read(m, off, sizeof(*eh), &eh_copy);
>>>> +    if (unlikely(eh == NULL))
>>>> +        return 0;
>>>> +    proto = eh->ether_type;
>>>> +    off = sizeof(*eh);
>>>> +    hdr_lens->l2_len = off;
>>>> +
>>>> +    if (proto == rte_cpu_to_be_16(ETHER_TYPE_IPv4)) {
>>>> +        const struct ipv4_hdr *ip4h;
>>>> +        struct ipv4_hdr ip4h_copy;
>>>> +
>>>> +        ip4h = rte_pktmbuf_read(m, off, sizeof(*ip4h), &ip4h_copy);
>>>> +        if (unlikely(ip4h == NULL))
>>>> +            return pkt_type;
>>>> +
>>>> +        pkt_type |= ptype_l3_ip(ip4h->version_ihl);
>>>> +        hdr_lens->l3_len = ip4_hlen(ip4h);
>>>> +        off += hdr_lens->l3_len;
>>>> +        if (ip4h->fragment_offset &
>>>> +                rte_cpu_to_be_16(IPV4_HDR_OFFSET_MASK |
>>>> +                    IPV4_HDR_MF_FLAG)) {
>>>> +            pkt_type |= RTE_PTYPE_L4_FRAG;
>>>> +            hdr_lens->l4_len = 0;
>>>> +            return pkt_type;
>>>> +        }
>>>> +        proto = ip4h->next_proto_id;
>>>> +        pkt_type |= ptype_l4(proto);
>>>> +    } else if (proto == rte_cpu_to_be_16(ETHER_TYPE_IPv6)) {
>>>> +        const struct ipv6_hdr *ip6h;
>>>> +        struct ipv6_hdr ip6h_copy;
>>>> +        int frag = 0;
>>>> +
>>>> +        ip6h = rte_pktmbuf_read(m, off, sizeof(*ip6h), &ip6h_copy);
>>>> +        if (unlikely(ip6h == NULL))
>>>> +            return pkt_type;
>>>> +
>>>> +        proto = ip6h->proto;
>>>> +        hdr_lens->l3_len = sizeof(*ip6h);
>>>> +        off += hdr_lens->l3_len;
>>>> +        pkt_type |= ptype_l3_ip6(proto);
>>>> +        if ((pkt_type & RTE_PTYPE_L3_MASK) == RTE_PTYPE_L3_IPV6_EXT) {
>>>> +            proto = skip_ip6_ext(proto, m, &off, &frag);
>>>> +            hdr_lens->l3_len = off - hdr_lens->l2_len;
>>>> +        }
>>>> +        if (proto == 0)
>>>> +            return pkt_type;
>>>> +        if (frag) {
>>>> +            pkt_type |= RTE_PTYPE_L4_FRAG;
>>>> +            hdr_lens->l4_len = 0;
>>>> +            return pkt_type;
>>>> +        }
>>>> +        pkt_type |= ptype_l4(proto);
>>>> +    }
>>>> +
>>>> +    if ((pkt_type & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_UDP) {
>>>> +        hdr_lens->l4_len = sizeof(struct udp_hdr);
>>>> +    } else if ((pkt_type & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_TCP) {
>>>> +        const struct tcp_hdr *th;
>>>> +        struct tcp_hdr th_copy;
>>>> +
>>>> +        th = rte_pktmbuf_read(m, off, sizeof(*th), &th_copy);
>>>> +        if (unlikely(th == NULL))
>>>> +            return pkt_type & (RTE_PTYPE_L2_MASK |
>>>> +                RTE_PTYPE_L3_MASK);
>>>> +        hdr_lens->l4_len = (th->data_off & 0xf0) >> 2;
>>>> +    } else if ((pkt_type & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_SCTP) {
>>>> +        hdr_lens->l4_len = sizeof(struct sctp_hdr);
>>>> +    } else {
>>>> +        hdr_lens->l4_len = 0;
>>>> +    }
>>>> +
>>>> +    return pkt_type;
>>>> +}
>>>> diff --git a/lib/librte_mbuf/rte_mbuf_ptype.h
>>>> b/lib/librte_mbuf/rte_mbuf_ptype.h
>>>> index 4a34678..f468520 100644
>>>> --- a/lib/librte_mbuf/rte_mbuf_ptype.h
>>>> +++ b/lib/librte_mbuf/rte_mbuf_ptype.h
>>>> @@ -545,6 +545,49 @@ extern "C" {
>>>>           RTE_PTYPE_INNER_L3_MASK |                \
>>>>           RTE_PTYPE_INNER_L4_MASK))
>>>>   +struct rte_mbuf;
>>>> +
>>>> +/**
>>>> + * Structure containing header lengths associated to a packet.
>>>> + */
>>>> +struct rte_mbuf_hdr_lens {
>>>> +    uint8_t l2_len;
>>>> +    uint8_t l3_len;
>>>> +    uint8_t l4_len;
>>>> +    uint8_t tunnel_len;
>>>> +    uint8_t inner_l2_len;
>>>> +    uint8_t inner_l3_len;
>>>> +    uint8_t inner_l4_len;
>>>> +};
>>> [LC] The header parsing graph usually is not unique. The definition
>>> maybe nice for the basic IP and L4 tunnel.
>>> However it can't scale out to other cases, e.g. qinq, mac-in-mac, mpls
>>> l2/l3 tunnel.
>>> The parsing logic of "rte_pktmbuf_get_ptype()" and the definition of
>>> "struct rte_mbuf_hdr_lens" consist a pair for one specific parser scheme.
>>> In this case, the fixed function is to support below.
>>>
>>> + * Supported packet types are:
>>> + *   L2: Ether
>>> + *   L3: IPv4, IPv6
>>> + *   L4: TCP, UDP, SCTP
>>>
>>> Of course, it can add more packet type detection logic in future. But
>>> the more support, the higher the cost.
>>>
>>> One of the alternative way is to allow registering parser pair. APP
>>> decides to choose the predefined scheme(by DPDK LIB), or to
>>> self-define the parsing logic.
>>> In this way, the scheme can do some assumption for the specific case
>>> and ignore some useless graph detection.
>>> In addition, besides the SW parser, the HW parser(identified by
>>> packet_type in mbuf) can be turn on/off by leveraging the same manner.
>>
>> Sorry, I'm not sure I'm fully getting what you are saying. If I understand well,
>> you would like to have something more flexible that supports the registration of
>> protocol to be recognized?
>>
>> I'm not sure having a function with a dynamic registration method would really
>> increase the performance compared to a static complete function.
>> Actually, we will never support a tons of protocols since each layer packet type
>> is 4 bits, and since it requires that at least one hw supports it.
> 
> This patch will be very useful as a reference implementation, but it also highlights an issue with the current implementation of packet types reporting by HW and SW - as you just mentioned there are only 4 bits per each layer. As these 4 bit are used as a enumeration it is impossible to reports multiple headers located on the same layer. MPLS is one example, different packets could have different numbers of MPLS labels, but it is impossible to report using current packet_type structure.
> 
> It is possible, however, to  program HW to report user (application) specific packet types. For example, for IPoMPLS with one MPLS label, HW will report packet type A, but for IPoMPLS with two MPLS labels HW will reports packet type B. In this case, instead of defining and supporting tons of statically defined (or enumerated) protocol headers combinations, application will register packet types it expects from HW in addition to standard packet types. At the moment we  have high bits of packet_type reserved, so one possible solution would be to use the highest bit to indicate that this is user defined packet_type, specific to the application. Then it could be used with HW and with SW parser. For example, packet_type 0x8000000A is IPoMPLS with one MPLS label, 0x8000000B is IPoMPLS with two MPLS labels and so on.

Thank you for the explanation. From your description, I wonder if the
flow director API recently [1] proposed by Adrien wouldn't solve this issue?

[1] http://dpdk.org/ml/archives/dev/2016-July/043365.html

Regards,
Olivier


More information about the dev mailing list