[dpdk-dev] [PATCH v5] ip_frag: remove padding length of fragment
Ananyev, Konstantin
konstantin.ananyev at intel.com
Wed Dec 16 11:44:56 CET 2020
Hi Yicai,
> > In some situations, we would get several ip fragments, which total
> > data length is less than min_ip_len(64) and padding with zeros.
> > We simulated intermediate fragments by modifying the MTU.
> > To illustrate the problem, we simplify the packet format and ignore
> > the impact of the packet header.In namespace2, a packet whose data
> > length is 1520 is sent.
> > When the packet passes tap2, the packet is divided into two
> > fragments: fragment A and B, similar to (1520 = 1510 + 10).
> > When the packet passes tap3, the larger fragment packet A is divided
> > into two fragments A1 and A2, similar to (1510 = 1500 + 10).
> > Finally, the bond interface receives three fragments:
> > A1, A2, and B (1520 = 1500 + 10 + 10).
> > One fragmented packet A2 is smaller than the minimum Ethernet frame
> > length, so it needs to be padded.
> >
> > |---------------------------------------------------|
> > | HOST |
> > | |--------------| |----------------------------| |
> > | | ns2 | | |--------------| | |
> > | | |--------| | | |--------| |--------| | |
> > | | | tap1 | | | | tap2 | ns1| tap3 | | |
> > | | |mtu=1510| | | |mtu=1510| |mtu=1500| | |
> > | |--|1.1.1.1 |--| |--|1.1.1.2 |----|2.1.1.1 |--| |
> > | |--------| |--------| |--------| |
> > | | | | |
> > | |-----------------| | |
> > | | |
> > | |--------| |
> > | | bond | |
> > |--------------------------------------|mtu=1500|---|
> > |--------|
> >
> > When processing the preceding packets above, DPDK would aggregate
> > fragmented packets A2 and B.
> > And error packets are generated, which padding(zero) is displayed in
> > the middle of the packet.
> >
> > A2 + B:
> > 0000 fa 16 3e 9f fb 82 fa 47 b2 57 dc 20 08 00 45 00
> > 0010 00 33 b4 66 00 ba 3f 01 c1 a5 01 01 01 01 02 01
> > 0020 01 02 c0 c1 c2 c3 c4 c5 c6 c7 00 00 00 00 00 00
> > 0030 00 00 00 00 00 00 00 00 00 00 00 00 c8 c9 ca cb
> > 0040 cc cd ce cf d0 d1 d2 d3 d4 d5 d6 d7 d8 d9 da db
> > 0050 dc dd de df e0 e1 e2 e3 e4 e5 e6
> >
> > So, we would calculate the length of padding, and remove the padding
> > in pkt_len and data_len before aggregation.
> >
> > Fixes: 7f0983ee331c ("ip_frag: check fragment length of incoming
> > packet")
> > Cc: stable at dpdk.org
> >
> > Signed-off-by: Yicai Lu <luyicai at huawei.com>
> > ---
> > v4 -> v5: Update the comments and description.
> > ---
> > lib/librte_ip_frag/rte_ipv4_reassembly.c | 12 +++++++++---
> > 1 file changed, 9 insertions(+), 3 deletions(-)
> >
> > diff --git a/lib/librte_ip_frag/rte_ipv4_reassembly.c
> > b/lib/librte_ip_frag/rte_ipv4_reassembly.c
> > index 1dda8ac..fdf66a4 100644
> > --- a/lib/librte_ip_frag/rte_ipv4_reassembly.c
> > +++ b/lib/librte_ip_frag/rte_ipv4_reassembly.c
> > @@ -104,6 +104,7 @@ struct rte_mbuf *
> > const unaligned_uint64_t *psd;
> > uint16_t flag_offset, ip_ofs, ip_flag;
> > int32_t ip_len;
> > + int32_t trim;
> >
> > flag_offset = rte_be_to_cpu_16(ip_hdr->fragment_offset);
> > ip_ofs = (uint16_t)(flag_offset & RTE_IPV4_HDR_OFFSET_MASK); @@
> > -117,14 +118,15 @@ struct rte_mbuf *
> >
> > ip_ofs *= RTE_IPV4_HDR_OFFSET_UNITS;
> > ip_len = rte_be_to_cpu_16(ip_hdr->total_length) - mb->l3_len;
> > + trim = mb->pkt_len - (ip_len + mb->l3_len + mb->l2_len);
> >
> > IP_FRAG_LOG(DEBUG, "%s:%d:\n"
> > - "mbuf: %p, tms: %" PRIu64
> > - ", key: <%" PRIx64 ", %#x>, ofs: %u, len: %d, flags: %#x\n"
> > + "mbuf: %p, tms: %" PRIu64 ", key: <%" PRIx64 ", %#x>"
> > + "ofs: %u, len: %d, padding: %d, flags: %#x\n"
> > "tbl: %p, max_cycles: %" PRIu64 ", entry_mask: %#x, "
> > "max_entries: %u, use_entries: %u\n\n",
> > __func__, __LINE__,
> > - mb, tms, key.src_dst[0], key.id, ip_ofs, ip_len, ip_flag,
> > + mb, tms, key.src_dst[0], key.id, ip_ofs, ip_len, trim, ip_flag,
> > tbl, tbl->max_cycles, tbl->entry_mask, tbl->max_entries,
> > tbl->use_entries);
> >
> > @@ -134,6 +136,10 @@ struct rte_mbuf *
> > return NULL;
> > }
> >
> > + if (unlikely(trim > 0)) {
> > + rte_pktmbuf_trim(mb, trim);
> > + }
>
> > As a nit {} braces are not required for single expression.
> > LGTM in general, just one thing: shouldn't we have the same fix for ipv6 then?
> > Konstantin
>
> Hi Konstantin,
>
> Thanks!
>
> During the problem analysis, we have discussed on ipv6
> and concluded that it does not exist in ipv6.
>
> For ipv6, it consists of the following parts:
> basic header = 40(bytes)
> DMAC = 6(bytes)
> SMAC = 6(bytes)
> Type = 2(bytes)
> CRC = 4(bytes)
> fragment header = 8(bytes)
> ...
>
> 40 + 6 + 6 + 2 + 4 + 8 = 66 (bytes)
>
> Total is already greater than min_ip_len(64). So it doesn't
> need to be padded with zeros.
For normal cases - yes, but in theory there could be some unusual scenarios
(tunnelled packet, different media, etc.).
So for consistency and to avoid unforeseen issues - I think better to have
the fix for both ipv4 and ipv6.
After all the impact looks neglectable.
Konstantin
>
> > +
> > /* try to find/add entry into the fragment's table. */
> > if ((fp = ip_frag_find(tbl, dr, &key, tms)) == NULL) {
> > IP_FRAG_MBUF2DR(dr, mb);
> > --
> > 1.9.5.msysgit.1
More information about the dev
mailing list