[dpdk-dev] [PATCH v3 2/5] gso: add TCP/IPv4 GSO support

Ananyev, Konstantin konstantin.ananyev at intel.com
Thu Sep 14 10:39:42 CEST 2017



> -----Original Message-----
> From: Kavanagh, Mark B
> Sent: Thursday, September 14, 2017 9:35 AM
> To: Hu, Jiayu <jiayu.hu at intel.com>; Ananyev, Konstantin <konstantin.ananyev at intel.com>
> Cc: dev at dpdk.org; Tan, Jianfeng <jianfeng.tan at intel.com>
> Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
> 
> >From: Hu, Jiayu
> >Sent: Thursday, September 14, 2017 2:00 AM
> >To: Ananyev, Konstantin <konstantin.ananyev at intel.com>; Kavanagh, Mark B
> ><mark.b.kavanagh at intel.com>
> >Cc: dev at dpdk.org; Tan, Jianfeng <jianfeng.tan at intel.com>
> >Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
> >
> >Hi Konstantin,
> >
> >> -----Original Message-----
> >> From: Ananyev, Konstantin
> >> Sent: Wednesday, September 13, 2017 11:13 PM
> >> To: Kavanagh, Mark B <mark.b.kavanagh at intel.com>; Hu, Jiayu
> >> <jiayu.hu at intel.com>
> >> Cc: dev at dpdk.org; Tan, Jianfeng <jianfeng.tan at intel.com>
> >> Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
> >>
> >> Hi Mark,
> >>
> >> > -----Original Message-----
> >> > From: Kavanagh, Mark B
> >> > Sent: Wednesday, September 13, 2017 3:52 PM
> >> > To: Ananyev, Konstantin <konstantin.ananyev at intel.com>; Hu, Jiayu
> >> <jiayu.hu at intel.com>
> >> > Cc: dev at dpdk.org; Tan, Jianfeng <jianfeng.tan at intel.com>
> >> > Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
> >> >
> >> > >From: Ananyev, Konstantin
> >> > >Sent: Wednesday, September 13, 2017 10:38 AM
> >> > >To: Hu, Jiayu <jiayu.hu at intel.com>
> >> > >Cc: dev at dpdk.org; Kavanagh, Mark B <mark.b.kavanagh at intel.com>;
> >> Tan, Jianfeng
> >> > ><jianfeng.tan at intel.com>
> >> > >Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
> >> > >
> >> > >
> >> > >
> >> > >> > > +
> >> > >> > > +int
> >> > >> > > +gso_tcp4_segment(struct rte_mbuf *pkt,
> >> > >> > > +		uint16_t gso_size,
> >> > >> > > +		uint8_t ipid_delta,
> >> > >> > > +		struct rte_mempool *direct_pool,
> >> > >> > > +		struct rte_mempool *indirect_pool,
> >> > >> > > +		struct rte_mbuf **pkts_out,
> >> > >> > > +		uint16_t nb_pkts_out)
> >> > >> > > +{
> >> > >> > > +	struct ipv4_hdr *ipv4_hdr;
> >> > >> > > +	uint16_t tcp_dl;
> >> > >> > > +	uint16_t pyld_unit_size;
> >> > >> > > +	uint16_t hdr_offset;
> >> > >> > > +	int ret = 1;
> >> > >> > > +
> >> > >> > > +	ipv4_hdr = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt, char *)
> >> +
> >> > >> > > +			pkt->l2_len);
> >> > >> > > +	/* Don't process the fragmented packet */
> >> > >> > > +	if (unlikely((ipv4_hdr->fragment_offset & rte_cpu_to_be_16(
> >> > >> > > +
> >> 	IPV4_HDR_DF_MASK)) == 0)) {
> >> > >> >
> >> > >> >
> >> > >> > It is not a check for fragmented packet - it is a check that
> >> fragmentation
> >> > >is allowed for that packet.
> >> > >> > Should be IPV4_HDR_DF_MASK - 1,  I think.
> >> > >
> >> > >DF bit doesn't indicate is packet fragmented or not.
> >> > >It forbids to fragment packet any further.
> >> > >To check is packet already fragmented or not, you have to check MF bit
> >> and
> >> > >frag_offset.
> >> > >Both have to be zero for un-fragmented packets.
> >> > >
> >> > >>
> >> > >> IMO, IPV4_HDR_DF_MASK whose value is (1 << 14) is used to get DF bit.
> >> It's a
> >> > >> little-endian value. But ipv4_hdr->fragment_offset is big-endian order.
> >> > >> So the value of DF bit should be "ipv4_hdr->fragment_offset &
> >> > >rte_cpu_to_be_16(
> >> > >> IPV4_HDR_DF_MASK)". If this value is 0, the input packet is fragmented.
> >> > >>
> >> > >> >
> >> > >> > > +		pkts_out[0] = pkt;
> >> > >> > > +		return ret;
> >> > >> > > +	}
> >> > >> > > +
> >> > >> > > +	tcp_dl = rte_be_to_cpu_16(ipv4_hdr->total_length) - pkt-
> >> >l3_len -
> >> > >> > > +		pkt->l4_len;
> >> > >> >
> >> > >> > Why not use pkt->pkt_len - pkt->l2_len -pkt_l3_len - pkt_l4_len?
> >> > >>
> >> > >> Yes, we can use pkt->pkt_len - pkt->l2_len -pkt_l3_len - pkt_l4_len
> >here.
> >> > >>
> >> > >> >
> >> > >> > > +	/* Don't process the packet without data */
> >> > >> > > +	if (unlikely(tcp_dl == 0)) {
> >> > >> > > +		pkts_out[0] = pkt;
> >> > >> > > +		return ret;
> >> > >> > > +	}
> >> > >> > > +
> >> > >> > > +	hdr_offset = pkt->l2_len + pkt->l3_len + pkt->l4_len;
> >> > >> > > +	pyld_unit_size = gso_size - hdr_offset - ETHER_CRC_LEN;
> >> > >> >
> >> > >> > Hmm, why do we need to count CRC_LEN here?
> >> > >>
> >> > >> Yes, we shouldn't count ETHER_CRC_LEN here. Its length should be
> >> > >> included in gso_size.
> >> > >
> >> > >Why?
> >> > >What is the point to account crc len into this computation?
> >> > >Why not just assume that gso_size is already a max_frame_size - crc_len
> >> > >As I remember, when we RX packet crc bytes will be already stripped,
> >> > >when user populates the packet, he doesn't care about crc bytes too.
> >> >
> >> > Hi Konstantin,
> >> >
> >> > When packet is tx'd, the 4B for CRC are added back into the packet; if the
> >> payload is already at max capacity, then the actual segment size
> >> > will be 4B larger than expected (e.g. 1522B, as opposed to 1518B).
> >> > To prevent that from happening, we account for the CRC len in this
> >> calculation.
> >>
> >>
> >> Ok, and what prevents you to set gso_ctx.gso_size = 1514;  /*ether frame
> >> size without crc bytes */
> >> ?
> 
> Hey Konstantin,
> 
> If the user sets the gso_size to 1514, the resultant output segments' size should be 1514, and not 1518.

Yes and then NIC HW will add CRC bytes for you.
You are not filling CRC bytes in HW, and when providing to the HW size to send  - it is a payload size
(CRC bytes are not accounted).
Konstantin

 Consequently, the payload capacity
> of each segment would be reduced accordingly.
> The user only cares about the output segment size (i.e. gso_ctx.gso_size); we need to ensure that the size of the segments that are
> produced is consistent with that. As a result, we need to ensure that any packet overhead is accounted for in the segment size, before we
> can determine how much space remains for data.
> 
> Hope this makes sense.
> 
> Thanks,
> Mark
> 
> >
> >Exactly, applications can set 1514 to gso_segsz instead of 1518, if the lower
> >layer
> >will add CRC to the packet.
> >
> >Jiayu
> >
> >> Konstantin
> >>
> >> >
> >> > If I've missed anything, please do let me know!
> >> >
> >> > Thanks,
> >> > Mark
> >> >
> >> > >
> >> > >Konstantin


More information about the dev mailing list