[dpdk-dev] mbuff rearm_data aligmenet issue on non x86

Jerin Jacob jerin.jacob at caviumnetworks.com
Thu May 12 16:50:55 CEST 2016


On Thu, May 12, 2016 at 01:14:34PM +0000, Ananyev, Konstantin wrote:
> > 
> > On Thu, May 12, 2016 at 10:07:09AM +0000, Ananyev, Konstantin wrote:
> > > Hi Jerrin,
> > >
> > > >
> > > > Hi All,
> > > >
> > > > I would like align mbuff rearm_data field to 8 byte aligned so that
> > > > write to mbuf->rearm_data with uint64_t* will be naturally aligned.
> > > > I am not sure about IA but some other architecture/implementation has overhead
> > > > in non-naturally aligned stores.
> > > >
> > > > Proposed patch is something like this below, But open for any change to
> > > > make fit for all other architectures/platform.
> > > >
> > > > Any thoughts ?
> > > >
> > > > ➜ [master] [dpdk-master] $ git diff
> > > > diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
> > > > index 529debb..5a917d0 100644
> > > > --- a/lib/librte_mbuf/rte_mbuf.h
> > > > +++ b/lib/librte_mbuf/rte_mbuf.h
> > > > @@ -733,10 +733,8 @@ struct rte_mbuf {
> > > >         void *buf_addr;           /**< Virtual address of segment
> > > > buffer. */
> > > >         phys_addr_t buf_physaddr; /**< Physical address of segment
> > > > buffer. */
> > > >
> > > > -       uint16_t buf_len;         /**< Length of segment buffer. */
> > > > -
> > >
> > >
> > > There is no need to move buf_len itself, I think.
> > > Just move rearm_data marker prior to buf_len is enough.
> > > Though how do you suggest to deal with the fact, that right now we blindly
> > > update the whole 64bits pointed by rearm_data:
> > >
> > > drivers/net/ixgbe/ixgbe_rxtx_vec.c:
> > > 	/*
> > >                  * Flush mbuf with pkt template.
> > >                  * Data to be rearmed is 6 bytes long.
> > >                  * Though, RX will overwrite ol_flags that are coming next
> > >                  * anyway. So overwrite whole 8 bytes with one load:
> > >                  * 6 bytes of rearm_data plus first 2 bytes of ol_flags.
> > >                  */
> > >                 p0 = (uintptr_t)&mb0->rearm_data;
> > >                 *(uint64_t *)p0 = rxq->mbuf_initializer;
> > >
> > > ?
> > >
> > > If buf_len will be inside these 64bits, we can't do it anymore.
> > >
> > > Are you suggesting something like:
> > >
> > > uint64_t *p0, v0;
> > >
> > > p0 = &mb0->rearm_data;
> > > v0 = *p0 & REARM_MASK;
> > > *p0 = v0 | rxq->mbuf_initializer;
> > > ?
> > 
> > Due to unaligned rearm_data issue, In ThunderX platform, we need to write
> > multiple half word of aligned stores(so masking was better us).
> 
> Ok, so what would be the gain on ARM if you'll make that change?

~4 cpu cycles per packet.Again it may not be ARM architecture specific
as ARM architecture does not define  instruction latency so it is more of a
implementation specific data.

> Again, what would be the drop (if any) on IA?
> 
> > But I think, if we can put 16bit hole between port and ol_flags then
> > we may not need the masking stuff in ixgbe. Right?
> 
> You mean move buf_len somewhere else (end of cacheline0) and 
> introduce a 2B hole between port and ol_flags, right?

Yes

> Yep, that probably wouldn't have any performance impact.

I will try two options and send a patch which don't have any
performance impact on IA.

> 
> > 
> > OR
> > 
> > Even better, if we can fill in a uint16_t variable which will replaced
> > later in the flow like "data_len"?
> 
> data_len is grouped  with rx_descriptor_fields1 on purpose -
> so we can update packet_type,pkt_len, data_len,vlan_tci,rss with one 16B write.

OK

> 
> Konstantin
> 
> > and move buf_len at end the first  cache line? 
> >or any other thoughts to fix unaligned rearm_data issue?
> > 
> > Jerin
> > 
> > 
> > 
> > >
> > > If so I wonder what would be the performance impact of that change.
> > > Konstantin
> > >
> > >
> > > >         /* next 6 bytes are initialised on RX descriptor rearm */
> > > > -       MARKER8 rearm_data;
> > > > +       MARKER64 rearm_data;
> > > >         uint16_t data_off;
> > > >
> > > >         /**
> > > > @@ -754,6 +752,7 @@ struct rte_mbuf {
> > > >         uint8_t nb_segs;          /**< Number of segments. */
> > > >         uint8_t port;             /**< Input port. */
> > > >
> > > > +       uint16_t buf_len;         /**< Length of segment buffer. */
> > > >         uint64_t ol_flags;        /**< Offload features. */
> > > >
> > > >         /* remaining bytes are set on RX when pulling packet from
> > > >  * descriptor
> > > >
> > > > /Jerin


More information about the dev mailing list