[dpdk-dev] [PATCH v4] node: switch IPv4 metadata to dynamic mbuf field

Thomas Monjalon thomas at monjalon.net
Wed Oct 28 11:43:56 CET 2020


28/10/2020 11:42, Nithin Dabilpuram:
> On Wed, Oct 28, 2020 at 10:24:01AM +0000, Van Haaren, Harry wrote:
> > From: Thomas Monjalon
> > > 28/10/2020 10:30, Nithin Dabilpuram:
> > > > From: Thomas Monjalon <thomas at monjalon.net>
> > > >
> > > > The node_mbuf_priv1 was stored in the deprecated mbuf field udata64.
> > > > It is moved to a dynamic field in order to allow removal of udata64.
> > > >
> > > > Signed-off-by: Thomas Monjalon <thomas at monjalon.net>
> > > > Signed-off-by: Nithin Dabilpuram <ndabilpuram at marvell.com>
> > > [...]
> > > > +	IP4_LOOKUP_NODE_PRIV1_OFF(node->ctx) =
> > > node_mbuf_priv1_dynfield_offset;
> > > 
> > > That's interesting.
> > > You copy the offset in the node context for better performance.
> > > How much is it better than with global offset variable?
> > > How much it decreases compared to a static mbuf field?
> > 
> > Also interested in this topic, I'll offer the logical/theory point of view;
> > 
> > With a static field, the offset into the mbuf can be encoded in the instruction
> > stream, meaning there are no d-cache loads to identify particular dynamic field.
> > 
> > With a static/global variable, the cache line where the value resides is presumably
> > not hot in cache per burst (assuming an application that does significant work, so not
> > in cache since last burst). Hence overhead estimate could be 1x cache line load per burst.
> > 
> > With the data copied into the node, the offset is presumably on a hot cache line as the
> > node is using other data-members of its context. As a result, perhaps a cold static cache
> > line load is converted to a hot node-context line re-use. 
> > 
> > Real world overhead likely depends on A) does the application cache-trash enough to make
> > the static/global line fall out of cache - causing perf degradation due to reload, and B) does
> > the node->ctx still fit in the same number of lines as before if the value is copied there.
> 
> Agreed, node->ctx is already referred to get other data (lpm pointer). So
> referening another 4 bytes might even convert that to load pair which is at
> no extra cost.
> 
> Number's wise, 
> it decreases by ~1.4 % from static mbuf field to global offset variable 
> and it decreases by ~1% from static mbuf field to node context field
> cached per process call

OK thanks for providing these numbers.




More information about the dev mailing list