[dpdk-dev] [PATCH v4] node: switch IPv4 metadata to dynamic mbuf field

Van Haaren, Harry harry.van.haaren at intel.com
Wed Oct 28 11:24:01 CET 2020


> -----Original Message-----
> From: dev <dev-bounces at dpdk.org> On Behalf Of Thomas Monjalon
> Sent: Wednesday, October 28, 2020 10:09 AM
> To: Nithin Dabilpuram <ndabilpuram at marvell.com>
> Cc: Pavan Nikhilesh <pbhagavatula at marvell.com>; Jerin Jacob
> <jerinj at marvell.com>; Ruifeng Wang <ruifeng.wang at arm.com>; Richardson, Bruce
> <bruce.richardson at intel.com>; Ananyev, Konstantin
> <konstantin.ananyev at intel.com>; kirankumark at marvell.com; dev at dpdk.org;
> david.marchand at redhat.com; olivier.matz at 6wind.com
> Subject: Re: [dpdk-dev] [PATCH v4] node: switch IPv4 metadata to dynamic mbuf
> field
> 
> 28/10/2020 10:30, Nithin Dabilpuram:
> > From: Thomas Monjalon <thomas at monjalon.net>
> >
> > The node_mbuf_priv1 was stored in the deprecated mbuf field udata64.
> > It is moved to a dynamic field in order to allow removal of udata64.
> >
> > Signed-off-by: Thomas Monjalon <thomas at monjalon.net>
> > Signed-off-by: Nithin Dabilpuram <ndabilpuram at marvell.com>
> [...]
> > +	IP4_LOOKUP_NODE_PRIV1_OFF(node->ctx) =
> node_mbuf_priv1_dynfield_offset;
> 
> That's interesting.
> You copy the offset in the node context for better performance.
> How much is it better than with global offset variable?
> How much it decreases compared to a static mbuf field?

Also interested in this topic, I'll offer the logical/theory point of view;

With a static field, the offset into the mbuf can be encoded in the instruction
stream, meaning there are no d-cache loads to identify particular dynamic field.

With a static/global variable, the cache line where the value resides is presumably
not hot in cache per burst (assuming an application that does significant work, so not
in cache since last burst). Hence overhead estimate could be 1x cache line load per burst.

With the data copied into the node, the offset is presumably on a hot cache line as the
node is using other data-members of its context. As a result, perhaps a cold static cache
line load is converted to a hot node-context line re-use. 

Real world overhead likely depends on A) does the application cache-trash enough to make
the static/global line fall out of cache - causing perf degradation due to reload, and B) does
the node->ctx still fit in the same number of lines as before if the value is copied there.


More information about the dev mailing list