[dpdk-dev] [PATCH v2 2/2] ethdev: introduce Tx queue offloads API

Jerin Jacob jerin.jacob at caviumnetworks.com
Tue Sep 12 07:51:39 CEST 2017

-----Original Message-----
> Date: Tue, 12 Sep 2017 05:25:42 +0000
> From: Shahaf Shuler <shahafs at mellanox.com>
> To: Jerin Jacob <jerin.jacob at caviumnetworks.com>, "Ananyev, Konstantin"
>  <konstantin.ananyev at intel.com>
> CC: Stephen Hemminger <stephen at networkplumber.org>, Thomas Monjalon
>  <thomas at monjalon.net>, "dev at dpdk.org" <dev at dpdk.org>, "Zhang, Helin"
>  <helin.zhang at intel.com>, "Wu, Jingjing" <jingjing.wu at intel.com>
> Subject: RE: [dpdk-dev] [PATCH v2 2/2] ethdev: introduce Tx queue offloads
>  API
> Tuesday, September 12, 2017 7:01 AM, Jerin Jacob:
> > Yes, only when ETH_TXQ_FLAGS_NOMULTMEMP and
> > ETH_TXQ_FLAGS_NOREFCOUNT selected at tx queue configuration.
> > 
> > >
> > > So literally, yes it is not a TX HW offload, though I understand your
> > > intention to have such possibility - it might help to save some cycles.
> > 
> > It not a few cycles. We could see ~24% drop on per core(with 64B) with
> > testpmd and l3fwd on some SoCs. It is not very specific to nicvf HW, The
> > problem is with limited cache hierarchy in very low end arm64 machines.
> > For TX buffer recycling case, it need to touch the mbuf again to find out the
> > associated mempool to free. It is fine if application demands it but not all the
> > application demands it.
> > 
> > We have two category of arm64 machines, The high end machine where
> > cache hierarchy similar x86 server machine. The low end ones with very
> > limited cache resources. Unfortunately, we need to have the same binary on
> > both machines.
> > 
> > 
> > > Wonder would some new driver specific function would help in that case?
> > > nicvf_txq_pool_setup(portid, queueid, struct rte_mempool *txpool,
> > > uint32_t flags); or so?
> > 
> > It is possible, but how do we make such change in testpmd, l3fwd or ipsec-
> > gw in tree application which does need only NOMULTIMEMP &
> > 
> > If there is concern about making it Tx queue level it is fine. We can move
> > from queue level to port level or global level.
> > IMO, Application should express in some form that it wants only
> > NOMULTIMEMP & NOREFCOUNT and thats is the case for l3fwd and ipsec-
> > gw
> > 
> I understand the use case, and the fact those flags improve the performance on low-end ARM CPUs.
> IMO those flags cannot be on queue/port level. They must be global.

Where should we have it as global(in terms of API)?
And why it can not be at port level?

> Even though the use-case is generic the nicvf PMD is the only one which do such optimization.
> So am suggesting again - why not expose it as a PMD specific parameter?

Why to make it as PMD specific? if application can express it though
normative DPDK APIs.

> - The application can express it wants such optimization. 
> - It is global
> Currently it does not seems there is high demand for such flags from other PMDs. If such demand will raise, we can discuss again on how to expose it properly.

It is not PMD specific. It is all about where it runs? it will
applicable for any PMD that runs low end hardwares where it need SW
based Tx buffer recycling(The NPU is different story as it has HW
assisted mempool manager).
What we are loosing by running DPDK effectively on low end hardware
with such "on demand" runtime configuration though DPDK normative API.


More information about the dev mailing list