[dpdk-dev] [PATCH v7 0/17] distributor library performance enhancements

Bruce Richardson bruce.richardson at intel.com
Fri Feb 24 15:01:53 CET 2017


On Tue, Feb 21, 2017 at 03:17:36AM +0000, David Hunt wrote:
> This patch aims to improve the throughput of the distributor library.
> 
> It uses a similar handshake mechanism to the previous version of
> the library, in that bits are used to indicate when packets are ready
> to be sent to a worker and ready to be returned from a worker. One main
> difference is that instead of sending one packet in a cache line, it makes
> use of the 7 free spaces in the same cache line in order to send up to
> 8 packets at a time to/from a worker.
> 
> The flow matching algorithm has had significant re-work, and now keeps an
> array of inflight flows and an array of backlog flows, and matches incoming
> flows to the inflight/backlog flows of all workers so that flow pinning to
> workers can be maintained.
> 
> The Flow Match algorithm has both scalar and a vector versions, and a
> function pointer is used to select the post appropriate function at run time,
> depending on the presence of the SSE2 cpu flag. On non-x86 platforms, the
> the scalar match function is selected, which should still gives a good boost
> in performance over the non-burst API.
> 
> v2 changes:
>   * Created a common distributor_priv.h header file with common
>     definitions and structures.
>   * Added a scalar version so it can be built and used on machines without
>     sse2 instruction set
>   * Added unit autotests
>   * Added perf autotest

For future reference, I think it's better to put the list of deltas from
each version in reverse order, so that the latest changes are on top,
and save scrolling for those of us who have been tracking the set.

> 
> v3 changes:
>   * Addressed mailing list review comments
>   * Test code removal
>   * Split out SSE match into separate file to facilitate NEON addition
>   * Cleaned up conditional compilation flags for SSE2
>   * Addressed c99 style compilation errors
>   * rebased on latest head (Jan 2 2017, Happy New Year to all)
> 
> v4 changes:
>    * fixed issue building shared libraries
> 
> v5 changes:
>    * Removed some un-needed code around retries in worker API calls
>    * Cleanup due to review comments on mailing list
>    * Cleanup of non-x86 platform compilation, fallback to scalar match
> 
> v6 changes:
>    * Fixed intermittent segfault where num pkts not divisible
>      by BURST_SIZE
>    * Cleanup due to review comments on mailing list
>    * Renamed _priv.h to _private.h.
> 
> v7 changes:
>    * Reorganised patch so there's a more natural progression in the
>      changes, and divided them down into easier to review chunks.
>    * Previous versions of this patch set were effectively two APIs.
>      We now have a single API. Legacy functionality can
>      be used by by using the rte_distributor_create API call with the
>      RTE_DISTRIBUTOR_SINGLE flag when creating a distributor instance.
>    * Added symbol versioning for old API so that ABI is preserved.
> 
The merging to a single API is great to see, making it so much easier
for app developers. Thanks for that.

/Bruce


More information about the dev mailing list