[dpdk-dev] [PATCH v5 0/8] Introduce event vectorization

Jayatheerthan, Jay jay.jayatheerthan at intel.com
Wed Mar 24 09:10:35 CET 2021


> -----Original Message-----
> From: Pavan Nikhilesh Bhagavatula <pbhagavatula at marvell.com>
> Sent: Wednesday, March 24, 2021 12:15 PM
> To: Jayatheerthan, Jay <jay.jayatheerthan at intel.com>; Jerin Jacob Kollanukkaran <jerinj at marvell.com>; Carrillo, Erik G
> <erik.g.carrillo at intel.com>; Gujjar, Abhinandan S <abhinandan.gujjar at intel.com>; McDaniel, Timothy
> <timothy.mcdaniel at intel.com>; hemant.agrawal at nxp.com; Van Haaren, Harry <harry.van.haaren at intel.com>; mattias.ronnblom
> <mattias.ronnblom at ericsson.com>; Ma, Liang J <liang.j.ma at intel.com>
> Cc: dev at dpdk.org
> Subject: RE: [dpdk-dev] [PATCH v5 0/8] Introduce event vectorization
> 
> >> From: pbhagavatula at marvell.com <pbhagavatula at marvell.com>
> >> Sent: Wednesday, March 24, 2021 10:35 AM
> >> To: jerinj at marvell.com; Jayatheerthan, Jay
> ><jay.jayatheerthan at intel.com>; Carrillo, Erik G
> ><erik.g.carrillo at intel.com>; Gujjar, Abhinandan
> >> S <abhinandan.gujjar at intel.com>; McDaniel, Timothy
> ><timothy.mcdaniel at intel.com>; hemant.agrawal at nxp.com; Van
> >Haaren, Harry
> >> <harry.van.haaren at intel.com>; mattias.ronnblom
> ><mattias.ronnblom at ericsson.com>; Ma, Liang J
> ><liang.j.ma at intel.com>
> >> Cc: dev at dpdk.org; Pavan Nikhilesh <pbhagavatula at marvell.com>
> >> Subject: [dpdk-dev] [PATCH v5 0/8] Introduce event vectorization
> >>
> >> From: Pavan Nikhilesh <pbhagavatula at marvell.com>
> >>
> >> In traditional event programming model, events are identified by a
> >> flow-id and a uintptr_t. The flow-id uniquely identifies a given event
> >> and determines the order of scheduling based on schedule type, the
> >> uintptr_t holds a single object.
> >>
> >> Event devices also support burst mode with configurable dequeue
> >depth,
> >> i.e. each dequeue call would return multiple events and each event
> >> might be at a different stage of the pipeline.
> >> Having a burst of events belonging to different stages in a dequeue
> >> burst is not only difficult to vectorize but also increases the scheduler
> >> overhead and application overhead of pipelining events further.
> >> Using event vectors we see a performance gain of ~628% as shown in
> >[1].
> >This is very impressive performance boost. Thanks so much for putting
> >this patchset together! Just curious, was any performance
> >measurement done for existing applications (non-vector)?
> >>
> >> By introducing event vectorization, each event will be capable of
> >holding
> >> multiple uintptr_t of the same flow thereby allowing applications
> >> to vectorize their pipeline and reduce the complexity of pipelining
> >> events across multiple stages. This also reduces the complexity of
> >handling
> >> enqueue and dequeue on an event device.
> >>
> >> Since event devices are transparent to the events they are scheduling
> >> so the event producers such as eth_rx_adapter, crypto_adapter , etc..
> >> are responsible for vectorizing the buffers of the same flow into a
> >single
> >> event.
> >>
> >> The series also breaks ABI in the patch [8/8] which is targetted to the
> >> v21.11 release.
> >>
> >> The dpdk-test-eventdev application has been updated with options to
> >test
> >> multiple vector sizes and timeouts.
> >>
> >> [1]
> >> As for performance improvement, with a ARM Cortex-A72 equivalent
> >processer,
> >> software event device (--vdev=event_sw0), single worker core, single
> >stage
> >> and using one service core for Rx adapter, Tx adapter, Scheduling.
> >>
> >> Without event vectorization:
> >>     ./build/app/dpdk-test-eventdev -l 7-23 -s 0x700 --
> >vdev="event_sw0" --
> >>          --prod_type_ethdev --nb_pkts=0 --verbose 2 --
> >test=pipeline_queue
> >>          --stlist=a --wlcores=20
> >>     Port[0] using Rx adapter[0] configured
> >>     Port[0] using Tx adapter[0] Configured
> >>     4.728 mpps avg 4.728 mpps
> >Is this number before the patchset? If so, it would help put similar
> >number with the patchset but not using vectorization feature.
> 
> I don’t remember the exact clock frequency I was using when I ran
> the above test but with equal clocks:
> 1. Without the patchset applied
> 	5.071 mpps
> 2. With patchset applied w/o enabling vector
> 	5.123 mpps
> 3. With patchset applied with enabling vector
> 	vector_sz at 256 42.715 mpps
> 	vector_sz at 512 45.335 mpps
> 

Thanks Pavan for the details. It may be useful to include this info in the patchset.

> >>
> >> With event vectorization:
> >>     ./build/app/dpdk-test-eventdev -l 7-23 -s 0x700 --
> >vdev="event_sw0" --
> >>         --prod_type_ethdev --nb_pkts=0 --verbose 2 --
> >test=pipeline_queue
> >>         --stlist=a --wlcores=20 --enable_vector --nb_eth_queues 1
> >>         --vector_size 256
> >>     Port[0] using Rx adapter[0] configured
> >>     Port[0] using Tx adapter[0] Configured
> >>     34.383 mpps avg 34.383 mpps
> >>
> >> Having dedicated service cores for each Rx queues and tweaking the
> >vector,
> >> dequeue burst size would further improve performance.
> >>
> >> API usage is shown below:
> >>
> >> Configuration:
> >>
> >> 	struct rte_event_eth_rx_adapter_event_vector_config
> >vec_conf;
> >>
> >> 	vector_pool = rte_event_vector_pool_create("vector_pool",
> >> 			nb_elem, 0, vector_size, socket_id);
> >>
> >> 	rte_event_eth_rx_adapter_create(id, event_id, &adptr_conf);
> >> 	rte_event_eth_rx_adapter_queue_add(id, eth_id, -1,
> >&queue_conf);
> >> 	if (cap & RTE_EVENT_ETH_RX_ADAPTER_CAP_EVENT_VECTOR)
> >{
> >> 		vec_conf.vector_sz = vector_size;
> >> 		vec_conf.vector_timeout_ns = vector_tmo_nsec;
> >> 		vec_conf.vector_mp = vector_pool;
> >>
> >	rte_event_eth_rx_adapter_queue_event_vector_config(id,
> >> 				eth_id, -1, &vec_conf);
> >> 	}
> >>
> >> Fastpath:
> >>
> >> 	num = rte_event_dequeue_burst(event_id, port_id, &ev, 1, 0);
> >> 	if (!num)
> >> 		continue;
> >>
> >> 	if (ev.event_type & RTE_EVENT_TYPE_VECTOR) {
> >> 		switch (ev.event_type) {
> >> 		case RTE_EVENT_TYPE_ETHDEV_VECTOR:
> >> 		case RTE_EVENT_TYPE_ETH_RX_ADAPTER_VECTOR:
> >> 			struct rte_mbuf **mbufs;
> >>
> >> 			mbufs = ev.vector_ev->mbufs;
> >> 			for (i = 0; i < ev.vector_ev->nb_elem; i++)
> >> 				//Process mbufs.
> >> 			break;
> >> 		case ...
> >> 		}
> >> 	}
> >> 	...
> >>
> >> v5 Changes:
> >> - Make `rte_event_vector_pool_create non-inline` to ease ABI
> >stability.(Ray)
> >> - Move `rte_event_eth_rx_adapter_queue_event_vector_config` and
> >>   `rte_event_eth_rx_adapter_vector_limits_get` implementation to
> >the patch
> >>   where they are initially defined.(Ray)
> >> - Multiple gramatical and style fixes.(Jerin)
> >> - Add missing release notes.(Jerin)
> >>
> >> v4 Changes:
> >> - Fix missing event vector structure in event structure.(Jay)
> >>
> >> v3 Changes:
> >> - Fix unintended formatting changes.
> >>
> >> v2 Changes:
> >> - Multiple gramatical and style fixes.(Jerin)
> >> - Add parameter to define vector size in power of 2. (Jerin)
> >> - Redo patch series w/o breaking ABI till the last patch.(David)
> >> - Add deprication notice to announce ABI break in 21.11.(David)
> >> - Add vector limits validation to app/test-eventdev.
> >>
> >> Pavan Nikhilesh (8):
> >>   eventdev: introduce event vector capability
> >>   eventdev: introduce event vector Rx capability
> >>   eventdev: introduce event vector Tx capability
> >>   eventdev: add Rx adapter event vector support
> >>   eventdev: add Tx adapter event vector support
> >>   app/eventdev: add event vector mode in pipeline test
> >>   doc: announce event Rx adapter config changes
> >>   eventdev: simplify Rx adapter event vector config
> >>
> >>  app/test-eventdev/evt_common.h                |   4 +
> >>  app/test-eventdev/evt_options.c               |  52 +++
> >>  app/test-eventdev/evt_options.h               |   4 +
> >>  app/test-eventdev/test_pipeline_atq.c         | 310 +++++++++++++++--
> >>  app/test-eventdev/test_pipeline_common.c      | 105 +++++-
> >>  app/test-eventdev/test_pipeline_common.h      |  18 +
> >>  app/test-eventdev/test_pipeline_queue.c       | 320
> >++++++++++++++++--
> >>  .../prog_guide/event_ethernet_rx_adapter.rst  |  38 +++
> >>  .../prog_guide/event_ethernet_tx_adapter.rst  |  12 +
> >>  doc/guides/prog_guide/eventdev.rst            |  36 +-
> >>  doc/guides/rel_notes/deprecation.rst          |   9 +
> >>  doc/guides/rel_notes/release_21_05.rst        |   8 +
> >>  doc/guides/tools/testeventdev.rst             |  45 ++-
> >>  lib/librte_eventdev/eventdev_pmd.h            |  31 +-
> >>  .../rte_event_eth_rx_adapter.c                | 305 ++++++++++++++++-
> >>  .../rte_event_eth_rx_adapter.h                |  78 +++++
> >>  .../rte_event_eth_tx_adapter.c                |  66 +++-
> >>  lib/librte_eventdev/rte_eventdev.c            |  53 ++-
> >>  lib/librte_eventdev/rte_eventdev.h            | 113 ++++++-
> >>  lib/librte_eventdev/version.map               |   4 +
> >>  20 files changed, 1524 insertions(+), 87 deletions(-)
> >>
> >> --
> >> 2.17.1
> >
> >Just a heads up. v5 patchset doesn't apply cleanly on HEAD
> >(5f0849c1155849dfdbf950c91c52cdf9cd301f59). Although, it applies
> >cleanly on app/eventdev: fix timeout accuracy
> >(c33d48387dc8ccf1b432820f6e0cd4992ab486df).
> 
> This patchset is currently rebased on main branch, I will rebase it on
> dpdk-next-event in next version.
> 



More information about the dev mailing list