[EXTERNAL] Re: [PATCH v2 1/3] eventdev: introduce event pre-scheduling
    Pathak, Pravin 
    pravin.pathak at intel.com
       
    Fri Sep 27 05:31:00 CEST 2024
    
    
  
> -----Original Message-----
> From: Pavan Nikhilesh Bhagavatula <pbhagavatula at marvell.com>
> Sent: Thursday, September 26, 2024 6:03 AM
> To: Pathak, Pravin <pravin.pathak at intel.com>; Mattias Rönnblom
> <hofors at lysator.liu.se>; Jerin Jacob <jerinj at marvell.com>; Shijith Thotton
> <sthotton at marvell.com>; Sevincer, Abdullah <abdullah.sevincer at intel.com>;
> hemant.agrawal at nxp.com; sachin.saxena at oss.nxp.com; Van Haaren, Harry
> <harry.van.haaren at intel.com>; mattias.ronnblom at ericsson.com;
> liangma at liangbit.com; Mccarthy, Peter <peter.mccarthy at intel.com>
> Cc: dev at dpdk.org
> Subject: RE: [EXTERNAL] Re: [PATCH v2 1/3] eventdev: introduce event pre-
> scheduling
> 
> > > -----Original Message-----
> > > From: Pavan Nikhilesh Bhagavatula <pbhagavatula at marvell.com>
> > > Sent: Wednesday, September 25, 2024 6:30 AM
> > > To: Mattias Rönnblom <hofors at lysator.liu.se>; Pathak, Pravin
> > > <pravin.pathak at intel.com>; Jerin Jacob <jerinj at marvell.com>; Shijith
> > Thotton
> > > <sthotton at marvell.com>; Sevincer, Abdullah
> > <abdullah.sevincer at intel.com>;
> > > hemant.agrawal at nxp.com; sachin.saxena at oss.nxp.com; Van Haaren,
> Harry
> > > <harry.van.haaren at intel.com>; mattias.ronnblom at ericsson.com;
> > > liangma at liangbit.com; Mccarthy, Peter <peter.mccarthy at intel.com>
> > > Cc: dev at dpdk.org
> > > Subject: RE: [EXTERNAL] Re: [PATCH v2 1/3] eventdev: introduce event
> > > pre- scheduling
> > >
> > > > On 2024-09-19 15:13, Pavan Nikhilesh Bhagavatula wrote:
> > > > >>> From: pbhagavatula at marvell.com <pbhagavatula at marvell.com>
> > > > >>> Sent: Tuesday, September 17, 2024 3:11 AM
> > > > >>> To: jerinj at marvell.com; sthotton at marvell.com; Sevincer,
> > > > >>> Abdullah <abdullah.sevincer at intel.com>;
> > > > >>> hemant.agrawal at nxp.com; sachin.saxena at oss.nxp.com; Van
> Haaren,
> > > > >>> Harry
> > > > >> <harry.van.haaren at intel.com>;
> > > > >>> mattias.ronnblom at ericsson.com; liangma at liangbit.com; Mccarthy,
> > > > >>> Peter <peter.mccarthy at intel.com>
> > > > >>> Cc: dev at dpdk.org; Pavan Nikhilesh <pbhagavatula at marvell.com>
> > > > >>> Subject: [PATCH v2 1/3] eventdev: introduce event
> > > > >>> pre-scheduling
> > > > >>>
> > > > >>> From: Pavan Nikhilesh <pbhagavatula at marvell.com>
> > > > >>>
> > > > >>> Event pre-scheduling improves scheduling performance by
> > > > >>> assigning
> > > > events
> > > > >> to
> > > > >>> event ports in advance when dequeues are issued.
> > > > >>> The dequeue operation initiates the pre-schedule operation,
> > > > >>> which
> > > > >> completes in
> > > > >>> parallel without affecting the dequeued event flow contexts
> > > > >>> and dequeue latency.
> > > > >>>
> > > > >> Is the prescheduling done to get the event more quickly in the
> > > > >> next
> > > > dequeue?
> > > > >> The first dequeue executes pre-schedule to make events
> > > > >> available for the
> > > > next
> > > > >> dequeue.
> > > > >> Is this how it is supposed to work?
> > > > >>
> > > > >
> > > > > Yes, that is correct.
> > > > >
> > > >
> > > > "improves scheduling performance" may be a bit misleading, in that case.
> > > > I suggest "reduces scheduling overhead" instead. You can argue it
> > > > likely reduces scheduling performance, in certain scenarios.
> > > > "reduces scheduling overhead, at the cost of load balancing
> performance."
> > > >
> > >
> > > In case of OCTEON, we see double the scheduling performance with
> > > prescheduling without effecting any priority/weight aspects.
> > >
> > > > It seems to me that this should be a simple hint-type API, where
> > > > the hint is used by the event device to decide if pre-scheduling
> > > > should be used or not (assuming pre-scheduling on/off is even an
> > > > option). The hint would just be a way for the application to
> > > > express whether or not it want the scheduler to prioritize load
> > > > balancing agility and port-to-port wall-time latency, or
> > > > scheduling overhead, which in turn could potentially be rephrased
> > > > as the app being throughput or latency/RT-
> > > oriented.
> > > >
> > >
> > > The three prescheduling types are designed based on real world
> > > use-cases
> > that
> > > some of our customers require in their applications.
> > > Relying on application to provide hits might not be possible in all
> > > the cases as
> > it
> > > is very timing sensitive.
> > >
> > >
> > > > It could also be useful for the event device to know which
> > > > priority levels are to be considered latency-sensitive, and which
> > > > are throughput-oriented - maybe in the form of a threshold.
> > > >
> > > > >>> Event devices can indicate pre-scheduling capabilities using
> > > > >>> `RTE_EVENT_DEV_CAP_EVENT_PRESCHEDULE` and
> > > > >>> `RTE_EVENT_DEV_CAP_EVENT_PRESCHEDULE_ADAPTIVE` via the
> > event
> > > > >> device
> > > > >>> info function `info.event_dev_cap`.
> >
> > What is PRESCHEDULE_ADAPTIVE? Can you please add more description?
> 
> Unlike raw PRESCHEDULE where every dequeue triggers a pre-scheduling
> request in parallel, PRESCHEDULE_ADAPTIVE will delay issuing pre-scheduling
> till event device knows that the currently scheduled context can make forward
> progress.
> For example, in OCTEON HW does it when it sees that scheduling context held
> by the port is top/head of the flow.
So adaptive can be used by different HWs to handle preschedule dynamically based on 
Their implementation. The end aim is to balance fair load balancing
and throughput. Is this understanding correct? I am thinking of using these for DLB if
possible. 
> 
> > This will be more useful as per port configuration instead of
> > device-level configuration.
> 
> In the patch 2/3 I introduced port level API to control prefetches at an event
> port level.
So is device level configuration a default value for all Eventdev ports and it can be changed
At runtime using port level APIs? This will be nice.
> 
> It is not a port configuration API because Applications might want to
> enable/disable prefetching in fastpath. Example use cases we see is to disable
> pre-scheduling when application wants to preempt an lcore and reenable it at
> a later point of time.
> 
> 
> > The application can choose a type based on its requirement on the port
> > it is serving.
> > As Mattias suggested, if this is made HINT flag for port
> > configuration, other PMDs can Ignore it based on either they may not
> > need it depending on their architecture or not support it.
> >
> 
> If PMDs support preschedules then it has to advertise via capabilities, silently
> ignoring feature configuration is bad.
> 
> We can make the fastpath APIs as hints.
> 
> 
> 
> > > > >>>
> > > > >>> Applications can select the pre-schedule type and configure it
> > > > >>> through `rte_event_dev_config.preschedule_type` during
> > > > `rte_event_dev_configure`.
> > > > >>>
> > > > >>> The supported pre-schedule types are:
> > > > >>>   * `RTE_EVENT_DEV_PRESCHEDULE_NONE` - No pre-scheduling.
> > > > >>>   * `RTE_EVENT_DEV_PRESCHEDULE` - Always issue a pre-schedule
> > > > >>> on
> > > > >> dequeue.
> > > > >>>   * `RTE_EVENT_DEV_PRESCHEDULE_ADAPTIVE` - Delay issuing pre-
> > > > schedule
> > > > >>> until
> > > > >>>     there are no forward progress constraints with the held
> > > > >>> flow
> > contexts.
> > > > >>>
> > > > >>> Signed-off-by: Pavan Nikhilesh <pbhagavatula at marvell.com>
> > > > >>> ---
> > > > >>>   app/test/test_eventdev.c                    | 63 +++++++++++++++++++++
> > > > >>>   doc/guides/prog_guide/eventdev/eventdev.rst | 22 +++++++
> > > > >>>   lib/eventdev/rte_eventdev.h                 | 48 ++++++++++++++++
> > > > >>>   3 files changed, 133 insertions(+)
> > > > >>>
> > > > >>> diff --git a/app/test/test_eventdev.c
> > > > >>> b/app/test/test_eventdev.c index e4e234dc98..cf496ee88d 100644
> > > > >>> --- a/app/test/test_eventdev.c
> > > > >>> +++ b/app/test/test_eventdev.c
> > > > >>> @@ -1250,6 +1250,67 @@ test_eventdev_profile_switch(void)
> > > > >>>   	return TEST_SUCCESS;
> > > > >>>   }
> > > > >>>
> > > > >>> +static int
> > > > >>> +preschedule_test(rte_event_dev_preschedule_type_t
> > > > >>> +preschedule_type, const char *preschedule_name) {
> > > > >>> +#define NB_EVENTS     1024
> > > > >>> +	uint64_t start, total;
> > > > >>> +	struct rte_event ev;
> > > > >>> +	int rc, cnt;
> > > > >>> +
> > > > >>> +	ev.event_type = RTE_EVENT_TYPE_CPU;
> > > > >>> +	ev.queue_id = 0;
> > > > >>> +	ev.op = RTE_EVENT_OP_NEW;
> > > > >>> +	ev.u64 = 0xBADF00D0;
> > > > >>> +
> > > > >>> +	for (cnt = 0; cnt < NB_EVENTS; cnt++) {
> > > > >>> +		ev.flow_id = cnt;
> > > > >>> +		rc = rte_event_enqueue_burst(TEST_DEV_ID, 0, &ev,
> > 1);
> > > > >>> +		TEST_ASSERT(rc == 1, "Failed to enqueue event");
> > > > >>> +	}
> > > > >>> +
> > > > >>> +	RTE_SET_USED(preschedule_type);
> > > > >>> +	total = 0;
> > > > >>> +	while (cnt) {
> > > > >>> +		start = rte_rdtsc_precise();
> > > > >>> +		rc = rte_event_dequeue_burst(TEST_DEV_ID, 0, &ev,
> > 1, 0);
> > > > >>> +		if (rc) {
> > > > >>> +			total += rte_rdtsc_precise() - start;
> > > > >>> +			cnt--;
> > > > >>> +		}
> > > > >>> +	}
> > > > >>> +	printf("Preschedule type : %s, avg cycles %" PRIu64 "\n",
> > > > >>> preschedule_name,
> > > > >>> +	       total / NB_EVENTS);
> > > > >>> +
> > > > >>> +	return TEST_SUCCESS;
> > > > >>> +}
> > > > >>> +
> > > > >>> +static int
> > > > >>> +test_eventdev_preschedule_configure(void)
> > > > >>> +{
> > > > >>> +	struct rte_event_dev_config dev_conf;
> > > > >>> +	struct rte_event_dev_info info;
> > > > >>> +	int rc;
> > > > >>> +
> > > > >>> +	rte_event_dev_info_get(TEST_DEV_ID, &info);
> > > > >>> +
> > > > >>> +	if ((info.event_dev_cap &
> > > > >> RTE_EVENT_DEV_CAP_EVENT_PRESCHEDULE)
> > > > >>> == 0)
> > > > >>> +		return TEST_SKIPPED;
> > > > >>> +
> > > > >>> +	devconf_set_default_sane_values(&dev_conf, &info);
> > > > >>> +	dev_conf.preschedule_type =
> > RTE_EVENT_DEV_PRESCHEDULE;
> > > > >>> +	rc = rte_event_dev_configure(TEST_DEV_ID, &dev_conf);
> > > > >>> +	TEST_ASSERT_SUCCESS(rc, "Failed to configure eventdev");
> > > > >>> +
> > > > >>> +	rc = preschedule_test(RTE_EVENT_DEV_PRESCHEDULE_NONE,
> > > > >>> "RTE_EVENT_DEV_PRESCHEDULE_NONE");
> > > > >>> +	rc |= preschedule_test(RTE_EVENT_DEV_PRESCHEDULE,
> > > > >>> "RTE_EVENT_DEV_PRESCHEDULE");
> > > > >>> +	if (info.event_dev_cap &
> > > > >>> RTE_EVENT_DEV_CAP_EVENT_PRESCHEDULE_ADAPTIVE)
> > > > >>> +		rc |=
> > > > >>> preschedule_test(RTE_EVENT_DEV_PRESCHEDULE_ADAPTIVE,
> > > > >>> +
> > > > >>> "RTE_EVENT_DEV_PRESCHEDULE_ADAPTIVE");
> > > > >>> +
> > > > >>> +	return rc;
> > > > >>> +}
> > > > >>> +
> > > > >>>   static int
> > > > >>>   test_eventdev_close(void)
> > > > >>>   {
> > > > >>> @@ -1310,6 +1371,8 @@ static struct unit_test_suite
> > > > >>> eventdev_common_testsuite  = {
> > > > >>>   			test_eventdev_start_stop),
> > > > >>>   		TEST_CASE_ST(eventdev_configure_setup,
> > > > >>> eventdev_stop_device,
> > > > >>>   			test_eventdev_profile_switch),
> > > > >>> +		TEST_CASE_ST(eventdev_configure_setup, NULL,
> > > > >>> +			test_eventdev_preschedule_configure),
> > > > >>>   		TEST_CASE_ST(eventdev_setup_device,
> > > > >> eventdev_stop_device,
> > > > >>>   			test_eventdev_link),
> > > > >>>   		TEST_CASE_ST(eventdev_setup_device,
> > > > >> eventdev_stop_device,
> > > > >>> diff --git a/doc/guides/prog_guide/eventdev/eventdev.rst
> > > > >>> b/doc/guides/prog_guide/eventdev/eventdev.rst
> > > > >>> index fb6dfce102..341b9bb2c6 100644
> > > > >>> --- a/doc/guides/prog_guide/eventdev/eventdev.rst
> > > > >>> +++ b/doc/guides/prog_guide/eventdev/eventdev.rst
> > > > >>> @@ -357,6 +357,28 @@ Worker path:
> > > > >>>          // Process the event received.
> > > > >>>      }
> > > > >>>
> > > > >>> +Event Pre-scheduling
> > > > >>> +~~~~~~~~~~~~~~~~~~~~
> > > > >>> +
> > > > >>> +Event pre-scheduling improves scheduling performance by
> > > > >>> +assigning events to event ports in advance when dequeues are
> issued.
> > > > >>> +The `rte_event_dequeue_burst` operation initiates the
> > > > >>> +pre-schedule operation, which completes in parallel without
> > > > >>> +affecting the dequeued
> > > > >> event
> > > > >>> flow contexts and dequeue latency.
> > > > >>> +On the next dequeue operation, the pre-scheduled events are
> > > > >>> +dequeued and pre-schedule is initiated again.
> > > > >>> +
> > > > >>> +An application can use event pre-scheduling if the event
> > > > >>> +device supports it at either device level or at a individual port level.
> > > > >>> +The application can check pre-schedule capability by checking
> > > > >>> +if ``rte_event_dev_info.event_dev_cap``
> > > > >>> +has the bit ``RTE_EVENT_DEV_CAP_PRESCHEDULE`` set, if present
> > > > >>> +pre-scheduling can be enabled at device configuration time by
> > > > >>> +setting
> > > > >>> appropriate pre-schedule type in
> > ``rte_event_dev_config.preschedule``.
> > > > >>> +
> > > > >>> +Currently, the following pre-schedule types are supported:
> > > > >>> + * ``RTE_EVENT_DEV_PRESCHEDULE_NONE`` - No pre-scheduling.
> > > > >>> + * ``RTE_EVENT_DEV_PRESCHEDULE`` - Always issue a
> > > > >>> +pre-schedule when
> > > > >>> dequeue is issued.
> > > > >>> + * ``RTE_EVENT_DEV_PRESCHEDULE_ADAPTIVE`` - Issue
> > > > >>> + pre-schedule
> > > > when
> > > > >>> dequeue is issued and there are
> > > > >>> +   no forward progress constraints.
> > > > >>> +
> > > > >>>   Starting the EventDev
> > > > >>>   ~~~~~~~~~~~~~~~~~~~~~
> > > > >>>
> > > > >>> diff --git a/lib/eventdev/rte_eventdev.h
> > > > >>> b/lib/eventdev/rte_eventdev.h
> > > > >> index
> > > > >>> 08e5f9320b..5ea7f5a07b 100644
> > > > >>> --- a/lib/eventdev/rte_eventdev.h
> > > > >>> +++ b/lib/eventdev/rte_eventdev.h
> > > > >>> @@ -446,6 +446,30 @@ struct rte_event;
> > > > >>>    * @see RTE_SCHED_TYPE_PARALLEL
> > > > >>>    */
> > > > >>>
> > > > >>> +#define RTE_EVENT_DEV_CAP_EVENT_PRESCHEDULE (1ULL << 16)
> > > /**<
> > > > >> Event
> > > > >>> +device supports event pre-scheduling.
> > > > >>> + *
> > > > >>> + * When this capability is available, the application can
> > > > >>> +enable event pre-scheduling on the event
> > > > >>> + * device to pre-schedule events to a event port when
> > > > >>> +`rte_event_dequeue_burst()`
> > > > >>> + * is issued.
> > > > >>> + * The pre-schedule process starts with the
> > > > >>> +`rte_event_dequeue_burst()` call and the
> > > > >>> + * pre-scheduled events are returned on the next
> > > > >> `rte_event_dequeue_burst()`
> > > > >>> call.
> > > > >>> + *
> > > > >>> + * @see rte_event_dev_configure() */
> > > > >>> +
> > > > >>> +#define RTE_EVENT_DEV_CAP_EVENT_PRESCHEDULE_ADAPTIVE
> > (1ULL
> > > > <<
> > > > >> 17)
> > > > >>> /**<
> > > > >>> +Event device supports adaptive event pre-scheduling.
> > > > >>> + *
> > > > >>> + * When this capability is available, the application can
> > > > >>> +enable adaptive pre-scheduling
> > > > >>> + * on the event device where the events are pre-scheduled
> > > > >>> +when there are no forward
> > > > >>> + * progress constraints with the currently held flow contexts.
> > > > >>> + * The pre-schedule process starts with the
> > > > >>> +`rte_event_dequeue_burst()` call and the
> > > > >>> + * pre-scheduled events are returned on the next
> > > > >> `rte_event_dequeue_burst()`
> > > > >>> call.
> > > > >>> + *
> > > > >>> + * @see rte_event_dev_configure() */
> > > > >>> +
> > > > >>>   /* Event device priority levels */
> > > > >>>   #define RTE_EVENT_DEV_PRIORITY_HIGHEST   0
> > > > >>>   /**< Highest priority level for events and queues.
> > > > >>> @@ -680,6 +704,25 @@ rte_event_dev_attr_get(uint8_t dev_id,
> > > > uint32_t
> > > > >>> attr_id,
> > > > >>>    *  @see rte_event_dequeue_timeout_ticks(),
> > > > rte_event_dequeue_burst()
> > > > >>>    */
> > > > >>>
> > > > >>> +typedef enum {
> > > > >>> +	RTE_EVENT_DEV_PRESCHEDULE_NONE = 0,
> > > > >>> +	/* Disable pre-schedule across the event device or on a
> > > > >>> +given event
> > > > >> port.
> > > > >>> +	 * @ref rte_event_dev_config.preschedule_type
> > > > >>> +	 */
> > > > >>> +	RTE_EVENT_DEV_PRESCHEDULE,
> > > > >>> +	/* Enable pre-schedule always across the event device or a
> > given
> > > > >>> +event
> > > > >>> port.
> > > > >>> +	 * @ref rte_event_dev_config.preschedule_type
> > > > >>> +	 * @see RTE_EVENT_DEV_CAP_EVENT_PRESCHEDULE
> > > > >>> +	 */
> > > > >>> +	RTE_EVENT_DEV_PRESCHEDULE_ADAPTIVE,
> > > > >>> +	/* Enable adaptive pre-schedule across the event device or a
> > > > >>> +given
> > > > >> event
> > > > >>> port.
> > > > >>> +	 * Delay issuing pre-schedule until there are no forward
> > > > >>> +progress
> > > > >>> constraints with
> > > > >>> +	 * the held flow contexts.
> > > > >>> +	 * @ref rte_event_dev_config.preschedule_type
> > > > >>> +	 * @see
> > RTE_EVENT_DEV_CAP_EVENT_PRESCHEDULE_ADAPTIVE
> > > > >>> +	 */
> > > > >>> +} rte_event_dev_preschedule_type_t;
> > > > >>> +
> > > > >>>   /** Event device configuration structure */  struct
> > > rte_event_dev_config {
> > > > >>>   	uint32_t dequeue_timeout_ns; @@ -752,6 +795,11 @@
> struct
> > > > >>> rte_event_dev_config {
> > > > >>>   	 * optimized for single-link usage, this field is a hint
> > > > >>> for how
> > many
> > > > >>>   	 * to allocate; otherwise, regular event ports and queues
> > > > >>> will
> > be used.
> > > > >>>   	 */
> > > > >>> +	rte_event_dev_preschedule_type_t preschedule_type;
> > > > >>> +	/**< Event pre-schedule type to use across the event device,
> > > > >>> +if
> > > > >>> supported.
> > > > >>> +	 * @see RTE_EVENT_DEV_CAP_EVENT_PRESCHEDULE
> > > > >>> +	 * @see
> > RTE_EVENT_DEV_CAP_EVENT_PRESCHEDULE_ADAPTIVE
> > > > >>> +	 */
> > > > >>>   };
> > > > >>>
> > > > >>>   /**
> > > > >>> --
> > > > >>> 2.25.1
> > > > >
    
    
More information about the dev
mailing list