[PATCH v3] eventdev: avoid non-burst shortcut for variable-size bursts
Mattias Rönnblom
hofors at lysator.liu.se
Mon May 15 22:52:44 CEST 2023
On 2023-05-15 14:38, Jerin Jacob wrote:
> On Fri, May 12, 2023 at 6:45 PM Mattias Rönnblom
> <mattias.ronnblom at ericsson.com> wrote:
>>
>> On 2023-05-12 13:59, Jerin Jacob wrote:
>>> On Thu, May 11, 2023 at 2:00 PM Mattias Rönnblom
>>> <mattias.ronnblom at ericsson.com> wrote:
>>>>
>>>> Use non-burst event enqueue and dequeue calls from burst enqueue and
>>>> dequeue only when the burst size is compile-time constant (and equal
>>>> to one).
>>>>
>>>> Signed-off-by: Mattias Rönnblom <mattias.ronnblom at ericsson.com>
>>>>
>>>> ---
>>>>
>>>> v3: Actually include the change v2 claimed to contain.
>>>> v2: Wrap builtin call in __extension__, to avoid compiler warnings if
>>>> application is compiled with -pedantic. (Morten Brørup)
>>>> ---
>>>> lib/eventdev/rte_eventdev.h | 4 ++--
>>>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/lib/eventdev/rte_eventdev.h b/lib/eventdev/rte_eventdev.h
>>>> index a90e23ac8b..a471caeb6d 100644
>>>> --- a/lib/eventdev/rte_eventdev.h
>>>> +++ b/lib/eventdev/rte_eventdev.h
>>>> @@ -1944,7 +1944,7 @@ __rte_event_enqueue_burst(uint8_t dev_id, uint8_t port_id,
>>>> * Allow zero cost non burst mode routine invocation if application
>>>> * requests nb_events as const one
>>>> */
>>>> - if (nb_events == 1)
>>>> + if (__extension__(__builtin_constant_p(nb_events)) && nb_events == 1)
>>>
>>> "Why" part is not clear from the commit message. Is this to avoid
>>> nb_events read if it is built-in const.
>>
>> The __builtin_constant_p() is introduced to avoid having the compiler
>> generate a conditional branch and two different code paths in case
>> nb_elem is a run-time variable.
>>
>> In particular, this matters if nb_elems is run-time variable and varies
>> between 1 and some larger value.
>>
>> I should have mention this in the commit message.
>>
>> A very slight performance improvement. It also makes the code better
>> match the comment, imo. Zero cost for const one enqueues, but no impact
>> non-compile-time-constant-length enqueues.
>>
>> Feel free to ignore.
>
>
> I did some performance comparison of the patch.
> A low-end ARM machines shows 0.7% drop with single event case. No
> regression see with high-end ARM cores with single event case.
>
> IMO, optimizing the check for burst mode(the new patch) may not show
> any real improvement as the cost is divided by number of event.
> Whereas optimizing the check for single event case(The current code)
> shows better performance with single event case and no regression
> with burst mode as cost is divided by number of events.
I ran some tests on an AMD Zen 3 with DSW.
In the below tests the enqueue burst size is not compile-time constant.
Enqueue burst size Performance improvement
Run-time constant 1 ~5%
Run-time constant 2 ~0%
Run-time variable 1-2 ~9%
Run-time variable 1-16 ~0%
The run-time variable enqueue sizes randomly (uniformly) distributed in
the specified range.
The first result may come as a surprise. The benchmark is using
RTE_EVENT_OP_FORWARD type events (which likely is the dominating op type
in most apps). The single-event enqueue function only exists in a
generic variant (i.e., no rte_event_enqueue_forward_burst() equivalent).
I suspect that is the reason for the performance improvement.
This effect is large-enough to make it somewhat beneficial (+~1%) to use
run-time variable single-event enqueue compared to keeping the burst
size compile-time constant.
The performance gain is counted toward both enqueue and dequeue costs
(+benchmark app overhead), so an under-estimation if see this as an
enqueue performance improvement.
> If you agree, then we can skip this patch.
>
I have no strong opinion if this should be included or not.
It was up to me, I would drop the single-enqueue special case handling
altogether in the next ABI update.
>
>>
>>> If so, check should be following. Right?
>>>
>>> if (__extension__((__builtin_constant_p(nb_events)) && nb_events == 1)
>>> || nb_events == 1)
>>>
>>> At least, It was my original intention in the code.
>>>
>>>
>>>
>>>> return (fp_ops->enqueue)(port, ev);
>>>> else
>>>> return fn(port, ev, nb_events);
>>>> @@ -2200,7 +2200,7 @@ rte_event_dequeue_burst(uint8_t dev_id, uint8_t port_id, struct rte_event ev[],
>>>> * Allow zero cost non burst mode routine invocation if application
>>>> * requests nb_events as const one
>>>> */
>>>> - if (nb_events == 1)
>>>> + if (__extension__(__builtin_constant_p(nb_events)) && nb_events == 1)
>>>> return (fp_ops->dequeue)(port, ev, timeout_ticks);
>>>> else
>>>> return (fp_ops->dequeue_burst)(port, ev, nb_events,
>>>> --
>>>> 2.34.1
>>>>
>>
More information about the dev
mailing list