[dpdk-dev] rte_mbuf library likely()/unlikely()

Wiles, Keith keith.wiles at intel.com
Tue Jul 24 15:02:41 CEST 2018
Previous message: [dpdk-dev] rte_mbuf library likely()/unlikely()
Next message: [dpdk-dev] [PATCH v5 0/3]crypto/openssl: support asymmetric crypto
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

> On Jul 24, 2018, at 6:31 AM, Van Haaren, Harry <harry.van.haaren at intel.com> wrote:
> 
>> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Morten Brørup
>> Sent: Tuesday, July 24, 2018 9:14 AM
>> To: Olivier Matz <olivier.matz at 6wind.com>; Wiles, Keith
>> <keith.wiles at intel.com>
>> Cc: Honnappa Nagarahalli <Honnappa.Nagarahalli at arm.com>; dev at dpdk.org
>> Subject: Re: [dpdk-dev] rte_mbuf library likely()/unlikely()
>> 
>>> -----Original Message-----
>>> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Olivier Matz
>>> Sent: Tuesday, July 24, 2018 9:29 AM
>>> 
>>> Hi,
>>> 
>>> On Mon, Jul 23, 2018 at 10:40:03PM +0000, Wiles, Keith wrote:
>>>> 
>>>> 
>>>>> On Jul 23, 2018, at 2:09 PM, Morten Brørup
>>> <mb at smartsharesystems.com> wrote:
>>>>> 
>>>>> I haven't performance tested, but they are compiler branch
>>> prediction hints pointing out the most likely execution path, so I
>>> expect them to have a positive effect.
>>>> 
>>>> We really need to make sure this provides any performance improvement
>>> and that means it needs to be tested on a number of systems. Can you
>>> please do some performance testing or see if we can get the guys doing
>>> DPDK performance testing to first give this a try? This area is very
>>> sensitive to tweaking.
>>> 
>>> I agree we should be driven by performance improvements.
>> 
>> Which is why I suggested these changes. Theoretically, they will provide a
>> performance improvement. The most likely execution path is obvious from code
>> review.
> 
> 
>>> I remember a discussion with Bruce on the ML saying that hardware
>>> branch predictors generally do a good job.
>> 
>> They do, and it is very well documented. E.g. here's a really interesting
>> historical review about branch predictors:
>> https://danluu.com/branch-prediction/
>> 
>> However, just because hardware branch predictors are pretty good, I don't
>> think we should remove or stop adding likely()/unlikely() and other branch
>> prediction hints. The hints still add value, both for execution speed and
>> for source code readability.
> 
> 
> Although "likely" and "unlikely" sound like they predict branch-taken
> or branch-not-taken, in practice the UNLIKELY() macro expands to __builtin_expect().
> 
> The compiler uses __builtin_expect() to understand what code is expected to
> run most of the time. In *practice* what this means is that the code that is
> not expected to run gets moved "far away" in the binary itself. The compiler
> can layout code to be better with the "static branch prediction" methods (see below).
> 
> Think of UNLIKELY() not as helping a runtime branch predictor, it doesn't.
> Think of UNLIKELY() as a method to move error-handling instructions to cold
> instruction-cache-lines.
> 
> To put it another way, using UNLIKELY() puts the unlikely branch code far away,
> and there's a chance we're going to pay an i-cache miss for that.
> 
> As a summary:
> UNLIKELY() for *error handling* is good, it moves cold instructions out of hot i-cache
> UNLIKELY() on anything that is valid to happen (e.g. ring dequeue == 0) is not advised,
> as we could cause i-cache misses on the data path. 
> 
> 
>> Please also refer to the other link I provided about GCC branches. It
>> basically says that GCC treats an If-sentence like this:
>> 
>> If (Condition) Then
>> 	Expect to execute this
>> Else
>> 	Do not expect to execute this
>> 
>> So if we don't want unlikely() around an if-condition which probably
>> evaluates to false, we should rewrite the execution order accordingly.
> 
> In the case of error handling, this is correct - and the compiler will handle it.
> If both taken and not-taken occur at runtime, let the runtime branch predictor handle it.
> 
> 
>> Although hardware branch predictors help a lot most of the time, the
>> likely()/unlikely() still helps the first time the CPU executes the branch
>> instruction.
> 
> There is a static branch prediction scheme that predicts branches that are
> executing for the first time. It is detailed in the Software Optimization Manual,
> Section 3.4 "Optimizing the front end", Page 103 "Static Branch Prediction Algorithm":
> 
> https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-optimization-manual.pdf
> 
> UNLIKELY() tells GCC which layout to use - so it does help somewhat, but the
> price of a single i-cache miss is much higher than multiple branch-misses...
> 
> 
>> Furthermore, I'm very well aware of the rule of thumb for adding
>> likely()/unlikely(): Don't add one if it isn't correct almost every time the
>> branch is considered.
> 
> I would word this stronger: if both taken and not-taken are *valid*, don't use UNLIKELY().
> Use UNLIKELY() for error handling cases only.
> 
> 
>> How much more compiler branch prediction hints adds to hardware compiler
>> branch prediction is a somewhat academic discussion. But Honnappa and Keith
>> are right: Performance improvements should be performance tested.
> 
> +1 that ultimately it comes down to measured numbers, on a variety of platforms..
> 
> ..but micro-benchmarks are not enough either. What if the whole micro-bench fits into
> i-cache, it looks like there is no difference in perf. Then apply these UNLIKELY()
> changes to a real world app, and the i-cache misses may become much more expensive.
> 
> 
>> Unfortunately, I don't have the equipment or resources to perform a usable
>> performance test, so I submitted the changes to the mailing list for code
>> review instead. And I'm certainly getting code reviewed now. :-)
>> 
>> From a code review perspective, someone else than me might observe that the
>> exception handling execution path is "missing" the unlikely() hint, so I
>> would argue that code readability is an argument for adding it - unless
>> performance testing shows a slowdown.
>> 
>> 
>> Med venlig hilsen / kind regards
>> - Morten Brørup
> 
> Unless you see a significant branch-miss bottleneck somewhere in the code,
> my opinion is that we are trying to micro-optimize code, when there are probably
> larger problems to solve.

+1

Regards,
Keith
Previous message: [dpdk-dev] rte_mbuf library likely()/unlikely()
Next message: [dpdk-dev] [PATCH v5 0/3]crypto/openssl: support asymmetric crypto
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the dev mailing list