[dpdk-dev] rte_mbuf library likely()/unlikely()

Van Haaren, Harry harry.van.haaren at intel.com
Tue Jul 24 13:31:57 CEST 2018


> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Morten Brørup
> Sent: Tuesday, July 24, 2018 9:14 AM
> To: Olivier Matz <olivier.matz at 6wind.com>; Wiles, Keith
> <keith.wiles at intel.com>
> Cc: Honnappa Nagarahalli <Honnappa.Nagarahalli at arm.com>; dev at dpdk.org
> Subject: Re: [dpdk-dev] rte_mbuf library likely()/unlikely()
> 
> > -----Original Message-----
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Olivier Matz
> > Sent: Tuesday, July 24, 2018 9:29 AM
> >
> > Hi,
> >
> > On Mon, Jul 23, 2018 at 10:40:03PM +0000, Wiles, Keith wrote:
> > >
> > >
> > > > On Jul 23, 2018, at 2:09 PM, Morten Brørup
> > <mb at smartsharesystems.com> wrote:
> > > >
> > > > I haven't performance tested, but they are compiler branch
> > prediction hints pointing out the most likely execution path, so I
> > expect them to have a positive effect.
> > >
> > > We really need to make sure this provides any performance improvement
> > and that means it needs to be tested on a number of systems. Can you
> > please do some performance testing or see if we can get the guys doing
> > DPDK performance testing to first give this a try? This area is very
> > sensitive to tweaking.
> >
> > I agree we should be driven by performance improvements.
> 
> Which is why I suggested these changes. Theoretically, they will provide a
> performance improvement. The most likely execution path is obvious from code
> review.


> > I remember a discussion with Bruce on the ML saying that hardware
> > branch predictors generally do a good job.
> 
> They do, and it is very well documented. E.g. here's a really interesting
> historical review about branch predictors:
> https://danluu.com/branch-prediction/
> 
> However, just because hardware branch predictors are pretty good, I don't
> think we should remove or stop adding likely()/unlikely() and other branch
> prediction hints. The hints still add value, both for execution speed and
> for source code readability.


Although "likely" and "unlikely" sound like they predict branch-taken
or branch-not-taken, in practice the UNLIKELY() macro expands to __builtin_expect().

The compiler uses __builtin_expect() to understand what code is expected to
run most of the time. In *practice* what this means is that the code that is
not expected to run gets moved "far away" in the binary itself. The compiler
can layout code to be better with the "static branch prediction" methods (see below).

Think of UNLIKELY() not as helping a runtime branch predictor, it doesn't.
Think of UNLIKELY() as a method to move error-handling instructions to cold
instruction-cache-lines.

To put it another way, using UNLIKELY() puts the unlikely branch code far away,
and there's a chance we're going to pay an i-cache miss for that.

As a summary:
UNLIKELY() for *error handling* is good, it moves cold instructions out of hot i-cache
UNLIKELY() on anything that is valid to happen (e.g. ring dequeue == 0) is not advised,
as we could cause i-cache misses on the data path. 


> Please also refer to the other link I provided about GCC branches. It
> basically says that GCC treats an If-sentence like this:
> 
> If (Condition) Then
> 	Expect to execute this
> Else
> 	Do not expect to execute this
> 
> So if we don't want unlikely() around an if-condition which probably
> evaluates to false, we should rewrite the execution order accordingly.

In the case of error handling, this is correct - and the compiler will handle it.
If both taken and not-taken occur at runtime, let the runtime branch predictor handle it.


> Although hardware branch predictors help a lot most of the time, the
> likely()/unlikely() still helps the first time the CPU executes the branch
> instruction.

There is a static branch prediction scheme that predicts branches that are
executing for the first time. It is detailed in the Software Optimization Manual,
Section 3.4 "Optimizing the front end", Page 103 "Static Branch Prediction Algorithm":

https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-optimization-manual.pdf

UNLIKELY() tells GCC which layout to use - so it does help somewhat, but the
price of a single i-cache miss is much higher than multiple branch-misses...


> Furthermore, I'm very well aware of the rule of thumb for adding
> likely()/unlikely(): Don't add one if it isn't correct almost every time the
> branch is considered.

I would word this stronger: if both taken and not-taken are *valid*, don't use UNLIKELY().
Use UNLIKELY() for error handling cases only.


> How much more compiler branch prediction hints adds to hardware compiler
> branch prediction is a somewhat academic discussion. But Honnappa and Keith
> are right: Performance improvements should be performance tested.

+1 that ultimately it comes down to measured numbers, on a variety of platforms..

..but micro-benchmarks are not enough either. What if the whole micro-bench fits into
i-cache, it looks like there is no difference in perf. Then apply these UNLIKELY()
changes to a real world app, and the i-cache misses may become much more expensive.


> Unfortunately, I don't have the equipment or resources to perform a usable
> performance test, so I submitted the changes to the mailing list for code
> review instead. And I'm certainly getting code reviewed now. :-)
> 
> From a code review perspective, someone else than me might observe that the
> exception handling execution path is "missing" the unlikely() hint, so I
> would argue that code readability is an argument for adding it - unless
> performance testing shows a slowdown.
> 
> 
> Med venlig hilsen / kind regards
> - Morten Brørup

Unless you see a significant branch-miss bottleneck somewhere in the code,
my opinion is that we are trying to micro-optimize code, when there are probably
larger problems to solve.


More information about the dev mailing list