The effect of inlining

Mattias Rönnblom hofors at lysator.liu.se
Mon Apr 1 17:20:07 CEST 2024


On 2024-03-29 14:42, Morten Brørup wrote:
> +CC techboard
> 
>> From: Maxime Coquelin [mailto:maxime.coquelin at redhat.com]
>> Sent: Friday, 29 March 2024 14.05
>>
>> Hi Stephen,
>>
>> On 3/29/24 03:53, Stephen Hemminger wrote:
>>> On Thu, 28 Mar 2024 17:10:42 -0700
>>> Andrey Ignatov <rdna at apple.com> wrote:
>>>
>>>>>
>>>>> You don't need always inline, the compiler will do it anyway.
>>>>
>>>> I can remove it in v2, but it's not completely obvious to me how is
>> it
>>>> decided when to specify it explicitly and when not?
>>>>
>>>> I see plenty of __rte_always_inline in this file:
>>>>
>>>> % git grep -c '^static __rte_always_inline' lib/vhost/virtio_net.c
>>>> lib/vhost/virtio_net.c:66
>>>
>>>
>>> Cargo cult really.
>>>
>>
>> Cargo cult... really?
>>
>> Well, I just did a quick test by comparing IO forwarding with testpmd
>> between main branch and with adding a patch that removes all the
>> inline/noinline in lib/vhost/virtio_net.c [0].
>>
>> main branch: 14.63Mpps
>> main branch - inline/noinline: 10.24Mpps
> 
> Thank you for testing this, Maxime. Very interesting!
> 
> It is sometimes suggested on techboard meetings that we should convert more inline functions to non-inline for improved API/ABI stability, with the argument that the performance of inlining is negligible.
> 

I think you are mixing two different (but related) things here.
1) marking functions with the inline family of keywords/attributes
2) keeping function definitions in header files

1) does not affect the ABI, while 2) does. Neither 1) nor 2) affects the 
API (i.e., source-level compatibility).

2) *allows* for function inlining even in non-LTO builds, but doesn't 
force it.

If you don't believe 2) makes a difference performance-wise, it follows 
that you also don't believe LTO makes much of a difference. Both have 
the same effect: allowing the compiler to reason over a larger chunk of 
your program.

Allowing the compiler to inline small, often-called functions is crucial 
for performance, in my experience. If the target symbol tend to be in a 
shared object, the difference is even larger. It's also quite common 
that you see no effect of LTO (other than a reduction of code footprint).

As LTO becomes more practical to use, 2) loses much of its appeal.

If PGO ever becomes practical to use, maybe 1) will as well.

> I think this test proves that the sum of many small (negligible) performance differences it not negligible!
> 
>>
>> Andrey, thanks for the patch, I'll have a look at it next week.
>>
>> Maxime
>>
>> [0]: https://pastebin.com/72P2npZ0
> 


More information about the dev mailing list