Very inefficient code generation: FUNC_ALWAYS_INLINE pragma ignored and overhead from unexpected conditional code

In the artificial testcase below, the compiler (cl6x 7.4.2 with options -mv6600 -os -k -o3) refuses to inline ffswap() into test_ffswap(), even with a FUNC_ALWAYS_INLINE pragma.

#pragma FUNC_ALWAYS_INLINE(ffswap);

static inline DFF ffswap(DFF dff) {

float tmp = dff.x; dff.x = dff.y;; dff.y = tmp; return dff; }

DFF test_ffswap(DFF dff) { return ffswap(dff); }

The result is that test_ffswap() takes 30 cpu cycles (19 within test_ffswap() and then another 21 within ffswap() which is 5 times longer than it should take. I.e., with inlining (and proper optimization), test_ffswap() should be reduced to 3 moves (to do the swap), return instruction and, perhaps a nop for a total of 6 cycles.

(Please visit the site to view this file)

There are two sources of the inefficiency. First, the function is not being inlined. Second, the function ffswap() contains some conditional code that is being generated by the compiler performing some additional loads and stores that significantly increases the compile-time for the function.

1) Why isn't the function ffswap() being inlined? How do I get the function to be inlined?

2) What is the conditional code in ffswap() doing? It's obviously coming from the backend of the compiler because it does not show up in the optimizer comments. Is this alignment related, something else? How can I avoid this code?

I noticed that these two problems show up together. When I come across a function that cannot be inlined it usually contains this conditional code, so I assume that the problems are related.

I can force inlining by passing the structure in pieces (see below), but this is very ugly and then the unexpected conditional code is moved to test_ffswap(). The resulting function test_ffswap() is now twice as fast as the original code (15 cycles compared to 30 cycles) but it still takes 2.5 times longer than it should. If I write the ffswap() function in assembly, then it cannot be inlined, so this does not solve the problem.

(Please visit the site to view this file)

For this case, I could avoid the problem by converting back and forth to the x128_t type for each call, but I need a solution that will work for structures that are not exactly 128 bytes as well.

Very inefficient code generation: FUNC_ALWAYS_INLINE pragma ignored and overhead from unexpected conditional code

Trending Articles

Scuffham Amps - S-GEAR 2.6.0 VST, AAX, STANDALONE x86 x64 (R2R NO iLok2, +NO...

Practice Sheet of Right form of verbs for HSC Students

VHSE First (1st) Allotment 2025 - vhscap.kerala.gov.in

UNIVERSE LEAGUE – UNIVERSE LEAGUE – WAR (We Are Ready) – EP [iTunes Plus M4A]

City Hunter Teledrama – Episode 18 – 07th May 2016

Comment on Proposed Criteria for Identifying Predatory Conferences by Luke...

Bureau of Internal Revenue: Regional Offices (Directory)

Kendrick Lamar – Not Like Us (2024) [24Bit-88.2kHz] [PMEDIA] ⭐️

Inception 2010 Hindi Dual Audio 650MB BRRip 720p ESubs HEVC

East Hull MD admits sexual assaults after another victim comes forward

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

R. v. Sargeant, 2023 ONSC 6406 (CanLII)

Rajasthan Board 10th Result 2016 Roll No wise & Name Wise

Who’s been sentenced at Northampton Magistrates’ Court

मतलबी दोस्त स्टेट्स | Matlabi Dost Status in Hindi – Selfish Friends Status

Family cries out as traditional ruler allegedly abducts brother, extorts N2.5m

Long-Running Conflict In Springfield (MA) Gangland Sphere Has Manzi Family &...

Wondershare Filmora X v10.1.20.16 x64

Man arrested after fracas in flat

Man charged in ongoing Sexual Assault Investigation Derek Nyilas, 46, Faces...