alt.hn

3/4/2026 at 6:33:38 PM

Faster C software with Dynamic Feature Detection

https://gist.github.com/jjl/d998164191af59a594500687a679b98d

by todsacerdoti

3/4/2026 at 7:56:08 PM

For function-multiversioning, the intrinsic headers in both gcc and clang have logic to take care of selecting targets. You also don't need to do dispatch manually when writing manual optimizations--the same function name with different targets is supported and dispatches automatically.

by BearOso

3/5/2026 at 4:31:11 AM

Is it actually better/faster though? To see the difference between -O and -O2/3, compile some code for an x64 target on Godbolt and look at the output. -O produces optimised x86 code. -O2/3 produces enormous amounts of incomprehensible SSE/AVX/whatever code for even the simplest stuff, leading to a huge blowout in code size that can potentially interact badly with cacheing.

We had a look at this in embedded where you don't have infinite memory to play with and at the moment it's OK because there's no advanced instructions available to use, but it'll get ugly in the future when gcc realises it can use new instructions and produce five times the amount of object code for the same source code.

by pseudohadamard