5/22/2025 at 1:05:58 PM
The associated issue for comparing two u16s is interesting.by mmastrac
5/22/2025 at 7:46:49 PM
I'm surprised there's no mention of store forwarding in that discussion. The -O3 codegen is bonkers, but the -O2 output is reasonable. In the case where one of the structs has just been computed, attempting to load it as a single 32-bit load can result in a store forwarding failure that would negate the benefit of merging the loads. In a non-inlined, non-PGO scenario the compiler doesn't have enough information to tell whether the optimization is suitable.by ack_complete
5/23/2025 at 5:08:58 AM
> In the case where one of the structs has just been computed, attempting to load it as a single 32-bit load can result in a store forwarding failureIt actually depends on the uArch, Apple silicon doesn't seem to have this restriction: https://news.ycombinator.com/item?id=43888005
> In a non-inlined, non-PGO scenario the compiler doesn't have enough information to tell whether the optimization is suitable.
I guess you're talking about stores and load across function boundaries?
Trivia: X86 LLVM creates a whole Pass just to prevent this partial-store-to-load issue on Intel CPUs: https://github.com/llvm/llvm-project/blob/main/llvm/lib/Targ...
by mshockwave
5/23/2025 at 4:42:36 AM
> In the case where one of the structs has just been computed, attempting to load it as a single 32-bit load can result in a store forwarding failure that would negate the benefit of merging the loadsWould that failure be significantly worse than separate loading?
Just negating the optimization wouldn't be much reason against doing it. A single load is simpler and in the general case faster.
by Dylan16807
5/22/2025 at 1:34:46 PM
The thing I like most about this is that the discussion isn't just 14 pages of "I'm having this issue as well" and "Any updates on when this will be fixed?" As a web dev, GitHub issues kinda suck.by heybales
5/22/2025 at 3:44:06 PM
It was worse before emoji reactions were added and 90% of messages were literally just "+1"by eterm
5/22/2025 at 5:03:19 PM
+1by heybales
5/23/2025 at 3:43:05 PM
Wonder if it's a poor interface issue... if people could just click a button that says "me too" but didn't add a full comment but rather just added some minimal notation at the bottom of the comment that indicated their username, 1) would people use it and 2) would that be not overly-busy enough to not be annoying? It could even mute notifications for the me-toos.by NoMoreNicksLeft
5/22/2025 at 7:36:20 PM
This just seems to illustrate the complexity of compiler authorship. I am very sure c compilers are wble to address this issue any better in the general case.by rhdjsjebshjffn
5/22/2025 at 7:56:15 PM
Keep in mind Rust is using the same backend as one of the main C compilers, LLVM. So if it is handling it any better that means the Clang developers handle it before it even reaches the shared LLVM backend. Well, or there is something about the way Clang structures the code that catches a pattern in the backend the Rust developers do not know about.by runevault
5/23/2025 at 1:27:19 AM
I mean yea, i just view rust as the quality-oriented spear of western development.Rust is absolutely an improvement over C in every way.
by rhdjsjebshjffn
5/22/2025 at 7:59:10 PM
The rust issue has people trying this with c code and the compiler generates the same issue. This will get fixed and it’ll help c and Rust codeby vlovich123
5/23/2025 at 1:34:42 AM
Out of curiosity just clang or gcc as well?by runevault
5/23/2025 at 1:31:56 PM
I just tried it, and the problem is even worse in gcc.Given this C code:
typedef struct { uint16_t a, b; } pair;
int eq_copy(pair a, pair b) {
return a.a == b.a && a.b == b.b;
}
int eq_ref(pair *a, pair *b) {
return a->a == b->a && a->b == b->b;
}
Clang generates clean code for the eq_copy variant, but complex code for the eq_ref variant. Gcc emits pretty complex code in both variants.For example, here's eq_ref from gcc -O2:
eq_ref:
movzx edx, WORD PTR [rsi]
xor eax, eax
cmp WORD PTR [rdi], dx
je .L9
ret
.L9:
movzx eax, WORD PTR [rsi+2]
cmp WORD PTR [rdi+2], ax
sete al
movzx eax, al
ret
Have a play around: https://c.godbolt.org/z/79Eaa3jYf
by josephg