12/29/2025 at 7:49:29 PM
Even calling uninitialized data “garbage” is misleading. You might expect that the compiler would just leave out some initialization code and compile the remaining code in the expected way, causing the values to be “whatever was in memory previously”. But no - the compiler can (and absolutely will) optimize by assuming the values are whatever would be most convenient for optimization reasons, even if it would be vanishingly unlikely or even impossible.As an example, consider this code (godbolt: https://godbolt.org/z/TrMrYTKG9):
struct foo {
unsigned char a, b;
};
foo make(int x) {
foo result;
if (x) {
result.a = 13;
} else {
result.b = 37;
}
return result;
}
At high enough optimization levels, the function compiles to “mov eax, 9485; ret”, which sets both a=13 and b=37 without testing the condition at all - as if both branches of the test were executed. This is perfectly reasonable because the lack of initialization means the values could already have been set that way (even if unlikely), so the compiler just goes ahead and sets them that way. It’s faster!
by nneonneo
12/29/2025 at 11:50:59 PM
Indeed, UB is literally whatever the compiler feels like. A famous one [1] has the compiler deleting code that contains UB and falling through to the next function."But it's right there in the name!" Undefined behavior literally places no restrictions on the code generated or the behavior of the program. And the compiler is under no obligation to help you debug your (admittedly buggy) program. It can literally delete your program and replace it with something else that it likes.
[1] https://kristerw.blogspot.com/2017/09/why-undefined-behavior...
by titzer
12/29/2025 at 9:16:25 PM
There are some even funnier cases like this one: https://gcc.godbolt.org/z/cbscGf8ssThe compiler sees that foo can only be assigned in one place (that isn't called locally, but could called from other object files linked into the program) and its address never escapes. Since dereferencing a null pointer is UB, it can legally assume that `*foo` is always 42 and optimizes out the variable entirely.
by jmgao
12/29/2025 at 9:28:00 PM
To those who are just as confused as me:Compilers can do whatever they want when they see UB, and accessing an unassigned and unassiganble (file-local) variable is UB, therefore the compiler can just decide that *foo is in fact always 42, or never 42, or sometimes 42, and all would be just as valid options for the compiler.
(I know I'm just restating the parent comment, but I had to think it through several times before understanding it myself, even after reading that.)
by publicdebates
12/29/2025 at 11:09:20 PM
> Compilers can do whatever they want when they see UB, and accessing an unassigned and unassiganble (file-local) variable is UB, therefore the compiler can just decide that *foo is in fact always 42, or never 42, or sometimes 42, and all would be just as valid options for the compiler.That's not exactly correct. It's not that the compiler sees that there's UB and decides to do something arbitrary: it's that it sees that there's exactly one way for UB to not be triggered and so it's assuming that that's happening.
by jmgao
12/29/2025 at 10:02:50 PM
Although it should be noted that that’s not how compilers “reason”.The way they work things out is to assume no UB happens (because otherwise your program is invalid and you would not request compiling an invalid program would you) then work from there.
by masklinn
12/30/2025 at 2:04:29 AM
No who would write an incorrect program! :-dby actionfromafar
12/29/2025 at 8:13:28 PM
Even the notion that uninitialized memory contain values is kind of dangerous. Once you access them you can't reason about what's going to happen at all. Behaviour can happen that's not self-consistent with any value at all: https://godbolt.org/z/adsP4sxMTby recursivecaveat
12/29/2025 at 9:15:12 PM
Is that an old 'bot? because I noticed it was an old version of Clang, and I tried switching to the latest Clang which is hilarious: https://godbolt.org/z/fra6fWexMby masklinn
12/29/2025 at 9:54:57 PM
Oh yeah the classic Clang behaviour of “just stop codegen at UB”. If you look at the assembly, the main function just ends after the call to endl (right before where the if test should go); the program will run off the end of main and execute whatever nonsense is after it in memory as instructions. In this case I guess it calls main again (??) and then runs off into the woods and crashes.I’ve never understood this behaviour from clang. At least stick a trap at the end so the program aborts instead of just executing random instructions?
The x and y values are funny too, because clang doesn’t even bother loading anything into esi for operator<<(unsigned int), so you get whatever the previous call left behind in that register. This means there’s no x or y variable at all, even though they’re nominally being “printed out”.
by nneonneo
12/29/2025 at 9:58:35 PM
No I wrote it with the default choice of compiler just now. That newer result is truly crazy though lol.by recursivecaveat
12/29/2025 at 9:47:58 PM
icc's result is interesting tooby qbane
12/29/2025 at 9:22:48 PM
This is goldby afiori
12/30/2025 at 12:33:24 AM
If you don't initialise a variable, you're implicitly saying any value is fine, so this actually makes sense.by userbinator
12/30/2025 at 1:01:24 AM
The difference is that it can behave as if it had multiple different values at the same time. You don't just get any value, you can get completely absurd paradoxical Schrödinger values where `x > 5 && x < 5` may be true, and on the next line `x > 5` may be false, and it may flip on Wednesdays.This is because the code is executed symbolically during optimization. It's not running on your real CPU. It's first "run" on a simulation of an abstract machine from the C spec, which doesn't have registers or even real stack to hold an actual garbage value, but it does have magic memory where bits can be set to 0, 1, or this-can-never-ever-happen.
Optimization passes ask questions like "is x unused? (so I can skip saving its register)" or "is x always equal to y? (so I can stop storing it separately)" or "is this condition using x always true? (so that I can remove the else branch)". When using the value is an undefined behavior, there's no requirement for these answers to be consistent or even correct, so the optimizer rolls with whatever seems cheapest/easiest.
by pornel
12/30/2025 at 2:01:59 AM
"Your scientists were so preoccupied with whether they could, they didn't stop to think if they should."With Optimizing settings on, the compiler should immediately treat unused variables as errors by default.
by actionfromafar
12/30/2025 at 11:11:49 AM
So here are your options:1. Syntactically require initialization, ie you can't write "int k;" only "int k = 0;". This is easy to do and 100% effective, but for many algorithms this has a notable performance cost to comply.
2. Semantically require initialization, the compiler must prove at least one write happens before every read. Rice's Theorem says we cannot have this unless we're willing to accept that some correct programs don't compile because the compiler couldn't see why they're correct. Safe Rust lives here. Fewer but still some programmers will hate this too because you're still losing perf in some cases to shut up the prover.
3. Redefine "immediately" as "Well, it should report the error at runtime". This has an even larger performance overhead in many cases, and of course in some applications there is no meaningful "report the error at runtime".
Now, it so happens I think option (2) is almost always the right choice, but then I would say that. If you need performance then sometimes none of those options is enough, which is why unsafe Rust is allowed to call core::mem::MaybeUninit::assume_init an unsafe function which in many cases compiles to no instructions at all, but is the specific moment when you're taking responsibility for claiming this is initialized and if you're wrong about that too fucking bad.
by tialaramex
12/30/2025 at 11:42:06 AM
With optimizations, 1. and 2. can be kind of equivalent: if initialization is syntactically required (or variables are defined to be zero by default), then the compiler can elide this if it can prove that value is never read.by rcxdude
12/30/2025 at 11:46:59 AM
That, however, conflicts with unused write detection which can be quite useful (arguably more so than unused variable as it's both more general and more likely to catch issues). Though I guess you could always ignore a trivial initialisation for that purpose.by masklinn
12/30/2025 at 12:52:16 PM
There isn't just a performance cost to initializing at declaration all the time. If you don't have a meaningful sentinel value (does zero mean "uninitialized" or does it mean logical zero?) then reading from the "initialized with meaningless data just to silence the lint" data is still a bug. And this bug is now somewhat tricky to detect because the sanitizers can't detect it.by UncleMeat
12/30/2025 at 1:42:46 PM
Yes, that's an important consideration for languages like Rust or C++ which don't endorse mandatory defaults. It may even literally be impossible to "initialize with meaningless data" in these languages if the type doesn't have such "meaningless" values.In languages like Go or Odin where "zero is default" for every type and you can't even opt out, this same problem (which I'd say is a bigger but less instantly fatal version of the Billion Dollar Mistake) occurs everywhere, at every API edge, and even in documentation, you just have to suck it up.
Which reminds of in a sense another option - you can have the syntactic behaviour but write it as though you don't initialize at all even though you do, which is the behaviour C++ silently has for user defined types. If we define a Goose type (in C++ a "class"), which we stubbornly don't provide any way for our users to make themselves (e.g. we make the constructors private, or we explicitly delete the constructors), and then a user writes "Goose foo;" in their C++ program it won't compile because the compiler isn't allowed to leave this foo variable uninitialized - but it also can't just construct it, so, too bad, this isn't a valid C++ program.
by tialaramex
12/30/2025 at 12:49:39 PM
If you have a program that will unconditionally access uninitialized memory then the compiler can halt and emit a diagnostic. But that's rarely what is discussed in these UB conversations. Instead the compiler is encountering a program with multiple paths, some of which would encounter UB if taken. But the compiler cannot just refuse to compile this, since it is perfectly possible that the path is dead. Like, imagine this program: int foo(bool x, int* y) {
if (x) return *y;
return 0;
}
Dereferencing y would be UB. But maybe this function is called only with x=false when y is nullptr. This cannot be a compile error. So instead the compiler recognizes that certain program paths are illegal and uses that information during compilation.
by UncleMeat
12/30/2025 at 3:22:59 PM
Maybe we should make that an error.by actionfromafar
12/30/2025 at 6:05:27 PM
More modern languages have indeed embedded nullability into the type system and will yell at you if you dereference a nullable pointer without a check. This is good.Retrofitting this into C++ at the language level is impossible. At least without a huge change in priorities from the committee.
by UncleMeat
12/30/2025 at 6:39:34 PM
Maybe not the Standard, but maybe not impossible to retrofit into: -Werror -Wlet-me-stop-you-right-there
by actionfromafar
12/30/2025 at 4:01:41 AM
That's what Golang went for. There are order possibilities: D has `= void` initializer to explicitly leave variables uninitialized. Rust requires values to be initialized before use, and if the compiler can't prove they are, it's either an error or requires an explicit MaybeUninit type wrapper.by pornel
12/30/2025 at 1:54:22 AM
For some values of 'sense'.by like_any_other
12/30/2025 at 2:16:36 AM
That seems like a reasonable optimization, actually. If the programmer doesn’t initialize a variable, why not set it to a value that always works?Good example of why uninitialized variables are not intuitive.
by sethev
12/29/2025 at 9:08:50 PM
Things can get even wonkier if the compiler keeps the values in registers, as two consecutive loads could use different registers based as you say on what's the most convenient for optimisation (register allocation, code density).by masklinn
12/29/2025 at 8:59:17 PM
If I understand it right, in principle the compiler doesn't even need to do that.It can just leave the result totally uninitialised. That's because both code paths have undefined behaviour: whichever of result.x or result.y is not set is still copied at "return result" which is undefined behaviour, so the overall function has undefined behaviour either way.
It could even just replace the function body with abort(), or omit the implementation entirely (even the ret instruction, allowing execution to just fall through to whatever memory happens to follow). Whether any computer does that in practice is another matter.
by quietbritishjim
12/29/2025 at 9:04:54 PM
> It can just leave the result totally uninitialised. That's because both code paths have undefined behaviour: whichever of result.x or result.y is not set is still copied at "return result" which is undefined behaviour, so the overall function has undefined behaviour either way.That is incorrect, per the resolution of DR222 (partially initialized structures) at WG14:
> This DR asks the question of whether or not struct assignment is well defined when the source of the assignment is a struct, some of whose members have not been given a value. There was consensus that this should be well defined because of common usage, including the standard-specified structure struct tm.
As long as the caller doesn't read an uninitialised member, it's completely fine.
by masklinn
12/30/2025 at 11:19:41 AM
Ooh, thanks for mentioning DR222 that's very interesting.by tialaramex
12/29/2025 at 8:03:21 PM
How is this an "optimization" if the compiled result is incorrect? Why would you design a compiler that can produce errors?by arrowsmith
12/29/2025 at 8:07:37 PM
It’s not incorrect.The code says that if x is true then a=13 and if it is false than b=37.
This is the case. Its just that a=13 even if x is false. A thing that the code had nothing to say about, and so the compiler is free to do.
by Negitivefrags
12/29/2025 at 9:27:47 PM
Ok, so you’re saying it’s “technically correct?”Practically speaking, I’d argue that a compiler assuming uninitialized stack or heap memory is always equal to some arbitrary convenient constant is obviously incorrect, actively harmful, and benefits no one.
by foltik
12/29/2025 at 9:33:31 PM
In this example, the human author clearly intended mutual exclusivity in the condition branches, and this optimization would in fact destroy that assumption. That said, (a) human intentions are not evidence of foolproof programming logic, and often miscalculate state, and (b) the author could possibly catch most or all errors here when compiling without optimizations during debugging phase.by publicdebates
12/29/2025 at 9:54:54 PM
Regardless of intention, the code says this memory is uninitialized.I take issue with the compiler assuming anything about the contents of that memory; it should be a black box.
by foltik
12/29/2025 at 10:06:06 PM
The compiler is the arbiter of what’s what (as long as it does not run afoul the CPU itself).The memory being uninitialised means reading it is illegal for the writer of the program. The compiler can write to it if that suits it, the program can’t see the difference without UB.
In fact the compiler can also read from it, because it knows that it has in fact initialised that memory. And the compiler is not writing a C program and is thus not bound by the strictures of the C abstract machine anyway.
by masklinn
12/29/2025 at 10:32:00 PM
Yes yes, the spec says compilers are free to do whatever they want. That doesn’t mean they should.> The user didn’t initialize this integer. Let’s assume it’s always 4 since that helps us optimize this division over here into a shift…
This is convenient for who exactly? Why not just treat it as a black box memory load and not do further “optimizations”?
by foltik
12/29/2025 at 10:41:12 PM
> That doesn’t mean they should.Nobody’s stopping you from using non-optimising compilers, regardless of the strawmen you assert.
by masklinn
12/29/2025 at 11:20:23 PM
As if treating uninitialized reads as opaque somehow precludes all optimizations?There’s a million more sensible things that the compiler could do here besides the hilariously bad codegen you see in the grandparent and sibling comments.
All I’ve heard amounts to “but it’s allowed by the spec.” I’m not arguing against that. I’m saying a spec that incentivizes this nonsense is poorly designed.
by foltik
12/30/2025 at 3:10:50 AM
Why is the code gen bad? What result are you wanting? You specifically want whatever value happened to be on the stack as opposed to a value the compiler picked?by Negitivefrags
12/30/2025 at 9:53:06 AM
> As if treating uninitialized reads as opaque somehow precludes all optimizations?That's not what these words mean.
> There’s a million more sensible things
Again, if you don't like compilers leveraging UBs use a non-optimizing compiler.
> All I’ve heard amounts to “but it’s allowed by the spec.” I’m not arguing against that.
You literally are though. Your statements so far have all been variations of or nonsensical assertions around "why can't I read from uninitialised memory when the spec says I can't do that".
> I’m saying a spec that incentivizes this nonsense is poorly designed.
Then... don't use languages that are specified that way? It's really not that hard.
by masklinn
12/30/2025 at 4:05:08 PM
From the LLVM docs [0]:> Undef values aren't exactly constants ... they can appear to have different bit patterns at each use.
My claim is simple and narrow: compilers should internally model such values as unspecified, not actively choose convenient constants.
The comment I replied to cited an example where an undef is constant folded into the value required for a conditional to be true. Can you point to any case where that produces a real optimization benefit, as opposed to being a degenerate interaction between UB and value propagation passes?
And to be explicit: “if you don’t like it, don’t use it” is just refusing to engage, not a constructive response to this critique. These semantics aren't set in stone.
[0] https://llvm.org/doxygen/classllvm_1_1UndefValue.html#detail...
by foltik
12/30/2025 at 5:51:15 PM
> My claim is simple and narrow: compilers should internally model such values as unspecified, not actively choose convenient constants.An assertion you have provided no utility or justification for.
> The comment I replied to cited an example where an undef is constant folded into the value required for a conditional to be true.
The comment you replied to did in fact not do that and it’s incredible that you misread it such.
> Can you point to any case where that produces a real optimization benefit, as opposed to being a degenerate interaction between UB and value propagation passes?
The original snippet literally folds a branch and two stores into a single store, saving CPU resources and generating tighter code.
> this critique
Critique is not what you have engaged in at any point.
by masklinn
12/31/2025 at 5:04:27 AM
Sorry, my earlier comments were somewhat vague and assuming we were on the same page about a few things. Let me be concrete.The snippet is, after lowering:
if (x)
return { a = 13, b = undef }
else
return { a = undef, b = 37 }
LLVM represents this as a phi node of two aggregates: a = phi [13, then], [undef, else]
b = phi [undef, then], [37, else]
Since undef isn’t “unknown”, it’s “pick any value you like, per use”, InstCombine is allowed to instantiate each undef to whatever makes the expression simplest. This is the problem. a = 13
b = 37
The branch is eliminated, but only because LLVM assumes that those undefs will take specific arbitrary values chosen for convenience (fewer instructions).Yes, the spec permits this. But at that point the program has already violated the language contract by executing undefined behavior. The read is accidental by definition: the program makes no claim about the value. Treating that absence of meaning as permission to invent specific values is a semantic choice, and precisely what I am criticizing. This “optimization” is not a win unless you willfully ignore the program and everything but instruction count.
As for utility and justification: it’s all about user experience. A good language and compiler should preserve a clear mental model between what the programmer wrote and what runs. Silent non-local behavior changes (such as the one in the article) destroy that. Bugs should fail loudly and early, not be “optimized” away.
Imagine if the spec treated type mismatches the same way. Oops, assigned a float to an int, now it’s undef. Let’s just assume it’s always 42 since that lets us eliminate a branch. That’s obviously absurd, and this is the same category of mistake.
by foltik
12/29/2025 at 10:31:43 PM
Also even without UB, even for a naive translation, a could just happen to be 13 by chance, so the behaviour isn't even an example of nasal demons.by 1718627440
12/29/2025 at 9:20:05 PM
Because a could be 13 even if x is false because initialisation of the struct doesn’t have defined behavior of what the initial values of a and b need to be.Same for b. If x is true, b could be 37 no matter how unlikely that is.
by throwatdem12311
12/29/2025 at 10:37:42 PM
It is not incorrect. The values are undefined, so the compiler is free to do whatever it want to do with them, even assign values to them.by xboxnolifes
12/29/2025 at 8:07:12 PM
It's not incorrect. Where is the flaw?by tehjoker