5/20/2026 at 8:33:20 AM
Yes there is tons of surprising and weird UB in C, but this article doesn't do a great job of showcasing it. It barely scratches the surface.Here's a way weirder example:
volatile int x = 5;
printf("%d in hex is 0x%x.\n", x, x);
This is totally fine if x is just an int, but the volatile makes it UB. Why? 5.1.2.4.1 says any volatile access - including just reading it - is a side effect. 6.5.1.2 says that unsequenced side effects on the same scalar object (in this case, x) are UB. 6.5.3.3.8 tells us that the evaluations of function arguments are indeterminately sequenced w.r.t. each other.So in common parlance, a "data race" is any concurrent accesses to the same object from different threads, at least one of which is a write. In C, we can have a data race on a single thread and without any writes!
by muvlon
5/20/2026 at 10:56:21 AM
Author here.> It barely scratches the surface.
I agree. The point of the post is not to enumerate and explain the implications of all 283 uses of the word "undefined" in the standard. Nor enumerate all the things that are undefined by omission.
The point of the post is to say it's not possible to avoid them. Or at least, no human since the invention of C in 1972 has.
And if it's not succeeded for 54 years, "try harder", or "just never make a mistake", is at least not the solution.
The (one!) exploitable flaw found by Mythos in OpenBSD was an impressive endorsement of the OpenBSD developers, and yet as the post says, I pointed it at the simplest of their code and found a heap of UB.
Now, is it exploitable that `find` also reads the uninitialized auto variable `status` (UB) from a `waitpid(&status)` before checking if `waitpid()` returned error? (not reported) I can't imagine an architecture or compiler where it would be, no.
FTA:
> The following is not an attempt at enumerating all the UB in the world. It’s merely making the case that UB is everywhere, and if nobody can do it right, how is it even fair to blame the programmer? My point is that ALL nontrivial C and C++ code has UB.
by thomashabets2
5/20/2026 at 5:30:45 PM
> Now, is it exploitable that `find` also reads the uninitialized auto variable `status` (UB) from a `waitpid(&status)` before checking if `waitpid()` returned error? (not reported) I can't imagine an architecture or compiler where it would be, no.I presume you're referring to this code:
pid = waitpid(pid, &status, 0);
if (WIFEXITED(status))
rval = WEXITSTATUS(status);
else
rval = -1;
The only signal handler find installs is for SIGINFO, and it uses the SA_RESTART flag, so EINTR can be ruled out. The pid argument is definitely valid as you can't reach the above if it wasn't, and there's no other way for the child process to be reaped[1], so no ECHILD.A check should probably be added in case the situation changes in the future, triggering spooky action at a distance, or were that code to be copy+pasted somewhere where the invariants didn't hold. But I think the current code in its current context is, strictly speaking, correct as-is.
[1] OpenBSD lacks the kernel features for such surprises that might theoretically be possible on Linux.
by wahern
5/20/2026 at 9:48:46 PM
Indeed. That's why I didn't deem it worth reporting.But in my code, I would have fixed for the reasons you mention. Sprinkle enough of these around, and some low percentage will in the future have its assumption invalidated.
by thomashabets2
5/21/2026 at 2:20:42 AM
Couldn’t waitpid return EINTR if the (parent) process were stopped and then continued?EINTR scares the crap out of me because nobody expects it!
by BobbyTables2
5/21/2026 at 9:13:52 AM
No. You only get EINTR when a signal handler fires and you didn't use the SA_RESTART flag with sigaction. If you don't install any signal handlers, or you use SA_RESTART on all handlers, or you've blocked/masked all signals (or at least the ones with handlers), you won't get EINTR.When writing library code, it's important to consider EINTR because you can't know about signal dispositions. Though, the common practice of looping on EINTR kind of defeats the purpose.
by wahern
5/20/2026 at 11:07:30 AM
Fair enough!> And if it's not succeeded for 54 years, "try harder", or "just never make a mistake", is at least not the solution.
And I 100% agree. UB is way overused by these standards for how dangerous it is, and as a consequence using C (and C++) for anything nontrivial amounts to navigating a minefield.
by muvlon
5/20/2026 at 4:02:43 PM
I think as compilers got smarter, UB changed somewhat in meaning. Originally the compilers didn't perform such complex analysis, and while invoking UB could break your program, it would still do something reasonable.by webstrand
5/20/2026 at 4:42:02 PM
Yes, but compilers got smart enough for it to be a problem around 30 years ago, and we are still arguing about what to do.by marcosdumay
5/20/2026 at 5:34:37 PM
You see a reasoning here, basically when all those C compiler benchmarks started, vendors moved from what Frank Allen described, to anything goes to win SPEC something benchmarks."Oh, it was quite a while ago. I kind of stopped when C came out. That was a big blow. We were making so much good progress on optimizations and transformations. We were getting rid of just one nice problem after another. When C came out, at one of the SIGPLAN compiler conferences, there was a debate between Steve Johnson from Bell Labs, who was supporting C, and one of our people, Bill Harrison, who was working on a project that I had at that time supporting automatic optimization...The nubbin of the debate was Steve's defense of not having to build optimizers anymore because the programmer would take care of it. That it was really a programmer's issue.... Seibel: Do you think C is a reasonable language if they had restricted its use to operating-system kernels? Allen: Oh, yeah. That would have been fine. And, in fact, you need to have something like that, something where experts can really fine-tune without big bottlenecks because those are key problems to solve. By 1960, we had a long list of amazing languages: Lisp, APL, Fortran, COBOL, Algol 60. These are higher-level than C. We have seriously regressed, since C developed. C has destroyed our ability to advance the state of the art in automatic optimization, automatic parallelization, automatic mapping of a high-level language to the machine. This is one of the reasons compilers are ... basically not taught much anymore in the colleges and universities."
-- Fran Allen interview, Excerpted from: Peter Seibel. Coders at Work: Reflections on the Craft of Programming
by pjmlp
5/20/2026 at 11:16:22 AM
What should the behavior above be defined to do?by saagarjha
5/20/2026 at 12:31:40 PM
“Implementation defined behaviour”: compiler author chooses, and documents the choice.A lot of UB should be implementation defined behaviour instead; this would much better match programmers’ intuitions as they reason about their code - you can even see it in the comments of this post: it’s always things like “this hardware supports / doesn’t support unaligned accesses”, it’s never nasal demons.
by tsukikage
5/20/2026 at 3:02:54 PM
I told someone at a conference that UB actually means "implementation-defined, no documentation required". He started to refute me and then stopped.by tardedmeme
5/20/2026 at 5:05:29 PM
That isn't true, for UB the compiler is allowed to assume the UB can never happen. For example if you dereference a pointer and only after check if it is NULL, the compiler can remove the NULL check, since it is clearly impossible (nevermind that you might be on a microcontroller where NULL is a valid address).The fallout of this are quite large! If behaviour is implementation defined the compiler has to stick to one consistent behaviour. No such need for UB, you can get different behaviour bu changing unrelated code, by changing between debug and release or just because of what garbage happened to be on the stack.
Since the compiler is allowed to assume the UB doesn't happen it will also sometimes look like the compiler miscompiled your code elsewhere, but what actually happened was some inlining followed by extrapolating "this can never happen".
UB is often surprising: I have seen unaligned loads crash on x86 due to it bring UB in C (even though x86 is generally fine with it). But once a newer compiler decided that it was fine to vectorise that code (since it clearly aligned) the CPU was no longer happy with it.
by VorpalWay
5/20/2026 at 9:52:56 PM
I think parent commenter made a joke. UB can be seen as "implementation defines this to reformat your hard drive. No we don't document it".That is, the compiler de facto defines what happens when you compile UB code.
So you're not wrong, but I think you missed the sarcastic spin of parent.
by thomashabets2
5/21/2026 at 4:38:48 AM
>That is, the compiler de facto defines what happens when you compile UB code.That is not what undefined behavior is though, that is unspecified behavior.
The entire point of undefined behavior is to cover the cases where the compiler can't define the semantics of your program either because doing so is genuinely not possible, or is incredibly onerous to deduce, or would require introducing runtime checks whose performance cost is at odds with C and C++'s predominant use cases.
by Maxatar
5/21/2026 at 7:31:06 AM
Sorry, by "de facto defines" I meant that it factually does something, even if that "something" is "segfault the compiler at build time".That "de facto" did some heavy lifting.
by thomashabets2
5/20/2026 at 4:43:24 PM
Except that UB doesn't mean that. UB means "the developer must never write this".by marcosdumay
5/20/2026 at 5:25:18 PM
Both are wrong. It means "this standard does not constrain the behaviour of code that does this".It's entirely legal for implementations to have predictable behaviour, documented or not, for code that is undefined by the standard. In their quest for maxxing benchmark performance they generally choose not to, but there's really nothing in any standard that stops you from making an implementation that prioritises safety.
by munch117
5/20/2026 at 5:51:43 PM
Every implementation so far has predictable behavior in all cases. Sometimes the rules for predicting it are very obscure. But it's all fully defined within the compiler's binary code. And none of them link to nasal portals.by tardedmeme
5/20/2026 at 6:25:20 PM
How do you propose to predict the behavior of a true race condition with only the binary, faithfully translated by the compiler?Moreover, this is at best an incredibly pedantic point, not something that changes how programmers need to approach UB. You can't review the source code of a compiler that hasn't been written yet.
by AlotOfReading
5/21/2026 at 6:38:33 AM
I didn't suggest that implementations should entirely eliminate every form of UB. There is plenty of middle ground. For example, you could easily limit the consequences of integer overflow by specifying or partially specifying overflow behaviour, with very little runtime cost.I'm not suggesting you change how you write code, but with a better implementation the code that you do write - that lives in the real world where mistakes are made - might work better. How is that being pedantic?
An interesting case where compiler writers did something like that is casting via union members, but I'm running out of time, so we can talk about that another day.
by munch117
5/21/2026 at 7:14:45 AM
It's fully defined by your CPU's silicon masks and your compiler's binary code that one of several things will happen.by tardedmeme
5/21/2026 at 7:53:34 AM
Turns out that when you're implementing network applications, the set of things that could happen also depends on what the script kiddie on the other side of the globe feels like this morning.Some would prefer less excitement than this.
C code should be more predictable and easier to reason about than using a macro assembler. To the extent it is not, the language has failed.
by tsukikage
5/20/2026 at 12:00:06 PM
Print x twice. Not all “side effects” care about order.Better yet, define an order for parameter evaluation.
by Filligree
5/21/2026 at 12:08:57 PM
There is an easy way to take control: read the volatile variable only once. volatile int x = 5;
...
int y=x;
printf("%d in hex is 0x%x.\n", y, y);
by HelloNurse
5/20/2026 at 12:43:58 PM
You're missing the point. Volatile forces two loads of a value that may have changed in the middle. So the value of "x" may depend on the time/order of load.by poppadom1982
5/20/2026 at 1:17:59 PM
Which is, if I understand correctly, the entire point of volatile. Don't use it if you don't want that behavior.And in fact, in the example given, if there is something (another thread or whatever) that can change the value of x, then you don't know what either number will be. Well, in that circumstance, without volatile, it may print the same number both times, but you still don't know what the number will be (unless the read gets optimized away entirely).
by AnimalMuppet
5/20/2026 at 1:42:43 PM
If that behavior is the entire point, then I think the bigger point is that the spec should reflect that and not call it undefined.by chuckadams
5/20/2026 at 2:00:51 PM
I suspect that many undefined behaviors reflect the inability of the standard committee to come to a consensus on the nuances involved. “Punt to the implementers” is a way to allow every tool vendor to select their own expected behavior in those cases.by voakbasda
5/20/2026 at 3:14:42 PM
You seem to be operating under the assumption "undefined behavior" means "the compiler authors can decide what to do." That's not what it means. It means "any program that causes this behavior to be triggered is not a valid C program, the programmer knows this and did not submit an invalid program, and the programmer explicitly prevented this from happening elsewhere in ways automated analysis cannot detect. Proceed with compilation knowing this branch is impossible."The spelling for compiler authors getting to choose a behavior is "implementation defined", as the other comment mentions.
by chowells
5/21/2026 at 7:15:40 AM
It means the C standard does not specify what the program does. Other documents may still specify what the program does. And the program definitely still does something, whether specified or not.by tardedmeme
5/21/2026 at 4:45:33 PM
> And the program definitely still does something, whether specified or not.No. It most definitely does not mean this. Go read the series this is part of: https://blog.llvm.org/2011/05/what-every-c-programmer-should...
It is absolutely critical that people programming in C understand what real compilers in the real world do.
by chowells
5/20/2026 at 2:36:07 PM
Then it should be "implementation defined" rather than "undefined".by MarkusQ
5/20/2026 at 1:11:32 PM
Why is that missing the point? Loading it twice, possibly with different values, is the intended behavior. It's only undefined because the C spec doesn't specify the order of the loads (unlike most other languages which have a perfectly well-defined order for side effects in a single expression).by hmry
5/20/2026 at 2:48:28 PM
What you are describing is implementation defined behavior. Using that is perfectly safe and reasonable. Undefined means this programs is malformed.by rowanG077
5/20/2026 at 5:24:09 PM
No I'm just repeating what the original comment said, which is that it's explicitly UB:"5.1.2.4.1 says any volatile access - including just reading it - is a side effect. 6.5.1.2 says that unsequenced side effects on the same scalar object (in this case, x) are UB. 6.5.3.3.8 tells us that the evaluations of function arguments are indeterminately sequenced w.r.t. each other."
If function arguments were sequenced with respect to each other, it wouldn't be a problem.
But actually, maybe the original comment is wrong. Presumably "indeterminately sequenced" and "unsequenced" mean different things, although I don't have a copy of the standard at hand to check.
by hmry
5/20/2026 at 11:57:22 AM
Couldn’t you just define that function arguments are evaluated left to right?Or just throw an error.
by echoangle
5/20/2026 at 3:50:23 PM
Why? Just for this edge case? It could be faster and/or allow smaller code size to allow this to be undefined.Undefined is also different from "depends on the compiler", because which behavior is chosen can even depend on the circumstances, whatever code appears before and/or after it.
That said, UB in code, such as this example of ordering of reads of volatile parameters being undefined, does not automatically mean that code that uses it is bad. It may very well be that the function being called doesn't misbehave either way.
by jfoks
5/20/2026 at 5:32:55 PM
That’s the point of the whole article. It’s not worth the speed gain to have a language that nobody can safely use because you can’t really prevent UB when you write it.> It may very well be that the function being called doesn't misbehave either way.
The function being good or bad has nothing to do with the UB. The UB occurs before the function is called.
by echoangle
5/20/2026 at 12:18:55 PM
I meant reading the uninitialized variableby saagarjha
5/20/2026 at 2:06:29 PM
There is no uninitialized variable, I explicitly initialized it to 5.And yes indeed, C could do what Rust does and define the order of evaluation for function arguments.
If the argument expressions are indeed side-effect-less, the compiler can always make use of the "as-if" rule and legally reorder the computation anyway, for example to alleviate register pressure.
by muvlon
5/20/2026 at 11:37:15 AM
HCFby lll-o-lll
5/20/2026 at 11:52:43 AM
I have good news about what UB allowsby saagarjha
5/20/2026 at 12:42:45 PM
What is that?by JadeNB
5/20/2026 at 3:27:29 PM
A fictitious assembly instruction (and pretty good TV series).https://en.wikipedia.org/wiki/Halt_and_Catch_Fire_(computing...
by FabHK
5/20/2026 at 5:27:58 PM
Halt and Catch Fireby SAI_Peregrinus
5/20/2026 at 11:40:46 AM
Compilation errorby jeffffff
5/20/2026 at 11:53:28 AM
It’s hard to detect all UB at compile timeby saagarjha
5/20/2026 at 12:17:17 PM
It’s harder depending on the language, which is clearly the point.by Demiurge
5/20/2026 at 2:02:45 PM
[flagged]by stellamariesays
5/21/2026 at 2:52:31 AM
> Or at least, no human since the invention of C in 1972 has.No human without proper tools maybe, but what about seL4? It goes beyond proving the code is UB-free and actually formally verifies the code works as intended. And the code is written in C. (the proofs of course aren't)
The proof is interesting because it goes beyond just proving the C code is correct. For some platforms, they compile the code with an ordinary compiler, and verify that the machine code does what the C code is supposed to do. (that's because just writing correct C code doesn't help you if you trigger a compiler bug)
This works even if the compiler (in this case, GCC) isn't verified - they verify a specific output of the compiler, not that the compiler always generates machine code correctly.
by nextaccountic
5/20/2026 at 1:14:11 PM
> The point of the post is to say it's not possible to avoid them. Or at least, no human since the invention of C in 1972 has.What are you talking about? UB was coined only in the first C standard, in 1989. Prior to that there was no "If you do this, anything can happen". It was "If you do this, that will happen".
by lelanthran
5/21/2026 at 6:52:32 AM
> UB was coined only in the first C standard, in 1989Pre 1989, when C did not have a standard, was the behavior unspecified or undefined? That is, of course, a trick question. Because in this context the very definitions of the words come from the standard itself.
Before a language gets a specification, is the de facto specification the five words "you know what I mean"?
The very definition of "UB" in C later became "[…] this document imposes no requirements". Is that not the same thing as "there is to specification (yet)"?
It sounds very zen, but "a non existing specification imposes no requirements".
But I don't think it's meaningful to argue the semantic difference before the (in-context) existence of the words "undefined" vs "unspecified".
> Prior to that there was no "If you do this, anything can happen".
Of course it was. You relied on "common sense".
> It was "If you do this, that will happen".
Haha, of course it wasn't. Before a specification there is neither a definition of "this" nor "that".
Unless you mean ye olde "the compiler implementation is the specification". In which case we'll get dragged into "what even is a language" and "what is the sound of one hand clapping?".
Or, alternatively, it's as true then as it is today. If you go by "GCC x.y.z on platform Z kernel Y, (etc…) is the specification" then there is no UB.
by thomashabets2
5/20/2026 at 2:12:15 PM
More like, "if you do this, what happens depends on your particular combination of hardware, operating system, and compiler. Don't ask us."by professoretc
5/20/2026 at 3:01:14 PM
No, that would be implementation defined.by nickez
5/20/2026 at 5:42:15 PM
The post I was replying to said,> UB was coined only in the first C standard, in 1989. Prior to that there was no "If you do this, anything can happen".
I.e., the context is, before UB existed as a concept, how would these things be categorized. And I was trying to offer the correction that, before UB existed, it wasn't "all behavior is defined" but rather many behaviors depend on your particular local environment. While that may technically be implementation defined, the current standard requires that implementation defined be documented, and UB-like edge cases were most definitely not documented anywhere consistently in the old days!
by professoretc
5/20/2026 at 5:12:29 PM
No, that's actually UB. The important bit here is "compiler defined" -- UB means the compiler is allowed to assume it never happens while compiling.Consider, for example, an implementation defined function f() -- which can also diverge/crash horribly, etc.
If I write
if p {
print("p is true")
} else {
g()
}
if p {
f()
}
Then either we:
- print p is true and execute f
- do nothingThis is true regardless of if f immediately crashes the computer, nasal demons, whatever -- that's implementation defined.
UB means f may never happen.
And that means the compiler may optimize this to just:
g()
Notice the difference here -- the print never happens!, and g always happens.You can see why this is concerning when you write code like
if dry_run {
print("would run rm -rf /")
} else {
run("rm -rf /")
}
if dry_run {
// oops: some_debug_string is NULL and will segfault!
print(some_debug_string);
}
by tekne
5/20/2026 at 6:05:44 PM
I see what you're going for, but I don't see how your example is UB. If `p` is a pointer, and, after your `if (p)` check, `p` is dereferenced unconditionally, then yes, your check for `p == NULL` could be removed, and the code under the `if` would be removed as well. But the example you've constructed is not UB.by tyg13
5/21/2026 at 7:06:54 AM
You misunderstood their example, I think.If doesn't matter what 'p' is in their example. The point is: if 'f' is undefined behavior (rather than just impl-defined), then the optimizer concludes that the "if p { f() }" can never happen... which means that we're allowed to assume that 'if p { ... } else { ... }' (in the first part of the example) will always take the else branch. The compiler will optimize accordingly and just always call g() unconditionally.
by Quekid5
5/20/2026 at 6:43:27 PM
> if nobody can do it right, how is it even fair to blame the programmer? My point is that ALLIt's fair to blame the programmer for the choice of programming in a language like this, if it was in fact their choice. As you've so eloquently put, choosing those languages is essentially equivalent to choosing UB, so starting a new project with one of them is 100% blameworthy when the UB is inevitably found.
by saghm
5/21/2026 at 7:29:20 AM
Not all projects are green field. But sure, new modules can be written in other languages. And C is, as cross-language barriers go, fairly easy to interface with.by thomashabets2
5/20/2026 at 10:53:10 AM
Volatile is a type system hack. They should have done a more principled fix, and certainly modern languages should not act as though "C did it" makes it a good idea.The reason for the hack is that very early C compilers just always spill, so you can write MMIO driver code by setting a pointer to point at the MMIO hardware and it actually works because every time you change x the CPU instruction performs a memory write.
Once C compilers got some basic optimisations that obvious "clever" trick stops working because the compiler can see that we're just modifying x over, and over and over, and so it doesn't spill x from a register and the driver doesn't work properly. C's "volatile" keyword is a hack saying "OK compiler, forget that optimisation" which was presumably a few minutes work to implement, whereas the correct fix, providing MMIO intrinsics in the associated library, was a lot of work.
Why should you want intrinsics here? Intrinsics let you actually spell out what's possible and what isn't. On some targets we can actually do a 1-byte 2-byte and 4-byte write, those are distinct operations and the hardware knows, so e.g. maybe some device expects a 4-byte RGBA write and so if you emit four 1-byte writes that's very confusing and maybe it doesn't work, don't do that. On some targets bit-level writes are available, you can say OK, MMIO write to bit 4 of address 0x1234 and it will write a single bit. If you only have volatile there's no way to know what happens or what it means.
by tialaramex
5/20/2026 at 1:39:12 PM
I agree that marking the read/write as special rather than the variable itself would be nice, although it would also be nice if C/C++ was more consistent in the way things like this are done. Maybe given std::atomic and std::mutex as template/library features, supported by compiler intrinsics, it would be nice to have "volatile" supported in a similar way.As a nit pick, I don't think this is correct use of "spill". Register spilling refers to when a compiler's code generator runs out of registers and needs to store variables in memory instead. In the MMIO case you are reading/writing via a pointer, so this is unrelated to registers and spilling behavior.
by HarHarVeryFunny
5/20/2026 at 2:22:32 PM
That's fair that "spill" probably isn't quite the right word.by tialaramex
5/20/2026 at 3:34:09 PM
By MMIO semantics do you mean explicit load and store instructions? I’ve never felt that pointer reads or writes were lacking descriptiveness here. I would argue the only surprising thing is that they might be optimized out (which is what volatile prevents).Volatile on a non pointer value is not for MMIO, though, that’s typically for concurrency like with interrupts.
by MobiusHorizons
5/20/2026 at 6:38:50 PM
> I’ve never felt that pointer reads or writes were lacking descriptiveness here. I would argue the only surprising thing is that they might be optimized outThe C and C++ languages would be very slow by modern standards if you insist that reading or writing via a pointer must result in immediate fetches or stores to memory.
> Volatile on a non pointer value is not for MMIO, though, that’s typically for concurrency like with interrupts.
You're holding it wrong. Perhaps you've been holding it wrong for so long and so confidently that you've distorted the world around you -- indeed on MSVC on x86 or x86-64 that actually happened -- but, you're still holding it wrong.
by tialaramex
5/20/2026 at 7:51:39 PM
> You're holding it wrong. Perhaps you've been holding it wrong for so long and so confidently that you've distorted the world around you -- indeed on MSVC on x86 or x86-64 that actually happened -- but, you're still holding it wrong.Please explain. How would you make the variable backed by a hardware register region? Is this using some sort of linker trick to change where the value lives in memory?
by MobiusHorizons
5/20/2026 at 10:32:01 PM
You said it was for concurrency. The feature you want for that in C (and most languages suitable for this problem) is atomic memory ordering, not the volatile type qualifier.Microsoft's platform was x86 only for years, and because Intel's design pays for a lot more memory ordering by default than most, on Microsoft's platforms just "volatile" would kinda work even though it was the wrong thing, so Microsoft explicitly grandfathered this for x86 and x86-64 only, you are guaranteed the Acquire-Release ordering even though you didn't ask for it with your volatile type qualifier.
If you were actually thinking of POSIX signals or something similar then yeah, the POSIX requirements say volatile will work, seems like a bad idea to me, but your compiler and other tools are likely also built for POSIX so they've read the same documentation.
by tialaramex
5/20/2026 at 11:22:36 AM
Yeah, it's also cleaner to be able to mark particular reads and writes as having side effects as opposed to having it be a property of the variable.by rcxdude
5/20/2026 at 3:01:36 PM
Thr Linux kernel uses READ_ONCE and WEITE_ONCE which look like actual function calls which is very sensible.by tardedmeme
5/20/2026 at 11:18:20 AM
> The reason for the hack is that very early C compilers just always spill, so you can write MMIO driver code by setting a pointer to point at the MMIO hardware and it actually works because every time you change x the CPU instruction performs a memory write.Source?
by saagarjha
5/20/2026 at 2:42:22 PM
This is one of those "everyone doing this kind of work knows" that's rather hard to source, but: this is basically the point of volatile. Especially for reads rather than writes, where you may want to read some location that is being written into by a different piece of hardware.People used to use it for thread synchronization before proper memory barrier primitives (see https://mariadb.org/wp-content/uploads/2017/11/2017-11-Memor... ) were available. It was not entirely reliable for this purpose.
by pjc50
5/20/2026 at 6:54:16 PM
Yeah. I could have sworn that I've read somewhere an anecdote from the Bell Labs era in which this comes up, but I can't find it and might be misremembering. The whole volatile keyword doesn't exist in K&R C as released, there are no "type qualifiers" at all in that language, both volatile and const are introduced in C89.Duff's famous Device, often misunderstood as some insight about memory copying or something silly, was an MMIO hack, it doesn't look like an MMIO hack to us because it doesn't say volatile, but that's because Duff's compiler did not have that keyword, the reason Duff doesn't change the destination pointer is that it's pointing at hardware and the hardware isn't going anywhere, writing different bytes to the same address is I/O.
by tialaramex
5/21/2026 at 11:35:49 AM
No idea about volatile, but I do remember function prototypes and const came as influence from C++, well CFront.by pjmlp
5/20/2026 at 12:41:15 PM
Source for what? The volatile keyword is explicitly telling the compiler "don't optimize read/write to this memory location". That's the whole point. Its use for manipulating hardware registers is covered in any intro embedded systems course. I don't know the history of C compilers but it would seem reasonable to assume that compilers started out plainly translating the C to machine code. Optimization would have happened later as the compilers became more mature.https://www.gnu.org/software/c-intro-and-ref/manual/html_nod...
by skillina
5/21/2026 at 7:35:29 AM
Source for "compilers basically always did volatile since everything was always spilled".by saagarjha
5/20/2026 at 12:28:59 PM
> In C, we can have a data race on a single thread and without any writes!You need to distinguish between a UB and a race, and I think that's something that discussions of UB miss. Take any C program and compile it. Then disassemble it. You end up with an Assembly program that doesn't have any UB, because Assembly doesn't have UB.
UB is a property of a source program, not the executable. It means that the spec for the language in which the source is written doesn't assign it any meaning. But the executable that's the result of compiling the program does have a meaning assigned to it by the machine's spec, as machine code doesn't have UB.
A race is a property of the behaviour of a program. So it's true to say that your C program has UB, but the executable won't actually have a race. Of course, a C compiler can compile a program with UB in any way it likes so it's possible it will introduce a race, but if it chooses to compile the program in a way that doesn't introduces another thread, then there won't be a race.
by pron
5/20/2026 at 1:11:41 PM
> because Assembly doesn't have UBTo be pedantic, old hardware like 6502 family chips (Commodore 64, Apple II, etc) had illegal instructions which were often used by programmers, but it was completely up to the chip to do whatever it wanted with those like with UB.
by redox99
5/20/2026 at 1:56:38 PM
> illegal instructions... were often used by programmersIntentionally, with an expected effect? I'd need a citation for that.
by zahlman
5/20/2026 at 9:09:16 PM
Yes, many of those are perfectly stable. For example, the 6502 has an undocumented instruction commonly known as "LAX" which loads both the A and X registers at the same time in a predictable manner in most addressing modes, in the same time and space it would otherwise take to load either of those registers on their own.The benefits of being able to do stuff like this when you need to conserve resources are obvious, and common idioms have formed around their use. Check out https://csdb.dk/release/?id=198357
by boomlinde
5/20/2026 at 3:26:03 PM
Some desultory googling turned up:* https://www.nesdev.org/wiki/CPU_unofficial_opcodes#Games_usi...
* https://hitmen.c02.at/files/docs/c64/NoMoreSecrets-NMOS6510U... (doesn't name any software, but some copy protection schemes were already known to use them)
by chuckadams
5/21/2026 at 8:59:34 AM
Some instructions were very useful and they were simply discovered by programmers who tried out what each instruction did. People did not necessarily have access to documentation those days!So any instruction or hardware feature would get used, whether it's "officially" documented or not.
by vardump
5/21/2026 at 2:03:22 AM
> You end up with an Assembly program that doesn't have any UB, because Assembly doesn't have UB.I guess that's true if you think of assembly as a more readable form of machine code, but from a practical sense I'd argue that assembly inherits the undefined behaviors of the architecture it represents and the implementations of that architecture it actually builds for.
IIRC the OG Xbox security was broken partially as a result of undefined behaviors in x86 where the AMD CPUs that were used in early development would crash or throw an error or something when execution reached the end of the memory space but the Intel CPU they switched to instead just rolled over and kept executing from 0.
by wolrah
5/20/2026 at 4:37:07 PM
I specifically said data race, which is a known term of art and a type of language-level UB. It is separate from the races you're thinking about. Just like signed integer overflow or use-after-free, the compiler is allowed to assume data races never happen.by muvlon
5/20/2026 at 1:53:58 PM
The problem is that in the quest to win benchmark games, compilers started to take advantage of UB for all kinds of possible optimizations, which is almost as deterministic as LLM generated code, across compiler version updates.by pjmlp
5/20/2026 at 2:35:50 PM
Soooo… Pay attention to updates changelog?by skydhash
5/20/2026 at 5:13:35 PM
This isn't an answer. UB is not only code dependent, but in many cases value-dependent as well. Changing anything about a program has the potential to cause UB anywhere in the code graph affected. So even the smallest possible change requires you to fully understand that entire graph, as well as the entire compiler history and how it interacts with your program. Remember, UB isn't diagnostic and runtime sanitizers don't catch everything, nor does exhaustive testing and static analysis.by AlotOfReading
5/20/2026 at 2:36:48 PM
If only those changes were all listed there...by pjmlp
5/20/2026 at 12:14:35 PM
> In C, we can have a data race on a single thread and without any writes!Well, sure, that's what volatile means - that the value may be changed by something else. If it's a global variable then the something else might be an interrupt or signal handler, not just another thread. If it's a pointer to something (i.e. read from a specific address) then that could be a hardware device register who's value is changing.
The concept of a volatile variable isn't the problem - any language that is going to support writing interrupt routines and memory mapped I/O needs to have some way of telling the compiler "don't optimize this out" since reading from the same hardware device register twice isn't like reading from the same memory location twice.
I think the problem here is more that not all of the interactions between language features and restrictions have been fully thought out. It's pretty stupid to be able to explicity tell the language "this value can change at any time", and for it to still consider certain uses of that value as UB since it can change at any time! There should have been a carve out in the "unsequenced side effect" definitions for volatile variables.
by HarHarVeryFunny
5/20/2026 at 12:34:24 PM
> There should have been a carve out in the "unsequenced side effect" definitions for volatile variables.As noted, there’s almost 300 usages of the word undefined in the standard. Believing that it’s possible to correctly define all the carve outs necessary correctly and have the compiler implement the carve outs successfully is about as logical as believing UB is humanly avoidable in written code.
by vlovich123
5/20/2026 at 8:39:38 AM
I think the article's point is that you don't actually have to get weird at all to run into UB.Lots of people mistakenly think that C and C++ are "really flexible" because they let you do "what you want". The truth of the matter is that almost every fancy, powerful thing you think you can do is an absolute minefield of UB.
by simonask
5/20/2026 at 10:29:30 AM
My go-to example of "UB is everywhere" is this one: int increment(int x) {
return x + 1;
}
Which is UB for certain values of x.
by kzrdude
5/20/2026 at 11:03:41 AM
C23 removed the whole stuff about indeterminate value and trap representation. Underflow/overflow being silent or not is implementation defined.by CodeArtisan
5/20/2026 at 11:19:08 AM
Signed overflow is just undefined.by saagarjha
5/21/2026 at 6:49:37 AM
TBF that is the same as saying "signed overflow is UB".by jstimpfle
5/21/2026 at 6:57:03 AM
yes but it is a 'picture' that makes you think about it in a different way.by kzrdude
5/20/2026 at 6:41:07 PM
I've long said that the value a programming language offers is as much about what it doesn't allow as what it does allow. Efficiency aside, most useful programs could be written in most languages, but there are an infinite number of programs you could write that aren't particularly useful. Ruling out the programs you might accidentally write that resemble the one you intended is a pretty useful feature of a language, and it's a metric that C and C++ rate quite poorly on IMO.by saghm
5/20/2026 at 10:20:57 AM
I would agree that C is "really flexible", but I would say it's primarily flexible because it lets you cast say from a void pointer to a typed pointer without requiring much boilerplate. It's also flexible because it lets you control memory layout and resource management patterns quite closely.If you want to be standards correct, yes you have to know the standard well. True. And you can always slip, and learn another gotcha. Also true. But it's still extremely flexible.
by jstimpfle
5/20/2026 at 10:58:42 AM
The problem is that a lot of the flexibility introduced by UB doesn't serve the developer.Take signed integer overflow, for example. Making it UB might've made sense in the 1970s when PDP-1 owners would've started a fight over having to do an expensive check on every single addition. But it's 2026 now. Everyone settled on two's complement, and with speculative execution the check is basically free anyways. Leaving it UB serves no practical purpose, other than letting the compiler developer skip having to add a check for obscure weird legacy architectures. Literally all it does is serve as a footgun allowing over-eager optimizations to blow up your program.
Although often a source of bugs, C's low-level memory management is indeed a great source of flexibility with lots of useful applications. It's all the other weird little UB things which are the problem. As the article title already states: writing C means you are constantly making use of UB without even realizing it - and that's a problem.
by crote
5/20/2026 at 11:22:11 AM
If we're talking two's complement it's not undefined that is right. Having to emit checks though, that is where I beg to differ. A check is only useful if you want to actually change the behavior when it happens, otherwise it is useless. Furthermore, it might be "essentially free" from a branch prediction point, but low and behold caches exist. You would pollute both the instruction cache with those instructions _and_ the branch prediction cache. From this it doesn't follow at all, that there is no cost.In the end small things do add up, and if you're adding many little things "because it doesn't cost much nowadays" you will end up with slow software and not have one specific bottleneck to look at. I do agree that having the option for checked operations is nice (see C#), but I have needed this behavior (branching on overflow) exactly once so far.
by ablob
5/20/2026 at 12:34:45 PM
> A check is only useful if you want to actually change the behavior when it happens, otherwise it is useless.You almost always want to change the behavior to erroring out on overflow. The few cases where overflow really is intended and fine can be handled by explicit opt-out.
And I refuse to buy the argument that "small things add up" in the world where we do string building and parsing every few microseconds. Checked math will have unnoticable impact compared to all the other things we do, in almost every type of program.
by Xirdus
5/20/2026 at 1:51:46 PM
This string manipulation stuff is very common, and that's why in 2026, an age where science fiction has become a reality, many things are still absurdly slow. Exactly because of such sloppiness, which does accumulate in many cases, and when one least expected it.by jstimpfle
5/20/2026 at 9:06:51 PM
100% agreed on the sloppiness. But overflow checking is not sloppiness. It's the opposite of sloppiness. Unchecked math is sloppiness, allowing overflows to happen silently and uncontrollably is sloppiness. It just so happens this kind of sloppiness makes code faster, unlike other kinds of sloppines that make code slower. Not doing necessary safety checks is faster than doing these necessary checks, but it doesn't make these checks any less necessary. Not validating user input also makes code faster, and is also sloppy.by Xirdus
5/20/2026 at 11:20:14 AM
Signed overflow checks are typically not free unfortunately they have a cost of about 5% or thereaboutsby saagarjha
5/20/2026 at 12:50:50 PM
In hot paths it can be even more. This is why even Rust defines it as wrapping but elides the overflow panic in release builds.by vlovich123
5/20/2026 at 1:23:06 PM
It is defined as an error. That error’s default handling is wrapping when debug_assertions is off, and panic when it’s on, but since it’s an incorrect program (though not UB) either behavior is acceptable in any mode.by steveklabnik
5/20/2026 at 1:53:47 PM
If it is defined as an error, but the compiled build will continue to run with the value wrapped around, I would say that's indistinguishable from UB.by jstimpfle
5/20/2026 at 3:19:54 PM
No. An integer getting deterministically set to an unintended value is a bug. A bug is not the same thing as UB. (Even if it were non-deterministic, it would still not be anything like UB.) It's not the same ballpark, not even the same sport.by 12_throw_away
5/20/2026 at 9:55:28 PM
What if the wrapped index is used to construct an invalid pointer? It might be possible, not sure. What if the integer is used to read the wrong data from disk, or corrupt data on disk by writing to the wrong location?by jstimpfle
5/20/2026 at 10:41:48 PM
> What if the wrapped index is used to construct an invalid pointer?Constructing an invalid pointer in rust is UB, yes, but integer wraparound is not.
> What if the integer is used to read the wrong data to a disk, or corrupt data on disk by writing to the wrong location?
Then it is a very bad bug.
> What if the program controls a nuclear power plant and the integer causes the control system to fail, causing memory errors due to radiation from the meltdown?
Then it is a very very bad bug.
> What if the wrapped integer causes the program to output the true name of god, and the programmer, in their last minutes of existence, looks up to see, overhead, without any fuss, the stars going out?
Ok, you got me, this one is UB.
by 12_throw_away
5/21/2026 at 12:49:17 AM
> Constructing an invalid pointer in rust is UBno, it is dereferencing, not constructing, an invalid pointer, that is UB. there is even a safe function provided to construct an invalid but non-null pointer: `https://doc.rust-lang.org/stable/std/ptr/fn.dangling.html`
by kobebrookskC3
5/21/2026 at 1:42:53 AM
> What if the wrapped index is used to construct an invalid pointer?Using that pointer would be UB, but that is UB, not the addition.
> What if the integer is used to read the wrong data from disk, or corrupt data on disk by writing to the wrong location?
That is a bug, but it is not undefined behavior.
by steveklabnik
5/20/2026 at 5:20:33 PM
It's indistinguishable from unspecified behavior, not from undefined behavior. Unspecified behavior has to pick from a finite list of allowed behaviors. Undefined behavior can do anything.by SAI_Peregrinus
5/20/2026 at 9:58:06 PM
A program with corrupted state can essentially do anything. Yes it's still a question of run-time checks the runtime has to protect against it. But the compiler is probably deriving a lot of assumptions from the assumption that there wasn't overflow.by jstimpfle
5/21/2026 at 1:40:54 AM
“Undefined behavior” is a term of art in programming languages that means something more specific than “the program may do something odd.”The compiler is not allowed to derive any assumptions from it. It only could if it were UB.
by steveklabnik
5/21/2026 at 5:51:04 AM
But did the rust compiler assume that the integer would not overflow? It did so in Debug mode where runtime checks were added. If it's not the case in Release mode, does that mean semantics are different between Debug and Release?by jstimpfle
5/21/2026 at 3:53:01 PM
> But did the rust compiler assume that the integer would not overflow?It did not.
> It did so in Debug mode where runtime checks were added.
It didn't assume in that case either. It did a well defined thing: add checks.
> If it's not the case in Release mode, does that mean semantics are different between Debug and Release?
Strictly speaking, the language doesn't know about "release mode", as that's a Cargo thing. But yes, in practice, the semantics are different based on various things: it could be debug vs release, it could also be flags that control the behavior. But that's still distinct from "undefined behavior" as a concept. The behavior is well defined, with multiple possible options for behaviors.
by steveklabnik
5/21/2026 at 8:24:09 PM
So in Rust, you are actually specificing TWO programs with a single source? Those Rust users are surely too clever for my liking!You can tune a C compiler as well to have a very specific defined behaviour for integer overflow. You can add -fwrapv or you can add UBSAN.
The user never intended overflow to happen, because if they did, they could have used something like __builtin_mul_overflow() or whatever. Or they are an emotionally unstable user with destructive tendencies. The user also never intended the program to abort with a (nicely formatted) error message, unless they are a very very sad depressed nihilistic user who also never runs their program in Release mode.
To say that overflow would be defined in Rust is at least half a lie. We could agree that cargo has a choice of diagnostic policy though, a policy how to handle what is essentially a state with no defined or useful path forward, or in other words, UB.
Throwing errors might be a wanted property to detect oversights. C ecosystem has UBSAN too! But essentially the same is still true: Basic arithmetic operations are not closed over the numbers 0..2^N. Rust doesn't have a (unique and useful) definition for those operations for a subset of numbers. Even if you claim the operations are defined (say wrapping arithmetic in Release mode), it's not what the programmer wants. Probably the majority of algorithms work over natural numbers or integer numbers. These algorithms don't work when the arithmetic on integers modulo 2^N.
So the user has to constrain the set of valid inputs, and do manual sanitization, just like in C.
by jstimpfle
5/21/2026 at 2:32:16 PM
The semantics are well-defined in both modes. You can predict exactly what will happen in either case. In C, the semantics are not defined at all, you can't predict what will happen and it's allowed to change between compilations of the same source.It will probably get omitted, since Undefined Behavior isn't allowed by the C abstract machine, but sadly compilers are allowed to emit code for UB in the source (partly because some UB is only detectable at runtime). Sometimes disabling optimizations will incorrectly allow codegen to run for source lines which have UB, tricking people into thinking that optimizations are breaking their program. Compilers are allowed to do this, since behaviors other than "omit the offending statement" are unfortunately allowed by the standard, so it's not a compiler bug.
by SAI_Peregrinus
5/21/2026 at 7:43:11 PM
UB is a runtime property. As far as you can statically verify some code parts, you can see UB at compile time, but the point of UB is exactly that it is about stuff you can't predict, or that is hard to predict as a compiler.Now why you can cook up trivial artificial examples where a compiler will remove some code sections based on statically detected UB, instead of printing an error, you have to ask the compiler authors.
> The semantics are well-defined in both modes.
So they're not the same? So the behaviour is not uniquely defined by the source code alone, but is actually _very_ different based on compile mode? Between two modes whose point was never to have different semantics, but to have the _same_ semantics while being debuggable vs being fast?
> You can predict exactly what will happen in either case. In C, the semantics are not defined at all, you can't predict what will happen and it's allowed to change between compilations of the same source.
You can make the same "predictability" argument for C, you can easily write a compiler that has semantics exactly laid out. Case in point: -fwrapv. Case in point: UBSAN.
by jstimpfle
5/21/2026 at 7:27:00 AM
Yeah on average. On some paths it's almost freeby saagarjha
5/20/2026 at 12:35:07 PM
You can run your code under ASAN and UBSAN nowadays, it will catch many or most of issues as they happen.But that's completely besides the point. UB on signed overflow, or really most of UB, is not unrelated to C flexibility. It is a detail of the spec related to portability and performance. IIRC it is even required to make such trivial optimizations as turning
for (int i = 0; i < n; i++) func(a[i]);
into for (Foo *p = a, *last = a + n; p < last; p++) func(p);
saving arithmetics and saving a register, on architectures where `int` is smaller than pointers. But there is also options like -fwrapv on GCC for example, allowing you to actually use signed overflow.
by jstimpfle
5/20/2026 at 1:39:39 PM
How is undefined behavior necessary for this transformation?by Chinjut
5/20/2026 at 2:15:54 PM
IIRC computation of the address is done by computing offset from base pointer as a multiplication in (32-bit) int, (like p + (i * sizeof (Foo)). The right term might overflow, but due to signed overflow being UB, the compiler is able to assume that it does not, so the transformation to do the arithmetic entirely in (64-bit) pointer space is valid.by jstimpfle
5/20/2026 at 6:31:04 PM
Exactly. You as the programmer know that the loop counter won't overflow, and in general, essentially nobody would actually write it that way. But if you don't assume it can't happen, the possibility for signed overflow is everywhere in address computations.This is also a major blocker for auto-vectorization. Can't coalesce a load of a[i], a[i+1], a[i+2], a[i+3] into a load of a[i:i+3] if there's a possibility that `i+1`, `i+2` or `i+3` wrapped around (thus causing your "contiguous" load to be non-contiguous). This is a big reason why you shouldn't use `unsigned` for loop counters, especially if they're going to be used as an index into an address calculation.
by tyg13
5/21/2026 at 1:54:51 PM
But surely the more natural approach than making this undefined behavior would be making the computation of a[i] take place in 64-bit pointer space rather than 32-bit int space? Why does the compiler need the freedom to emit nasal demons?by Chinjut
5/20/2026 at 2:37:05 PM
*is not related to C flexibilityby jstimpfle
5/20/2026 at 10:38:19 AM
It's not flexible in practice, because knowing the standard isn't optional. If you make the choice to not follow the standard, you're making the choice to write fundamentally broken software. Sometimes with catastrophic consequences.by simonask
5/20/2026 at 10:57:36 AM
I'm making the choice to pass pointers as void to get low-friction polymorphism. I'm making the choice to control the memory layout of my data structures, including of levels and type of indirection. I'm making the choice to control my own memory allocators and closely control lifetimes, closely control (almost) everything that happens in the system.That has nothing to do with not following the standard.
by jstimpfle
5/20/2026 at 11:20:56 AM
But be as you may you’re not following the standard.by saagarjha
5/20/2026 at 12:24:15 PM
what is your point?by jstimpfle
5/20/2026 at 12:37:33 PM
If you don't follow the standard, gcc -O2 can introduce bugs to your code that you never even wrote. Skipping null checks, executing both branches of a conditional, and so on.by Xirdus
5/20/2026 at 12:39:46 PM
Where did I say I'm not following the standard?by jstimpfle
5/20/2026 at 12:58:25 PM
I interpreted these words:> If you want to be standards correct, yes you have to know the standard well.
to mean that being standards-correct is optional. It's not. Every C programmer needs to know every possible UB by heart and never introduce any of it to their code, or else they'll be constantly introducing subtle, hard to debug bugs that contradict the actual code they wrote.
Maybe you meant something different by those words, but then I'm confused what the "if" was supposed to mean.
by Xirdus
5/20/2026 at 2:06:09 PM
Of course it's optional (although I didn't mean to imply that). Even using computers at all is optional. I never said that I don't aim to follow the standard, have a clean compiling program without warnings and without UB, etc. I do strive to achieve all of that.But it's not entirely black and white, either. In practice I'm fine accepting that some bugs are technically UB but whatever, we've found a bug by whatever manifestation (like NULL dereference most likely leading to segfault in practice). I just fix the bug as a bug, and life goes on.
The standard is not perfect, it does have shortcomings. It can be improved. And it can be interpreted to fix some issues. Let's not hold theory over practicality, and let's expect the compiler writers also strive to do the reasonable thing.
by jstimpfle
5/20/2026 at 8:57:49 PM
In practice, GCC -O2 will happily erase entire swathes of code and turn perfectly logical source into nonsense assembly whenever it gets as much as a sniff of UB anywhere in the code path. Nobody would be talking about UB if GCC wasn't so aggressive in abusing the freedom UB gives.To paraphrase your earlier comment - you lose low-friction polymorphism (unpredictable compiler output causes a lot of friction). You lose control of memory allocations (because they may have been elided) and lose control of lifetimes (because free can be moved before last use causing a crash, or removed entirely causing a leak). You lose control of (almost) anything that happens in the system. And it has everything to do with not following the standard.
You do retain control of the memory layout of data structures, though.
by Xirdus
5/20/2026 at 10:13:31 PM
Then I'm almost ashamed to admit that I'm not sure I've ever witnessed any surprising form of UB in the wild. For example, I will reliably get segfaults on NULL dereference in practice. Typical manifestations of UB are entirely predictable and obvious. Of course I'm also running most code without most optimizations, most of the time, while developing.On the other hand, what I've observed with my own eyes is interesting phenomenons like performance drops, e.g. memory bandwidth dropping from gigabytes/sec to 300 KB/sec due to false sharing on an ARM SOC for example.
by jstimpfle
5/21/2026 at 1:13:59 PM
There was once a privilege escalation vulnerability in Linux kernel that only happened when compiled with optimizations. In kernel space, address 0 is just regular memory that can be read from and written to if there's a page mapped to it. But in C standard, reads and writes to null pointer are UB.There was some function that read from a passed pointer unconditionally whether it's null or not. It made sense in context. Then it checked if the pointer is null - if it is do early return, if it's not do privileged operation. The pointer was null iff the user didn't have permissions to do the operation.
What GCC did is notice that a pointer is accessed before its null check. Since accessing a null pointer is UB, and GCC assumes UB never happens, it figured out the null check is superfluous. And removed the check and the early return. The pointer read stayed, mind you. The optimized function would unconditionally read from the pointer even if it's null, then unconditionally execute the privileged operation without checking permissions. That allowed obtaining root access from anywhere.
I saw a few other writeups of interesting UB behavior on The Old New Thing blog. I especially like the time travel one: https://devblogs.microsoft.com/oldnewthing/20140627-00/?p=63... (apologies to people of the future, links to MS devblogs tend to die often).
by Xirdus
5/21/2026 at 7:34:34 AM
Compilers do not surprise all that often which is why it is extra surprising when it does happenby saagarjha
5/20/2026 at 10:05:29 AM
At which point it feels like some sort of high-level assembly-like language, which is simple enough to compile efficiently and stay crossplatform, with some primitives for calls, jumps, etc. could find a nice niche.Maybe this already exists, even? A stripped down version of C? A more advanced LLVM IR? I feel like this is a problem that could use a resolution, just maybe not with enough of a scale for anyone to bother, vs. learning C, assembly of given architecture, or one of the new and fancy compiled languages.
by 3form
5/20/2026 at 4:09:39 PM
There's Vale [0] as a structured high-level assembly language, but pretty far from usable right now. I do hope it matures. Basically: All non-control-flow instructions can be directly supported. Control flow is lofted to a higher level and implemented in C-style structured blocks and keywords, which map directly to a subset of the ISA that modifies the program counter. This separation means it's not a proper superset of traditional assembly languages -- you can't paste in arbitrary blocks of existing code -- but a lot of interesting things (for them, implementations of cryptographic primitives) are pretty trivial to port over. And in exchange, you get a well defined Hoare logic that can talk about total correctness, not just [1]'s partial correctness.by addaon
5/20/2026 at 10:40:02 AM
Well, Zig is aiming to be a "saner C", and mostly succeeding so far. I hope they make it to production.Rust is a somewhat more thorough attempt to actually course-correct.
by simonask
5/20/2026 at 5:40:32 PM
It is basically what you can have today with Object Pascal or Modula-2, with a revamped syntax for C crowds.by pjmlp
5/20/2026 at 5:41:41 PM
Yes, there have been quite a few C inspired Assembly languages for DSPs for example, TI had one.by pjmlp
5/20/2026 at 10:13:14 AM
And it makes sense as long as you allow the concept of unsequenced operations at all (admittedly it’s somewhat rare; e.g. in Scheme such things are defined to still occur in sequence, but which specific sequence is unspecified and potentially different each time). The “volatile” annotation marks your variable as being an MMIO register or something of that nature, something that could change at any point for reasons outside of the compiler’s control. Naturally, this means all of the hazards of concurrent modification are potentially there.That said, your “common parlance” definition of “data race” is not the definition used by the C standard, so your last sentence is at best misleading in a discussion of standard C.
> The execution of a program contains a data race if it contains two conflicting actions in different threads, at least one of which is not atomic, and neither happens before the other. Any such data race results in undefined behavior.
(Here “conflicting” and “happens before” are defined in the preceding text.)
by mananaysiempre
5/20/2026 at 10:53:31 AM
Your first paragraph makes it sound as if the compiler will actually generate two reads of the value of some register, which might lead to unexpected effects at runtime for certain special registers.However, this is not at all what UB means in C (or C++). The compiler is free to optimize away the entire block of code where this printf() sequence occurs, by the logic that it would be UB if the program were to ever reach it.
For example, the following program:
int y = rand();
if (y != 8) {
volatile int x;
printf("%d: %d", x, x) ;
} else {
printf("y is 8");
}
Can be optimized to always print "y is 8" by a perfectly standard compliant compiler.
by tsimionescu
5/20/2026 at 11:55:34 AM
> Your first paragraph makes it sound as if the compiler will actually generate two reads of the value of some register, which might lead to unexpected effects at runtime for certain special registers.I don’t see how. I was trying to explain why it’s reasonable for a volatile read to be a side effect, after which the C rule on unsequenced side effects applies, yielding UB as you say.
by mananaysiempre
5/20/2026 at 11:06:22 AM
"volatile" tells the compiler it is _not_ safe to optimise away any read or write, so it can't just optimise that section away at all.> An object that has volatile-qualified type may be modified in ways unknown to the implementation or have other unknown side effects. Therefore any expression referring to such an object shall be evaluated strictly according to the rules of the abstract machine, as described in 5.1.2.3. Furthermore, at every sequence point the value last stored in the object shall agree with that prescribed by the abstract machine, except as modified by the unknown factors mentioned previously.
A compliant compiler is only free to optimise away, where it can determine there are no side-effects. But volatile in 5.1.2.3 has:
> Accessing a volatile object, modifying an object, modifying a file, or calling a function that does any of those operations are all side effects.
by shakna
5/20/2026 at 11:19:38 AM
Yes, but undefined behaviour is undefined behaviour, and that behaviour can legally be that the code is not emitted at all, volatile (or any other side effect) or not. (and compilers do reason about undefined behaviour when optimising, so this isn't necessarily a completely theoretical argument, though I don't know whether the in compiler's actual logic which of 'don't optimise volatile' or the 'do assume undefined behaviour is impossible and remove code that definitely invokes it' would 'win', or whether there's any current compiler that would flag this as unconditionally undefined behaviour in the first place).by rcxdude
5/20/2026 at 11:25:04 AM
Volatile wins.GCC calls that out [0] - volatile means things in memory may not be what they appear to be, and that there are asynchronous things happening, so something that may not appear to be possible, may become so, because volatile is a side-effect.
So about the only optimisation allowed to happen, is combining multiple references.
Clang is similar:
> The compiler does not optimize out any accesses to variables declared volatile. The number of volatile reads and writes will be exactly as they appear in the C/C++ code, no more and no less and in the same order.
[0] https://www.gnu.org/software/c-intro-and-ref/manual/html_nod...
by shakna
5/20/2026 at 12:00:34 PM
That's cool and all if you are writing GCC or Clang dialect C, but it doesn't change the fact that it is UB in the C standard.by poizan42
5/20/2026 at 3:03:16 PM
This is all assuming that the code is not invoking undefined behaviour. If the code is invoking undefined behaviour, GCC and clang are both well within their rights to say 'none of the rest of our documentation applies' (and have historically done so on bug reports).by rcxdude
5/20/2026 at 11:17:16 AM
Sure it can. That code path has unconditional UB and thus it is not valid.by saagarjha
5/20/2026 at 11:20:38 AM
Only if there would be no side-effects. Which there are.by shakna
5/20/2026 at 11:52:12 AM
No this is irrelevant for making this decisionby saagarjha
5/20/2026 at 12:01:15 PM
I've mentioned elsewhere the standards, and compilers as well, disagreeing with you here.But feel free to run against the various compilers through godbolt. [0] They won't optimise the branch away. Access to a volatile, must be preserved, in the order that they exist. No optimisation, UB or otherwise, is allowed to impede that. Because an access is a side-effect.
by shakna
5/20/2026 at 1:58:38 PM
Compilers not doing something is not a demonstration that they are not actually allowed to do that thing.by zahlman
5/20/2026 at 12:17:18 PM
That they won’t is as most a courtesy to you but they are not required to do this.by saagarjha
5/20/2026 at 12:26:42 PM
> Furthermore, at every sequence point the value last stored in the object shall agree with that prescribed by the abstract machine, except as modified by the unknown factors mentioned previously.I quoted the C standard, first. Not compiler behaviour.
I showed where it requires the compiler not to optimise this.
How about, instead of one-line throwaway disagreements, you point out where they are permitted to do this, instead?
by shakna
5/20/2026 at 1:56:47 PM
The compiler is required to not optimise out reads/writes through volatile. That's unrelated to code also having UB: you can't sprinkle volatile through arbitrary UB and suddenly have it be defined.> A compliant compiler is only free to optimise away, where it can determine there are no side-effects
A compliant compiler is also allowed to assume UB cannot occur.
by dwattttt
5/20/2026 at 12:03:23 PM
This looks like a long back and fourth, that can easily be solved by a minute or two on godbolt...by nilamo
5/20/2026 at 12:15:48 PM
> that can easily be solved by a minute or two on godbolt...Unfortunately it's not that simple when it comes to UB. If the snippet in question does in fact exhibit UB then there's no guarantee whatever Godbolt shows will generalize to other programs/versions/compilers/environments/etc.
by aw1621107
5/20/2026 at 12:23:15 PM
That's very funny to me.A) x is always removed.
B) no, it's never removed if volatile.
But neither person can prove what a compiler will actually do, despite claiming they'll always act a certain way given 5 lines of code.
by nilamo
5/20/2026 at 12:33:11 PM
Also, at behavioural edges what you'll see on Godbolt is compiler bugs. So you learn nothing about what should happen.All popular modern C++ compilers have known bugs and while I'm sure there are C compilers with no known bugs that will be because nobody tested very hard.
by tialaramex
5/20/2026 at 1:58:42 PM
I have watched a compiler flip between emitting the code I expected (despite it having UB), and emitting unexpected code after a minor update.What you observe a compiler do when there's UB is not at all something you can rely on.
by dwattttt
5/20/2026 at 3:06:47 PM
No, claim A is 'x may be removed by a conforming C compiler'. Whether any given version of a given compiler actually does so in any given circumstance is a different question (the answer being: probably not, because while this is undefined behaviour it's not likely something that is going to be flagged as such by a compiler's optimizer. Also, from some testing with GCC and forcing a null point dereference, it seems like volatile at least does win in that case with the current version of it x86, and it dutifully emits the null pointer dereference and then the 'ud2' instruction instead of the rest of that execution path).by rcxdude
5/20/2026 at 12:29:16 PM
I made the weaker claim that x can be removed. This is something I could prove with compiler output but I would have to find a compiler willing to make this optimization which is not something I can guarantee.by saagarjha
5/20/2026 at 12:18:03 PM
No, compilers will often choose to not optimize on UB.by saagarjha
5/20/2026 at 11:28:15 AM
When compiler decides something is UB aka "result of this code is not defined and could be any" it selects the most performant version of undefined behavior - doing nothing by optimizing code away.by u8080
5/20/2026 at 11:39:44 AM
The compiler is not free to remove accesses to something marked volatile - its defined as a side-effect.Volatile means something else may be acting here. Something else may install anything into the register at any time - and every time you access.
The compiler is required to preserve the order of accesses. In almost every C compiler, today, there are almost no optimisations the moment a volatile is introduced, for this reason.
by shakna
5/20/2026 at 11:55:21 AM
If code has undefined behavior, the entire execution path that leads to that UB has no assigned semantics in the C model. So there are no volatile accesses in this code according to the C abstract machine - the entire execution path is UB, so it can be assumed it doesn't happen at all.by tsimionescu
5/20/2026 at 12:29:25 PM
> An object that has volatile-qualified type may be modified in ways unknown to the implementation or have other unknown side effects. Therefore any expression referring to such an object shall be evaluated strictly according to the rules of the abstract machineThe execution path has unknown side effects, and so the execution path must be strictly followed. That's uh... The entire point of that section in the C standard. Its why volatile is called out, in the semantic model for the abstract machine.
Otherwise... Why call it out, at all? It must be strictly followed, not lazily, as in other areas of the standard.
by shakna
5/20/2026 at 1:26:52 PM
Previously discussed here: https://news.ycombinator.com/item?id=33770277UB supersedes volatile, once the compiler hits UB then all bets are off. Compilers can and do optimize out UB branches, which is almost never what you want... yet here we are.
by Aeolos
5/20/2026 at 1:42:24 PM
From that thread: https://news.ycombinator.com/item?id=33770905>> The moment you enter a compilation unit (assuming no link optimizations) with a state which at some point will run into undefined behavior all bets are of. [...] Yes, UB can "time travel"
> Close, but not quite. This is a common misconception in the reverse direction.
> Abstractly, what UB can do is performing the inverse of the preceding instructions, effectively making the abstract machine run in reverse. However, this is only equivalent to "time-traveling" until you get to the point of the last side effect (where "side effect" here refers to predefined operations in the standard that interact with the external world, such as I/O and volatile accesses), because only everything since that point can be optimized away under the as-if rule without altering the externally visible effects of the program.
> As a concrete, practical example, this means the following: if you do fflush(stdout); return INT_MAX + 1; the compiler cannot omit the fflush() call merely because the subsequent statement had undefined behavior. That is, the UB cannot time-travel to before the flush. What the program can do is to write garbage to the file afterward, or attempt to overwrite what you wrote in the file to revert it to its previous state, but the fflush() must still occur before anything wild happens. If nobody observes the in-between state, then the end result can look like time-travel, but if the system blocks on fflush() and the user terminates the program while it's blocked, there is no opportunity for UB.
by shakna
5/20/2026 at 3:09:23 PM
Sure, but in this case the volatile accesses are part of the undefined behaviour and so they're not outside of the blast radius.by rcxdude
5/20/2026 at 12:27:54 PM
The print example has no defined order of accesses, function parameters can be evaluated in any order. But further, the entire problem with UB is that it supercedes the regular guarantees that you get (like with volatile) when it's encountered. Yes gcc and clang do the obvious thing that makes the most sense in this example, but what people are trying to tell you is that they could just not do that and they would still be complying with the standard. For example, you can imagine a more serious example of UB that causes the program to fail to compile completely, and then do you emit the correct number of in order reads of volatile variables? Obviously not.by SpaceNugget
5/20/2026 at 12:35:12 PM
Function parameters cannot be evaluated in any order, when one of them is a volatile.> The initialization shall occur in initializer list order, each initializer provided for a particular subobject overriding any previously listed initializer for the same subobject
And what I am trying to tell people, is the standard has expectations around the volatile keyword, that the compilers took into account when designing how they would work - it isn't just kindness, its compliance. But no one is actually talking about the quotes from the standard, and just quoting themselves and their own understandings.
by shakna
5/20/2026 at 12:54:53 PM
That quote doesn't have anything to do with parameter evaluation order. There is no order for function parameter evaluation.And no, there is no exception for undefined behavior. There can't be, otherwise the behavior would be... defined. It's in the name. Again, what do you think the compiler emits when the undefined behavior causes the program to not compile altogether?
by SpaceNugget
5/20/2026 at 1:12:43 PM
Are you sure?>unsequenced side effects on the same scalar object are UB
>6.5.3.3.8 tells us that the evaluations of function arguments are indeterminately sequenced w.r.t. each other.
Read 5.1.2.4.3:
"If A is not sequenced before or after B, then A and B are unsequenced."
"Evaluations A and B are indeterminately sequenced when A is sequenced either before or after B, but it is unspecified which."
With a footnote saying this:
"9)The executions of unsequenced evaluations can interleave. Indeterminately sequenced evaluations cannot interleave, but can be executed in any order."
I.e the standard makes a distinction between "unsequenced" and "indeterminately sequenced". And with no mention of side effects on "indeterminately sequenced" being UB it leads me to conclude that your example is not UB.
by rocketrascal
5/20/2026 at 10:29:01 AM
Reading a register from a microcontroller peripheral may well reset it as an example of a possible side-effect here, and that's exactly the kind of thing you use volatile for.by berti
5/20/2026 at 1:48:22 PM
> Here's a way weirder example:Well, yes; but when the C standard authors wrote like this, they surely had in mind "the reads could be in either order, therefore the output could display the polled values in either order". Not C++ nasal demons.
And yeah, being able to say "reading is a side effect" is important when for example you interact with certain memory-mapped devices.
by zahlman
5/20/2026 at 3:41:46 PM
I think C standard doesn't do itself any favors by using "undefined behavior" to signify both "anything can happen, including erasing all your data and setting your data center on fire" and "one of the very small and well defined set of things would happen, but we can not commit to which one". The latter is not exactly great, but significantly less dangerous than the former.by smsm42
5/20/2026 at 9:34:25 AM
Yes, there is a data race there. The value of a volatile can be changed by something outside the current thread. That’s what volatile means and why it exists.Edit: thread=thread of execution. I’m not making a point about thread safety within a program.
by sethev
5/20/2026 at 10:05:29 AM
Not from the standard’s point of view. The traditional (in some circles) use of volatile for atomic variables was not sanctioned by the C11/C++11 thread model; if you want an atomic, write atomic, not volatile, or be aware of your dependency on a compiler (like MSVC) that explicitly amends the language definition so as to allow cross-thread access to volatile variables.by mananaysiempre
5/20/2026 at 10:11:00 AM
Thread was a poor choice of word. Outside the control of the program is a better way to put it. Like memory mapped io.by sethev
5/20/2026 at 3:14:27 PM
It's almost universally better to use inline assembly via a macro to read/write mmio rather than use volatile.by surajrmal
5/20/2026 at 10:35:35 AM
Can also represent a register that has an effect reading it. Reading a memory mapped register can have side effects. Like memory mapped io on a UART will fetch the next byte to be read.by trissylegs
5/20/2026 at 11:38:49 AM
Was going to say the same thing until I saw this comment. volatile is defined the way I'd expect, plus it's a strange code example.by frollogaston
5/20/2026 at 10:13:20 AM
Not sure why you're being downvoted. That's completely right. The example is silly. The code is obviously bad, doesn't matter if it's UB or not.I'm also not convinced (yet) that the example really is UB: I agree reading a volatile is "a side effect" in some sense, and GP cited a paragraph that says just that. But GP doesn't clearly quote that it's a side effect on the object (or how a side effect on an object is defined). Reading an object doesn't mutate it after all.
But whatever language lawyer things, the code is obviously broken, with an obvious fix, so I'm not so interested in what its semantics should be. Here is the fix:
volatile int x;
// ...
int val = x; // volatile read
printf("%x %d\n", val, val);
by jstimpfle
5/20/2026 at 11:10:30 AM
The problem is that the function call as a whole is UB. Having the original example compile to the equivalent of volatile int x;
int a = x;
int b = x;
printf("%x %d\n", a, b);
is equally valid as volatile int x;
int a = x;
int b = x;
printf("%x %d\n", b, a);
, and neither needs to have the same output as your proposed fix.C could've specified something like "arguments are evaluated left-to-right" or "if two arguments have the same expression, the expression is [only evaluated once]/[always evaluated twice]". But it didn't, so the developer is left gingerly navigating a minefield every time they use volatile.
by crote
5/20/2026 at 11:17:47 AM
Not only is "arguments are evaluated left-to-right" less easy to formalize than you think, it would also make all C code run slower, because the compiler would no longer be able to interleave computations for more efficient pipelining. The same goes for "expression is [only evaluated once]/[always evaluated twice]".Of course the developer is navigating a minefield every time they use volatile, that's why it's called "volatile" - an English word otherwise only commonly used in chemistry, where it means "stuff that wants to go boom".
by indigo945
5/20/2026 at 1:43:50 PM
the compiler can still interleave anything it shows is side-effect free; it’s hard to show that something would benefit from being reordered without analyzing it well enough to determine what side effects it hasby remexre
5/20/2026 at 11:45:50 AM
Your argument makes no sense since the developer is expected to perform manual sequencing. Correctly written UB free code cannot be interleaved either.All you've achieved is that the standard C function call syntax can no longer be used as is.
by imtringued
5/20/2026 at 2:19:58 PM
I understand, that's why I said the code is obviously broken. The problem is not about order of evaluation. It's not about an UB arising from unsequenced volatile reads or whatever.The problem is simply that the there are two volatile reads where only one was intended. It doesn't matter if there is UB or not. The code doesn't express the intention either way. All you need to know to understand that is that volatile might be modified concurrently (a little bit similar but not the same semantics as atomics).
by jstimpfle
5/20/2026 at 10:52:34 AM
With volatile it could be changed by an interrupt service routine between reads, so it makes sense.by RobotToaster
5/20/2026 at 6:04:38 PM
Or, it could be hardware that has a "clear flag on read" type behavior.by nomel
5/20/2026 at 7:15:43 PM
What's weird about it?If you are using volatile you are reading from a device port mapped to that address.
Since C doesn't mandate in which order function arguments are evaluated, you don't know which argument will be read from port first.
How can that be anything but UB?
by drysine
5/20/2026 at 11:02:44 AM
This has got nothing to do with data races etc. but everything to do with "Sequence Points and Single Update Rule" which is well described in C language specification.See my comment here - https://news.ycombinator.com/item?id=48205760
by rramadass
5/20/2026 at 11:40:34 AM
Memory mapped IO sends a read request to a peripheral which is allowed have side effects in the background and return two different values upon a read. You can think of it as a synchronous RPC request.The lack of argument sequencing feels utterly petty however.
by imtringued
5/20/2026 at 1:58:47 PM
[dead]by netrikare