Everything in C is undefined behavior

5/20/2026 at 8:33:20 AM

Yes there is tons of surprising and weird UB in C, but this article doesn't do a great job of showcasing it. It barely scratches the surface.

Here's a way weirder example:

  volatile int x = 5;
  printf("%d in hex is 0x%x.\n", x, x);

This is totally fine if x is just an int, but the volatile makes it UB. Why? 5.1.2.4.1 says any volatile access - including just reading it - is a side effect. 6.5.1.2 says that unsequenced side effects on the same scalar object (in this case, x) are UB. 6.5.3.3.8 tells us that the evaluations of function arguments are indeterminately sequenced w.r.t. each other.

So in common parlance, a "data race" is any concurrent accesses to the same object from different threads, at least one of which is a write. In C, we can have a data race on a single thread and without any writes!

by muvlon

5/20/2026 at 10:56:21 AM

Author here.

> It barely scratches the surface.

I agree. The point of the post is not to enumerate and explain the implications of all 283 uses of the word "undefined" in the standard. Nor enumerate all the things that are undefined by omission.

The point of the post is to say it's not possible to avoid them. Or at least, no human since the invention of C in 1972 has.

And if it's not succeeded for 54 years, "try harder", or "just never make a mistake", is at least not the solution.

The (one!) exploitable flaw found by Mythos in OpenBSD was an impressive endorsement of the OpenBSD developers, and yet as the post says, I pointed it at the simplest of their code and found a heap of UB.

Now, is it exploitable that `find` also reads the uninitialized auto variable `status` (UB) from a `waitpid(&status)` before checking if `waitpid()` returned error? (not reported) I can't imagine an architecture or compiler where it would be, no.

FTA:

> The following is not an attempt at enumerating all the UB in the world. It’s merely making the case that UB is everywhere, and if nobody can do it right, how is it even fair to blame the programmer? My point is that ALL nontrivial C and C++ code has UB.

by thomashabets2

5/20/2026 at 5:30:45 PM

> Now, is it exploitable that `find` also reads the uninitialized auto variable `status` (UB) from a `waitpid(&status)` before checking if `waitpid()` returned error? (not reported) I can't imagine an architecture or compiler where it would be, no.

I presume you're referring to this code:

  pid = waitpid(pid, &status, 0);
  if (WIFEXITED(status))
    rval = WEXITSTATUS(status);
  else
    rval = -1;

The only signal handler find installs is for SIGINFO, and it uses the SA_RESTART flag, so EINTR can be ruled out. The pid argument is definitely valid as you can't reach the above if it wasn't, and there's no other way for the child process to be reaped[1], so no ECHILD.

A check should probably be added in case the situation changes in the future, triggering spooky action at a distance, or were that code to be copy+pasted somewhere where the invariants didn't hold. But I think the current code in its current context is, strictly speaking, correct as-is.

[1] OpenBSD lacks the kernel features for such surprises that might theoretically be possible on Linux.

by wahern

5/20/2026 at 9:48:46 PM

Indeed. That's why I didn't deem it worth reporting.

But in my code, I would have fixed for the reasons you mention. Sprinkle enough of these around, and some low percentage will in the future have its assumption invalidated.

by thomashabets2

5/21/2026 at 2:20:42 AM

Couldn’t waitpid return EINTR if the (parent) process were stopped and then continued?

EINTR scares the crap out of me because nobody expects it!

by BobbyTables2

5/21/2026 at 9:13:52 AM

No. You only get EINTR when a signal handler fires and you didn't use the SA_RESTART flag with sigaction. If you don't install any signal handlers, or you use SA_RESTART on all handlers, or you've blocked/masked all signals (or at least the ones with handlers), you won't get EINTR.

When writing library code, it's important to consider EINTR because you can't know about signal dispositions. Though, the common practice of looping on EINTR kind of defeats the purpose.

by wahern

5/20/2026 at 11:07:30 AM

Fair enough!

> And if it's not succeeded for 54 years, "try harder", or "just never make a mistake", is at least not the solution.

And I 100% agree. UB is way overused by these standards for how dangerous it is, and as a consequence using C (and C++) for anything nontrivial amounts to navigating a minefield.

by muvlon

5/20/2026 at 4:02:43 PM

I think as compilers got smarter, UB changed somewhat in meaning. Originally the compilers didn't perform such complex analysis, and while invoking UB could break your program, it would still do something reasonable.

by webstrand

5/20/2026 at 4:42:02 PM

Yes, but compilers got smart enough for it to be a problem around 30 years ago, and we are still arguing about what to do.

by marcosdumay

5/20/2026 at 5:34:37 PM

You see a reasoning here, basically when all those C compiler benchmarks started, vendors moved from what Frank Allen described, to anything goes to win SPEC something benchmarks.

"Oh, it was quite a while ago. I kind of stopped when C came out. That was a big blow. We were making so much good progress on optimizations and transformations. We were getting rid of just one nice problem after another. When C came out, at one of the SIGPLAN compiler conferences, there was a debate between Steve Johnson from Bell Labs, who was supporting C, and one of our people, Bill Harrison, who was working on a project that I had at that time supporting automatic optimization...The nubbin of the debate was Steve's defense of not having to build optimizers anymore because the programmer would take care of it. That it was really a programmer's issue.... Seibel: Do you think C is a reasonable language if they had restricted its use to operating-system kernels? Allen: Oh, yeah. That would have been fine. And, in fact, you need to have something like that, something where experts can really fine-tune without big bottlenecks because those are key problems to solve. By 1960, we had a long list of amazing languages: Lisp, APL, Fortran, COBOL, Algol 60. These are higher-level than C. We have seriously regressed, since C developed. C has destroyed our ability to advance the state of the art in automatic optimization, automatic parallelization, automatic mapping of a high-level language to the machine. This is one of the reasons compilers are ... basically not taught much anymore in the colleges and universities."

-- Fran Allen interview, Excerpted from: Peter Seibel. Coders at Work: Reflections on the Craft of Programming

by pjmlp

5/20/2026 at 11:16:22 AM

What should the behavior above be defined to do?

by saagarjha

5/20/2026 at 12:31:40 PM

“Implementation defined behaviour”: compiler author chooses, and documents the choice.

A lot of UB should be implementation defined behaviour instead; this would much better match programmers’ intuitions as they reason about their code - you can even see it in the comments of this post: it’s always things like “this hardware supports / doesn’t support unaligned accesses”, it’s never nasal demons.

by tsukikage

5/20/2026 at 3:02:54 PM

I told someone at a conference that UB actually means "implementation-defined, no documentation required". He started to refute me and then stopped.

by tardedmeme

5/20/2026 at 5:05:29 PM

That isn't true, for UB the compiler is allowed to assume the UB can never happen. For example if you dereference a pointer and only after check if it is NULL, the compiler can remove the NULL check, since it is clearly impossible (nevermind that you might be on a microcontroller where NULL is a valid address).

The fallout of this are quite large! If behaviour is implementation defined the compiler has to stick to one consistent behaviour. No such need for UB, you can get different behaviour bu changing unrelated code, by changing between debug and release or just because of what garbage happened to be on the stack.

Since the compiler is allowed to assume the UB doesn't happen it will also sometimes look like the compiler miscompiled your code elsewhere, but what actually happened was some inlining followed by extrapolating "this can never happen".

UB is often surprising: I have seen unaligned loads crash on x86 due to it bring UB in C (even though x86 is generally fine with it). But once a newer compiler decided that it was fine to vectorise that code (since it clearly aligned) the CPU was no longer happy with it.

by VorpalWay

5/20/2026 at 9:52:56 PM

I think parent commenter made a joke. UB can be seen as "implementation defines this to reformat your hard drive. No we don't document it".

That is, the compiler de facto defines what happens when you compile UB code.

So you're not wrong, but I think you missed the sarcastic spin of parent.

by thomashabets2

5/21/2026 at 4:38:48 AM

>That is, the compiler de facto defines what happens when you compile UB code.

That is not what undefined behavior is though, that is unspecified behavior.

The entire point of undefined behavior is to cover the cases where the compiler can't define the semantics of your program either because doing so is genuinely not possible, or is incredibly onerous to deduce, or would require introducing runtime checks whose performance cost is at odds with C and C++'s predominant use cases.

by Maxatar

5/21/2026 at 7:31:06 AM

Sorry, by "de facto defines" I meant that it factually does something, even if that "something" is "segfault the compiler at build time".

That "de facto" did some heavy lifting.

by thomashabets2

5/20/2026 at 4:43:24 PM

Except that UB doesn't mean that. UB means "the developer must never write this".

by marcosdumay

5/20/2026 at 5:25:18 PM

Both are wrong. It means "this standard does not constrain the behaviour of code that does this".

It's entirely legal for implementations to have predictable behaviour, documented or not, for code that is undefined by the standard. In their quest for maxxing benchmark performance they generally choose not to, but there's really nothing in any standard that stops you from making an implementation that prioritises safety.

by munch117

5/20/2026 at 5:51:43 PM

Every implementation so far has predictable behavior in all cases. Sometimes the rules for predicting it are very obscure. But it's all fully defined within the compiler's binary code. And none of them link to nasal portals.

by tardedmeme

5/20/2026 at 6:25:20 PM

How do you propose to predict the behavior of a true race condition with only the binary, faithfully translated by the compiler?

Moreover, this is at best an incredibly pedantic point, not something that changes how programmers need to approach UB. You can't review the source code of a compiler that hasn't been written yet.

by AlotOfReading

5/21/2026 at 6:38:33 AM

I didn't suggest that implementations should entirely eliminate every form of UB. There is plenty of middle ground. For example, you could easily limit the consequences of integer overflow by specifying or partially specifying overflow behaviour, with very little runtime cost.

I'm not suggesting you change how you write code, but with a better implementation the code that you do write - that lives in the real world where mistakes are made - might work better. How is that being pedantic?

An interesting case where compiler writers did something like that is casting via union members, but I'm running out of time, so we can talk about that another day.

by munch117

5/21/2026 at 7:14:45 AM

It's fully defined by your CPU's silicon masks and your compiler's binary code that one of several things will happen.

by tardedmeme

5/21/2026 at 7:53:34 AM

Turns out that when you're implementing network applications, the set of things that could happen also depends on what the script kiddie on the other side of the globe feels like this morning.

Some would prefer less excitement than this.

C code should be more predictable and easier to reason about than using a macro assembler. To the extent it is not, the language has failed.

by tsukikage

5/20/2026 at 12:00:06 PM

Print x twice. Not all “side effects” care about order.

Better yet, define an order for parameter evaluation.

by Filligree

5/21/2026 at 12:08:57 PM

There is an easy way to take control: read the volatile variable only once.

  volatile int x = 5;
  ...
  int y=x;
  printf("%d in hex is 0x%x.\n", y, y);

by HelloNurse

5/20/2026 at 12:43:58 PM

You're missing the point. Volatile forces two loads of a value that may have changed in the middle. So the value of "x" may depend on the time/order of load.

by poppadom1982

5/20/2026 at 1:17:59 PM

Which is, if I understand correctly, the entire point of volatile. Don't use it if you don't want that behavior.

And in fact, in the example given, if there is something (another thread or whatever) that can change the value of x, then you don't know what either number will be. Well, in that circumstance, without volatile, it may print the same number both times, but you still don't know what the number will be (unless the read gets optimized away entirely).

by AnimalMuppet

5/20/2026 at 1:42:43 PM

If that behavior is the entire point, then I think the bigger point is that the spec should reflect that and not call it undefined.

by chuckadams

5/20/2026 at 2:00:51 PM

I suspect that many undefined behaviors reflect the inability of the standard committee to come to a consensus on the nuances involved. “Punt to the implementers” is a way to allow every tool vendor to select their own expected behavior in those cases.

by voakbasda

5/20/2026 at 3:14:42 PM

You seem to be operating under the assumption "undefined behavior" means "the compiler authors can decide what to do." That's not what it means. It means "any program that causes this behavior to be triggered is not a valid C program, the programmer knows this and did not submit an invalid program, and the programmer explicitly prevented this from happening elsewhere in ways automated analysis cannot detect. Proceed with compilation knowing this branch is impossible."

The spelling for compiler authors getting to choose a behavior is "implementation defined", as the other comment mentions.

by chowells

5/21/2026 at 7:15:40 AM

It means the C standard does not specify what the program does. Other documents may still specify what the program does. And the program definitely still does something, whether specified or not.

by tardedmeme

5/21/2026 at 4:45:33 PM

> And the program definitely still does something, whether specified or not.

No. It most definitely does not mean this. Go read the series this is part of: https://blog.llvm.org/2011/05/what-every-c-programmer-should...

It is absolutely critical that people programming in C understand what real compilers in the real world do.

by chowells

5/20/2026 at 2:36:07 PM

Then it should be "implementation defined" rather than "undefined".

by MarkusQ

5/20/2026 at 1:11:32 PM

Why is that missing the point? Loading it twice, possibly with different values, is the intended behavior. It's only undefined because the C spec doesn't specify the order of the loads (unlike most other languages which have a perfectly well-defined order for side effects in a single expression).

by hmry

5/20/2026 at 2:48:28 PM

What you are describing is implementation defined behavior. Using that is perfectly safe and reasonable. Undefined means this programs is malformed.

by rowanG077

5/20/2026 at 5:24:09 PM

No I'm just repeating what the original comment said, which is that it's explicitly UB:

"5.1.2.4.1 says any volatile access - including just reading it - is a side effect. 6.5.1.2 says that unsequenced side effects on the same scalar object (in this case, x) are UB. 6.5.3.3.8 tells us that the evaluations of function arguments are indeterminately sequenced w.r.t. each other."

If function arguments were sequenced with respect to each other, it wouldn't be a problem.

But actually, maybe the original comment is wrong. Presumably "indeterminately sequenced" and "unsequenced" mean different things, although I don't have a copy of the standard at hand to check.

by hmry

5/20/2026 at 11:57:22 AM

Couldn’t you just define that function arguments are evaluated left to right?

Or just throw an error.

by echoangle

5/20/2026 at 3:50:23 PM

Why? Just for this edge case? It could be faster and/or allow smaller code size to allow this to be undefined.

Undefined is also different from "depends on the compiler", because which behavior is chosen can even depend on the circumstances, whatever code appears before and/or after it.

That said, UB in code, such as this example of ordering of reads of volatile parameters being undefined, does not automatically mean that code that uses it is bad. It may very well be that the function being called doesn't misbehave either way.

by jfoks

5/20/2026 at 5:32:55 PM

That’s the point of the whole article. It’s not worth the speed gain to have a language that nobody can safely use because you can’t really prevent UB when you write it.

> It may very well be that the function being called doesn't misbehave either way.

The function being good or bad has nothing to do with the UB. The UB occurs before the function is called.

by echoangle

5/20/2026 at 12:18:55 PM

I meant reading the uninitialized variable

by saagarjha

5/20/2026 at 2:06:29 PM

There is no uninitialized variable, I explicitly initialized it to 5.

And yes indeed, C could do what Rust does and define the order of evaluation for function arguments.

If the argument expressions are indeed side-effect-less, the compiler can always make use of the "as-if" rule and legally reorder the computation anyway, for example to alleviate register pressure.

by muvlon

5/20/2026 at 11:37:15 AM

HCF

by lll-o-lll

5/20/2026 at 11:52:43 AM

I have good news about what UB allows

by saagarjha

5/20/2026 at 12:42:45 PM

What is that?

by JadeNB

5/20/2026 at 3:27:29 PM

A fictitious assembly instruction (and pretty good TV series).

https://en.wikipedia.org/wiki/Halt_and_Catch_Fire_(computing...

by FabHK

5/20/2026 at 5:27:58 PM

Halt and Catch Fire

by SAI_Peregrinus

5/20/2026 at 11:40:46 AM

Compilation error

by jeffffff

5/20/2026 at 11:53:28 AM

It’s hard to detect all UB at compile time

by saagarjha

5/20/2026 at 12:17:17 PM

It’s harder depending on the language, which is clearly the point.

by Demiurge

5/20/2026 at 2:02:45 PM

[flagged]

by stellamariesays

5/21/2026 at 2:52:31 AM

> Or at least, no human since the invention of C in 1972 has.

No human without proper tools maybe, but what about seL4? It goes beyond proving the code is UB-free and actually formally verifies the code works as intended. And the code is written in C. (the proofs of course aren't)

The proof is interesting because it goes beyond just proving the C code is correct. For some platforms, they compile the code with an ordinary compiler, and verify that the machine code does what the C code is supposed to do. (that's because just writing correct C code doesn't help you if you trigger a compiler bug)

This works even if the compiler (in this case, GCC) isn't verified - they verify a specific output of the compiler, not that the compiler always generates machine code correctly.

by nextaccountic

5/20/2026 at 1:14:11 PM

> The point of the post is to say it's not possible to avoid them. Or at least, no human since the invention of C in 1972 has.

What are you talking about? UB was coined only in the first C standard, in 1989. Prior to that there was no "If you do this, anything can happen". It was "If you do this, that will happen".

by lelanthran

5/21/2026 at 6:52:32 AM

> UB was coined only in the first C standard, in 1989

Pre 1989, when C did not have a standard, was the behavior unspecified or undefined? That is, of course, a trick question. Because in this context the very definitions of the words come from the standard itself.

Before a language gets a specification, is the de facto specification the five words "you know what I mean"?

The very definition of "UB" in C later became "[…] this document imposes no requirements". Is that not the same thing as "there is to specification (yet)"?

It sounds very zen, but "a non existing specification imposes no requirements".

But I don't think it's meaningful to argue the semantic difference before the (in-context) existence of the words "undefined" vs "unspecified".

> Prior to that there was no "If you do this, anything can happen".

Of course it was. You relied on "common sense".

> It was "If you do this, that will happen".

Haha, of course it wasn't. Before a specification there is neither a definition of "this" nor "that".

Unless you mean ye olde "the compiler implementation is the specification". In which case we'll get dragged into "what even is a language" and "what is the sound of one hand clapping?".

Or, alternatively, it's as true then as it is today. If you go by "GCC x.y.z on platform Z kernel Y, (etc…) is the specification" then there is no UB.

by thomashabets2

5/20/2026 at 2:12:15 PM

More like, "if you do this, what happens depends on your particular combination of hardware, operating system, and compiler. Don't ask us."

by professoretc

5/20/2026 at 3:01:14 PM

No, that would be implementation defined.

by nickez

5/20/2026 at 5:42:15 PM

The post I was replying to said,

> UB was coined only in the first C standard, in 1989. Prior to that there was no "If you do this, anything can happen".

I.e., the context is, before UB existed as a concept, how would these things be categorized. And I was trying to offer the correction that, before UB existed, it wasn't "all behavior is defined" but rather many behaviors depend on your particular local environment. While that may technically be implementation defined, the current standard requires that implementation defined be documented, and UB-like edge cases were most definitely not documented anywhere consistently in the old days!

by professoretc

5/20/2026 at 5:12:29 PM

No, that's actually UB. The important bit here is "compiler defined" -- UB means the compiler is allowed to assume it never happens while compiling.

Consider, for example, an implementation defined function f() -- which can also diverge/crash horribly, etc.

If I write

    if p {
      print("p is true")
    } else {
      g()
    }

    if p {
      f()
    }

Then either we: - print p is true and execute f - do nothing

This is true regardless of if f immediately crashes the computer, nasal demons, whatever -- that's implementation defined.

UB means f may never happen.

And that means the compiler may optimize this to just:

g()

Notice the difference here -- the print never happens!, and g always happens.

You can see why this is concerning when you write code like

    if dry_run {
      print("would run rm -rf /")
    } else {
      run("rm -rf /")
    }

    if dry_run {
      // oops: some_debug_string is NULL and will segfault!
      print(some_debug_string);
    }

by tekne

5/20/2026 at 6:05:44 PM

I see what you're going for, but I don't see how your example is UB. If `p` is a pointer, and, after your `if (p)` check, `p` is dereferenced unconditionally, then yes, your check for `p == NULL` could be removed, and the code under the `if` would be removed as well. But the example you've constructed is not UB.

by tyg13

5/21/2026 at 7:06:54 AM

You misunderstood their example, I think.

If doesn't matter what 'p' is in their example. The point is: if 'f' is undefined behavior (rather than just impl-defined), then the optimizer concludes that the "if p { f() }" can never happen... which means that we're allowed to assume that 'if p { ... } else { ... }' (in the first part of the example) will always take the else branch. The compiler will optimize accordingly and just always call g() unconditionally.

by Quekid5

5/20/2026 at 6:43:27 PM

> if nobody can do it right, how is it even fair to blame the programmer? My point is that ALL

It's fair to blame the programmer for the choice of programming in a language like this, if it was in fact their choice. As you've so eloquently put, choosing those languages is essentially equivalent to choosing UB, so starting a new project with one of them is 100% blameworthy when the UB is inevitably found.

by saghm

5/21/2026 at 7:29:20 AM

Not all projects are green field. But sure, new modules can be written in other languages. And C is, as cross-language barriers go, fairly easy to interface with.

by thomashabets2

5/20/2026 at 10:53:10 AM

Volatile is a type system hack. They should have done a more principled fix, and certainly modern languages should not act as though "C did it" makes it a good idea.

The reason for the hack is that very early C compilers just always spill, so you can write MMIO driver code by setting a pointer to point at the MMIO hardware and it actually works because every time you change x the CPU instruction performs a memory write.

Once C compilers got some basic optimisations that obvious "clever" trick stops working because the compiler can see that we're just modifying x over, and over and over, and so it doesn't spill x from a register and the driver doesn't work properly. C's "volatile" keyword is a hack saying "OK compiler, forget that optimisation" which was presumably a few minutes work to implement, whereas the correct fix, providing MMIO intrinsics in the associated library, was a lot of work.

Why should you want intrinsics here? Intrinsics let you actually spell out what's possible and what isn't. On some targets we can actually do a 1-byte 2-byte and 4-byte write, those are distinct operations and the hardware knows, so e.g. maybe some device expects a 4-byte RGBA write and so if you emit four 1-byte writes that's very confusing and maybe it doesn't work, don't do that. On some targets bit-level writes are available, you can say OK, MMIO write to bit 4 of address 0x1234 and it will write a single bit. If you only have volatile there's no way to know what happens or what it means.

by tialaramex

5/20/2026 at 1:39:12 PM

I agree that marking the read/write as special rather than the variable itself would be nice, although it would also be nice if C/C++ was more consistent in the way things like this are done. Maybe given std::atomic and std::mutex as template/library features, supported by compiler intrinsics, it would be nice to have "volatile" supported in a similar way.

As a nit pick, I don't think this is correct use of "spill". Register spilling refers to when a compiler's code generator runs out of registers and needs to store variables in memory instead. In the MMIO case you are reading/writing via a pointer, so this is unrelated to registers and spilling behavior.

by HarHarVeryFunny

5/20/2026 at 2:22:32 PM

That's fair that "spill" probably isn't quite the right word.

by tialaramex

5/20/2026 at 3:34:09 PM

By MMIO semantics do you mean explicit load and store instructions? I’ve never felt that pointer reads or writes were lacking descriptiveness here. I would argue the only surprising thing is that they might be optimized out (which is what volatile prevents).

Volatile on a non pointer value is not for MMIO, though, that’s typically for concurrency like with interrupts.

by MobiusHorizons

5/20/2026 at 6:38:50 PM

> I’ve never felt that pointer reads or writes were lacking descriptiveness here. I would argue the only surprising thing is that they might be optimized out

The C and C++ languages would be very slow by modern standards if you insist that reading or writing via a pointer must result in immediate fetches or stores to memory.

> Volatile on a non pointer value is not for MMIO, though, that’s typically for concurrency like with interrupts.

You're holding it wrong. Perhaps you've been holding it wrong for so long and so confidently that you've distorted the world around you -- indeed on MSVC on x86 or x86-64 that actually happened -- but, you're still holding it wrong.

by tialaramex

5/20/2026 at 7:51:39 PM

> You're holding it wrong. Perhaps you've been holding it wrong for so long and so confidently that you've distorted the world around you -- indeed on MSVC on x86 or x86-64 that actually happened -- but, you're still holding it wrong.

Please explain. How would you make the variable backed by a hardware register region? Is this using some sort of linker trick to change where the value lives in memory?

by MobiusHorizons

5/20/2026 at 10:32:01 PM

You said it was for concurrency. The feature you want for that in C (and most languages suitable for this problem) is atomic memory ordering, not the volatile type qualifier.

Microsoft's platform was x86 only for years, and because Intel's design pays for a lot more memory ordering by default than most, on Microsoft's platforms just "volatile" would kinda work even though it was the wrong thing, so Microsoft explicitly grandfathered this for x86 and x86-64 only, you are guaranteed the Acquire-Release ordering even though you didn't ask for it with your volatile type qualifier.

If you were actually thinking of POSIX signals or something similar then yeah, the POSIX requirements say volatile will work, seems like a bad idea to me, but your compiler and other tools are likely also built for POSIX so they've read the same documentation.

by tialaramex

5/20/2026 at 11:22:36 AM

Yeah, it's also cleaner to be able to mark particular reads and writes as having side effects as opposed to having it be a property of the variable.

by rcxdude

5/20/2026 at 3:01:36 PM

Thr Linux kernel uses READ_ONCE and WEITE_ONCE which look like actual function calls which is very sensible.

by tardedmeme

5/20/2026 at 11:18:20 AM

> The reason for the hack is that very early C compilers just always spill, so you can write MMIO driver code by setting a pointer to point at the MMIO hardware and it actually works because every time you change x the CPU instruction performs a memory write.

Source?

by saagarjha

5/20/2026 at 2:42:22 PM

This is one of those "everyone doing this kind of work knows" that's rather hard to source, but: this is basically the point of volatile. Especially for reads rather than writes, where you may want to read some location that is being written into by a different piece of hardware.

People used to use it for thread synchronization before proper memory barrier primitives (see https://mariadb.org/wp-content/uploads/2017/11/2017-11-Memor... ) were available. It was not entirely reliable for this purpose.

by pjc50

5/20/2026 at 6:54:16 PM

Yeah. I could have sworn that I've read somewhere an anecdote from the Bell Labs era in which this comes up, but I can't find it and might be misremembering. The whole volatile keyword doesn't exist in K&R C as released, there are no "type qualifiers" at all in that language, both volatile and const are introduced in C89.

Duff's famous Device, often misunderstood as some insight about memory copying or something silly, was an MMIO hack, it doesn't look like an MMIO hack to us because it doesn't say volatile, but that's because Duff's compiler did not have that keyword, the reason Duff doesn't change the destination pointer is that it's pointing at hardware and the hardware isn't going anywhere, writing different bytes to the same address is I/O.

by tialaramex

5/21/2026 at 11:35:49 AM

No idea about volatile, but I do remember function prototypes and const came as influence from C++, well CFront.

by pjmlp

5/20/2026 at 12:41:15 PM

Source for what? The volatile keyword is explicitly telling the compiler "don't optimize read/write to this memory location". That's the whole point. Its use for manipulating hardware registers is covered in any intro embedded systems course. I don't know the history of C compilers but it would seem reasonable to assume that compilers started out plainly translating the C to machine code. Optimization would have happened later as the compilers became more mature.

https://www.gnu.org/software/c-intro-and-ref/manual/html_nod...

by skillina

5/21/2026 at 7:35:29 AM

Source for "compilers basically always did volatile since everything was always spilled".

by saagarjha

5/20/2026 at 12:28:59 PM

> In C, we can have a data race on a single thread and without any writes!

You need to distinguish between a UB and a race, and I think that's something that discussions of UB miss. Take any C program and compile it. Then disassemble it. You end up with an Assembly program that doesn't have any UB, because Assembly doesn't have UB.

UB is a property of a source program, not the executable. It means that the spec for the language in which the source is written doesn't assign it any meaning. But the executable that's the result of compiling the program does have a meaning assigned to it by the machine's spec, as machine code doesn't have UB.

A race is a property of the behaviour of a program. So it's true to say that your C program has UB, but the executable won't actually have a race. Of course, a C compiler can compile a program with UB in any way it likes so it's possible it will introduce a race, but if it chooses to compile the program in a way that doesn't introduces another thread, then there won't be a race.

by pron

5/20/2026 at 1:11:41 PM

> because Assembly doesn't have UB

To be pedantic, old hardware like 6502 family chips (Commodore 64, Apple II, etc) had illegal instructions which were often used by programmers, but it was completely up to the chip to do whatever it wanted with those like with UB.

by redox99

5/20/2026 at 1:56:38 PM

> illegal instructions... were often used by programmers

Intentionally, with an expected effect? I'd need a citation for that.

by zahlman

5/20/2026 at 9:09:16 PM

Yes, many of those are perfectly stable. For example, the 6502 has an undocumented instruction commonly known as "LAX" which loads both the A and X registers at the same time in a predictable manner in most addressing modes, in the same time and space it would otherwise take to load either of those registers on their own.

The benefits of being able to do stuff like this when you need to conserve resources are obvious, and common idioms have formed around their use. Check out https://csdb.dk/release/?id=198357

by boomlinde

5/20/2026 at 3:26:03 PM

Some desultory googling turned up:

* https://www.nesdev.org/wiki/CPU_unofficial_opcodes#Games_usi...

* https://hitmen.c02.at/files/docs/c64/NoMoreSecrets-NMOS6510U... (doesn't name any software, but some copy protection schemes were already known to use them)

by chuckadams

5/21/2026 at 8:59:34 AM

Some instructions were very useful and they were simply discovered by programmers who tried out what each instruction did. People did not necessarily have access to documentation those days!

So any instruction or hardware feature would get used, whether it's "officially" documented or not.

by vardump

5/21/2026 at 2:03:22 AM

> You end up with an Assembly program that doesn't have any UB, because Assembly doesn't have UB.

I guess that's true if you think of assembly as a more readable form of machine code, but from a practical sense I'd argue that assembly inherits the undefined behaviors of the architecture it represents and the implementations of that architecture it actually builds for.

IIRC the OG Xbox security was broken partially as a result of undefined behaviors in x86 where the AMD CPUs that were used in early development would crash or throw an error or something when execution reached the end of the memory space but the Intel CPU they switched to instead just rolled over and kept executing from 0.

by wolrah

5/20/2026 at 4:37:07 PM

I specifically said data race, which is a known term of art and a type of language-level UB. It is separate from the races you're thinking about. Just like signed integer overflow or use-after-free, the compiler is allowed to assume data races never happen.

by muvlon

5/20/2026 at 1:53:58 PM

The problem is that in the quest to win benchmark games, compilers started to take advantage of UB for all kinds of possible optimizations, which is almost as deterministic as LLM generated code, across compiler version updates.

by pjmlp

5/20/2026 at 2:35:50 PM

Soooo… Pay attention to updates changelog?

by skydhash

5/20/2026 at 5:13:35 PM

This isn't an answer. UB is not only code dependent, but in many cases value-dependent as well. Changing anything about a program has the potential to cause UB anywhere in the code graph affected. So even the smallest possible change requires you to fully understand that entire graph, as well as the entire compiler history and how it interacts with your program. Remember, UB isn't diagnostic and runtime sanitizers don't catch everything, nor does exhaustive testing and static analysis.

by AlotOfReading

5/20/2026 at 2:36:48 PM

If only those changes were all listed there...

by pjmlp

5/20/2026 at 12:14:35 PM

> In C, we can have a data race on a single thread and without any writes!

Well, sure, that's what volatile means - that the value may be changed by something else. If it's a global variable then the something else might be an interrupt or signal handler, not just another thread. If it's a pointer to something (i.e. read from a specific address) then that could be a hardware device register who's value is changing.

The concept of a volatile variable isn't the problem - any language that is going to support writing interrupt routines and memory mapped I/O needs to have some way of telling the compiler "don't optimize this out" since reading from the same hardware device register twice isn't like reading from the same memory location twice.

I think the problem here is more that not all of the interactions between language features and restrictions have been fully thought out. It's pretty stupid to be able to explicity tell the language "this value can change at any time", and for it to still consider certain uses of that value as UB since it can change at any time! There should have been a carve out in the "unsequenced side effect" definitions for volatile variables.

by HarHarVeryFunny

5/20/2026 at 12:34:24 PM

> There should have been a carve out in the "unsequenced side effect" definitions for volatile variables.

As noted, there’s almost 300 usages of the word undefined in the standard. Believing that it’s possible to correctly define all the carve outs necessary correctly and have the compiler implement the carve outs successfully is about as logical as believing UB is humanly avoidable in written code.

by vlovich123

5/20/2026 at 8:39:38 AM

I think the article's point is that you don't actually have to get weird at all to run into UB.

Lots of people mistakenly think that C and C++ are "really flexible" because they let you do "what you want". The truth of the matter is that almost every fancy, powerful thing you think you can do is an absolute minefield of UB.

by simonask

5/20/2026 at 10:29:30 AM

My go-to example of "UB is everywhere" is this one:

    int increment(int x) {
        return x + 1;
    }

Which is UB for certain values of x.

by kzrdude

5/20/2026 at 11:03:41 AM

C23 removed the whole stuff about indeterminate value and trap representation. Underflow/overflow being silent or not is implementation defined.

by CodeArtisan

5/20/2026 at 11:19:08 AM

Signed overflow is just undefined.

by saagarjha

5/21/2026 at 6:49:37 AM

TBF that is the same as saying "signed overflow is UB".

by jstimpfle

5/21/2026 at 6:57:03 AM

yes but it is a 'picture' that makes you think about it in a different way.

by kzrdude

5/20/2026 at 6:41:07 PM

I've long said that the value a programming language offers is as much about what it doesn't allow as what it does allow. Efficiency aside, most useful programs could be written in most languages, but there are an infinite number of programs you could write that aren't particularly useful. Ruling out the programs you might accidentally write that resemble the one you intended is a pretty useful feature of a language, and it's a metric that C and C++ rate quite poorly on IMO.

by saghm

5/20/2026 at 10:20:57 AM

I would agree that C is "really flexible", but I would say it's primarily flexible because it lets you cast say from a void pointer to a typed pointer without requiring much boilerplate. It's also flexible because it lets you control memory layout and resource management patterns quite closely.

If you want to be standards correct, yes you have to know the standard well. True. And you can always slip, and learn another gotcha. Also true. But it's still extremely flexible.

by jstimpfle

5/20/2026 at 10:58:42 AM

The problem is that a lot of the flexibility introduced by UB doesn't serve the developer.

Take signed integer overflow, for example. Making it UB might've made sense in the 1970s when PDP-1 owners would've started a fight over having to do an expensive check on every single addition. But it's 2026 now. Everyone settled on two's complement, and with speculative execution the check is basically free anyways. Leaving it UB serves no practical purpose, other than letting the compiler developer skip having to add a check for obscure weird legacy architectures. Literally all it does is serve as a footgun allowing over-eager optimizations to blow up your program.

Although often a source of bugs, C's low-level memory management is indeed a great source of flexibility with lots of useful applications. It's all the other weird little UB things which are the problem. As the article title already states: writing C means you are constantly making use of UB without even realizing it - and that's a problem.

by crote

5/20/2026 at 11:22:11 AM

If we're talking two's complement it's not undefined that is right. Having to emit checks though, that is where I beg to differ. A check is only useful if you want to actually change the behavior when it happens, otherwise it is useless. Furthermore, it might be "essentially free" from a branch prediction point, but low and behold caches exist. You would pollute both the instruction cache with those instructions _and_ the branch prediction cache. From this it doesn't follow at all, that there is no cost.

In the end small things do add up, and if you're adding many little things "because it doesn't cost much nowadays" you will end up with slow software and not have one specific bottleneck to look at. I do agree that having the option for checked operations is nice (see C#), but I have needed this behavior (branching on overflow) exactly once so far.

by ablob

5/20/2026 at 12:34:45 PM

> A check is only useful if you want to actually change the behavior when it happens, otherwise it is useless.

You almost always want to change the behavior to erroring out on overflow. The few cases where overflow really is intended and fine can be handled by explicit opt-out.

And I refuse to buy the argument that "small things add up" in the world where we do string building and parsing every few microseconds. Checked math will have unnoticable impact compared to all the other things we do, in almost every type of program.

by Xirdus

5/20/2026 at 1:51:46 PM

This string manipulation stuff is very common, and that's why in 2026, an age where science fiction has become a reality, many things are still absurdly slow. Exactly because of such sloppiness, which does accumulate in many cases, and when one least expected it.

by jstimpfle

5/20/2026 at 9:06:51 PM

100% agreed on the sloppiness. But overflow checking is not sloppiness. It's the opposite of sloppiness. Unchecked math is sloppiness, allowing overflows to happen silently and uncontrollably is sloppiness. It just so happens this kind of sloppiness makes code faster, unlike other kinds of sloppines that make code slower. Not doing necessary safety checks is faster than doing these necessary checks, but it doesn't make these checks any less necessary. Not validating user input also makes code faster, and is also sloppy.

by Xirdus

5/20/2026 at 11:20:14 AM

Signed overflow checks are typically not free unfortunately they have a cost of about 5% or thereabouts

by saagarjha

5/20/2026 at 12:50:50 PM

In hot paths it can be even more. This is why even Rust defines it as wrapping but elides the overflow panic in release builds.

by vlovich123

5/20/2026 at 1:23:06 PM

It is defined as an error. That error’s default handling is wrapping when debug_assertions is off, and panic when it’s on, but since it’s an incorrect program (though not UB) either behavior is acceptable in any mode.

by steveklabnik

5/20/2026 at 1:53:47 PM

If it is defined as an error, but the compiled build will continue to run with the value wrapped around, I would say that's indistinguishable from UB.

by jstimpfle

5/20/2026 at 3:19:54 PM

No. An integer getting deterministically set to an unintended value is a bug. A bug is not the same thing as UB. (Even if it were non-deterministic, it would still not be anything like UB.) It's not the same ballpark, not even the same sport.

by 12_throw_away

5/20/2026 at 9:55:28 PM

What if the wrapped index is used to construct an invalid pointer? It might be possible, not sure. What if the integer is used to read the wrong data from disk, or corrupt data on disk by writing to the wrong location?

by jstimpfle

5/20/2026 at 10:41:48 PM

> What if the wrapped index is used to construct an invalid pointer?

Constructing an invalid pointer in rust is UB, yes, but integer wraparound is not.

> What if the integer is used to read the wrong data to a disk, or corrupt data on disk by writing to the wrong location?

Then it is a very bad bug.

> What if the program controls a nuclear power plant and the integer causes the control system to fail, causing memory errors due to radiation from the meltdown?

Then it is a very very bad bug.

> What if the wrapped integer causes the program to output the true name of god, and the programmer, in their last minutes of existence, looks up to see, overhead, without any fuss, the stars going out?

Ok, you got me, this one is UB.

by 12_throw_away

5/21/2026 at 12:49:17 AM

> Constructing an invalid pointer in rust is UB

no, it is dereferencing, not constructing, an invalid pointer, that is UB. there is even a safe function provided to construct an invalid but non-null pointer: `https://doc.rust-lang.org/stable/std/ptr/fn.dangling.html`

by kobebrookskC3

5/21/2026 at 1:42:53 AM

> What if the wrapped index is used to construct an invalid pointer?

Using that pointer would be UB, but that is UB, not the addition.

> What if the integer is used to read the wrong data from disk, or corrupt data on disk by writing to the wrong location?

That is a bug, but it is not undefined behavior.

by steveklabnik

5/20/2026 at 5:20:33 PM

It's indistinguishable from unspecified behavior, not from undefined behavior. Unspecified behavior has to pick from a finite list of allowed behaviors. Undefined behavior can do anything.

by SAI_Peregrinus

5/20/2026 at 9:58:06 PM

A program with corrupted state can essentially do anything. Yes it's still a question of run-time checks the runtime has to protect against it. But the compiler is probably deriving a lot of assumptions from the assumption that there wasn't overflow.

by jstimpfle

5/21/2026 at 1:40:54 AM

“Undefined behavior” is a term of art in programming languages that means something more specific than “the program may do something odd.”

The compiler is not allowed to derive any assumptions from it. It only could if it were UB.

by steveklabnik

5/21/2026 at 5:51:04 AM

But did the rust compiler assume that the integer would not overflow? It did so in Debug mode where runtime checks were added. If it's not the case in Release mode, does that mean semantics are different between Debug and Release?

by jstimpfle

5/21/2026 at 3:53:01 PM

> But did the rust compiler assume that the integer would not overflow?

It did not.

> It did so in Debug mode where runtime checks were added.

It didn't assume in that case either. It did a well defined thing: add checks.

> If it's not the case in Release mode, does that mean semantics are different between Debug and Release?

Strictly speaking, the language doesn't know about "release mode", as that's a Cargo thing. But yes, in practice, the semantics are different based on various things: it could be debug vs release, it could also be flags that control the behavior. But that's still distinct from "undefined behavior" as a concept. The behavior is well defined, with multiple possible options for behaviors.

by steveklabnik

5/21/2026 at 8:24:09 PM

So in Rust, you are actually specificing TWO programs with a single source? Those Rust users are surely too clever for my liking!

You can tune a C compiler as well to have a very specific defined behaviour for integer overflow. You can add -fwrapv or you can add UBSAN.

The user never intended overflow to happen, because if they did, they could have used something like __builtin_mul_overflow() or whatever. Or they are an emotionally unstable user with destructive tendencies. The user also never intended the program to abort with a (nicely formatted) error message, unless they are a very very sad depressed nihilistic user who also never runs their program in Release mode.

To say that overflow would be defined in Rust is at least half a lie. We could agree that cargo has a choice of diagnostic policy though, a policy how to handle what is essentially a state with no defined or useful path forward, or in other words, UB.

Throwing errors might be a wanted property to detect oversights. C ecosystem has UBSAN too! But essentially the same is still true: Basic arithmetic operations are not closed over the numbers 0..2^N. Rust doesn't have a (unique and useful) definition for those operations for a subset of numbers. Even if you claim the operations are defined (say wrapping arithmetic in Release mode), it's not what the programmer wants. Probably the majority of algorithms work over natural numbers or integer numbers. These algorithms don't work when the arithmetic on integers modulo 2^N.

So the user has to constrain the set of valid inputs, and do manual sanitization, just like in C.

by jstimpfle

5/21/2026 at 2:32:16 PM

The semantics are well-defined in both modes. You can predict exactly what will happen in either case. In C, the semantics are not defined at all, you can't predict what will happen and it's allowed to change between compilations of the same source.

It will probably get omitted, since Undefined Behavior isn't allowed by the C abstract machine, but sadly compilers are allowed to emit code for UB in the source (partly because some UB is only detectable at runtime). Sometimes disabling optimizations will incorrectly allow codegen to run for source lines which have UB, tricking people into thinking that optimizations are breaking their program. Compilers are allowed to do this, since behaviors other than "omit the offending statement" are unfortunately allowed by the standard, so it's not a compiler bug.

by SAI_Peregrinus

5/21/2026 at 7:43:11 PM

UB is a runtime property. As far as you can statically verify some code parts, you can see UB at compile time, but the point of UB is exactly that it is about stuff you can't predict, or that is hard to predict as a compiler.

Now why you can cook up trivial artificial examples where a compiler will remove some code sections based on statically detected UB, instead of printing an error, you have to ask the compiler authors.

> The semantics are well-defined in both modes.

So they're not the same? So the behaviour is not uniquely defined by the source code alone, but is actually _very_ different based on compile mode? Between two modes whose point was never to have different semantics, but to have the _same_ semantics while being debuggable vs being fast?

> You can predict exactly what will happen in either case. In C, the semantics are not defined at all, you can't predict what will happen and it's allowed to change between compilations of the same source.

You can make the same "predictability" argument for C, you can easily write a compiler that has semantics exactly laid out. Case in point: -fwrapv. Case in point: UBSAN.

by jstimpfle

5/21/2026 at 7:27:00 AM

Yeah on average. On some paths it's almost free

by saagarjha

5/20/2026 at 12:35:07 PM

You can run your code under ASAN and UBSAN nowadays, it will catch many or most of issues as they happen.

But that's completely besides the point. UB on signed overflow, or really most of UB, is not unrelated to C flexibility. It is a detail of the spec related to portability and performance. IIRC it is even required to make such trivial optimizations as turning

    for (int i = 0; i < n; i++) func(a[i]);

into

    for (Foo *p = a, *last = a + n; p < last; p++) func(p);

saving arithmetics and saving a register, on architectures where `int` is smaller than pointers. But there is also options like -fwrapv on GCC for example, allowing you to actually use signed overflow.

by jstimpfle

5/20/2026 at 1:39:39 PM

How is undefined behavior necessary for this transformation?

by Chinjut

5/20/2026 at 2:15:54 PM

IIRC computation of the address is done by computing offset from base pointer as a multiplication in (32-bit) int, (like p + (i * sizeof (Foo)). The right term might overflow, but due to signed overflow being UB, the compiler is able to assume that it does not, so the transformation to do the arithmetic entirely in (64-bit) pointer space is valid.

by jstimpfle

5/20/2026 at 6:31:04 PM

Exactly. You as the programmer know that the loop counter won't overflow, and in general, essentially nobody would actually write it that way. But if you don't assume it can't happen, the possibility for signed overflow is everywhere in address computations.

This is also a major blocker for auto-vectorization. Can't coalesce a load of a[i], a[i+1], a[i+2], a[i+3] into a load of a[i:i+3] if there's a possibility that `i+1`, `i+2` or `i+3` wrapped around (thus causing your "contiguous" load to be non-contiguous). This is a big reason why you shouldn't use `unsigned` for loop counters, especially if they're going to be used as an index into an address calculation.

by tyg13

5/21/2026 at 1:54:51 PM

But surely the more natural approach than making this undefined behavior would be making the computation of a[i] take place in 64-bit pointer space rather than 32-bit int space? Why does the compiler need the freedom to emit nasal demons?

by Chinjut

5/20/2026 at 2:37:05 PM

*is not related to C flexibility

by jstimpfle

5/20/2026 at 10:38:19 AM

It's not flexible in practice, because knowing the standard isn't optional. If you make the choice to not follow the standard, you're making the choice to write fundamentally broken software. Sometimes with catastrophic consequences.

by simonask

5/20/2026 at 10:57:36 AM

I'm making the choice to pass pointers as void to get low-friction polymorphism. I'm making the choice to control the memory layout of my data structures, including of levels and type of indirection. I'm making the choice to control my own memory allocators and closely control lifetimes, closely control (almost) everything that happens in the system.

That has nothing to do with not following the standard.

by jstimpfle

5/20/2026 at 11:20:56 AM

But be as you may you’re not following the standard.

by saagarjha

5/20/2026 at 12:24:15 PM

what is your point?

by jstimpfle

5/20/2026 at 12:37:33 PM

If you don't follow the standard, gcc -O2 can introduce bugs to your code that you never even wrote. Skipping null checks, executing both branches of a conditional, and so on.

by Xirdus

5/20/2026 at 12:39:46 PM

Where did I say I'm not following the standard?

by jstimpfle

5/20/2026 at 12:58:25 PM

I interpreted these words:

> If you want to be standards correct, yes you have to know the standard well.

to mean that being standards-correct is optional. It's not. Every C programmer needs to know every possible UB by heart and never introduce any of it to their code, or else they'll be constantly introducing subtle, hard to debug bugs that contradict the actual code they wrote.

Maybe you meant something different by those words, but then I'm confused what the "if" was supposed to mean.

by Xirdus

5/20/2026 at 2:06:09 PM

Of course it's optional (although I didn't mean to imply that). Even using computers at all is optional. I never said that I don't aim to follow the standard, have a clean compiling program without warnings and without UB, etc. I do strive to achieve all of that.

But it's not entirely black and white, either. In practice I'm fine accepting that some bugs are technically UB but whatever, we've found a bug by whatever manifestation (like NULL dereference most likely leading to segfault in practice). I just fix the bug as a bug, and life goes on.

The standard is not perfect, it does have shortcomings. It can be improved. And it can be interpreted to fix some issues. Let's not hold theory over practicality, and let's expect the compiler writers also strive to do the reasonable thing.

by jstimpfle

5/20/2026 at 8:57:49 PM

In practice, GCC -O2 will happily erase entire swathes of code and turn perfectly logical source into nonsense assembly whenever it gets as much as a sniff of UB anywhere in the code path. Nobody would be talking about UB if GCC wasn't so aggressive in abusing the freedom UB gives.

To paraphrase your earlier comment - you lose low-friction polymorphism (unpredictable compiler output causes a lot of friction). You lose control of memory allocations (because they may have been elided) and lose control of lifetimes (because free can be moved before last use causing a crash, or removed entirely causing a leak). You lose control of (almost) anything that happens in the system. And it has everything to do with not following the standard.

You do retain control of the memory layout of data structures, though.

by Xirdus

5/20/2026 at 10:13:31 PM

Then I'm almost ashamed to admit that I'm not sure I've ever witnessed any surprising form of UB in the wild. For example, I will reliably get segfaults on NULL dereference in practice. Typical manifestations of UB are entirely predictable and obvious. Of course I'm also running most code without most optimizations, most of the time, while developing.

On the other hand, what I've observed with my own eyes is interesting phenomenons like performance drops, e.g. memory bandwidth dropping from gigabytes/sec to 300 KB/sec due to false sharing on an ARM SOC for example.

by jstimpfle

5/21/2026 at 1:13:59 PM

There was once a privilege escalation vulnerability in Linux kernel that only happened when compiled with optimizations. In kernel space, address 0 is just regular memory that can be read from and written to if there's a page mapped to it. But in C standard, reads and writes to null pointer are UB.

There was some function that read from a passed pointer unconditionally whether it's null or not. It made sense in context. Then it checked if the pointer is null - if it is do early return, if it's not do privileged operation. The pointer was null iff the user didn't have permissions to do the operation.

What GCC did is notice that a pointer is accessed before its null check. Since accessing a null pointer is UB, and GCC assumes UB never happens, it figured out the null check is superfluous. And removed the check and the early return. The pointer read stayed, mind you. The optimized function would unconditionally read from the pointer even if it's null, then unconditionally execute the privileged operation without checking permissions. That allowed obtaining root access from anywhere.

I saw a few other writeups of interesting UB behavior on The Old New Thing blog. I especially like the time travel one: https://devblogs.microsoft.com/oldnewthing/20140627-00/?p=63... (apologies to people of the future, links to MS devblogs tend to die often).

by Xirdus

5/21/2026 at 7:34:34 AM

Compilers do not surprise all that often which is why it is extra surprising when it does happen

by saagarjha

5/20/2026 at 10:05:29 AM

At which point it feels like some sort of high-level assembly-like language, which is simple enough to compile efficiently and stay crossplatform, with some primitives for calls, jumps, etc. could find a nice niche.

Maybe this already exists, even? A stripped down version of C? A more advanced LLVM IR? I feel like this is a problem that could use a resolution, just maybe not with enough of a scale for anyone to bother, vs. learning C, assembly of given architecture, or one of the new and fancy compiled languages.

by 3form

5/20/2026 at 4:09:39 PM

There's Vale [0] as a structured high-level assembly language, but pretty far from usable right now. I do hope it matures. Basically: All non-control-flow instructions can be directly supported. Control flow is lofted to a higher level and implemented in C-style structured blocks and keywords, which map directly to a subset of the ISA that modifies the program counter. This separation means it's not a proper superset of traditional assembly languages -- you can't paste in arbitrary blocks of existing code -- but a lot of interesting things (for them, implementations of cryptographic primitives) are pretty trivial to port over. And in exchange, you get a well defined Hoare logic that can talk about total correctness, not just [1]'s partial correctness.

[0] https://github.com/project-everest/vale

[1] https://nickbenton.name/coqasm.pdf

by addaon

5/20/2026 at 10:40:02 AM

Well, Zig is aiming to be a "saner C", and mostly succeeding so far. I hope they make it to production.

Rust is a somewhat more thorough attempt to actually course-correct.

by simonask

5/20/2026 at 5:40:32 PM

It is basically what you can have today with Object Pascal or Modula-2, with a revamped syntax for C crowds.

by pjmlp

5/20/2026 at 5:41:41 PM

Yes, there have been quite a few C inspired Assembly languages for DSPs for example, TI had one.

by pjmlp

5/20/2026 at 10:13:14 AM

And it makes sense as long as you allow the concept of unsequenced operations at all (admittedly it’s somewhat rare; e.g. in Scheme such things are defined to still occur in sequence, but which specific sequence is unspecified and potentially different each time). The “volatile” annotation marks your variable as being an MMIO register or something of that nature, something that could change at any point for reasons outside of the compiler’s control. Naturally, this means all of the hazards of concurrent modification are potentially there.

That said, your “common parlance” definition of “data race” is not the definition used by the C standard, so your last sentence is at best misleading in a discussion of standard C.

> The execution of a program contains a data race if it contains two conflicting actions in different threads, at least one of which is not atomic, and neither happens before the other. Any such data race results in undefined behavior.

(Here “conflicting” and “happens before” are defined in the preceding text.)

by mananaysiempre

5/20/2026 at 10:53:31 AM

Your first paragraph makes it sound as if the compiler will actually generate two reads of the value of some register, which might lead to unexpected effects at runtime for certain special registers.

However, this is not at all what UB means in C (or C++). The compiler is free to optimize away the entire block of code where this printf() sequence occurs, by the logic that it would be UB if the program were to ever reach it.

For example, the following program:

  int y = rand();
  if (y != 8) {
    volatile int x;
    printf("%d: %d", x, x) ;
  } else {
    printf("y is 8");
  }

Can be optimized to always print "y is 8" by a perfectly standard compliant compiler.

by tsimionescu

5/20/2026 at 11:55:34 AM

> Your first paragraph makes it sound as if the compiler will actually generate two reads of the value of some register, which might lead to unexpected effects at runtime for certain special registers.

I don’t see how. I was trying to explain why it’s reasonable for a volatile read to be a side effect, after which the C rule on unsequenced side effects applies, yielding UB as you say.

by mananaysiempre

5/20/2026 at 11:06:22 AM

"volatile" tells the compiler it is _not_ safe to optimise away any read or write, so it can't just optimise that section away at all.

> An object that has volatile-qualified type may be modified in ways unknown to the implementation or have other unknown side effects. Therefore any expression referring to such an object shall be evaluated strictly according to the rules of the abstract machine, as described in 5.1.2.3. Furthermore, at every sequence point the value last stored in the object shall agree with that prescribed by the abstract machine, except as modified by the unknown factors mentioned previously.

A compliant compiler is only free to optimise away, where it can determine there are no side-effects. But volatile in 5.1.2.3 has:

> Accessing a volatile object, modifying an object, modifying a file, or calling a function that does any of those operations are all side effects.

by shakna

5/20/2026 at 11:19:38 AM

Yes, but undefined behaviour is undefined behaviour, and that behaviour can legally be that the code is not emitted at all, volatile (or any other side effect) or not. (and compilers do reason about undefined behaviour when optimising, so this isn't necessarily a completely theoretical argument, though I don't know whether the in compiler's actual logic which of 'don't optimise volatile' or the 'do assume undefined behaviour is impossible and remove code that definitely invokes it' would 'win', or whether there's any current compiler that would flag this as unconditionally undefined behaviour in the first place).

by rcxdude

5/20/2026 at 11:25:04 AM

Volatile wins.

GCC calls that out [0] - volatile means things in memory may not be what they appear to be, and that there are asynchronous things happening, so something that may not appear to be possible, may become so, because volatile is a side-effect.

So about the only optimisation allowed to happen, is combining multiple references.

Clang is similar:

> The compiler does not optimize out any accesses to variables declared volatile. The number of volatile reads and writes will be exactly as they appear in the C/C++ code, no more and no less and in the same order.

[0] https://www.gnu.org/software/c-intro-and-ref/manual/html_nod...

by shakna

5/20/2026 at 12:00:34 PM

That's cool and all if you are writing GCC or Clang dialect C, but it doesn't change the fact that it is UB in the C standard.

by poizan42

5/20/2026 at 3:03:16 PM

This is all assuming that the code is not invoking undefined behaviour. If the code is invoking undefined behaviour, GCC and clang are both well within their rights to say 'none of the rest of our documentation applies' (and have historically done so on bug reports).

by rcxdude

5/20/2026 at 11:17:16 AM

Sure it can. That code path has unconditional UB and thus it is not valid.

by saagarjha

5/20/2026 at 11:20:38 AM

Only if there would be no side-effects. Which there are.

by shakna

5/20/2026 at 11:52:12 AM

No this is irrelevant for making this decision

by saagarjha

5/20/2026 at 12:01:15 PM

I've mentioned elsewhere the standards, and compilers as well, disagreeing with you here.

But feel free to run against the various compilers through godbolt. [0] They won't optimise the branch away. Access to a volatile, must be preserved, in the order that they exist. No optimisation, UB or otherwise, is allowed to impede that. Because an access is a side-effect.

[0] https://godbolt.org/z/85cGhq3Ta

by shakna

5/20/2026 at 1:58:38 PM

Compilers not doing something is not a demonstration that they are not actually allowed to do that thing.

by zahlman

5/20/2026 at 12:17:18 PM

That they won’t is as most a courtesy to you but they are not required to do this.

by saagarjha

5/20/2026 at 12:26:42 PM

> Furthermore, at every sequence point the value last stored in the object shall agree with that prescribed by the abstract machine, except as modified by the unknown factors mentioned previously.

I quoted the C standard, first. Not compiler behaviour.

I showed where it requires the compiler not to optimise this.

How about, instead of one-line throwaway disagreements, you point out where they are permitted to do this, instead?

by shakna

5/20/2026 at 1:56:47 PM

The compiler is required to not optimise out reads/writes through volatile. That's unrelated to code also having UB: you can't sprinkle volatile through arbitrary UB and suddenly have it be defined.

> A compliant compiler is only free to optimise away, where it can determine there are no side-effects

A compliant compiler is also allowed to assume UB cannot occur.

by dwattttt

5/20/2026 at 12:03:23 PM

This looks like a long back and fourth, that can easily be solved by a minute or two on godbolt...

by nilamo

5/20/2026 at 12:15:48 PM

> that can easily be solved by a minute or two on godbolt...

Unfortunately it's not that simple when it comes to UB. If the snippet in question does in fact exhibit UB then there's no guarantee whatever Godbolt shows will generalize to other programs/versions/compilers/environments/etc.

by aw1621107

5/20/2026 at 12:23:15 PM

That's very funny to me.

A) x is always removed.

B) no, it's never removed if volatile.

But neither person can prove what a compiler will actually do, despite claiming they'll always act a certain way given 5 lines of code.

by nilamo

5/20/2026 at 12:33:11 PM

Also, at behavioural edges what you'll see on Godbolt is compiler bugs. So you learn nothing about what should happen.

All popular modern C++ compilers have known bugs and while I'm sure there are C compilers with no known bugs that will be because nobody tested very hard.

by tialaramex

5/20/2026 at 1:58:42 PM

I have watched a compiler flip between emitting the code I expected (despite it having UB), and emitting unexpected code after a minor update.

What you observe a compiler do when there's UB is not at all something you can rely on.

by dwattttt

5/20/2026 at 3:06:47 PM

No, claim A is 'x may be removed by a conforming C compiler'. Whether any given version of a given compiler actually does so in any given circumstance is a different question (the answer being: probably not, because while this is undefined behaviour it's not likely something that is going to be flagged as such by a compiler's optimizer. Also, from some testing with GCC and forcing a null point dereference, it seems like volatile at least does win in that case with the current version of it x86, and it dutifully emits the null pointer dereference and then the 'ud2' instruction instead of the rest of that execution path).

by rcxdude

5/20/2026 at 12:29:16 PM

I made the weaker claim that x can be removed. This is something I could prove with compiler output but I would have to find a compiler willing to make this optimization which is not something I can guarantee.

by saagarjha

5/20/2026 at 12:18:03 PM

No, compilers will often choose to not optimize on UB.

by saagarjha

5/20/2026 at 11:28:15 AM

When compiler decides something is UB aka "result of this code is not defined and could be any" it selects the most performant version of undefined behavior - doing nothing by optimizing code away.

by u8080

5/20/2026 at 11:39:44 AM

The compiler is not free to remove accesses to something marked volatile - its defined as a side-effect.

Volatile means something else may be acting here. Something else may install anything into the register at any time - and every time you access.

The compiler is required to preserve the order of accesses. In almost every C compiler, today, there are almost no optimisations the moment a volatile is introduced, for this reason.

by shakna

5/20/2026 at 11:55:21 AM

If code has undefined behavior, the entire execution path that leads to that UB has no assigned semantics in the C model. So there are no volatile accesses in this code according to the C abstract machine - the entire execution path is UB, so it can be assumed it doesn't happen at all.

by tsimionescu

5/20/2026 at 12:29:25 PM

The execution path has unknown side effects, and so the execution path must be strictly followed. That's uh... The entire point of that section in the C standard. Its why volatile is called out, in the semantic model for the abstract machine.

Otherwise... Why call it out, at all? It must be strictly followed, not lazily, as in other areas of the standard.

by shakna

5/20/2026 at 1:26:52 PM

Previously discussed here: https://news.ycombinator.com/item?id=33770277

UB supersedes volatile, once the compiler hits UB then all bets are off. Compilers can and do optimize out UB branches, which is almost never what you want... yet here we are.

by Aeolos

5/20/2026 at 1:42:24 PM

From that thread: https://news.ycombinator.com/item?id=33770905

>> The moment you enter a compilation unit (assuming no link optimizations) with a state which at some point will run into undefined behavior all bets are of. [...] Yes, UB can "time travel"

> Close, but not quite. This is a common misconception in the reverse direction.

> Abstractly, what UB can do is performing the inverse of the preceding instructions, effectively making the abstract machine run in reverse. However, this is only equivalent to "time-traveling" until you get to the point of the last side effect (where "side effect" here refers to predefined operations in the standard that interact with the external world, such as I/O and volatile accesses), because only everything since that point can be optimized away under the as-if rule without altering the externally visible effects of the program.

> As a concrete, practical example, this means the following: if you do fflush(stdout); return INT_MAX + 1; the compiler cannot omit the fflush() call merely because the subsequent statement had undefined behavior. That is, the UB cannot time-travel to before the flush. What the program can do is to write garbage to the file afterward, or attempt to overwrite what you wrote in the file to revert it to its previous state, but the fflush() must still occur before anything wild happens. If nobody observes the in-between state, then the end result can look like time-travel, but if the system blocks on fflush() and the user terminates the program while it's blocked, there is no opportunity for UB.

by shakna

5/20/2026 at 3:09:23 PM

Sure, but in this case the volatile accesses are part of the undefined behaviour and so they're not outside of the blast radius.

by rcxdude

5/20/2026 at 12:27:54 PM

The print example has no defined order of accesses, function parameters can be evaluated in any order. But further, the entire problem with UB is that it supercedes the regular guarantees that you get (like with volatile) when it's encountered. Yes gcc and clang do the obvious thing that makes the most sense in this example, but what people are trying to tell you is that they could just not do that and they would still be complying with the standard. For example, you can imagine a more serious example of UB that causes the program to fail to compile completely, and then do you emit the correct number of in order reads of volatile variables? Obviously not.

by SpaceNugget

5/20/2026 at 12:35:12 PM

Function parameters cannot be evaluated in any order, when one of them is a volatile.

> The initialization shall occur in initializer list order, each initializer provided for a particular subobject overriding any previously listed initializer for the same subobject

And what I am trying to tell people, is the standard has expectations around the volatile keyword, that the compilers took into account when designing how they would work - it isn't just kindness, its compliance. But no one is actually talking about the quotes from the standard, and just quoting themselves and their own understandings.

by shakna

5/20/2026 at 12:54:53 PM

That quote doesn't have anything to do with parameter evaluation order. There is no order for function parameter evaluation.

And no, there is no exception for undefined behavior. There can't be, otherwise the behavior would be... defined. It's in the name. Again, what do you think the compiler emits when the undefined behavior causes the program to not compile altogether?

by SpaceNugget

5/20/2026 at 1:12:43 PM

Are you sure?

>unsequenced side effects on the same scalar object are UB

>6.5.3.3.8 tells us that the evaluations of function arguments are indeterminately sequenced w.r.t. each other.

Read 5.1.2.4.3:

"If A is not sequenced before or after B, then A and B are unsequenced."

"Evaluations A and B are indeterminately sequenced when A is sequenced either before or after B, but it is unspecified which."

With a footnote saying this:

"9)The executions of unsequenced evaluations can interleave. Indeterminately sequenced evaluations cannot interleave, but can be executed in any order."

I.e the standard makes a distinction between "unsequenced" and "indeterminately sequenced". And with no mention of side effects on "indeterminately sequenced" being UB it leads me to conclude that your example is not UB.

by rocketrascal

5/20/2026 at 10:29:01 AM

Reading a register from a microcontroller peripheral may well reset it as an example of a possible side-effect here, and that's exactly the kind of thing you use volatile for.

by berti

5/20/2026 at 1:48:22 PM

> Here's a way weirder example:

Well, yes; but when the C standard authors wrote like this, they surely had in mind "the reads could be in either order, therefore the output could display the polled values in either order". Not C++ nasal demons.

And yeah, being able to say "reading is a side effect" is important when for example you interact with certain memory-mapped devices.

by zahlman

5/20/2026 at 3:41:46 PM

I think C standard doesn't do itself any favors by using "undefined behavior" to signify both "anything can happen, including erasing all your data and setting your data center on fire" and "one of the very small and well defined set of things would happen, but we can not commit to which one". The latter is not exactly great, but significantly less dangerous than the former.

by smsm42

5/20/2026 at 9:34:25 AM

Yes, there is a data race there. The value of a volatile can be changed by something outside the current thread. That’s what volatile means and why it exists.

Edit: thread=thread of execution. I’m not making a point about thread safety within a program.

by sethev

5/20/2026 at 10:05:29 AM

Not from the standard’s point of view. The traditional (in some circles) use of volatile for atomic variables was not sanctioned by the C11/C++11 thread model; if you want an atomic, write atomic, not volatile, or be aware of your dependency on a compiler (like MSVC) that explicitly amends the language definition so as to allow cross-thread access to volatile variables.

by mananaysiempre

5/20/2026 at 10:11:00 AM

Thread was a poor choice of word. Outside the control of the program is a better way to put it. Like memory mapped io.

by sethev

5/20/2026 at 3:14:27 PM

It's almost universally better to use inline assembly via a macro to read/write mmio rather than use volatile.

by surajrmal

5/20/2026 at 10:35:35 AM

Can also represent a register that has an effect reading it. Reading a memory mapped register can have side effects. Like memory mapped io on a UART will fetch the next byte to be read.

by trissylegs

5/20/2026 at 11:38:49 AM

Was going to say the same thing until I saw this comment. volatile is defined the way I'd expect, plus it's a strange code example.

by frollogaston

5/20/2026 at 10:13:20 AM

Not sure why you're being downvoted. That's completely right. The example is silly. The code is obviously bad, doesn't matter if it's UB or not.

I'm also not convinced (yet) that the example really is UB: I agree reading a volatile is "a side effect" in some sense, and GP cited a paragraph that says just that. But GP doesn't clearly quote that it's a side effect on the object (or how a side effect on an object is defined). Reading an object doesn't mutate it after all.

But whatever language lawyer things, the code is obviously broken, with an obvious fix, so I'm not so interested in what its semantics should be. Here is the fix:

    volatile int x;
    // ...
    int val = x;  // volatile read
    printf("%x %d\n", val, val);

by jstimpfle

5/20/2026 at 11:10:30 AM

The problem is that the function call as a whole is UB. Having the original example compile to the equivalent of

  volatile int x;
  int a = x;
  int b = x;
  printf("%x %d\n", a, b);

is equally valid as

  volatile int x;
  int a = x;
  int b = x;
  printf("%x %d\n", b, a);

, and neither needs to have the same output as your proposed fix.

C could've specified something like "arguments are evaluated left-to-right" or "if two arguments have the same expression, the expression is [only evaluated once]/[always evaluated twice]". But it didn't, so the developer is left gingerly navigating a minefield every time they use volatile.

by crote

5/20/2026 at 11:17:47 AM

Not only is "arguments are evaluated left-to-right" less easy to formalize than you think, it would also make all C code run slower, because the compiler would no longer be able to interleave computations for more efficient pipelining. The same goes for "expression is [only evaluated once]/[always evaluated twice]".

Of course the developer is navigating a minefield every time they use volatile, that's why it's called "volatile" - an English word otherwise only commonly used in chemistry, where it means "stuff that wants to go boom".

by indigo945

5/20/2026 at 1:43:50 PM

the compiler can still interleave anything it shows is side-effect free; it’s hard to show that something would benefit from being reordered without analyzing it well enough to determine what side effects it has

by remexre

5/20/2026 at 11:45:50 AM

Your argument makes no sense since the developer is expected to perform manual sequencing. Correctly written UB free code cannot be interleaved either.

All you've achieved is that the standard C function call syntax can no longer be used as is.

by imtringued

5/20/2026 at 2:19:58 PM

I understand, that's why I said the code is obviously broken. The problem is not about order of evaluation. It's not about an UB arising from unsequenced volatile reads or whatever.

The problem is simply that the there are two volatile reads where only one was intended. It doesn't matter if there is UB or not. The code doesn't express the intention either way. All you need to know to understand that is that volatile might be modified concurrently (a little bit similar but not the same semantics as atomics).

by jstimpfle

5/20/2026 at 10:52:34 AM

With volatile it could be changed by an interrupt service routine between reads, so it makes sense.

by RobotToaster

5/20/2026 at 6:04:38 PM

Or, it could be hardware that has a "clear flag on read" type behavior.

by nomel

5/20/2026 at 7:15:43 PM

What's weird about it?

If you are using volatile you are reading from a device port mapped to that address.

Since C doesn't mandate in which order function arguments are evaluated, you don't know which argument will be read from port first.

How can that be anything but UB?

by drysine

5/20/2026 at 11:02:44 AM

This has got nothing to do with data races etc. but everything to do with "Sequence Points and Single Update Rule" which is well described in C language specification.

See my comment here - https://news.ycombinator.com/item?id=48205760

by rramadass

5/20/2026 at 11:40:34 AM

Memory mapped IO sends a read request to a peripheral which is allowed have side effects in the background and return two different values upon a read. You can think of it as a synchronous RPC request.

The lack of argument sequencing feels utterly petty however.

by imtringued

5/20/2026 at 1:58:47 PM

[dead]

by netrikare

5/20/2026 at 8:22:32 AM

The UB in unaligned pointers is even worse: an unaligned pointer in itself is UB, not only an access to it. So even implicit casting a void*v to an int*i (like 'i=v' in C or 'f(v)' when f() accepts an int*) is UB if the cast pointer is not aligned to int.

It is important to understand that this is a C level problem: if you have UB in your C program, then your C program is broken, i.e., it is formally invalid and wrong, because it is against the C language spec. UB is not on the HW, it has nothing to do with crashes or faults. That cast from void* to int* most likely corresponds to no code on the HW at all -- types are in C only, not on the HW, so a cast is a reinterpretation at C level -- and no HW will crash on that cast (because there is not even code for it). You may think that an integer value in a register must be fine, right? No, because it's not about pointers actually being integers in registers on your HW, but your C program is broken by definition if the cast pointer is unaligned.

by beeforpork

5/20/2026 at 9:06:32 AM

Author here.

> an unaligned pointer in itself is UB

Yup. Per the "Actually, it was UB even before that" section in the post.

> UB is not on the HW, it has nothing to do with crashes or faults

Yeah. I tried to convey this too, but I'm also addressing the people who say "but it's demonstrably fine", by giving examples. Because it's not.

by thomashabets2

5/20/2026 at 8:50:34 AM

Which is totally fine and expected for any decent programmer. Casting pointers is clearly here be dragons territory.

by account42

5/20/2026 at 8:58:40 AM

Many, many programmers come to C (and C++) with a lower-level understanding that actually gets in the way here. They understand that all types "are" just bytes and that all pointers "are" just register-sized integer addresses, because that's how the hardware works and has worked for decades.

It's perfectly reasonable to expect any load through `int*` to just load 4 bytes from memory, done and done. They get surprised that it is far from the whole story, and the result is UB.

Meanwhile, the actual computers we have been using for decades have no problems actually just loading 4 bytes through any arbitrary pointer with zero overhead. But no.

by simonask

5/20/2026 at 9:17:36 AM

> They understand that all types "are" just bytes and that all pointers "are" just register-sized integer addresses, because that's how the hardware works and has worked for decades.

I'd clarify this with "They understand that all values are just bytes".

> Meanwhile, the actual computers we have been using for decades have no problems actually just loading 4 bytes through any arbitrary pointer with zero overhead.

It's partly the standards fault here - rather than saying "We don't know how vendors will implement this, so we shall leave it as implementation-defined", they say "We don't know how vendors will implement this, so we will leave it as undefined".

A clear majority of the UB problems with C could be fixed if the standards committee slowly moved all UB into IB. It's not that there isn't any progress (Signed twos-complement is coming, after all), it's that there is (I believe) much pushback from compiler authors (who dominate the standards) who don't want to make UB into IB.

by lelanthran

5/20/2026 at 7:16:43 PM

> A clear majority of the UB problems with C could be fixed if the standards committee slowly moved all UB into IB

There is no such thing as getting rid of "all UB."

What behavior is the implementation supposed to prescribe for a write to an unpredictable garbage address you read from the network? It could overwrite your code. It could overwrite any value anywhere. It could overlap with anything. Prescribing defined behavior for absolutely everything would require defining a precise, unoptimizable 1-to-1 mapping to assembly code and disallowing any multithreading.

by mike_hock

5/21/2026 at 8:53:10 AM

> What behavior is the implementation supposed to prescribe for a write to an unpredictable garbage address you read from the network? I

"The compiler is not allowed to elide a write to a garbage address".

Wasn't that easy?

by lelanthran

5/20/2026 at 9:59:05 AM

>It's partly the standards fault here - rather than saying "We don't know how vendors will implement this, so we shall leave it as implementation-defined", they say "We don't know how vendors will implement this, so we will leave it as undefined

I'd agree to a point. I still think it's unreasonable for compiler writers to get all lawyery about precise terminology. After all "implementation defined" could still be subject to the same lawyeriness (we implemented it, ergo we define it).

To me this is an issue of culture. We need to push back against the view that UB means anything can happen, therefore the compiler can do anything.

by benj111

5/20/2026 at 10:27:23 AM

But it's genuinely useful. In all seriousness, are you sure you aren't perhaps just using the wrong language? At this point UB and leveraging it for optimization are core parts of the most performant C implementations.

That said, I think there are many cases where compilers could make a better effort to link UB they're optimizing against to UB that appears in the code as originally authored and emit a diagnostic or even error out. But at least we've got ubsan and friends so it seems like things are within reason if not optimal.

by fc417fc802

5/20/2026 at 12:34:24 PM

> At this point UB and leveraging it for optimization are core parts of the most performant C implementations.

I am skeptical that NULL-pointer checks being removed contribute anything more than a rounding error in performance gains in any non-trivial program.

by lelanthran

5/20/2026 at 3:50:05 PM

I got a measurable improvement from eliminating a null pointer check within the last week. Billions of devices have arm little cores, and the extra branch predictor pressure and frontend bandwidth from those instructions can be significant.

A standard way to eliminate those is to invoke undefined behavior if some condition is not met;

    if (a == NULL) {
      __builtin_unreachable();
    }

Which then allows elimination of the null check in later code, possibly after inlining some function.

by charleslmunger

5/20/2026 at 10:42:27 AM

>are you sure you aren't perhaps just using the wrong language

Well I think there is a tension here. C is the language for microcontrollers and the language for high performance.

In ye olden days both groups interests were aligned because speed in C was about working with the machine. Now the UB has been highjacked for speed, that microcontroller that I'm working on, where I know and int will overflow and rely on that is UB so may be optimised out, so I then have to think about what the compiler may do.

I wouldn't say C is the wrong language. I would say there are wrong compilers though.

by benj111

5/20/2026 at 11:01:10 AM

This series was a good explanation for me of why treating UB this way is genuinely useful: https://blog.llvm.org/2011/05/what-every-c-programmer-should...

Being able to assume certain things don't happen is powerful when you're writing optimisations, not doing that would have a real performance cost

by circuit10

5/20/2026 at 12:52:34 PM

> Being able to assume certain things don't happen is powerful when you're writing optimisations, not doing that would have a real performance cost

A few of those are significant performance gains, the majority are not.

Emitting the instruction for a NULL pointer dereference is effectively no more costly than not emitting that instruction.

It's the code removal that's killing me.

by lelanthran

5/20/2026 at 12:54:34 PM

What if the compiler is able to use that to determine that a whole code path is dead, and then significantly improve the surrounding function because of that?

Compilers optimise in multiple passes and removing things earlier can expose optimisation opportunities later that can affect other parts of the code too

by circuit10

5/20/2026 at 12:56:40 PM

> What if the compiler is able to use that to determine that a whole code path is dead,

Then it should warn "unreachable code".

> and then significantly improve the surrounding function because of that?

It's not simply the removal that is the problem, it's that the code is silently removed.

by lelanthran

5/20/2026 at 10:45:38 PM

I don’t mean this in a rude way but you should really read the posts I linked, it’s interesting and part 3 especially answers these questions

Direct link to part 3 (but read the others first for context if you can): https://blog.llvm.org/2011/05/what-every-c-programmer-should...

You don’t want warnings for every piece of code in a library you’re not using or sanity check you added that isn’t supposed to be hit

And you can’t warn when you’re optimising based on undefined behaviour because you can’t know when it will happen, that is equivalent to the halting problem

If you warned whenever undefined behaviour could be happening then e.g. every single pointer deference would say “warning: compiling assuming pointer is not null or unaligned”

by circuit10

5/21/2026 at 8:58:40 AM

> I don’t mean this in a rude way but you should really read the posts I linked, it’s interesting and part 3 especially answers these questions

I have read the entire series. Both in the past and more recently.

> If you warned whenever undefined behaviour could be happening then e.g. every single pointer deference would say “warning: compiling assuming pointer is not null or unaligned”

I am not proposing that at all, I am proposing (which the series you link to does not preclude) that eliding code on the basis of UB can be determined by the compiler. If the compiler can determine that some code block needs to be elided, then the reason for that elision has already been determined, in which case it can issue a warning when "reason" == "pointer already used" when eliding the null check.

by lelanthran

5/20/2026 at 2:36:10 PM

Right. But to take the first example, the value of initialised memory.

It's undefined so it doesn't have to be zeroed therefore increasing efficiency.

But it's also UB so if you do know that memory contains something, you can't take advantage of that because it's UB. Having it UB is fine. It's the compilers assuming UB can't happen and optimising it away.

by benj111

5/20/2026 at 11:23:49 AM

Turning undefined behavior into implementation defined behavior is rarely a fix, though.

by saagarjha

5/20/2026 at 11:39:04 AM

It's a fix that removes the most pointy part of UB.

"Going past the end of the array results in addressing arbitrary values" I can live with. "Going past the end of an array results in anything happening" is a hard sell.

by lelanthran

5/20/2026 at 3:37:43 PM

Is that really a meaningful distinction?

Once you are addressing arbitrary values you are firmly in the realm of "anything happening" in practice, but you've now given up optimization opportunities. As has been repeatedly demonstrated over the years, once memory safety breaks it is practically impossible to make any guarantees about program behavior.

by charleslmunger

5/20/2026 at 4:54:00 PM

Yes, it's a meaningful distinction. No you are not into "anything happening" in practice.

Your compiler emitting a load operation and it failing isn't "anything". The failure being handled by code that the compiler authors can't predict doesn't make it "anything".

And if you lose optimization opportunities because of this it's because your optimization is broken. By the way, if you lose optimization opportunities because of this, that means both codes are meaningfully different and you knew it all the time.

by marcosdumay

5/21/2026 at 7:31:11 AM

Compilers elide loads all the time this is one of the more basic optimizations a compiler can do. We just mostly think those are "good" optimizations.

by saagarjha

5/21/2026 at 7:52:00 AM

I mean... You can turn a one byte out of bounds write into code execution.

https://daniel.haxx.se/blog/2016/10/14/a-single-byte-write-o...

And if you get code execution, then you by definition have "anything".

by charleslmunger

5/20/2026 at 11:51:39 AM

I think it’s a really easy sell, actually: if you go past the end of the array far enough you end up accessing the stack which includes parts of the program like “where does this function return to” or “what is the index used to perform this access” or “there is no page mapped there”. None of these are arbitrary values.

by saagarjha

5/20/2026 at 12:55:00 PM

The "anything can happen" means that the compiler can simply silently refuse to emit the code does the access.

Documenting that the instructions to access will always be eliminated makes it easier to predict what will happen.

by lelanthran

5/21/2026 at 7:33:16 AM

Yes, but usually you don't want this. You think you do, but you don't: you can't always eliminate these, and often eliminating the extra accesses is not the most efficient thing to do either. Sometimes it's faster to have the loads and not check, sometimes you can check and skip that path, etc.

by saagarjha

5/20/2026 at 1:14:06 PM

Can you unravel this further (for those of us who don’t know compilers)? I’ve always assumed access past the end of an array can’t always be detected in C, so I don’t see how those instructions could be eliminated.

For example, a dynamically linked library that takes in a pointer, and then writes to the 10 ints after it—whether or not this behavior is defined is determined after that library is compiled, right?

by bee_rider

5/20/2026 at 1:45:18 PM

I think the disconnect here is that you're operating on the assumptions built by using common architectures that have solved these problems in implementation specific forms, and you're used to those solutions.

But just because those forms are common, doesn't mean the behavior is actually defined.

Ex - I might be using a vendor specific compiler for custom embedded devices where dynamic linking isn't available at all, and which might have complicated storage mechanisms that look nothing like standard memory pages.

by horsawlarway

5/20/2026 at 2:09:29 PM

I’m not sure there’s a disconnect at all (note that I’m not saagarjah, they and lelanthran seem to be pushing back on each other’s opinions; I’m just asking a clarifying question).

by bee_rider

5/20/2026 at 3:40:24 PM

Yes, and I'm saying your clarifying question hints at a misunderstanding.

You're already deep into the bowels of implementation specific behavior by the time we talk about dynamic linking. The C standard doesn't have anything to say about it at all.

My read on the above conversation is basically a discussion about asking/requiring vendors to properly document their implementation, as opposed to leaving it undocumented (the default - given my experience with hardware manufacturers...).

I don't think the real takeaway is that "instructions should be eliminated in case [blah blah blah]" it's that "Something is going to happen, please tell me what that is on your system, instead of leaving it as UB" (Basically - make UB in the standard implementation defined behavior from the vendor).

My read is that this won't happen because it's genuinely incredibly difficult to do, and this isn't a space overflowing with capital to allocate to the problem. But I do think there's merit to the idea of pushing vendors to provide coverage in this space AT SOME POINT.

by horsawlarway

5/20/2026 at 3:09:01 PM

> I’ve always assumed access past the end of an array can’t always be detected in C, so I don’t see how those instructions could be eliminated.

"Can't always be detected" is jut a different way of saying "Can sometimes be detected".

Upon detection, I'd rather that the compiler still emit the instructions, not elide the code altogether.

by lelanthran

5/20/2026 at 5:14:45 PM

Now the behavior of your compiler/runtime stack is dependent on the sophistication of your compiler or runtime relative to the particular code at issue + the specific information available statically or dynamically in the instance.

That does not seem like an improvement if your goal is predictable, consistent behavior.

by twoodfin

5/21/2026 at 9:06:04 AM

> Now the behavior of your compiler/runtime stack is dependent on the sophistication of your compiler or runtime relative to the particular code at issue + the specific information available statically or dynamically in the instance.

> That does not seem like an improvement if your goal is predictable, consistent behavior.

Getting unpredictable outcomes only on some items is better than getting unpredictable outcomes on every item. This is what I meant when I said moving UB to IB might not fix everything, but it is certainly a solid start!

by lelanthran

5/20/2026 at 1:38:10 PM

Are you talking about creating a pointer (more than one item) past an array, or dereferencing that pointer? Both are currently UB.

For the former, I kinda get it. It may need to be there for cases like with segmented address space where p+10 could actually be a value less than p, for the eventually generated assembly. Maybe it should be fine to create such a pointer, but have it be "indeterminate value" or whatever, if you try to compare that pointer to anything? I don't know enough about compiler internals to say one way or the other.

Dereferencing, though, can only be UB. There may not be a "value" behind that address. There may be a motor that's been I/O mapped, or a self destruct button.

by thomashabets2

5/20/2026 at 3:11:14 PM

I'm not saying that the result of the dereference be known, I'm saying that the instructions to do the dereference be always emitted.

Right now, if a dereference results in UB, the compiler may omit it entirely.

by lelanthran

5/20/2026 at 4:49:03 PM

I think I would defer to someone more of a language lawyer than we, but I'm not sure what you're describing can be expressed in the C abstract machine. If a pointer is invalid, not pointing to an object, then I'm not sure it means anything to "read from there".

I know what you mean, but I'm just not sure you're describing something that fits what C "is". We program C to the abstract machine specified in the standard (5.1.2), and the compiler's job is to translate that into something with identical behavior on particular hardware. Piercing the layers down to actual hardware or assembly isn't really done.

Even "volatile" just says (basically) "touching this object has side effects". It implies no double-loading, speculative store, etc, but doesn't say "don't emit assembly instructions to load this unless the program logic path takes the route where the C program does load it".

The standard is not using ancient language when it refers to "objects with static storage duration" instead of "heap" or ".data segment". It is the true class of objects in the abstract machine.

by thomashabets2

5/20/2026 at 3:41:51 PM

Wouldn't that make a compiler that emitted bounds checks violate the standard, since it would not be emitting the actual memory operations if you deref out of bounds?

by charleslmunger

5/20/2026 at 4:55:59 PM

No, because it's UB so there is no standard.

by marcosdumay

5/21/2026 at 7:47:05 AM

Isn't the proposal from the parent comment to define the behavior?

by charleslmunger

5/20/2026 at 11:05:52 AM

> Meanwhile, the actual computers we have been using for decades have no problems actually just loading 4 bytes through any arbitrary pointer with zero overhead. But no.

Not if those 4 bytes span a cacheline boundary, that will most likely result in 1/2 throughput compared to loading values inside a single cacheline. And if it causes cache-misses it takes up twice the L2 or L3 bandwidth.

Even worse, if the int spans two pages, it will need two TLB lookups. If it's a hot variable and the only thing you use from those pages, it even uses up an additional TLB entry, that could otherwise be used for better perf elsewhere, etc.

And if you're on embedded (and many C programs are), Cortex-M CPUs either can't handle unaligned accesses (M0, M0+) or take 2-3 times as long (split the load into 2x2 byte or 1x2 + 2x1 byte)

by da-alex

5/20/2026 at 1:16:51 PM

I don’t think any of that is justification for making unaligned access UB. It’s reason to avoid it or discourage it in certain scenarios, but it’s infinitesimally rare that loading 8 bytes instead of 4 is even measurable, and that includes embedded.

by simonask

5/20/2026 at 12:49:27 PM

> that all pointers "are" just register-sized integer addresses

And crucially until DR#260 https://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_260.htm this was a reasonable guess as to what the pointers are. Probably not a wise guess because it's not how your C compiler worked even then, but a reasonable guess if you didn't think too hard about this.

One way I like to think about this is that all C's types are just the machine integers wearing crap Halloween costumes. Groucho glasses for bool, maybe a Lincoln hat for char, float and double can be bright orange make-up and a long tie. But the pointers are different, because unlike the other types those have provenance.

5 == 5, 'Z' == 'Z', true == true, 1.5f == 1.5f, but whether two pointers are equivalent does not depend solely on their bit pattern in C.

by tialaramex

5/20/2026 at 12:34:41 PM

I'm not sure that's right. For instance, the Pentium 4 spec explicitly says unaligned int32 loads take longer. And x86/x64 is very gentle in that regard, other archs would whip you. So an unaligned int access is rightfully treated differently. It should be IB.

Just creating the pointer, though, should not be UB, even though it apparently is. It should not even be IB.

by mafuy

5/20/2026 at 1:09:19 PM

Also, it’s been way more than a decade since Pentium 4 was remotely relevant.

by simonask

5/20/2026 at 3:03:41 PM

> Meanwhile, the actual computers we have been using for decades have no problems actually just loading 4 bytes through any arbitrary pointer with zero overhead.

PCs yes, but there are many other things C is compiled to for which this is not true.

by TazeTSchnitzel

5/20/2026 at 3:11:50 PM

C isn't a programming language. It's not even portable assembly. It's a vague suggestion of a program that might or might not be feasible to run on a target computer and the compiler and other diagnostic tools are under no obligation whatsoever to help you find out what, if anything, is wrong with your program. It's user hostile and should be relegated to the bad old days.

by titzer

5/20/2026 at 9:12:42 AM

Except ARM32. ARM64 doesn't guarantee it to be valid in all cases either.

by pjc50

5/20/2026 at 3:50:30 PM

Yes but casting pointers is virtually required in any non-trivial C program, and frankly even a lot of the trivial ones, because there's no other way to do type erasure or generics. Well, there kind of are now, and there's always been macros, but void * has historically been the predominant way this is done at runtime.

by array_key_first

5/20/2026 at 12:23:34 PM

>an unaligned pointer in itself is UB, not only an access to it.

Can someone point to where the standard states this?

by 201984

5/20/2026 at 11:46:19 PM

I think this is 6.3.2.3.7 in C99 about casting between pointer types:

> If the resulting pointer is not correctly aligned for the pointed-to type, the behavior is undefined.

However, unless I’m missing something, producing such a pointer from an integer is apparently not insta-UB? 6.3.2.2.5:

> An integer may be converted to any pointer type. Except as previously specified, the result is implementation-defined, might not be correctly aligned, might not point to an entity of the referenced type, and might be a trap representation

And later on 6.5.3.2.4:

> If an invalid value has been assigned to the pointer, the behavior of the unary * operator is undefined.

Which implies that the invalid pointer must have been obtained without being already undefined, right?

by adrian17

5/20/2026 at 10:56:25 AM

Does that mean that if I have a struct with #pragma pack(push, 1) I can't use pointers to any members that don't happen to be aligned?

by stilley2

5/20/2026 at 11:22:26 AM

This is a non-standard extension, so your compiler may provide stronger guarantees.

by saagarjha

5/20/2026 at 5:28:01 PM

In practice, both GCC and Clang consider pointers to these unaligned members to be UB and will flag them in UBSAN.

by AlotOfReading

5/20/2026 at 11:56:14 AM

The problem with C UBI is that originally it meant the compiler has the freedom to map your code to the hardware inspite of machine instructions differing slightly between one another. The same C program may express different behaviour depending on which architecture it is running on.

This type of UB is fine and nobody really complains about hardware differences leading to bugs.

However, over time aggressive readings of UB evolved C into an implicit "Design by Contract" language where the constraints have become invisible. This creates a similar problem to RAII, where the implicit destructor calls are invisible.

When you dereference a pointer in C, the compiler adds an implicit non-nullable constraint to the function signature. When you pass in a possibly nullable pointer into the function, rather than seeing an error that there is no check or assertion, the compiler silently propagates the non-nullable constraint onto the pointer. When the compiler has proven the constraints to be invalid, it marks the function as unreachable. Calls to unreachable functions make the calling function unreachable as well.

by imtringued

5/20/2026 at 1:32:02 PM

> The problem with C UBI is that originally it meant the compiler has the freedom to map your code to the hardware inspite of machine instructions differing slightly between one another. The same C program may express different behaviour depending on which architecture it is running on.

You're conflating undefined behavior with implementation-defined behavior. If it was only to do with what we think of as normal variance between processors, then it would be easy to make it implementation-defined behavior instead.

The differentiating factor of undefined behavior is that there are no constraints on program behavior at that point, and it was introduced to handle cases where processor or compiler behavior cannot be meaningfully constrained. One key class is of course hardware traps: in the presence of compiler optimizations, it is effectively impossible to make any guarantees about program state at the time of a trap (Java tried, and most people agreed they failed); but even without optimizations, there are processors that cannot deliver a trap at a precise point of execution and thus will continue to execute instructions after a trapping instruction.

by jcranmer

5/20/2026 at 8:42:31 AM

But that seems obvious. You can't load an integer from an unaligned address.

It's not only C-level is it. There's no (guarantee across architectures for) machine code for that either.

by tovej

5/20/2026 at 8:54:50 AM

> You can't load an integer from an unaligned address.

You can, and the results are machine specific, clearly defined and well-documented. Ancient ARM raises an exception, modern ARM and x86 can do it with a performance penalty. It's only the C or C++ layer that is allowed to translate the code into arbitrary garbage, not the CPU.

by codeflo

5/20/2026 at 11:24:54 AM

There’s usually not a performance penalty on modern hardware

by saagarjha

5/20/2026 at 2:41:53 PM

There's typically only a performance penalty if the unaligned load spans a cache line on modern hardware.

by orlp

5/20/2026 at 9:02:16 AM

Sure you can. In many architectures it works just fine. Works perfectly in x86_64, for example. It's just a little slower.

by matheusmoreira

5/20/2026 at 9:34:09 AM

In many architectures does not mean you can. The standard is supposed to cover all architectures.

by tovej

5/20/2026 at 10:29:11 AM

If some architecture traps on unaligned access, then the compiler can and should simply generate the correct code so that it loads the integer piece by piece instead. Load multiple integers and shift and mask away the irrelevant bits, done. This is exactly what modern architectures already do in hardware. Works, it's just a little slower.

This is exactly what the compilers do if you use a packed structure to access unaligned data. Works everywhere, as expected. Compilers have always known what to do, they just weren't doing it. C standard says no.

The fact is the standard is garbage and the first thing every C programmer should learn is that they can and should ignore it. There is never any reason to wonder what the standard is supposed to do. The only thing that matters is what compilers actually do.

by matheusmoreira

5/20/2026 at 6:29:23 PM

> If some architecture traps on unaligned access, then the compiler can and should simply generate the correct code so that it loads the integer piece by piece instead.

Wouldn't the compiler have to assume that every pointer access might be unaligned and do the slow "piece by piece" access every time? It can hardly guess the runtime value of a pointer during compilation.

by josefx

5/21/2026 at 6:18:33 AM

It should be able to make a lot of inferences. For example, taking the address of some value allocated by the compiler itself results in an aligned pointer unless the programmer overrides it. Compiler should be able to trace it from there. Pointers from malloc are also aligned.

If compiler is not doing it for some reason, __builtin_assume_aligned can be used to explicitly mark a pointer as aligned.

by matheusmoreira

5/20/2026 at 11:07:15 AM

But if it's a pointer, the compiler doesn't know the alignment at compile time. Should the compiler insert an alignment check of every pointer access?

by da-alex

5/20/2026 at 11:23:12 AM

Compilers could add support for an unaligned attribute that we can apply to pointers. I'd prefer that to wrapping everything in a packed structure which is quite unsightly.

Would have been better if correct behavior was the default while pointer alignment requirements were opt in, just like vector stuff. Nothing we can do about it now.

I would hope the compiler is smart enough to figure out which accesses are aligned and unaligned on its own.

by matheusmoreira

5/20/2026 at 10:56:42 AM

The pointer might be something you forced. The compiler needs to do the right thing but if you set the pointer to an unaligned address because you have information on the hardware you can get this undefined situation with nothing the compiler can do about it.

by bluGill

5/20/2026 at 11:04:54 AM

Any reason the hardware pointer can't be accessed via the packed structure?

https://news.ycombinator.com/item?id=48205371

by matheusmoreira

5/20/2026 at 11:26:27 AM

The same reason you probably aren’t adding manual alignment fixes to your code?

by saagarjha

5/20/2026 at 11:44:07 AM

No reason at all, then. Because I am manually dealing with alignment in my code.

Wrote a lisp, its bytes type supports reading and writing integers at arbitrary locations within the buffer. Test suite exercises aligned and unaligned memory access for every C integer type. Also wrote my own mem* functions, dealing with alignment in those was certainly a fun exercise. It wasn't necessary, I just wanted the performance benefits.

by matheusmoreira

5/20/2026 at 11:19:47 AM

however you certainly can do that. The point of unaligned is the hardware can't load it from a single memory location in one address. It needs two accesses. And in that time, the value of one of the two addresses that the hardware has to load can change.

I would hope you're not so stupid as to design hardware that relies on this, but the fact is it certainly is possible for someone to do that. And if you do that, there is nothing that the compiler or the standard can do. It can't be done correctly

by bluGill

5/20/2026 at 11:33:55 AM

Yeah, the unaligned accesses aren't going to be atomic unless the hardware supports it.

> And in that time, the value of one of the two addresses that the hardware has to load can change.

You mean volatile addresses that could spontaneously change in the middle of the reads? Like memory mapped I/O addresses?

I would expect these to have stricter access requirements than arbitrary general purpose memory locations.

> I would hope you're not so stupid as to design hardware that relies on this

You and me both.

> And if you do that, there is nothing that the compiler or the standard can do. It can't be done correctly

Anything that does that is broken and terrible anyway. It really shouldn't contaminate language design. It's the sort of thing that compilers should be adding attributes for, rather than constraining the language to the point nothing works correctly and making us use attributes on everything to restore some sane baseline behavior.

by matheusmoreira

5/20/2026 at 11:54:52 AM

> Anything that does that is broken and terrible anyway

which is why it is undefined behaviour. the optimizer writers have told me consistently that if they can assume you're not doing this thing that's stupid anyway, they can make my code faster. And since I'm not doing that stupid thing anyway, I want my code to be faster.

by bluGill

5/20/2026 at 12:03:24 PM

Unaligned memory access isn't really stupid though. Not in the general case. Not to the point where it should give the compiler free reign to crash things or introduce security holes. It should just introduce a performance regression instead, which is a tractable problem. Just measure it and fix it by making things aligned.

Compilers can add some custom attributes that encode whatever semantics the badly designed hardware requires. This lets it freely break incorrect code in the small sections that are actually handling those special variables, while allowing the rest of the language to make sense.

by matheusmoreira

5/20/2026 at 7:20:57 PM

> If some architecture traps on unaligned access, then the compiler can and should simply generate the correct code so that it loads the integer piece by piece instead.

LMAO what?!

The compiler should pessimize each and every memory access everywhere with an alignment check on the pointer and a branch, or forego the efficient memory access method of the platform entirely and just do bytewise loads only?!

by mike_hock

5/21/2026 at 5:37:54 AM

Unaligned access. Not every access. Compiler should be able to analyze code, determine alignment invariants and optimize everything it can. If not, __builtin_assume_aligned could help whenever it needs to be made explicit. Alignment should have been part of the type itself to begin with but there's no fixing that now.

by matheusmoreira

5/20/2026 at 11:19:49 AM

That's why we write C instead of assembly, isn't it?

You could also mandate that a compiler for architectures without unaligned access either has to prove that the access is going to be aligned or insert a wrapper to turn the unaligned access into two aligned ones.

Just pretending the issue doesn't exist at all and making it the programmer's problem by leaving it as UB in the spec is a choice.

by crote

5/20/2026 at 8:52:51 AM

Unless your code targets some exotic architecture, like idk x86.

by mbel

5/20/2026 at 10:24:05 AM

Not really. Wait until the compiler starts vectorizing your code and using instructions requiring alignment (like the ones with A or NT in the mnemonic).

by cataphract

5/20/2026 at 11:26:54 AM

Usually the compiler will probably not generate those

by saagarjha

5/20/2026 at 5:08:31 PM

> Usually...probably...

you're betting against the compiler ever improving.

by bigfishrunning

5/20/2026 at 6:32:32 PM

This would be a regression

by saagarjha

5/20/2026 at 8:34:03 PM

Why? Automatic vectorization is pretty bad and has been for years, but wouldn't it be nice if the compiler could unroll-loops and use SIMD instructions to make your code faster while also being correct?

by bigfishrunning

5/21/2026 at 7:28:20 AM

If you are using SIMD instructions on x86 you probably want the unaligned ones

by saagarjha

5/20/2026 at 8:54:46 AM

You missed the point: the pointer existing as a value of that type at all is UB, even if you never try to access anything through it and no corresponding machine code is ever emitted.

by pjc50

5/20/2026 at 9:40:18 AM

Yes? I agree with that. I don't really see the issue there. The computer will allocate data in aligned addresses, so you would have to be doing something weird to begin with to access unaligned pointers. And aligned access is always better anyway. I guess packed structs are a thing if you're really byte golfing. Maybe compressed network data would also make sense.

But then I would assume you are aware of unaligned pointers, and have a sane way to parse that data, rather than read individual parts of it from a raw pointer.

I am curious, what would be a legitimate reason for an unaligned pointer to int?

by tovej

5/20/2026 at 1:58:21 PM

String search algorithms would be one example, where a 64-bit register can be used as a “vector” containing 8x1 bytes.

by simonask

5/20/2026 at 3:00:29 PM

Where is the part about unaligned pointers?

by jstimpfle

5/20/2026 at 6:37:20 PM

Strings typically consist of UTF-8 bytes, and any old `char*` pair has no alignment guarantees.

by simonask

5/20/2026 at 10:01:23 PM

That's true, and that's why your typical string vector code has a prelude and a postlude to do the incomplete chunks at the ends. Between the ends, it's processing larger self-aligned chunks.

by jstimpfle

5/21/2026 at 8:26:51 AM

If you're aware of that technique, why were you asking about use cases for unaligned loads?

by simonask

5/21/2026 at 8:43:45 AM

I'm saying that string search algorithms are _not_ a legitimate use case for unaligned loads.

by jstimpfle

5/21/2026 at 4:44:03 PM

You didn’t really say that, but feel free to share any reasons you might have to think so.

I don’t see any reason why it wouldn’t be perfectly fine on recent hardware, where unaligned loads are just as fast, and the cache pressure is identical for a linear search algorithm.

by simonask

5/21/2026 at 7:37:25 PM

I asked where is the part about unaligned pointers in your string processing example. Saying that you want to load multiple bytes at a time does not imply at all that you have to do unaligned loads.

Doing unaligned loads using SSE or AVX might have been possible on Intel architectures for a long time, but it is still a little bit slower afaik. But anyway when you get into sub-architecture specific details like that, you've essentially left C-land, and you're essentially doing assembler level programming.

by jstimpfle

5/20/2026 at 7:36:32 AM

The 5 stages of learning about UB in C:

-Denial: "I know what signed overflow does on my machine."

-Anger: "This compiler is trash! why doesn't it just do what I say!?"

-Bargaining: "I'm submitting this proposal to wg14 to fix C..."

-Depression: "Can you rely on C code for anything?"

-Acceptance: "Just dont write UB."

by quelsolaar

5/20/2026 at 9:11:22 AM

What stage is the "just make the compiler define the undefined" stage?

Unaligned access? Packed structs. Compiler will magically generate the correct code, as if it had always known how to do it right all along! Because it has, in fact, always known how to do it right. It just didn't.

Strict aliasing? Union type punning. Literally documented to work in any compiler that matters, despite the holy C standard never saying so. Alternatively, just disable it straight up: -fno-strict-aliasing. Enjoy reinterpreting memory as you see fit. You might hit some sharp edges here and there but they sure as hell aren't gonna be coming from the compiler.

Overflow? Just make it defined: -fwrapv. Replace +, -, * with __builtin_*_overflow while you're at it, and you even get explicit error checking for free. Nice functional interface. Generates efficient code too.

The "acceptance" stage is really "nobody sane actually cares about the C standard". The standard is garbage, only the compilers matter. And it turns out that compilers have plenty of extremely useful functions that let you side step most if not all of this. People just don't use this because they want to write "portable" "standard" C. The real acceptance is to break out of that mindset.

Somehow I built an entire lisp interpreter in freestanding C that actually managed to pass UBSan just by following the above logic. I was actually surprised at first: I expected it to crash and burn, but it didn't. So if I can do it, then anyone can do it too.

by matheusmoreira

5/20/2026 at 12:17:00 PM

A lot of the Central UB can not be defined, because they rely on detection. In order to have a well defined behaviour (by the standard or the compiler) the implementation needs to first detect that the behaviour is triggered, this is often very tricky or expensive. Its easy to define that a program should halt, if it writes outside an array, but detecting if it does can be both slow and hard to implement. There are implementations that do, but they are rarely used outside of debugging.

A better way to think about UB is as a contract between developer and implementation, so that the implementations can more easily reason about the code. How would you optimize:

(x * 2) / 2

An optimizer can optimize this out for a signed integer, because it doesn't have to consider overflow, but with a unsigned integer it can not. UB is a big reason why C is the most power efficient high level language.

by quelsolaar

5/20/2026 at 12:28:13 PM

> How would you optimize: (x * 2) / 2

I'd do the math myself and just write x.

I don't even use * for multiplication anymore, I use __builtin_mul_overflow and then check the result. Anyone who doesn't is gonna hit the overflow case one day, and they'll be lucky if their program isn't exploited because of it. I've been making an effort to use all the overflow checking builtins by default in most if not all cases. I've also been making Claude audit every single bare arithmetic operation in my projects. He's caught quite a few security issues already, and overflow checking dealt with them all.

This particular contract between developer and implementation is totally worthless and doing more harm than good. It encompasses regular everyday normal things like multiplication and addition. All things that our brains literally rely on in order to reason about the code. Can't even add numbers without the compiler screwing it up.

Programmers need to deal with overflow at all times. Can't calculate an offset without dealing with overflow. Can't calculate a size without dealing with overflow. It's simply everywhere in systems programming, which is what C was designed to do. The consequence of ignoring this is usually that your program gets mercilessly exploited.

All this for some efficiency gains. The cost/benefit analysis is way off here. Things should be correct, first and foremost. Then the compiler should give us the necessary sharp tools to make it fast, if needed. It shouldn't be making it fast at the cost of turning the entire language into a memetic vulnerability machine.

by matheusmoreira

5/21/2026 at 11:44:35 AM

The thing with (x * 2) / 2 is that for all practical purposes you might even have written something else, so the expression cannot be replaced by x directly.

What happens is that after a few common expression eliminations, peephole optmisations, code inlining, and possibly other optimisation passes, the remaining AST will be (x * 2) / 2, and then the magic happens.

by pjmlp

5/20/2026 at 1:14:06 PM

The things you want from C isn't C. Id advice you to use another language.

by quelsolaar

5/20/2026 at 1:18:52 PM

No. I like C. I've learned about a dozen languages by now. I always end up coming back to C. I've just accepted it.

There is no reason whatsoever that C can't be improved. Compiler attributes and builtins are already doing quite a lot of heavy lifting. Recent addition: counted_by, an attribute that allows compilers to properly track the size of memory referenced by pointers. All C programmers should be making liberal use of this stuff.

by matheusmoreira

5/20/2026 at 4:47:41 PM

> Strict aliasing? Union type punning. Literally documented to work in any compiler that matters, despite the holy C standard never saying so.

It does say so, actually, since C99 TC3 (DR 283).

by Georgelemental

5/20/2026 at 9:18:03 AM

> Unaligned access? Packed structs.

Packed structs are dangerous. You can do unaligned accesses through a packed type, but once you take the address of your misaligned int field, then you are back into UB territory. Very annoying in C++ when you try to pass the a misaligned field through what happens to be generic code that takes a const reference, as it will trigger a compiler warning. Unary operator+ is your friend.

by gpderetta

5/20/2026 at 9:58:33 AM

> but once you take the address of your misaligned int field

Gotta work with the structure directly by taking the address of the packed structure itself.

  struct uu64 {
      u64 value;
  } __attribute__((packed));

  struct uu64 unaligned;
  struct uu64 *address = &unaligned;

  address->value; // this works

  u64 *broken = &address->value; // this doesn't

Taking the address of the field inside the structure essentially casts away the alignment information that was explicitly added to stop the compiler from screwing things up. So it should not be done.

Mercifully, both gcc and clang emit address-of-packed-member warnings if it's done. So the packed structures are effectively turning silently broken nonsense code into sensible warnings. Major win.

by matheusmoreira

5/20/2026 at 9:22:25 AM

> What stage is the "just make the compiler define the undefined" stage?

It can be left as implementation defined, which means that the compiler can't simply do arbitrary things, it needs to document what it would do.

Take, for example, signed-integer overflow: currently a compiler can simply refuse to emit the code in one spot while emitting it in another spot in the same compilation unit! Making it IB means that the compiler vendor will be forced to define what happens when a signed-integer overflows, rather than just saying, as they do now, "you cannot do that, and if you do we can ignore it, correct it, replace it or simply travel back in time and corrupt your program".

> Somehow I built an entire lisp interpreter in freestanding C that actually managed to pass UBSan just by following the above logic. I was actually surprised at first: I expected it to crash and burn, but it didn't. So if I can do it, then anyone can do it too.

Same here; I built a few non-trivial things that passed the first attempt at tooling (valgrind, UBsan with tests, fuzzing, etc) with no UB issues found.

by lelanthran

5/20/2026 at 10:14:46 AM

Completely agree. It can, and I think it's extremely annoying that it wasn't.

So we have the next best thing: builtins and flags. So long as those cover all the undefined behavior there is, we can live with it. Compiler gets to be "conformant" and we get to do useful things without the compiler folding the code into itself and inside out.

by matheusmoreira

5/20/2026 at 7:38:42 PM

One of these days you should write a blogpost about this.

by MrBuddyCasino

5/20/2026 at 5:50:14 PM

Lost in the submission attempt to WG14.

by pjmlp

5/20/2026 at 1:51:31 PM

> People just don't use this because they want to write "portable" "standard" C

Something that bothers me is the Venn diagram of people that think abstraction is slow and error prone and people that only write portable C.

How many C implementations do you actually need to compile against? I don't think I've seen more than 3 outside Unix software from the 90s. Using non portable extensions is in fact totally doable for your application and you should probably do it, and just duplicate/triplicate code where you have to. It's not that hard to write and not hard to read.

by duped

5/20/2026 at 5:54:04 PM

Back when I still wrote C at work, it meant Aix xlC, HP-UX aCC, Solaris Forte, Red-Hat Linux GCC, Windows MSVC and C++ Builder.

Nowadays most are indeed clang and GCC forks, or MSVC.

by pjmlp

5/21/2026 at 2:08:39 AM

That's what I mean, I've seen enough autoconf "checking for <feature that totally works in every compiler you care about>" noise to know it's mostly pointless in this day in age.

by duped

5/20/2026 at 9:08:59 AM

Author here.

> -Acceptance: "Just dont write UB."

The point of my article is that this is not possible. This cannot be our end state, as long as humans are the ones writing the code. No human can avoid writing UB in C/C++.

by thomashabets2

5/20/2026 at 10:23:36 AM

It's honestly not that difficult to be rigorous. The things you mentioned in the blog post are pretty obvious forms of degenerate practices once you get used to seeing them. The best way to make your argument would be to bring up pointer overflow being ub. What's great about undefined behavior is that the C language doesn't require you to care. You can play fast and loose as much as you want. You can even use implicit types and yolo your app, writing C that more closely resembles JavaScript, just like how traditional k&r c devs did back in the day under an ilp32 model. Then you add the rigor later if you care about it. For most stuff, like an experiment, we obviously don't care, but when I do, I can usually one shot a file without any UB (which I check by reading the assembly output after building it with UBSAN) except there's just one thing that I usually can't eliminate, which is the compiler generating code that checks for pointer overflow. Because that's just such a ridiculous concept on modern machines which have a 56 bit address space. Maybe it mattered when coding for platforms like i8086. I've seen almost no code that cares about this. I have to sometimes, in my C library. It's important that functions like memchr() for example don't say `for (char *p = data, *e = data + size; p<e; ...` and instead say `for (size_t i = 0; i < n; ++i) ...data[i]...`. But these are just the skills you get with mastery, which is what makes it fun. Oh speaking of which, another fun thing everyone misses is the pitfalls of vectorization. You have to venture off into UB land in order to get better performance. But readahead can get you into trouble if you're trying to scan something like a string that's at the end of a memory page, where the subsequent page isn't mapped. My other favorite thing is designing code in such a way that the stack frame of any given function never exceeds 4096 bytes, and using alloca in a bounded way that pokes pages if it must be exceeded. If you want to have a fun time experiencing why the trickiness of UB rules are the way they are, try writing your own malloc() function that uses shorts and having it be on the stack, so you can have dynamic memory in a signal handler.

by jart

5/20/2026 at 12:05:51 PM

> It's honestly not that difficult to be rigorous.

Ok, let's try it. I pointed GPT 5.5 at the smallest part of cosmopolitan as I could find in two seconds, net/finger. 299 lines.

describesyn.c:66: q + 13 constructs a pointer that can point well beyond the array plus one element.

C23 6.5.6p9:

> If the pointer operand and the result do not point to elements of the same array object or one past the last element of the array object, the behavior is undefined

Now… you may be trolling, but I do feel like this disproves your assertion. Not you, not me, not Theo de Raadt, can avoid UB.

> the compiler generating code that checks for pointer overflow.

Do you need to check for that specifically? What pointer are you constructing that is not either pointing at a valid object correctly aligned (not UB), or exactly one past the element of an array?

Do you mean for the latter, in case you have an array that ends on the maximum expressible pointer address?

I'm a bit unclear on what you mean by "pointer overflow". From mentioning 56 bit address spaces I'm guessing you mean like the pointer wrapped, not what I pointed to in cosmopolitan, above?

Ok, to be clear that it's not just that one type, if you forgive that one:

net/http/base32.c:64: read sc[0] even if sl=0. I assume this is never called with sl=0, so could be fine.

net/http/ssh.c:355: pointer address underflow? Should that be `e - lp`?

net/http/ssh.c:209/229: double destroy of key. can this code path have non-null members, meaning double free? Looks like it, since line 207 does the parsing and checks that parse worked.

net/http/ssh.c:123: uses memset, which assumes that it sets member variable pointers to NULL (per my post, depending on that means depending on UB), and later these pointers are given to free(), so that's UB.

I won't look deeper into net/http, but presenting just the possibly incorrect remaining comments from jippity:

  - ssh.c:211 and parsecidr.c:44: length-taking APIs use unbounded strstr() / strchr(), so explicit n with non-NUL-terminated input can read beyond the buffer.

  - tokenbucket.c:77 and tokenbucket.c:92: x >> (32 - c) is UB for c == 0 and for out-of-range c.

  - isacceptablehost.c:68: long numeric host labels can overflow signed int b before the function eventually rejects/accepts the host.

by thomashabets2

5/20/2026 at 12:37:14 PM

> For most stuff, like an experiment, we obviously don't care, but when I do, I can usually one shot a file without any UB (which I check by reading the assembly output after building it with UBSAN)

Does this depend on the project, or part of a project? I'm wondering how far that scales, I don't know labor intensive it is -- maybe you can just look at the output and see that nothing funny is happening?

by andai

5/20/2026 at 11:41:28 AM

"Just don't write UB" sounds like still part of the bargaining stage at best

by frollogaston

5/20/2026 at 7:28:44 PM

Just work on embedded devices like I did lol. It's so nice to write software targeting a specific cpu.

by superxpro12

5/20/2026 at 9:21:15 AM

In C, acceptance is "I will write UB and it will eventually lead to something bad happening"

by im3w1l

5/20/2026 at 7:56:38 AM

> -Acceptance: "Just dont write UB."

Just switch to a saner language.

And before I get attacked for being a Rust shill, I meant Java :P

The bar is so low it's floating near the center of the Earth.

by Ygg2

5/20/2026 at 8:30:01 AM

> And before I get attacked for being a Rust shill, I meant Java :P

If all you want is C but less insane then the obvious answer here is Zig.

by dns_snek

5/20/2026 at 9:00:49 AM

Zig is cool, but it is not even close to being ready for prime-time. It will be pre-1.0 for a while, and major breaking changes are still happening.

by simonask

5/20/2026 at 9:22:31 AM

Sure, maybe don't bet your entire company on mountains of Zig code just yet, but aside from the breaking changes it's been perfectly usable and suitable for every project I've ever wanted to work on.

by dns_snek

5/20/2026 at 5:55:46 PM

Object Pascal, with 40 years of experience, no need to wait for 1.0.

by pjmlp

5/20/2026 at 10:07:50 AM

If someone is switching from C because it's too easy to trigger undefined behavior, picking one of the few other not memory safe languages is missing the point.

by AgentME

5/20/2026 at 9:08:35 AM

If all somebody want is a programming language than C/C++ on these matter, there are plentiful options of the shelf to pick from.

If all somebody want is a turn key replacement to C/C++ ecosystem, then there is nothing like that in the world that I’m aware of.

by psychoslave

5/20/2026 at 7:59:44 AM

> Just switch to a saner language.

And where's the fun in that?

by p2detar

5/20/2026 at 9:16:35 AM

That’s a taste matter. Being recalled that what is expressed is always depending on some technical details on every move, this is great when one is loving technical details and have all the leisure time to pay attention to them. This is going to be hell compared to sound defaults for someone willing to focus on delivering higher order feature/functionality which will most likely work just fine.

Unedefined behaviour means "we couldn’t settle on a best default trade-off with fine-tuning as a given option so we let everyone in the unknown".

by psychoslave

5/20/2026 at 8:12:43 AM

[flagged]

by xeyownt

5/20/2026 at 8:40:08 AM

Okay, so Java compiles to machine code now?

Because the last time I looked it appeared to need some godawful slow bytecode interpreter that took up thousands of kilobytes of RAM.

by ErroneousBosh

5/20/2026 at 8:52:00 AM

If you don't like JIT/JVM there's GraalVM Native Image.

https://www.graalvm.org/latest/reference-manual/native-image...

In the past you could use e.g. Excelsior JET.

by elch

5/20/2026 at 11:10:01 AM

Great, can you fit it into 768 bytes of flash and 64 bytes of RAM?

by ErroneousBosh

5/20/2026 at 11:38:02 AM

It isn't 1970 anymore. You can get 32-bit ARM MCUs with tens of kilobytes of flash and multiple kilobytes of RAM for less than 10 cents.

We've long since reached a point where chips are cheap enough to be disposable. They are included in paper transit tickets and price tags. There is basically no market left where your volume is small enough that custom application-specific ICs aren't an option, but your volume is large enough that the cost of a few additional kilobytes of memory isn't massively outweighed by the developer time saved.

Want several megabytes of RAM and flash to run Java? That's the price of a cup of coffee!

by crote

5/20/2026 at 12:13:15 PM

> It isn't 1970 anymore. You can get 32-bit ARM MCUs with tens of kilobytes of flash and multiple kilobytes of RAM for less than 10 cents.

Do they run at single-digit nA current draw?

by ErroneousBosh

5/20/2026 at 2:46:17 PM

You always could find deep niche where any high-level technology is not suitable.

I don't think you will program such device in C, rather in assembly, right? When you have like memory for 500 commands, it is easier to go directly to assembler, anyway, with such hardware as a target you don't need portability, this code is 100% hardware-dependable, at it is perfectly Ok.

BTW, which uC your have in mind when you talk about single-digit nA draw (in running state? in deep sleep?), because old 8-bit architectures typically are designed for older node processes and not as energy effective as new one, and draw in sleep doesn't depend much on RAM or FLASH size or architecture, it is more design philosophy.

Anyway, PIC16LF (20nA in deep sleep) or 8051 clone (50nA in deep sleep) or STM8 (~0.30 uA in halt) or ATtinys (100nA in deep sleep), which are covered by "768 bytes of flash and 64 bytes of RAM" description are comparable with EFM32 ARM32-M0+ (20nA in deep sleep), same with uA/Mhz, but ARM32-M0+ will do much more work for each Mhz, so it will be more efficient in the end (faster does all work and go to sleep again).

by blacklion

5/20/2026 at 2:35:25 PM

> Because the last time I looked it appeared to need some godawful slow bytecode interpreter that took up thousands of kilobytes of RAM.

Did you looked at java 1.2 at 1998 last time? Because after that there is compiler which produce some very efficient profile-guide-optimized code and do tricks like de-virtualization which is not possible with static compiler with support of multiple compilation units (like C++).

Really, there was time in history when HotSpot-compiled JVM bytecode was faster than everything that gcc could produce for comparable tasks. Yes, now this gap is reversed again, as both gcc and clang become much more clever, but still gap is not very wide now.

by blacklion

5/20/2026 at 5:56:48 PM

For 26 years already, it is a matter of choosing the right JDK.

by pjmlp

5/20/2026 at 8:56:16 AM

Java has been jitted for .. decades?

by pjc50

5/20/2026 at 10:18:06 AM

You know what JIT means, right? It means that is is not compiled from the start and indeed runs on a bytecode interpreter until the JIT compiler kicks in.

by Hendrikto

5/20/2026 at 10:38:17 AM

The java JIT has produced sufficiently fast code for all but the most demanding of HPC applications for going on 20 years. I realize keeping up with new developments can be difficult but the out of date java performance memes are entirely ridiculous by now.

Meanwhile half the world appears to run on cpython of all things.

by fc417fc802

5/20/2026 at 2:34:02 PM

My life for a browser that doesn't jitter and tear when scrolling or a terminal emulator that can actually process data near the speed my hardware can handle.

by agentultra

5/20/2026 at 3:08:18 PM

Yes, the JIT compiler compiles code. Yes, the results are good. That does not change the fact that the JVM still has and uses a bytecode interpreter, which the comment I replied to disputed.

by Hendrikto

5/20/2026 at 10:01:22 AM

> -Denial: "I know what signed overflow does on my machine."

Or you just not skip the introductory pages, that tell you what the language philosophy of C is, and why there is UB. Yes, UB can be a struggle, but the first four steps are entirely unnecessary. It means that you do not actually understand the core concepts of the very same language you are using, which is kinda stupid.

by 1718627440

5/20/2026 at 10:27:06 AM

I think the issue has been that the line between de-jure and de-facto behaviours has shifted over the years as compiler optimizations suddenly began relying on de-jure intrepretations of UB to increase performance while ignoring de-facto usage of the language.

When that started happened people became alarmed (oMG UB iS TeH BAD!) and since some old UB machines still had industry support (of organisations that actually participated in ISO meetings instead of arguing online) there was never any movement on defining de-facto usage as de-jure and the alarmist position became the default.

Personally I think the industry would've benefited from a Boring C (as described by DJB) push by people that would've created a public parallell "de-jure" standard that would've had a chance to be adopted by compiler creators.

by whizzter

5/20/2026 at 10:32:44 AM

> I think the issue has been that the line between de-jure and de-facto behaviours has shifted over the years as compiler optimizations suddenly began relying on de-jure intrepretations of UB to increase performance while ignoring de-facto usage of the language.

I guess I am too young, and also too much a purist, because I start from the impression of what the language is, not what the implementations happen to do.

> Personally I think the industry would've benefited from a Boring C (as described by DJB) push by people that would've created a public parallell "de-jure" standard that would've had a chance to be adopted by compiler creators.

-O0

by 1718627440

5/20/2026 at 7:24:58 AM

The examples aren't really undefined behavior. They are examples that could become UB based on input/circumstances. Which if you are going to be that generous, every function call is UB because it could exceed stack space. Which is basically true in any language (up to the equivalent def of UB in that language). I feel like c has enough actual rough edges that deserve attention that sensationalism like this muddies folks attention (particularly novices) and can end up doing more harm than good.

by greysphere

5/20/2026 at 7:46:57 AM

Ada 83 has no UB on call stack overflow, from the reference manual :

http://archive.adaic.com/standards/83lrm/html/lrm-11-01.html

"STORAGE_ERROR This exception is raised in any of the following situations: (...) or during the execution of a subprogram call, if storage is not sufficient."

by guerby

5/20/2026 at 8:18:38 AM

So it's just as useful as when your stack area ends with a page that will segfault on access, or your CPU will raise an interrupt if stack pointer goes beyond a particular address?

It's not safe though because throwing an exception, panicking, etc, is still a denial of service. It's just more deterministic than silently overwriting the heap instead. If the program is critical then you need to be able to statically prove the full size of the stack, which you can do with C and C++ with the right tools and restrictions.

by veltas

5/20/2026 at 2:28:31 PM

You're mixing specification (a language reference manual) and implementation (a given compiler, target, options, ...).

The Ada language specification says the Ada programmer can expect any Ada compiler when used in fully compliant mode to properly raise STORAGE_ERROR when a stack overflow occurs.

Only the Ada compiler writer has to deal with this, not every single programmer on every single program and platform (the UB behaviour of some languages).

In the case of GCC/GNAT the compiler manual provides insight on how to be in compliant mode per target regarding stack overflow, what are the limitations if any. You have tools to monitor and analyze you Ada code in this respect too.

by guerby

5/20/2026 at 8:42:04 AM

Deterministic, well-defined behavior is inherently safer than undefined behavior. It allows you to diagnose the problem and fix it. UB emphatically does not, and I don't dare to think of how many millions of person-hours are wasted every year dealing with the results.

by simonask

5/20/2026 at 10:36:03 AM

A segfault is considered safe if you're talking about functional safety because it results in a return to a defined safe state (RTDSS).

If a segfault leads to some other state you do not deem "safe", such as a single program gating access to a valuable asset with a default fail state of "allow", you just have a fundamental design flaw in your system. The safety problem is you or your AI agent, not the segfault.

by bregma

5/20/2026 at 7:38:19 AM

That's not true at all.

First, you can define what happens when stack space is exceeded. Second not all programs need an arbitrary amount of stack space, some only need a constant amount that can be calculated ahead of time. (And some languages don't use a stack at all in their implementations.)

Your language could also offer tools to probe how much stack space you have left, and make guarantees based on that. Or they could let you install some handlers for what to do when you run out of stack space.

by eru

5/20/2026 at 7:39:37 AM

UB based on input can be an exploit vector.

by pjc50

5/20/2026 at 7:53:01 AM

Unvalidated input can always be an exploit vector.

by layer8

5/20/2026 at 7:58:18 AM

Except in C, validation of user input can in itself be an exploit vector.

by Ygg2

5/20/2026 at 8:24:47 AM

That’s true in other languages as well. Any programmatic task can end up being an exploit vector.

by layer8

5/20/2026 at 8:52:35 AM

No? That's the whole point of formal verification?

You can even kind of retrofit this to C. The classic example is "sel4". You just need a set of proofs that the code doesn't trigger UB. This ends up being much larger and more complicated than the C itself.

by pjc50

5/20/2026 at 11:34:27 AM

You can fail to verify something which you actually wanted to verify (i.e you made a proof of something else instead of the thing that mattered). See WPA2 KRACK as an example.

by rocketrascal

5/20/2026 at 3:11:05 PM

Yeah, but only in C* can those errors end up as more UB.

* terms and limits may apply.

by Ygg2

5/20/2026 at 8:04:23 AM

Turtles all the way down.

by greybeard69

5/20/2026 at 7:36:09 AM

The examples are unequivocally UB. Full stop.

How to think of this properly is that when you have UB, you are no longer under the auspices of a language standard. Things may work fine for a time, indefinitely even. But what happens instead is you unknowingly become subject to whimsies of your toolchain (swap/upgrade compilers), architecture, or runtime (libc version differences).

You end up building a foundation on quicksand. That's the danger of UB.

by stevenhuang

5/20/2026 at 7:46:12 AM

> The examples are unequivocally UB. Full stop.

Tbh, already the first example (unaligned pointer access) is bogus and the C standard should be fixed (in the end the list of UB in the C standard is entirely "made up" and should be adapted to modern hardware, a lot of UB was important 30 years ago to allow optimizations on ancient CPUs, but a lot of those hardware restrictions are long gone).

In the end it's the CPU and not the compiler which decides whether an unaligned access is a problem or not. On most modern CPUs unaligned load/stores are no problem at all (not even a performance penalty unless you straddle a cache line). There's no point in restricting the entire C standard because of the behaviour of a few esoteric CPUs that are stuck in the past.

PS: we also need to stop with the "what if there is a CPU that..." discussions. The C standard should follow the current hardware, and not care about 40 year old CPUs or theoretical future CPU architectures. If esoteric CPUs need to be supported, compilers can do that with non-standard extensions.

by flohofwoe

5/20/2026 at 9:00:05 AM

Not having unaligned access in the language allows the compiler to assume that, for basic types where the aligment is at least the size, if two addresses are different then they don't alias and writes to one can't change the result of reads from the other. That's a very useful assumption to be able to make for optimization - much more useful than yolocasting pointers in a way that could get you unaligned ones.

by account42

5/20/2026 at 11:06:13 AM

> if two addresses are different ...

Eh, if the compiler knows that two addresses are different at compile time, it also knows how big the difference is.

by flohofwoe

5/20/2026 at 11:31:17 AM

Usually this is not the case.

by saagarjha

5/20/2026 at 2:00:31 PM

Indeed one of the fun LLVM bugs is that it can arrive at a situation in which it believes pointer A and pointer B are definitely not equal (weird given what's about to happen but OK that's potentially fine...) then we ask for their addresses† as integers X and Y, LLVM insists those integers aren't equal either because the pointers weren't (which as we're about to see is wrong) and then we subtract X - Y or Y - X and the answer either way is zero. Awkward. The integers were definitely equal.

† Although on a real modern CPU the pointer "is" just an address, notionally it has three components, the address, an address space (modern machines typically only have one) and a "provenance".

by tialaramex

5/20/2026 at 8:54:51 AM

Undefined means that the ISO C doesn't define the behavior. An implementation is free to do so.

by leni536

5/20/2026 at 9:04:05 AM

If they do, that is no longer an implementation of C. It is a dialect of C, and there are many (GNU C being the most popular), but there are real drawbacks to using dialects.

This is in contrast to the other category that exists, which is "implementation-defined".

by simonask

5/20/2026 at 11:08:50 AM

The thing is that the actual compiler behaviour matters more for real-world projects than what the C standard says. E.g. the C standard was always retroactive, it merely tried to reign in wildly different compiler behaviour at the time when the standard was new. It mostly succeeded, but still the most useful C and C++ compiler features are living in non-standard extensions.

by flohofwoe

5/20/2026 at 1:17:22 PM

Unaligned access being fine in one architecture, but not in others would create separate dialects, regardless of being blessed by ISO C.

Just don't do unaligned access, it's a dialect that doesn't exist currently, and should never exist.

by leni536

5/20/2026 at 9:55:15 AM

> If they do, that is no longer an implementation of C.

This is plain wrong. Undefined behaviour, means the C standard specifies no restriction on the behaviour of the program, which is what the implementation chooses to emit. An implementation can very well choose to emit any program it pleases, including programs that encrypt your harddisk, but also programs that stick to well defined rules.

by 1718627440

5/20/2026 at 10:22:50 AM

Sure, but the point is that code written against such a compiler is not C and is not portable. It is written in a dialect of C, and that comes with drawbacks.

Writing C (or any language) means adhering to the standard, because that's the definition of the language.

by simonask

5/20/2026 at 11:52:58 AM

You can't make any useful software in "Portable C" - or any portable language for that matter.

Side effects matter, and they are always non-portable/implementation defined/dependent on the hardware.

What printf() actaully does is implementation defined - what does "printing mean", does a console even exist? Maybe a user expects it to show graphical ascii/utf8 glyphs on a LCD display? Well, not every computer has that, so now what?

by rocketrascal

5/20/2026 at 1:09:24 PM

I agree, that most practical programs will rely on unportable behaviour, but

> What printf() actaully does is implementation defined - what does "printing mean", does a console even exist? Maybe a user expects it to show graphical ascii/utf8 glyphs on a LCD display? Well, not every computer has that, so now what?

You can very well write a program, that doesn't make an assumption about any of those things. In fact you should, because the user is to be the arbiter of in what environment your program gets invoked and what it gets connected to. Writing a program that makes assumptions about the specific behaviour of stdout is going to be highly impractical and annoying and also violates the abstraction and interface that stdout is. This consideration isn't just valid for stdout, but also for any other interface your programs naturally interfaces with.

> Well, not every computer has that, so now what?

In the case stdout is not available or can't process your data it is going to return -1 and set errno and then you can deal with that.

by 1718627440

5/20/2026 at 11:58:04 AM

Maybe it’s a generation thing. Languages like ML and Lisp have many implementations, while newer languages like Perl and Python are steered by a single organization. It’s way easier for the latter to have a single source of truth.

The C standard reminds me of Posix. You have a rough guideline if you ever wanted to port a program, but you actually have to learn the new compiler and its actual behavior before doing so.

by skydhash

5/20/2026 at 8:08:49 AM

I agree. I meant to elaborate more on how to think of UB.

For most C software on x86_64, UB is "fine" with very strong bunny ears. But it is preferable for one to, shall we say, write UB intentionally rather than accidentally and unknowingly. Having an awareness of all the minefields lends for more respect for the dangers of C code, it makes one question literally everything, and that would hopefully result in more correct code, more often.

On that note, on some RISC-V cores unaligned access can turn a single load into hundreds of instructions.

I think the problem is just that C is under specified for what we expect a language to provide in the modern age. It is still a great language, but the edges are sharp.

by stevenhuang

5/20/2026 at 7:57:11 AM

There are still modern CPUs that don't support misaligned access. It would be insane for C to mandate that misaligned accesses are supported.

However I do agree that just saying "the behaviour is undefined" is an unhelpful cop-out. They could easily say something like "non-atomic misaligned accesses either succeed or trap" or something like that.

> In the end it's the CPU and not the compiler which decides whether an unaligned access is a problem or not.

Not just the CPU - memory decides as well. MMIO devices often don't support misaligned accesses.

by IshKebab

5/20/2026 at 9:57:04 AM

> They could easily say something like "non-atomic misaligned accesses either succeed or trap" or something like that.

That means that the compiler must emit the read, even if the value is already known or never used, as it might trap. There is a reason for the UB!

by 1718627440

5/20/2026 at 11:03:11 AM

No it doesn't. Compilers are only required to emit the read for volatile types. If the type is non-volatile, misaligned, and can be optimised out then it would be perfectly fine to omit it (that would be the "succeed" option).

by IshKebab

5/20/2026 at 11:34:41 AM

If a trap is observable behaviour, then the compiler either needs to add code, that checks for the condition and then traps explicitly or it needs to actually perform the read. Currently it can be optimized out, because it is UB.

by 1718627440

5/20/2026 at 1:10:03 PM

I think you misunderstood my suggestion. It isn't that misaligned accesses must either all succeed or all fail. That's not possible in general because of MMIO devices.

The suggestion is that each individual access must either succeed or trap. Those are the only possible outcomes, but different accesses can result in different outcomes.

by IshKebab

5/20/2026 at 12:36:47 PM

You're merely attacking his particular suggestion and using this as an argument to defend UB, when those are completely independent concerns.

What people want is for a compiler that assumes that all pointers are aligned to use an aligned store or load instruction whenever the compiler wants to issue such an instruction. There is no need for UB here.

In other words, they want the compiler to stick with the decision it made and not randomly say "I can't do the thing I've been doing correctly for decades, because that's UB, my hands are tied, I must ruin the code, there's no other way."

by imtringued

5/20/2026 at 8:08:26 AM

On hardware that doesn't support it, misaligned loads could be compiled to multiple loads and shifts. Probably not great for performance, and it doesn't work if you need it to be atomic, but it isn't impossible.

by thayne

5/20/2026 at 8:33:09 AM

That still requires detecting when a misaligned load happens.

by gizmo686

5/20/2026 at 11:05:44 AM

That is only really possible if you know the pointer is misaligned at compile time (which does happen, e.g. for packed structs). The examples in the article are for runtime misalignment. It would be crazy to generate code so that every function checked if every access was aligned at runtime.

(Note the normal way to handle that if the hardware doesn't actually support it is for the access to trap and then the OS or firmware emulates it.)

by IshKebab

5/20/2026 at 9:02:38 AM

For x86 SSE there are aligned instructions that will trap on unaligned access.

by account42

5/20/2026 at 2:25:20 PM

The first example is dereferencing an integer pointer. That is a valid operation. Now if that pointer isn't valid (and being unaligned is one of many reasons it could be invalid) then calling the function with that invalid pointer will be UB.

An honest discussion would be something more like 'dereferencing pointers can lead to UB on invalid pointers. Here are N examples of that. Maybe avoid using pointers. Maybe consider how other languages avoid pointers. Maybe these shouldn't be UB and instead some other class of error.' And then even more honest discussion would present the upsides of having pointers and the upsides of having these errors be UB.

Instead, the article (and your comment) take this valid operation and presents it as invalid. Imagine you're a new programmer, you are just starting to wrap your head around pointers and you stumble across this article. You see the first example and it looks exactly what you would expect a dereference to look like. But the article claims it's wrong, and now you're confused. So you dig into the article more closely and are exposed to all these terms like UB, alignment, type coercion etc and come away more confused and scared and disinclined to understand pointers. This is classic FUD. This is a technique to manipulate, not educate.

Pointers have pros and cons. UB has pros and cons. Let's try to educate people about them.

by greysphere

5/20/2026 at 4:19:04 PM

There is an important distinction here to the technical meaning of UB that is lost to many.

UB simply means the operation you are intending to perform has no defined semantic under the ISO C specification. That is all. Understand what this means but do not read further into it. It is easy to read further into this as you have and many do, and come to incorrect conclusions, and think this MUST result in incorrect behaviour, but this is not the claim. The claim is rather than once you write UB, you are no longer writing C the language with a defined spec, and that any manner of degrees of freedom (architecture, toolchain, etc) can now cause your code that was once behaving correctly to now behave incorrectly. That is the danger.

> That is a valid operation. Now if that pointer isn't valid (and being unaligned is one of many reasons it could be invalid) then calling the function with that invalid pointer will be UB.

This is incorrect. The moment you express this in source code, it is already UB wrt to the C abstract machine.

6.3.2.3. 755 If the resulting pointer is not correctly aligned for the pointed-to type, the behavior is undefined.

https://c0x.shape-of-code.com/6.3.2.3.html

The important distinction is to KNOW this is still UB; whether the operation yields the expected behaviour on your platform and architecture is completely a separate question.

The reason this is of utmost important is because the C compiler operates on the C abstract machine.

If you violate language invariants, the compiler can--keyword can--emit WRONG code and it will be CORRECT to do so because C unfortunately allows it to. When this happens it's silent and deadly and it's a pain to debug. The point of all this seeming language lawyering is not FUD, it is genuine frustration with these footguns of the language that we are trying to share with others. Understanding UB correctly really is what separates those that know C and those that "know" C.

Things will work and then they won't. This can be fine for most cases but not fine for others. If you use C in 2026 you need to understand this.

> come away more confused and scared

This is the correct take. One aught to be more confused and scared after learning about UB; the language simply leaves things under-specified and it is up to the developer to understand they are engaging in UB.

Once UB is acknowledged, one aught to impress upon themselves the software they build is dependent ever more on the whims of their particular compiler (clang/gcc), compiler flags (optimizations), architecture, and runtime environment.

by stevenhuang

5/20/2026 at 6:28:50 PM

Maybe I'm misunderstanding. Here is what I'm trying to say.

"Accessing an object which is not correctly aligned" - this is UB

"As an example of this, take this code: ..." - this (code) is not UB.

Is this incorrect somehow?

You could interpret the second sentence as 'under the assumption of an unaligned pointer, let's look at what this seemingly innocuous (and correct) code does.'

But that's not what they did. They presented that code as if it's incorrect (following the whole premise of the article 'Everything in c is UB'). That's what the whole article does, they take a topic with real concerns, then present 'normal' code, and then imply the code is the issue (and therefore the language), not the premise.

You know what would be better, show an example that clearly shows the complete path for the premise to the issue. Ie show some code that generates an unaligned pointer and then uses it. Why did the author not do that? Surprise, because it's actually pretty hard to write code that's 'guaranteed' unaligned behavior.

    int foo[10];
    int *bar = (int *)(((int)&foo) + 1);

Is this unaligned access? You don't know because you don't know the size of int. (Not to mention it looks ridiculous. By only showing 'reasonable' code as the example, the article suppresses the common 'uh just don't do that' criticism.)

And in fact the ambiguity of alignments and sizes is the whole point - they are given the privilege/footgun of being undefined in c so that compilers are easier to write. It's very debatable if this was/is a good idea, but that's where the debate should be, not illusorily ascribed to derefing pointers.

If I'm misunderstanding, please let me know. Specifically, if you're claiming (1) either the literal code in the first box of the article is UB, or (2) please write some literal code that is UB in the vein of the first claim of the article. I think that would help me bridge the gap that we seem to be having.

by greysphere

5/20/2026 at 7:59:08 PM

Edit: I think one part of the confusion is we were addressing different parts of the first example of the article. You were referencing the int foo(..) snippet (which I agree has no UB), but I was referencing the parse_packet() snippet (which has UB by construction), which was also part of the first example :).

You are beginning to understand. Yes, surprisingly, it is (1) that is being claimed.

The mere expression is alone UB. Yes, you read that right. In source code, it's already UB. Why? Because the ISO spec defined UB that way. But you see, what this means in practice ie whether "it works" is an entirely separate question and would be specific to toolchain, hardware, runtime, the alignment of the pointer in question, blah blah.

There is nuance here, and that's why this topic is debated to death, because it's hard to explain and it is genuinely complex.

When people say something is UB, they mean to say that the behaviour is undefined--wrt to ISO C.

The behaviour that actually matters IS defined wrt toolchain, hardware, runtime, alignment of pointer in question.

But that's exactly it--the latter is not what we mean when we say something is UB, when we say something is UB we are talking about the ISO C spec. The important follow up question then, when knowingly invoking UB, is to ensure your environment is "correct", because you have now crossed into realms entirely out of the auspices of the ISO C spec. Ergo, you are now in UB land; what you thought was the foundation of your codebase, the ISO C spec, has now turned into quicksand.

It is this implied undocumented dependence on factors external to the source code that is a huge source of bugs and surprisal.

So take this example from the article. Yes, it is UB by construction.

(edit: i copied the wrong fragment initially -- if you were talking about the int foo(const int* p) fragment, yes that block is not by construction UB)

    bool parse_packet(const uint8_t\* bytes) {
            const int\* magic_intp = (const int*)bytes;   // UB!
            int magic_raw = foo(magic_intp);  // Probably crashes on SPARC.
            int magic = ntohl(magic_raw); // this is fine, at least.
            […]
    }

Why?

> Because the compiler is not obligated to generate assembly instructions that work on unaligned pointers. Because it’s UB.

Does it actually work though? It might and it might not: there is simply no guarantee from the language. But that's all it says. It may very well work on your arch and platform and toolchain, indefinitely. But again circling back, for code written like this to be so brittle, that is why UB is to be avoided.

And to your point:

> that's where the debate should be, not illusorily ascribed to derefing pointers.

But that is where the debate is. People just do not understand what UB actually means. The article is correct: everything in C is UB. The takeaway is not that, therefore all C code is irredeemably broken (well, to some people it does mean that, anywho..). The takeaway is that most C code IS in fact more delicate than one may originally believe, because of the fact ISO C is under-specified, to allow for specialization dependent on toolchain/arch/hardware/what have you etc.

So it is incumbent on the developer when writing C to correctly acknowledge when they are invoking UB, and to do so intentionally with the awareness that things may just randomly break one day.

by stevenhuang

5/20/2026 at 9:05:11 PM

Thanks, yes I think we had some confusion on the foo() vs parse() and I was referring to foo().

But even for the parse() example, the issue is the aliasing rules (and not the alignment - though that could still be an issue depending on input!) Aliasing isn't even mentioned in the article. Instead the example presents this thing 'people do all the time' and identifies it 'UB' without even identifying the actual issue.

On its own I could forgive the former, making a precise example is tricky (particularly with alignment issues). But this is repeated: milliseconds() is characterized 'UB' because it's inputs could be outside the representable range. Again the function is not UB, the inputs can potentially trigger UB.

Then the function pointer example obfuscates the assignment (fine) with the call (ub). Despite the red herring statement 'NULL compares unequal to any object or function' as the example assigns a function _pointer_ which can be NULL. The honest example of the statement is:

    void foo() = NULL;

which won't even compile because it violates the thing that was just said (among other reasons). The UB is the call below and has nothing to do with equality with NULL.

The repeated pattern of say one thing, show an example that's 'reasonable' and implies that it's related to what was just said, and from that invalid relation conclude that everything is UB in C feels dishonest (particularly when it's so easy to talk about UB in C honestly because there is so much to be legitimately concerned about!)

I appreciate your and your advocacy about UB in C, and I think I agree with most of your points about it and they worth discussing. That's why the article itself is frustrating to me, we don't need to be tricksy when talking about UB, it's already tricky enough!

by greysphere

5/20/2026 at 10:06:57 PM

Yes, I sympathize with you that it's tricky enough. I didn't find the first example with foo() as the author trying deliberately to be tricksy (ie saying the snippet itself is UB by construction), but I certainly see how it can be read that way. It again lends to how hard all of this is to explain!

One thing though, for parse() not only is it violating strict aliasing, it is also indeed violating alignment requirements.

To quote the standard again:

> 6.3.2.3. 755 If the resulting pointer is not correctly aligned for the pointed-to type, the behavior is undefined.

by stevenhuang

5/20/2026 at 8:54:29 AM

Yes, this article is pretty much the definition of FUD.

by account42

5/20/2026 at 6:46:28 AM

The problem of UB is not really that it may crash in some architecture. The real problem is that the compiler expects UB code to NOT happen, so if you write UB code anyway the compiler (and especially the optimizer) is allowed to translate that to anything that's convenient for its happy path. And sometimes that "anything" can be really unexpected (like removing big chunks of code).

by bestouff

5/20/2026 at 7:27:15 AM

One example along this path as an example is that every function must either terminate or have a side effect. I don't think one has bitten me yet but I could completely see how you accidentally write some kind of infinite loop or recursion and the function gets deleted. Also, bonus points for tail recursion so this bug might only show up with a higher optimization level if during debug nothing hit the infinite loop.

by inkysigma

5/20/2026 at 5:04:32 PM

There is that famous example where when you write an infinite loop last thing in your main, a function that you never called runs instead.

by marcosdumay

5/20/2026 at 9:09:59 AM

Infinite loop without side effects == program stuck and not responding on user input and not outputting anything. That's not something a useful program will ever want to do.

by account42

5/20/2026 at 9:30:34 AM

Not true, C++ made it so trivial infinite loops are not UB because it turns out they do have legitimate uses.

https://lists.isocpp.org/std-proposals/2020/05/1322.php

https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2024/p28...

by Certhas

5/20/2026 at 9:37:37 AM

Yes, the C++ committee has been making some stupid decisions lately. This is not the only one.

Low level platform-specific code that needs to hot spin until an interrupt happens can use assembly for that part which it will need to do for the interrupt handler anyway.

by account42

5/20/2026 at 7:25:38 PM

You don't even need to use assembly for this, the wait for interrupt typically involves side effects.

by TuxSH

5/20/2026 at 9:28:48 AM

The problem is when you accidentally write an infinite loop. In a different language, you run the code, see that it gets stuck and fix it. In C, the compiler may delete the function, making it hard to realize what is happening.

by xigoi

5/20/2026 at 9:38:56 AM

This is not a problem that C or C++ programmers actually encounter, ever.

by account42

5/20/2026 at 2:51:05 PM

I actually encountered it a couple weeks ago.

Can you spot the infinite loop in this function?

  char* strcpy(char* restrict d, const char* restrict s) {
    stpcpy(d, s);
    return d;
  }

I'll help. A call to `stpcpy` that ignores the return value can be swapped with a call to the (more likely to be optimized) `strcpy`. Since that's infinite recursion, and there is no forward progress, it's undefined behavior and anything goes.

This isn't just theory, it actually broke things in practice for me.

by ncruces

5/21/2026 at 1:35:23 PM

Naming an externally linked function with the prefix "str" is by itself UB since that prefix is reserved for <string.h>.

by rssoconnor

5/21/2026 at 2:04:54 PM

Right. I am doing a standard library replacement, and forgot I needed to compile freestanding. Oops. My bad.

So `str` and `mem` are reserved. But then so are `to` and `is` (by <ctype.h>). Just forget about having a function named `is_valid_user`.

And so are `mtx_`, `cnd_`, `thrd_`, `atomic_`, `memory_`...

Which is why... everything in C is UB.

by ncruces

5/20/2026 at 10:04:31 AM

Note, that this is not true for C.

by 1718627440

5/20/2026 at 9:33:18 AM

https://9p.io/sources/plan9/sys/src/libc/9sys/abort.c

by zarzavat

5/20/2026 at 9:40:05 AM

This is already UB without an infinite loop.

by account42

5/20/2026 at 10:04:59 AM

That's only true in C++ though, not in C.

by 1718627440

5/20/2026 at 10:21:36 AM

C does allow unconditional infinite loops (e.g. "while (1) { }" isn't UB) but still is UB if the controlling expression isn't constant (e.g. "while (two < 10) { }" is UB if two is a variable less than 10)

by dzaima

5/20/2026 at 7:39:32 AM

Yes, a crash is about the most benign UB: at least it's highly visible.

In worse scenarios, your programme will silently continue with garbage, or format your hard disk or give attackers the key to the kingdom.

by eru

5/20/2026 at 10:11:37 AM

Yes, that is a problem, but this is also the most useful feature and reason for UB. People that suggest to just define it or make it unspecified, miss, that the compiler being able to remove whole parts of a program is the point. When I write code, that is UB for certain inputs, it is because I do not intend the program to have any behaviour for these inputs. I do want the compiler to optimize those away or do anything that effects from the behaviour of the other defined cases. It is deeply satisfying to add some conditions triggering log strings and see that they do not occur in the binary, because they can be only reached via UB.

by 1718627440

5/20/2026 at 8:31:16 AM

The point in the article that 'It's not about optimisations' really got my attention. I've previously done some work where we wrote an analysis pass under the assumption that it executed last in the transformation pipeline and this was needed for correctness. The assumption was that since no further optimisations happened it was safe. Now I'm not so sure...

by rando1234

5/20/2026 at 9:04:13 AM

That's a feature, not a problem.

by account42

5/20/2026 at 7:00:20 AM

Removing code paths that the programmer has explicitly laid out in the source code should be made a hard compile error unless the operation has been tagged with an attribute (anyone who wants to add the unsafe keyword to C? ).

Another commenter suggested using LLMs, but I disagree. Having clangd emit warning squiggles for unchecked operations (like signed addition) would be a good start.

by anilakar

5/20/2026 at 7:30:00 AM

> Removing code paths that the programmer has explicitly laid out in the source code should be made a hard compile error unless the operation has been tagged with an attribute (anyone who wants to add the unsafe keyword to C? ).

Dead code elimination is essential for performance, especially when using templates (this is basically what enables the fabled "zero cost abstraction" because complex template code may generate a lot of 'inactive' code which needs to be removed by the optimizer).

The actual issue is that the compiler is free to eliminate code paths after UB, but that's also not trivial to fix (and some optimizations are actually enabled by manually injecting UB (like `__builtin_unreachable()` which can make a measurable difference in the right places).

by flohofwoe

5/20/2026 at 10:13:44 AM

> The actual issue is that the compiler is free to eliminate code paths after UB

Not, that the compiler can also emit code paths before UB, as UB is a property of the whole program, not just of a single statement.

by 1718627440

5/20/2026 at 12:01:57 PM

> free to eliminate code paths after UB

before.

by peterfirefly

5/20/2026 at 7:19:50 AM

Dead code elimination is run multiple times, including after other optimizations. So code that is not initially dead may become dead after propagating other information. Converting dead code into an error condition would make most generic code that is specialized for a particular context illegal.

by amoss

5/20/2026 at 9:27:08 AM

Consider:

   enum op_t{ add, mul };
   int exec(op_t op, int a, int b) {
       if(op == add) { return a+b; }
       if(op == mul) { return a\*b; }
   }

   c = exec(add, a,b);

Should be the compiler be prevented from inlining exec and constant-propagating op and removing the mul branch? What about if a and b are constants and the addition itself is optimized away?

by gpderetta

5/20/2026 at 7:10:15 AM

This is trickier than it initially seems. Using preprocessor directives to include or exclude swaths of code is a very common thing, and implementing a compiler error as you described would break the building of countless C codebases.

by 4gotunameagain

5/20/2026 at 9:02:53 AM

I have never in my 20 years of writing C heard so much about undefined behavior as I have in the past 6 months on Hacker News. It has never entered the conversation. You write the code. If it doesn't work, you debug it and apply a fix or a workaround. Why does the idea of undefined behavior in C get to the front page so consistently?

by parasti

5/20/2026 at 10:27:39 AM

Hacker News is still skewed towards people interested in programming languages (as opposed to actually programming). Probably some sort of Y-combinator Lisp heritage. There's also a persistent minority of CS grads who think that developing / using new programming languages is the most fascinating thing in the world, and some of them hold on to that thought.

It's reasonable that such people would also be interested in design aspects of languages, and UB in C is in that field. Though I would argue that a lot of it was originally accommodating old CPU architectures without compromising performance too badly, and about as much a "design choice" as wheels being round...

by summa_tech

5/20/2026 at 3:36:10 PM

There was also a period around the mid-2010s where I had the strong impression that lots of younger ambitious devs were fanatically promoting rust against C's undefined behavior mostly because it gave them a way to differentiate themselves from older seniors within organizations. (And I say this not as an old C diehard, but as someone who watched more than one colleague position himself as the 'rust guy'.)

by defgeneric

5/20/2026 at 9:52:37 AM

Excuse me, what? I was writing both C and C++ 20 years ago, and UB was a huge part of the conversation (and the curriculum) back then as well.

There were a few high-profile "scandals" around GCC 3.2 (IIRC) because the compiler finally started much more aggressively using UB in optimizations, which was a reason that lots of people stayed on GCC 2.95 for a very long time. GCC 3.2 came out in 2002.

by simonask

5/20/2026 at 10:23:21 AM

Started in 2005. Never ever did anyone complain about UB in my years of writing C code and patching other people's C code. I knew it exists - as a spec quirk. (Admittedly, never wrote a compiler and never used anything except gcc and clang.)

by parasti

5/21/2026 at 1:04:50 AM

“More aggressively using UB” isn’t the right way to think about it.

In the C ecosystem, the compiler gets to define what UB means. They broke compatibility with their previous UB semantics, then blamed the language spec.

by hedora

5/21/2026 at 8:28:59 AM

> In the C ecosystem, the compiler gets to define what UB means.

It really doesn't though. The current revision of the ISO/IEC 9899 standards document gets to define it, nobody else.

by simonask

5/20/2026 at 9:43:08 AM

Computers used to be cool; now they're dangerous.

Every company keep harping on about safety and being exposed (being in the news): so the narrative against 'unsafe' is up the wazoo.

The new world is basically a bunch of city dwellers who haven't seen raw nature and you show them a lawn mower, they freak out. Blades that spin?!?!?! Madness!!

by keyle

5/20/2026 at 9:58:20 AM

If everything is going to be dependent on computers, it's probably important that they work and remain under their owner's control rather than whichever NK or Chinese hacker group gets to them first.

Can't talk about C without CVE.

by pjc50

5/20/2026 at 10:03:02 AM

Yeah, npm, all the yaml state machines, & now MCP Gemini --yolo entered the chat.

If you think C is the problem, you'll come to the eventual conclusion that humans are the problems, and greed. Don't hate the player, hate the game etc.

C was invented so you don't have to write assembly. It wasn't invented to expose devices to billions of other devices.

by keyle

5/20/2026 at 9:08:40 AM

Because the production environment might be a completely different architecture, these details matter a lot. Works on my machine is not useful if your actual target is a small embedded system on top of a cell tower in the middle of nowhere. Granted, most people don't work on stuff like that, I imagine the vast majority of devs here are web developers, but even still it's an interesting discussion even if you haven't run into it yourself. Maybe even more so in that case.

by Etheryte

5/20/2026 at 9:56:02 AM

Um, as an embedded developer, you don't develop the code to run on your machine, you develop it to run on the same target as you expect to deploy to, sitting on your desk next to you.

I have lots of my code running day-in, day-out on literally hundreds of millions of machines. The approach to "getting it working" is exactly OP's.

I'll admit to being pretty defensive and anal in checking values and return-codes (more so than most, I suspect), and I'm a firm believer in KISS principles in software engineering ("solving hard problems with complicated code is easy, solving them with simple, understandable algorithms is the hard bit") but generally there's no real difference in approach to the code I write to work on my workstation, and the code I write to work in the field.

by spacedcowboy

5/20/2026 at 10:18:37 AM

Embedded developers often suffer under archaic toolchains. There's plenty of reasons for that, but one of them is UB: a newer version of the compiler can completely change an embedded program's behaviour.

by dmpk2k

5/20/2026 at 11:12:50 AM

Where I was it was quite the opposite. The bloody compiler guys kept on updating the compiler, and we were required to use the OS-delivered one. Since we were often using pre-release OS's, the toolchain could change every week.

It did make you write robust and defensive code, though...

by spacedcowboy

5/20/2026 at 10:50:45 AM

    There are more things in heaven and earth, Horatio,
    Than are dreamt of in your philosophy

You've probably been churning out possibly malformed code for years. Now you're becoming aware of your shortcomings. This is usually considered the transition from intermediate- to senior-level programmer.

by bregma

5/21/2026 at 1:00:51 AM

Exactly, you write for your target, not some imaginary spec. The spec is only as useful as to predict what your target roughly does, it's not normative.

Compilers might have bugs where the spec is supposed to work but it doesn't, and many extensions without standard equivalents, or implementation-specific behaviour where undefined things in the standard do get assigned a meaningful outcome.

by Pannoniae

5/20/2026 at 2:43:31 PM

I have the opposite experience, so many subtle bugs that bite you only on specific scenarios, so much that I can't count.

by SomeoneOnTheWeb

5/20/2026 at 7:07:11 PM

So, you never iterated past an array, you never used after a free(), you never tried doing i = ++i + ++i; ?

by AndriyKunitsyn

5/20/2026 at 9:49:54 AM

I wonder if it’s just the colorful metaphors and an opportunity to bring out examples of surprising behavior. Plus it’s a topic that can always stir up debates.

by sethev

5/20/2026 at 10:47:25 AM

If only it was that easy: https://silentsblog.com/2025/04/23/gta-san-andreas-win11-24h...

The real answer is that proponents of languages like C seem to completely disregard the dangers/difficulty of hitting/difficulty of fixing UB. Proponents of languages like Rust overstate it instead. Pointless wars/drama is fun to read and gets clicks.

by dminik

5/20/2026 at 10:14:40 AM

If there's no UBs then what will we programmers do, there won't be enough to debug and fix?

by aldanor

5/21/2026 at 1:02:45 AM

There was a similar rush of articles like this a few years ago.

tl;dr: C defined language semantics, and leaves some behavior undefined. Each system that C is ported to has the ability to define the behavior however it wants.

This blows the mind of PL folks every decade or so.

It’s cool that we have portable methods and formal language semantics for stuff like memory fences and atomics now, but that sort of thing worked fine in C back in 1970 (or else unix would not have worked). You just needed to read the target machine’s manual when porting stuff.

The modern version is arguably better, but also arguably worse. Does anyone else remember when the JVM got this stuff wrong, making safe multithreaded code impossible, and then later had to break compatibility with the language spec?

You could claim that we can’t trust hardware folks to get instruction semantics right (this is demonstrably true), but duplicating and slightly modifying the specs in your language spec doesn’t actually fix the underlying hardware bugs.

Yeah, getting old… I’ll go find a cloud to yell at.

by hedora

5/20/2026 at 10:55:49 AM

Because most of the people who post/write these articles do not actually know the C language specification nor understand its design.

Understanding three important concepts properly in C allows one to easily identify what can/cannot result in UB viz. 1) Expressions 2) Statements 3) Sequence Points and "Single Update Rule". It is not that hard at all.

I wrote about it here with links to further reading provided - https://news.ycombinator.com/item?id=48144734

by rramadass

5/20/2026 at 9:33:09 AM

I would guess that the continued success of Rust have shown that we don’t have to live with the user-hostility of C in order to write system programs. Therefore, people are understandably growing less and less patient with C and its unending bullshit.

Although I haven’t noticed a spike the last 6 months, just a slowly increasing realization that C isn’t fit for humans and should go the way of asbest: Don’t use it for anything new, and remove it where it already exists, unless doing so would be too expensive or disruptive.

by jakobnissen

5/20/2026 at 10:19:46 AM

I don't think C is hostile. C has UB for good reason. The problem is UB has been hijacked by the compiler writers for performance gains.

Personally I like C because you should have a good idea of what it's going to do. Other languages feel like a black box, and I start having to fight them far too often. But I say that as a hacker of low level stuff, not as someone who's paid and working on higher level stuff, so that is probably a niche view.

by benj111

5/20/2026 at 10:10:20 AM

1. It's been talked about for much longer than that.

2. You don't really appreciate the issue. Signed integer overflow is undefined. If you check for that overflow after the fact the compiler can, and demonstrably has pretended that the overflow can't happen and optimised away your overflow check.

You may not even come across that failure mode to know to 'fix' it. And good luck finding the issue unless you know about UB and what the compiler can and will do in such situations.

by benj111

5/20/2026 at 9:21:09 AM

There are a lot of Rust/whatever hipsters here that have defined their whole identity around hating C and C++.

by account42

5/20/2026 at 10:39:34 AM

Like the author of the article, I write C/C++ since 30 years. Mostly close-to-the-metal code around computer graphics. Actually: wrote.

After switching to Rust five years ago I agree with all the Rust hipsters as far as disliking those languages go.

I just don't talk about it a lot. If every Rust person I know that was a C/C++ developer before was as outspoken about what they think of the latter, you'd see that these people are a majority.

We're just old hands who like to use stuff that works. And most of us don't get attached to code or languages.

It's also difficult to admint to yourself that you were never in command of a language as far as UB/other footguns go, as much as you thought. Or ever, for your enire career. For me that self-realization about C/C++ (enabled by Rust) was a turning point.

Lately you can read about the dichotomy re. AI use.

I.e. developers who define them themselves through what they build/ideas are embracing LLMs; for what they can do.

I.e.: I am what I build.

Whereas developers for whom software engineering is a craft that defines them hate them openly.

I.e.: I am how I build.

Now this seems to suggest to me that maybe Rust developers who openly hate C/C++ squarely belong to the latter group whereas the silent ones belong to the former. It's builders vs programmers. Just different world views.

Also you can not dislike something and still not speak about it. Because you decided to not care.

by virtualritz

5/20/2026 at 6:03:50 PM

As C++ hipster since 1992, the problem is really C and any language that includes its semantics as subsets.

Just like TypeScript can't get rid of JavaScript WATs.

by pjmlp

5/20/2026 at 9:44:55 AM

Ironically, by stereotyping ”Rust hipsters” you are painting yourself out as a stereotype as well. Knee-jerk comments like yours add nothing to the discussion. Rust exists for a reason, it solves real problems, but it’s not suitable for everything. These are indisputable facts and by discarding every mention of Rust as coming from ”hipsters” with no understanding, you are doing the exact same thing that you would accuse them of. ”Use Rust for everything” and ”Rust is useless for everything” are equally vapid and meaningless statements designed for nothing but trolling and showing ignorance.

by hnarn

5/20/2026 at 10:04:12 AM

[flagged]

by account42

5/20/2026 at 10:37:18 AM

After the rise of Rust, it has gained more visibility? But some people were interested in C in this way long ago too, I used to hang out in some godforsaken irc channel where people competed in out-pedanticing each other over the C standard.

I trust your historical C usage was more productive than that..

by kzrdude

5/20/2026 at 7:23:17 AM

As much as I agree with the intro, these examples aren't good and the overall article is just a veil for pushing LLM coding.

by debugnik

5/20/2026 at 9:44:38 AM

Agreed. One after another these are standard things you avoid when writing portable code (or don't need, like accessing the object at address 0). They come across like from someone who wants to write whatever they want and have it work the same on everything. To make it into a language that allows this would remove its advantage of being able to write to the platform when you want to.

by gblargg

5/20/2026 at 7:25:17 AM

Not good how? Are they TRUE? If so that's super bad.

by boxed

5/20/2026 at 8:44:01 AM

Some of the examples are somewhat formally true in theory and bullshit in practice; some are quite hallucinatory.

  - Creating a potentially troublesome misaligned int pointer is a precisely localized and completely explicit user mistake, not something that just happens because it's C.
  - Passing signed char to character classification functions that expect an unsigned char (disguised as an int) is a very specific dumb user error. The C standard could specify that all negative inputs, including EOF and invalid signed char values, are classified as not belonging to the character class, but I doubt the current undefined behaviour in isxdigit() etc. implementations ever went beyond accepting invalid inputs.
  - Casting floating point values to integer values in general requires taking care of whether the FP values are small enough to be represented and what to do with NaN and Inf values: not the language's responsibility. C offers a toolbox of tests, not ready-made application specific error handling.
  - Expecting C to handle "address zero" in physical memory in ways that conflict with NULL in source code denotes a complete lack of understanding of what a program is. Where stuff in an executable is loaded in memory, in the rare cases when it matters, can surely be affected with platform specific extensions, possibly at the level of linker commands with nothing appearing in the C source code.

by HelloNurse

5/20/2026 at 9:18:39 AM

Author here.

So I see your counter points are all "so just don't do that, then".

And the point of my post is that this particular "just don't do that, then" has never been achieved by humans.

If if there's no example of a program without these bugs in a language, then I do think it's fair to blame the language. A knife with 16 blades and no handle.

> Expecting C to handle "address zero" in physical memory in ways that conflict with NULL in source code denotes a complete lack of understanding of what a program is.

Like the post says, it's rare that programmers actually want a pointer to memory address zero. But in my experience most programmers who even encounter that have this "complete lack of understanding", as you put it.

by thomashabets2

5/20/2026 at 9:45:22 AM

"Just don't do that" is the correct approach to errors, even when they are easy to overlook and the programming language provides many opportunities for mistakes.

For example, you seem to underestimate how wrong placing negative values in a signed char is: ordinary character encodings do not use negative codes, so either those negative values are not characters and they have no business being treated as such, or something strange and experimental is going on.

by HelloNurse

5/20/2026 at 1:55:23 PM

> "Just don't do that" is the correct approach to errors

We have 54 years of empirical data that literally nobody can follow this approach and reach UB-freeness. To stick to the plan is more like the in-debt gambler who just needs to work their system for a little longer, and they'll become rich.

By this logic we don't need any traffic rules other than "just don't crash or hit anyone". And we can aspire to an absolute dictatorship, all we need to do is "just" choose the benevolent one.

Of course we should always try to not make mistakes. But given more than half a century of empirical data that nobody has been able to avoid UB, ever, it takes quite some hubris to say "but it might work for us".

> you seem to underestimate how wrong placing negative values in a signed char is

Shrug. You don't make that mistake. There are thousands of mistakes like it, especially in C or C++.

Of course "don't do that". That is not the same as "So just don't do that!". The former is good advice. The latter is one of a million rules, and to expect even experts (see OpenBSD) to never make a mistake is unrealistic to say the least.

You may even have spotted the UB in https://pooladkhay.com/posts/first-kernel-patch/. But you would not spot all of them. Nobody in history has.

by thomashabets2

5/20/2026 at 3:20:14 PM

While, for the purpose of avoiding gratuitous mistakes, C is a serious disadvantage compared to less low-level languages, your discussion of UB pitfalls in C is aimed at a strawman.

First of all, traffic rules are good, and similar to good C programming rules: check number value ranges when there is a chance of casting or overflow, check Inf and NaN floating point values, declare alignment strategically (e.g. in all memory allocations) to avoid misaligned pointers and variables, and so on. Such rules have alternatives and exceptions and must not be part of the language.

Second, nobody needs perfection and "UB-freeness": it is reasonable to assume that many cases of UB won't be a problem, either because a library will be used correctly and they won't happen, or because the C implementation is neither weird nor hostile and they will be as benign as defined or implementation defined behaviour, or simply because we avoid doing something known to be inexact or hard to write correctly.

Practical programming requires knowing the relevant rules for what one is doing and learning new ones by making, diagnosing and overcoming mistakes; not omniscience, and definitely not the unfounded feeling of omniscience and unlimited resources that LLMs can give.

EDIT: I insist on the signed char example because it would be terribly wrong (processing who-knows-what as if it were a sequence of characters) even without undefined behaviour, even in different languages.

by HelloNurse

5/21/2026 at 7:17:02 AM

> Second, nobody needs perfection and "UB-freeness"

Sure. You only care about the ones that manifest security issues, stability issues, or other corruption. But of course those change over time as compilers change.

So while far from every instance of UB will manifest in a problem, every single one has the potential to, by a low percentage. They're all tiny liabilities that add up.

But which ones will? Reminds me of https://www.lesswrong.com/posts/ooypcn7qFzsMcy53R/infinite-c...

> because the C implementation is neither weird nor hostile

Some people definitely were screaming at GCC for being hostile when it removed the NULL check in the kernel:

    int foo = bar->baz;
    if (!bar) {
      return -EINVAL;
    }

> the unfounded feeling of omniscience and unlimited resources that LLMs can give.

I definitely don't have that. I'm not saying LLMs find all bugs (now or in the future), nor that they are an unlimited resource.

I'm just saying that for finding UB and subtle bugs, they find orders of magnitude more, especially in C and C++.

I am not saying they find a strict superset of bugs, compared to a human. But take me running this against cosmopolitan libc: https://news.ycombinator.com/item?id=48206377. It took me basically zero human time to spin it off, it took a couple of minutes (5.5 in xhigh effort) to run, and found 5-10 cases of UB, one of which I think is a user visible parsing error of SSH keys. Another is a set of double-free, which is definitely a thing that gets exploited over and over.

Would I have found these, in an unknown-to-me codebase no less, given manual source code reading all day? Of course not. Would I have found it with the likes of UBSAN? jart claims to have used it (https://news.ycombinator.com/item?id=48205545), and apparently didn't.

LLMs are just one of the tools to use. A tool that does better than any tool or human has done in the last half century.

> I insist on the signed char example because it would be terribly wrong

The char situation is terrible in C. It's perfectly safe to hold bytes in a char, signed char, and unsigned char, and convert between them. But then integer promotion rules combine with the historical choice of having isdigit take an int to break things.

If isdigit took a char, of any signedness, then there wouldn't be a problem. But that EOF ruins it.

> processing who-knows-what as if it were a sequence of characters

A "char" hasn't been "a character" in any meaningful sense in a long long time. Or rather, "a character" is not a code point or grapheme cluster. For byte processing, since they cast perfectly fine, it's fine. Or do you have some interesting example?

by thomashabets2

5/20/2026 at 10:54:01 AM

Just don't fall bro. It's that easy. No railings required.

by dminik

5/20/2026 at 7:59:47 AM

They are true but I agree it's not a great article. C has an unending list of UB and given the title I was expecting a more comprehensive survey, but they actually just picked a few that are both fairly well known and not very interesting.

by IshKebab

5/20/2026 at 9:20:03 AM

Author here.

As I stated:

It's about that point, not about how to avoid it. Because you can't.

by thomashabets2

5/20/2026 at 10:30:36 AM

Some of the C++ code in this article has not been idiomatic in over a decade, and would be considered a code smell today. The language has evolved into quite a different language than when it was first created. As soon as I saw all of those raw pointers and direct pointer access, it was clear that at least part of this article should be taken with a grain of salt.

The other obvious issue with the overall perspective is that C and C++ are being thrown together directly as if somehow they’re nearly the same language, but they are really very far apart nowadays.

by jb1991

5/20/2026 at 10:41:21 AM

I was about to call out that the code is supposed to be C and not C++, but I double checked and I realised it actually says std::atomic<int>, not atomic_int!

by debugnik

5/20/2026 at 10:45:55 AM

Exactly, this is very old C++ on display in this article. It’s certainly not as safe as a language like Rust, but quite a lot of undefended behavior and things that will shoot yourself in the foot have been changed over the last 10 years.

Most C++ today will be immediately obvious and not accidentally mixed up with C.

by jb1991

5/21/2026 at 11:52:18 AM

Unfortunely most C++ today keeps making use of C idioms, including at companies with seat at WG21 table.

by pjmlp

5/20/2026 at 8:52:11 AM

Is this a correct understanding of UB in C? A program P has a set of inputs A that do not trigger UB, and a complementary set of inputs B that do trigger UB. A correct compiler compiles P into an executable P'. For all inputs in A, P' should behave the same as P. However, for any input in B, the is absolutely no requirements on the behavior of P'.

by maple3142

5/20/2026 at 8:54:46 AM

Intuitively yes - the program will be compiled as if B-inputs are never passed to the program, and that can include eliminating code that tries to detect B-inputs.

by simonask

5/20/2026 at 10:15:05 AM

This is a description of an imaginary compiler, evoked by the ANSI/ISO standards documents, which has never existed and will never exist. To understand what the program will do, you just have to understand the compiler behavior on your target platforms. A helpful intuition pump is: imagine the ANSI/ISO specifications simply do not exist; now what? Well, you just continue your engineering practice, the way you would for any of the myriad languages that never even had a post hoc standards document.

by mbrock

5/20/2026 at 10:27:15 AM

> just

That word is carrying a lot of weight here. Compilers are unbelievably complex these days, and it's impossible for any one human to fully understand the entire compilation process, including the effects of any arbitrary combination of compiler flags.

Any assumptions you have about what the compiler does in the face of UB will collapse on the next patch release of that compiler, or the moment somebody changes the compiler flags, or the moment somebody tries to compile the code for a slightly different OS, not to mention architecture.

There is no other way to understand what C compilers do than reading the standard.

by simonask

5/20/2026 at 10:36:07 AM

Yet the standard does not tell you what the compilers do.

Linux works on a wide variety of platforms. It also relies on those platforms behaving predictably with respect to what the standard leaves undefined.

This description of ISO UB as a totally insane wonderland of random, malevolent semantics just doesn't describe reality.

by mbrock

5/20/2026 at 11:37:34 AM

Up until the compilers do something to your code that you don’t understand.

by saagarjha

5/20/2026 at 11:45:14 AM

yeah then I have to learn how it works and what it assumes and how I can control it and maybe switch to a more well behaved compiler if it's truly insane

by mbrock

5/20/2026 at 11:47:49 AM

Not imaginary. Eliding checks on nullptr and integer overflow were both implemented, shipped, miscompiled the linux kernel and grew flags to disable them. I expect there are more if one goes looking.

by JonChesterfield

5/20/2026 at 12:04:02 PM

Well yeah that just means some aspects of the imaginary compiler were in some configurations approximated by some historical compiler versions and were in some cases rejected by the community (which cares about sane semantics even for behavior left undefined by ANSI/ISO) and in some cases left in as defaults but made trivially configurable for anyone who wants to define the undefined behavior.

by mbrock

5/20/2026 at 7:41:46 PM

Every bug is the result of an imaginary computer that doesn't work exactly like my computer does and triggers a bug in my code. The code works on my machine, so this imaginary computer never existed and will never exist.

Signed vs unsigned chars, and the accompanying extension rules, have already bitten me switching between x86/ARM compilers. Confused the hell out of me when I was just starting out with C.

If you're going to interpret C as in "C on amd64, running on Linux 7.0 on an Arrow Lake Intel processor" then yes, you can get away with a lot of UB. That mitigates the problem but doesn't make it go away.

by jeroenhd

5/20/2026 at 11:09:43 AM

GCC -O1 and clang -O1 will both optimize this function under the assumption that inputs that cause signed integer overflow are never passed:

    int will_overflow(int a, int b) {
        int sum = a + b;
        if (b > 0 && sum < a)
            return 1;
        return 0;
    }

by Retr0id

5/20/2026 at 11:23:17 AM

Right, good example, and both GCC and Clang offer well understood parameters for deciding, per compilation unit, what behavior you want for signed overflow (-fwrapv, -fno-strict-overflow, etc), so in reality it's quite far from spooky arbitrary nasal demons.

by mbrock

5/20/2026 at 12:17:09 PM

Wouldn’t be better to check both inputs before against the max value of that type instead of actually doing the overflow?

by skydhash

5/20/2026 at 12:18:58 PM

There are lots of better ways of doing this, but knowing why this one is bad/wrong requires the mental model described upthread.

(But also, what you describe would be incorrect, since two <MAX values can add to a value that is >MAX, and overflow)

by Retr0id

5/20/2026 at 12:35:08 PM

> But also, what you describe would be incorrect, since two <MAX values can add to a value that is >MAX, and overflow

I was maybe unclear. I meant, if you know a sum can introduce overflow (because you have a check right after), why not check the inputs before doing the sum, instead of checking the sum?

by skydhash

5/20/2026 at 12:53:28 PM

You can do something like

       (y > 0 && x > INT_MAX - y) 
    || (y < 0 && x < INT_MIN - y)

and hope the optimizer turns it back into just checking the result. Or you use -fwrapv to concretize the ISO ambiguity and specify the natural two's complement semantics, checking overflow with the classic Hacker's Delight formula;

    ((x ^ s) & (y ^ s)) < 0

But the best way is to use the intrinsic __builtin_add_overflow or, depending on compiler support, its C23 standardization via <stdckdint.h> and ckd_add etc.

by mbrock

5/20/2026 at 10:15:28 AM

Yes, that's a good summary.

by 1718627440

5/20/2026 at 9:07:59 AM

A concrete example of undefined behavior caused by an unaligned pointer: https://pzemtsov.github.io/2016/11/06/bug-story-alignment-on...

by rom1v

5/20/2026 at 10:06:08 AM

Specifically on x86 where it's assumed that won't cause problems.

by gblargg

5/20/2026 at 5:11:48 PM

The problem is incorrectly assuming that the spec is meaningful in some kind of rigorous way.

It’s not. All that matters is what C compilers actually do and what real C programs expect.

This is a good thing. It creates a culture where the two sides meet each other where they’re at

by pizlonator

5/20/2026 at 6:55:58 PM

We also have a very limited number of compilers and a small number of prevalent architectures today. As long as you know the behavior of the target compiler and architecture, the behavior is defined, it's just not specified.

by BearOso

5/20/2026 at 10:01:48 PM

This is true.

But why I’m saying has always been true. What has changed is that the effective portability of C and C++ code has increased due to the reduction in number of compilers and arches

by pizlonator

5/20/2026 at 11:45:35 AM

Well, you can't write malloc in conforming C, which hurts rather more than remembering to write bitcast as memcpy on char pointers.

Doesn't matter though because you aren't writing standards conforming C. You're writing whatever dialect your compilers support, and that's probably (module bugs) much better behaved than the spec suggests.

Or you're writing C++ and way more exposed to the adversarial-and-benevolent compiler experience.

The type aliasing rules are the only ones that routinely cause me much annoyance in C and there's always a workaround, whether if it's the launder intrinsic used to implement C++, the may_alias attribute or in extremis dropping into asm. So they're a nuisance not a blocker.

by JonChesterfield

5/20/2026 at 6:51:42 AM

> A problem with this is that in order to confirm the findings, you’ll need an expert human. But generally expert humans are busy doing other things.

The article suggests using LLMs to identify and fix UB. However as per the above, I think the issue is that we need more expert humans.

LLM generated code will eventually contain UB.

EDIT: added "eventually"

by __0x01

5/20/2026 at 7:37:16 AM

It would already help a lot when the C and C++ standards start to clean up the list of Undefined Behaviour (e.g. there's a lot of nonsense UB currently in the C standard which could easily become Defined Behaviour - like the "file doesn't end in a new-line character" thing):

https://gist.github.com/Earnestly/7c903f481ff9d29a3dd1

by flohofwoe

5/20/2026 at 2:04:38 PM

The C committee is cleaning up a lot of UB (check https://www.open-std.org/jtc1/sc22/wg14/www/wg14_document_lo... for paper titles like "slaying earthly demons").

But don't misunderstand the goal of that: C and C++ will never get rid of UB. The result of dereferencing an invalid pointer is UB, will always remain UB, and really cannot be anything other than UB.

by jcranmer

5/20/2026 at 7:59:34 AM

The easy cases like you cite are also those that don’t cause problems in practice. I’m not sure that would help all that much, other than to slightly reduce internet criticism.

by layer8

5/20/2026 at 8:27:38 AM

Fixing easy cases makes the list shorter, so enables more focus on harder cases.

And it also signals that you actually do want to improve, just a little bit of boy scout rule goes a long way.

by talkin

5/20/2026 at 9:31:11 AM

The issue is that the list is infinite (anything not specified is UB), so actually removing any finite amount of UB from the list won't make it shorter.

(only slightly tongue-in-cheek, I do believe that removing silly things is worthwhile).

by gpderetta

5/20/2026 at 10:18:53 AM

The list of UB categories and rules is not infinite. The list of UB programs is, as is the list of all non UB programs.

by 1718627440

5/20/2026 at 11:36:04 AM

It is not obvious to me that the list of categories is not infinite (unless the final category is "everything else" of course)

by gpderetta

5/20/2026 at 12:55:02 PM

To be undefined behaviour, it must at least be valid syntax. The syntax is described in a finite document. Also it only gets executed by a finite machine, that has a finite number of finite descriptive documents.

by 1718627440

5/20/2026 at 2:17:23 PM

The list of unspecified behaviour is infinite, but the list of undefined behaviour is well defined and finite ;)

by flohofwoe

5/20/2026 at 9:25:32 AM

Author here.

> The article suggests using LLMs to identify and fix UB. However as per the above, I think the issue is that we need more expert humans.

Yup. But the point of the article is that even expert humans cannot do this alone. And as I wrote, LLM+junior won't suffice either. We need LLM+senior experts.

And it's a problem that we have way more existing UB than expert capacity.

Now, will LLMs and experts both miss UB in some cases? Of course. There's no 100% solution. But LLMs, I claim, will find orders of magnitude more, with low false positive, than any expert. Even if these expert humans (like in the OpenBSD case for the two bugs I found, one of which was UB) are given more than three decades to do it.

I didn't even use the best model, complex code target, or time. I just wanted to choose a target that has a high chance of having very good experts already having audited it.

by thomashabets2

5/20/2026 at 7:41:24 AM

Our LLM powered coding assistance are pretty good at doing lots of busywork that doesn't require all that much smarts. So they can supervise running our UB checks, like Valgrind, and making the linters happy.

by eru

5/20/2026 at 7:24:05 AM

> LLM generated code will eventually contain UB.

Yes.

Even in languages other than C (i.e. you will get behaviour that nothing in the input specified).

When LLMs generate code, all languages have UB.

by lelanthran

5/20/2026 at 7:42:37 AM

That's a bit silly.

UB means literally no restrictions. So if you standard says 'you have to crash with an error message' that's already no longer UB.

by eru

5/20/2026 at 8:06:12 AM

> So if you standard says 'you have to crash with an error message' that's already no longer UB.

Sure. For crashes. But when you instruct an LLM to do something, the output is probablistic, so you may get behviour that is unexpected and/or unwanted.

Like storing security tokens in code. Or nuking the production database.

by lelanthran

5/21/2026 at 3:56:34 AM

If you fix the random seed you use for sampling, your LLM is perfectly deterministic.

And there's no requirement for C compilers' UB to be deterministic either.

by eru

5/20/2026 at 7:42:20 AM

Very bad advice. Of course good new LLM's know about UB, but you still need to use ubsan (ie - fsanitize=undefined), and not your LLM.

by rurban

5/20/2026 at 8:13:33 AM

Coding agents write unsound Rust any day, too. unsafe impl Send … is much easier than fixing a bad design and it might even work momentarily.

by formerly_proven

5/20/2026 at 9:02:59 AM

Integer promotion seems to be the source of many signed integer overflow UB. Why does C have it? Does integer promotion ever have a good part?

by mjs01

5/20/2026 at 11:39:56 AM

Yes, it simplifies a lot of code that would otherwise be littered with casts.

by saagarjha

5/20/2026 at 12:18:23 PM

Could be fixed by having a nicer casting syntax (like Rust) or by not having so damn many scalar types that are used in practice.

"Explicit casts only" worked fine in Modula-2, which doesn't have as many scalar types.

by peterfirefly

5/21/2026 at 7:29:58 AM

Yeah most modern languages settle on having fewer types that are used most of the time and don't require as many casts.

by saagarjha

5/20/2026 at 6:42:42 AM

A fun one that'd fit list be sequence point violations like

    i = i++

by weinzierl

5/20/2026 at 6:54:36 AM

Fun, sure, but also GCC and Clang will both warn with -Wall (-Wsequence-point / -Wunsequenced).

by radiospiel

5/20/2026 at 9:18:36 AM

This would also be a code smell even if it was well defined.

by account42

5/20/2026 at 5:12:38 PM

Yes, it should be an explicit error. Not undefined.

by marcosdumay

5/20/2026 at 8:59:33 AM

Only in C, that one is defined in C++.

edit: I'm not sure it's even undefined in C.

by leni536

5/20/2026 at 6:32:25 PM

A lot of this stems from trying to insist that char just means "small" and not "8 bits" and that int means "bigger than that" and not "32 bits". In fairness, K&R dealt with an era where 9 bit architectures existed, but char is 8 bits now. Everywhere.

by commandlinefan

5/20/2026 at 7:46:20 PM

In the world of microcontrollers, CHAR_BIT can be 16 or some other funky number. char is usually 8 bits in size, though.

by jeroenhd

5/20/2026 at 3:45:34 PM

> the OpenBSD project has not been very receptive in the past for bug reports, my sense of “this is probably fine, in practice”, and that if OpenBSD wants to weed out UB from their code base, then that’s a major project that should be done in a better way than me just being the middle man between the LLM and them for a patch here and there.

Part of the reason for all the UB in OpenBSD is that UBSan doesn't run on that platform. When I ported OpenBSD's httpd to Linux, I found that UBSan tripped before the server even came up because the config flag parsing shifts into the MSB of a signed integer.

I tried to contribute back a patch (just make the flag bitfield unsigned), but it was ignored. I think if UBSan ran natively on OpenBSD, then there would be a lot more of these patches, and the maintainers would have to take an official stance on whether they think these bugs matter.

by bkallus

5/20/2026 at 6:54:37 PM

I like the ideas of this article but would not use SPARC as a main badguy in my examples. A naive and probably popular takeaway would be, "Thank goodness I am not writing for SPARC and don't need to worry about these SPARC architectural concerns!"

by psim1

5/20/2026 at 8:50:13 AM

> The compiler, and really the underlying hardware too, is playing a game of telephone with your UB intentions.

The part about hardware is wrong BTW. In all the cases about null pointers and out-of-bounds access and integer overflow and whatnot, the hardware semantics are clearly defined, and the assembler code does exactly what is written. The way modern compilers act on your code makes C less safe than assembler in that sense.

by codeflo

5/20/2026 at 9:32:49 AM

Author here

> The part about hardware is wrong BTW

Could you be more specific? I think by "wrong" you may mean "not actually relevant to UB", and you're right about that. If that's what you mean then that part is not for you. It's for the "but it's demonstrably fine" crowd.

> the hardware semantics are clearly defined

Yup. The article means to dive from the C abstract machine to illustrate how your defined intentions (in your head), written as UB C, get translated into defined hardware behavior that you did not intend.

I'm not saying the CPU has UB, and I wonder what part made you think I did.

That's what I mean game of telephone. The UB parts get interpreted as real instructions by the hardware, and it will definitely do those things. But what are those things? It's not the things you intended, and any "common sense" reading of the C code is irrelevant, because the C representation of your intentions were UB.

by thomashabets2

5/20/2026 at 4:03:33 PM

It seems like I simply misunderstood the point of the "game of telephone" metaphor. To be honest, even with your added explanation, I don't fully get why you express it that way. But I think we're in agreement on the substance, and I shouldn't have worded my response so harshly.

by codeflo

5/20/2026 at 12:57:16 PM

For a deep dive on UB with printf, see https://srs.fyi/see-conversions/

> When programming in C, to avoid unexpected pitfalls, one must be acutely aware of a whole slew of implicit behaviors (some of which are implementation-defined or even undefined).

by sltr

5/20/2026 at 9:10:36 PM

What all these C programmers are pointing out is 2 fold:

- Making a Turing machine have deterministic and predictable results is hard.

- Modern hardware is complex and getting all hardware to behave the same way requires a strong mathematical abstraction.

C was never intended to be a fully defined mathematical abstraction. It was a language which was easy to write a compiler for. That's its original strength. Trying to make it something it isn't is the problem. Either choose a language which does have such abstractions or understand the drawbacks of the tool you are using.

Right tool for the right job.

by hunterpayne

5/20/2026 at 9:10:51 AM

I read through this in detail... Is it just me, or are these things that are invoked by intentionally bypassing the typing?

I mean, you have to go out of your way and use a cast to get the UB in the first example.

For the `isxdigit` implementation, using a parameter to index into an array without a length check is pretty suspect already. I don't think any of my code actually indexes an array without checking the length in some way.

For the float -> int conversion, converting a float to an int without picking a conversion does not make sense in the first place - math.h has rounding and ceiling functions.

> For all you know the compiler has no internal way to even express your intention here.

I'm human, not a compiler, and even I cannot tell what the intention is behind trying to call NULL as a function. What exactly is expected to happen?

> Because the argument needs to be a pointer, and the NULL macro may be misinterpreted as an integer zero.

I don't think this is true for C. The NULL macro is defined to be a pointer in the C standard, AFAIK. Just because comparisons with zero are allowed, does not imply that the standard implicitly promotes NULL to `int`.

I think only the final one is of note (the 24-bit shift assigned to a uint64_t).

by lelanthran

5/20/2026 at 9:24:43 AM

> I don't think this is true for C. The NULL macro is defined to be a pointer in the C standard, AFAIK. Just because comparisons with zero are allowed, does not imply that the standard implicitly promotes NULL to `int`.

Probably confusion with C++ where NULL is 0 which is a special case that can be implicitly cast to both integers and pointers, unlike non-zero constants. C doesn't need this because it doesn't require explicit casts from void pointers to others.

by account42

5/20/2026 at 8:30:29 AM

C is still, by far, the simplest language that we have.

Although many newer languages are safer (with the exclusion of Rust, primarily by being slower) the same kinds of issues that are there in C are there in these languages, their effects are just harder to see.

People complain about C as though they know how to fix it.

by akiarie

5/20/2026 at 8:47:38 AM

C is not a simple language in the sense that writing software in C is simple, and I think that's the only useful way to understand the word "simple" in this context.

Brainfuck is "simple" by any other definition as well, but that's not a useful quality.

by simonask

5/20/2026 at 10:07:02 AM

C is a far simpler language than, for example, Swift. It's cognitive load in order to actually write something is pretty small - even the authors state that their book about C is intentionally slim because the concepts to understand are not that many.

That doesn't mean the C is a safer language than Swift, or a less-capable language than Swift. But in terms of "easy to understand along the happy-path", it's a lot easier to get going in C.

Swift, for example, bakes a whole load of CS-degree-level ideas and concepts into the basic language with its optionals, unwrapping, type-inference, async/await, existential types, ... ... ... . C doesn't do any of that. There are (many!) more footguns in C, but the language is less complex as a result.

Brainfuck is not at all simple, from that point of view. This is a valid Brainfuck program:

>+++++++++[<++++++++>-]<.>+++++++[<++++>-]<+.+++++++..+++.[-]>++++++++[<++++>-]<. >+++++++++++[<+++++>-]<.>++++++++[<+++>-]<.+++.------.--------.[-]>++++++++[<++++ >-]<+.[-]++++++++++.

This is the equivalent C program

#include <stdio.h> int main() { printf("Hello world!\n"); }

One of these is far simpler than the other.

[edit: changed to make the examples do the same thing]

by spacedcowboy

5/20/2026 at 10:35:44 AM

The point I'm getting at is that your definition of "simple" (a word that should be banned among programmers) is not useful, if it is even meaningful.

The brainfuck example is "simpler": Only 8 kinds of tokens! Not really useful, though.

The cognitive load of _actually delivering software_ written in C is immensely greater than doing so with Swift, or Rust, or Python, or Java, even Zig, despite all of those leveraging much heavier machinery in order to deliver a friendlier abstract model for you to program against.

The tragedy of C is that, in addition only delivering very baseline abstraction tools, it also adds its own set of seemingly arbitrary rules and requirements that come from nowhere but the C standard. Fictitious limitations to suit a bygone era. The abstract model of C is fine in some places, but definitely not fine in other places, and my hypothesis is that most UB in practice comes from a mismatch between programmer intuitions and C's idiosyncracies.

by simonask

5/20/2026 at 11:07:25 AM

Calling something "simple" to use and learn is a valid use of the word, sorry. Not going to stop doing that.

> The cognitive load of _actually delivering software_ written in C is immensely greater than doing so with Swift, or Rust, or Python, or Java, even Zig, despite all of those leveraging much heavier machinery in order to deliver a friendlier abstract model for you to program against

Sorry, I couldn't disagree more.

I find the simplicity of C to be elegant. You know the rules; it's like the entire C language is the 1-page summary of the encyclopaedia of C++ or Swift or Java, or (insert more-modern language here). The key to working well in C is in defining modular code with well-understood interfaces. I've got 40 years of programming in C so far, and the nightmare stories ran out after the first few years. Programming discipline is a thing.

Similarly, ObjC is a far superior, much simpler, object-oriented language than C++, there's about 15 different things over C, and you know the language. Template metaprogramming. Phooey! You'll still have to learn object-orientated programming semantics, but it's a "simple" language.

BTW: If you think the brainfuck language example is in any way easier to understand than the C one, I think you might need medication. /j

by spacedcowboy

5/20/2026 at 11:43:06 AM

> I've got 40 years of programming in C so far, and the nightmare stories ran out after the first few years.

You need to find something more interesting to do ;)

by saagarjha

5/20/2026 at 12:37:16 PM

Oh, I do. I'm building a two-story 1000 sq.ft garage right now - more workshop than garage tbh [1], I've just built a roll-off-roof observatory [2], the currently empty pad behind it is for a radio-telescope, last set up in London [3], still needs to be assembled in the new house. Right now I'm into the fun stuff of automating everything in the observatory. I've also recently taken up archery, and I'm enjoying that. I've written (well Claude has) an optimising compiler for a memory-managing language for the 6502 [4], but I'm just instrumenting (this bit is me) the IR so it can also target the M chip on my Mac. Eventually it'll also target m68k so I can bring up the Atari ST on the FPGA that is currently just emulating the atari 8-bit (I have a 120MHz 6502 at the moment :). The 'x' in 'xt' is from 'atari Xl' and the 't' is from 'atari sT'. The compiler is called 'xtc'. Both will run MiNT and the blitter on the FPGA is designed to integrate well with GEM, the graphics environment on the ST - even the XL version will have a graphical UI running at 1080p :)

So I have a few things to keep me busy right now.

1: http://0x0000ff.co.uk/img/garage/garage-layout.png

2: http://0x0000ff.co.uk/img/observatory/roll-off-roof-observat...

3: http://0x0000ff.co.uk/img/dish/dish.jpg

4: http://atari-xt.com/

by spacedcowboy

5/21/2026 at 7:29:21 AM

Ooh, fun! Good luck!

by saagarjha

5/21/2026 at 7:54:24 AM

another useful sense is easy to understand/read what programmer want to write. (ofc exclude mad code with macros etc).

Brainfuck is absolutely not simple in this case

by feelamee

5/20/2026 at 7:55:27 PM

C sits right in the middle between assembly and BASIC in terms of simplicity. You can't do a simple popcnt, but you can implement jump tables.

It's slower than Fortran and, depending on the platform, cobol. It's a bigger minefield than any language that came after it barring C++.

The only real advantage I can ascribe to C is that it's actually still being used after all these years, and it mostly works similarly on most hardware, like a Java for people who enjoy the casino.

Fixing C without breaking existing C code is pretty much impossible. You can start by defining warnings for UB, but then you will break any of the more trivial examples in the article. You can also start by simply killing off weird platforms (force a specific amount of bits for instance, screw the weird 16 bit char chips). Making casts explicit would probably fix a lot of problems too, though you'd need better syntax for that.

There is no fixing C without changing what C really is.

by jeroenhd

5/20/2026 at 9:12:30 AM

Can you elaborate what do you think C has in terms of simplicity that Zig doesn't, and which "same kinds of issues" do you think it has?

I'm not an expert in either language but my anecdotal experience disagrees with this - writing Zig has been far simpler and less error-prone than writing C.

by dns_snek

5/20/2026 at 12:05:59 PM

Can anyone explain why this is undefined behaviour? UBSan calls it "indirect call of a function through a function pointer of the wrong type"

    struct foo {int i;};
    int func(struct foo *x) {return x->i;}
    int main() {
        int (*funcptr)(void*) = (int (*)(void*)) &func;
        struct foo foo = { 42 };
        return funcptr(&foo);
    }

While this is all kosher per the language lawyers:

    struct foo {int i;};
    int func(void *x) {return ((struct foo *)x)->i;}
    int main() {
        int (*funcptr)(void*) = &func;
        struct foo foo = { 42 };
        return funcptr(&foo);
    }

by amiga386

5/20/2026 at 1:44:59 PM

C23 §6.5.2.2p7

> If the function is defined with a type that is not compatible with the type (of the expression) pointed to by the expression that denotes the called function, the behavior is undefined.

Compatible types requires integrating texts from several different paragraphs, but the general notion is "identical type, in a frontend sense", not "same ABI." This means that "const void " and "void " are not compatible types, much less "void " and "struct foo ".

by jcranmer

5/20/2026 at 9:31:47 PM

I get that it's defined that way, but I'd really like to know why.

I can see the value in saying that struct x* isn't compatible with struct y*, because they could have different alignment or packing rules. But struct x* and void*, which is already special-cased to allow assignment without a cast? Why aren't these considered compatible in function pointer parameter definitions?

Is there any work involved in casting void* to struct* (on any architecture) that a plain function pointer would miss out?

by amiga386

5/21/2026 at 6:04:38 AM

Yes, some restrictions seem arbitrary. Just like why these two types are not compatible:

  struct a {int data;};
  struct b {int data;};

I know, I know, changing it would break existing code, etc.

by teo_zero

5/21/2026 at 10:29:19 AM

You jest, but a central feature of C is that void* is a universal pointer, and can be used as the basis of polymorphism and context passing. That's why you can do things like:

    void qsort(void *base, size_t nmemb, size_t size,
               int (*compar)(const void *, const void *));

It would be nice to write:

    int compare_foo(const struct foo *a, const struct foo *b) {
        ...
    }

But instead we have to write this to avoid being declared "undefined behaviour":

    int compare_foo(const void *a, const void *b) {
        const struct foo *real_a = (struct foo *)a;
        const struct foo *real_b = (struct foo *)b;
        ...
    }

So what is the upside of having this rule, given the same thing is going to happen anyway, just more verbose?

by amiga386

5/20/2026 at 1:39:15 PM

It's undefined behavior due to the "strict aliasing" rule. You're simply not allowed to cast one pointer type to another (ever!) except for the following exceptions:

- casting an object pointer to or from void*

- casting an object pointer to or from char*

You're not doing either of those things. A function pointer is not an object pointer (the standard does not guarantee that the two kinds of pointer even have the same size/representation, and in fact on some esoteric hardware they don't), and even if it were, you aren't casting to or from void* or char*. So it's UB for two separate reasons.

by wavemode

5/20/2026 at 1:48:38 PM

Sorry, this explanation is plain wrong.

You can cast between pointer types freely so long as they can be representable in one another (some casts are undefined because the address would be unaligned in the target pointer type, and there's actually no guarantee that pointers to objects and pointers to functions have the same representation).

Strict aliasing rules don't kick in at pointer type casting, but rather kick in at lvalue access--when you dereference a pointer, in other words--and you've also given the list of strict aliasing rules completely incorrectly.

by jcranmer

5/20/2026 at 12:21:47 PM

Two function pointer (in practice) compatible or not depends on machine specific calling convention.

I guess enumerating all the possibility is just .. don't look right? make the standard too long and complex?

by j16sdiz

5/20/2026 at 12:09:52 PM

Casting to a pointer of incompatible type is UB. The exception is casting to char*.

by tomp

5/20/2026 at 12:53:07 PM

Tell me why struct* is incompatible with void* when it's such a standard case in C that you don't need a cast:

    struct foo *x = malloc(sizeof(struct foo)); /* malloc returns void* */

Or rather, tell me why the C11 standards committee decided to declare that struct* is incompatible with a void*

by amiga386

5/20/2026 at 1:05:51 PM

ok so Claude says I was wrong, it's more subtle.

(1) you can cast between any pointer types (no UB - assuming they're aligned), but accessing memory through a wrongly-typed pointer is UB

(2) the only exception is char*, which allows you a "byte view of memory"

(3) calling a function through a pointer requires the parameter pointer types to be compatible, and none of these are: int*, struct foo *, void*, char*

by tomp

5/20/2026 at 12:20:05 PM

[dead]

by kingforaday

5/20/2026 at 9:46:24 AM

When talking UB, putting C and C++ in the same basket is basically like comparing drunk driving a car and riding a bicycle sober... Both means of transport, very different experience.

by keyle

5/20/2026 at 11:42:05 AM

Maybe we should criminalize writing articles about Undefined Behavior that have a "So what do we do now?" subheader but omit any mention of UBSan.

by wyldfire

5/20/2026 at 11:06:14 AM

The scariest part is how many production systems rely on undefined behavior without anyone knowing until a compiler update breaks everything.

by danborn26

5/20/2026 at 7:11:47 PM

Where does scary part come from, they run on planes?

by coolThingsFirst

5/20/2026 at 3:38:28 PM

I fear I will be downvoted into oblivion but I also want to learn from this.

First let me state the case for C. It’s meant to be used as a systems language that’s as close to assembly as possible while remaining portable (compared to assembly). As such it’s the first high-level language developed for any new processor.

Given the above predicate: Isn’t everything described in the article as it should be?

Add too much to the language and it becomes less possible to implement on new architectures, right? Because the undefined behavior lets implementors stand up new compilers fairly quickly.

For less undefined behavior isn’t it better to use languages that have that in their DNA? D, Zig, Go, Java, etc?

by tomcam

5/20/2026 at 4:19:03 PM

> Given the above predicate: Isn’t everything described in the article as it should be?

I think the real trick question is "as it should be for whom?".

Reading the comments I think people underestimate the complex interaction between:

- engineers that design hardware (they don't care much about the compiler, except when it has to fix their mistakes)

- engineers that do the compiler (they have to struggle with all quirks of the new architecture and all of the complaints of the users)

- users of the new system (hardware + compiler) that just want to take their 100k lines of code (libraries) and just use it on the new system with better performance (as that's what the hardware people promissed!)

- users working on one architecture all their lives

For the compiler people, yes, probably most what is described is as it should be. For the users (that care about performance and not making porting efforts), probably no.

Now, even when I was doing compiler work we had a hard time explaining our users why we couldn't do some things they wanted (while also improving performance and not changing code that was writting), so explaining that on the internet seems to me a lost battle.

I am sure there are things that can be improved, and standards evolve. But the problem is very complex given the sheer amount of code written and the strange architectures out there.

by vladms

5/20/2026 at 1:30:03 PM

"My point is that ALL nontrivial C and C++ code has UB."

Is "nontrivial" defined

How would one identify "nontrivial" C code

Is there an objective measure (defined)

Or is it a matter of personal opinion that could vary from person to person (undefined)

by 1vuio0pswjnm7

5/20/2026 at 9:16:49 AM

I really like Zig's approach to UB. Especially alignment is a part of type. And all this wordy builtins for conversions. Starring to it makes you think what you doing wrong with data model it requires now 3 lines of casting expression.

by bvrmn

5/20/2026 at 1:57:41 PM

C does not abstract differences in underlying hardware well. Systems programmers know if they have an architecture that can't handle unaligned accesses or that the address they are doing load/stores from is a mmio register. Systems programmers know the difference between a virtual address and a physical address and have debugged MPU faults or MMU table walks and page faults more times than they want to think about.

C is horrible for trying to write a portable user-mode program in 2026. There are lots of better options.

C is great for writing low-level system code where you need to optimize performance down to the last cycle. It not abstracting away the hardware is super important for some use cases. A classic example is all of the platform-specific flavors of memcpy in the Linux kernel that are C/assembly hybrids hand-optimized for the SIMD pipelines of some CPUs.

C is a tool, Rust is a tool, Java is a tool, Python is a tool. Use the right tool for the job ¯\_(ツ)_/¯.

by QuiEgo

5/20/2026 at 10:30:17 AM

Is there a way to avoid undefined behavior Im C then? Could we write a new C compiler that adds some checks and fixes (e.g. raise documented exceptions) to each undefined behavior?

by elnatro

5/20/2026 at 10:47:30 AM

That post is just a hyperbolic rhetorical piece, not even a good technical shade. There are plenty of tools that restrict C into defined behavior subset. HN is just not aware of them. NASA, Aerospace and car industry are big customers, static analyzers and compilers.

Good open source ones:

Frama-C

IKOS (from NASA)

by u1hcw9nx

5/20/2026 at 12:01:16 PM

It’s been a while since I programmed in C. Thank you for these resources.

by elnatro

5/20/2026 at 11:40:59 AM

Not all of them but there are many tools that can try to define behavior for this code to help shake them out of your codebase.

by saagarjha

5/20/2026 at 12:19:49 PM

ubsan.

Doesn't catch all of it.

by peterfirefly

5/21/2026 at 8:50:33 AM

Will take a look thanks!

by elnatro

5/20/2026 at 6:43:33 AM

From the ANSI C standard:

  3.16 undefined behavior: Behavior, upon use of a nonportable or erroneous program construct, of erroneous data, or of indeterminately valued objects, for which this International Standard imposes no requirements.  Permissible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message).

Is it just me or did compiler writers apply overly legalistic interpretation to the "no requirements" part in this paragraph? The intent here is extremely clear, that undefined behavior means you're doing something not intended or specified by the language, but that the consequence of this should be somewhat bounded or as expected for the target machine. This is closer to our old school understanding of UB.

By 'bounded', this obviously ignores the security consequences of e.g. buffer overflows, but just because UB can be exploited doesn't mean it's appropriate for e.g. the compiler to exploit it too, that clearly violates the intent of this paragraph.

by veltas

5/20/2026 at 6:46:00 AM

> but that the consequence of this should be somewhat bounded or as expected for the target machine.

Aren't "unpredictable results" and "no requirements" contrary to the idea that the behavior would be "somewhat bounded"?

by dataflow

5/20/2026 at 6:55:36 AM

Notice though "ignoring the situation" thru "documented manner characteristic of the environment". Even though truly you can read this in an uncharitable way, you could also try and understand the intent of this paragraph, and I think reading it for its intents is always the best way to interpret a language standard when the wording is ambiguous or soft, especially if you're writing a compiler.

I don't think you could sincerely argue that this definition intends to allow the compiler to totally rewrite your code because of one guaranteed UB detected on line 5, just that it would be good to print a diagnostic if it can be detected, and if not to do what's "characteristic of the environment". Does that make sense?

by veltas

5/20/2026 at 7:16:11 AM

Ex falso quodlibet.

Bounding UB would be a nice idea, or at least prohibiting time-traveling UB (and there is an effort in that direction). But properly specifing it is actually hard.

by gpderetta

5/20/2026 at 9:29:45 AM

Prohibiting "time-travelling" UB would be horrible as that's a very important mechanism for dead code elimination.

by account42

5/20/2026 at 10:45:41 AM

Even if you forbid "time travel", you can still technically optimize many things as if time travel happened anyway - e.g. want to time-travel back to before some memory store? just pretend that the store happened, but then afterwards the previous value was stored back (and no other threads happen to see the intermediate value)!

Only things you need to worry about then are things with actual observable side-effects - volatile, printf and similar - and C23 does note that all observable behavior should happen even if UB follows, and compilers can't generally optimize function calls anyway (e.g. on systems on which you can define custom printf callbacks, you could put an exit(0) in such, and thus make it incorrect to optimize out a printf ever).

by dzaima

5/20/2026 at 7:26:00 AM

Reading for intent is pragmatic.

Reading adversarially is what people do who are looking for ways that something can be abused, from an offensive or defensive position.

Personally I am tired of the entire topic.

by cracki

5/20/2026 at 7:47:52 AM

What's bad is when your compiler writers and most of the people involved in standardisation are reading it adversarially.

by veltas

5/20/2026 at 9:31:27 AM

It's bad when compiler writers want to optimize correct code as much as possible, which is something their actual customers keep asking for?

by account42

5/20/2026 at 10:53:08 AM

When would optimizing correct code be harmed by not abusing UB (beyond its original intent, e.g. array access should be without overhead of checking for overflow)?

by veltas

5/20/2026 at 2:28:17 PM

> Notice though "ignoring the situation" thru "documented manner characteristic of the environment".

I noticed that. Those are 100% consistent & implied by the parts of the standard I quoted that you are ignoring, though.

What you're doing is:

- Arguing is that those phrases describe the totality of the implications, rather than mere examples, without providing anything to base this method of argumentation on.

- Completely ignoring the other phrases I quoted, which (taken at face value) contradict your reading.

- Claiming that anyone who disagrees is being insincere(?) and reading the standard uncharitably.

- Not even attempting to support this line of reasoning through other arguments.

So you're not only asking people to read contradictions into the standard, but also insinuating that people who don't are not arguing in good faith. That... honestly isn't a winning strategy.

Note that I'm not even saying your conclusion regarding their intent is necessarily wrong. I'm just saying your argument is bad. And that there is a difference between what the rules are and what some people believe their authors intended them to be.

If I wanted to argue your position, I would look for other parts of the standard where they do what you're claiming. That is, where the literal meaning of the wording would be crazy, and which would clearly contract what everyone believes the authors of the standard intended it to mean. Then you would at least have some basis for extrapolating that line of reasoning to this paragraph. At that point you might at least get an acknowledgment from the other side that the standard is unclear and/or has a defect, even if they didn't agree with your take on what requirements it imposes as-written.

> I don't think you could sincerely argue that this definition intends to allow the compiler to totally rewrite your code because of one guaranteed UB detected on line 5,

I'm not sure if you're exaggerating ("totally"?), being sloppy, or misunderstanding, or if you actually mean this literally, but I already don't believe it does that, and I have never seen any compiler interpret it that way either. Sorry, but you're going to have to be more precise and pedantic here so you actually have something realistic to argue against. Right now it looks like you have an impression of UB that doesn't match reality.

by dataflow

5/21/2026 at 7:54:25 AM

> Completely ignoring the other phrases I quoted, which (taken at face value) contradict your reading.

You are taking them out of context (literally this is what you describe here, taking at face value a smaller quote).

I think your approach to interpreting the spec is not correct. This isn't code, it's a spec: it needs to be read in full context (even though a good spec would certainly be written in a less context-sensitive way, this is not a perfect spec -- have you ever seen one?). You're not a computer or a machine, you need to read it more like a human, even though we're all trained on the concrete mechanics of computer programming. Yes, even though it's describing a programming language, believe it or not. All specs have flaws and need nuance in those situations or you will either (for language specs) write code that doesn't work anywhere, or you will write a compiler that breaks code matching what the authors of the spec intended to allow.

> Right now it looks like you have an impression of UB that doesn't match reality.

I have an impression of UB that is not the convention, my post is criticising the convention. I am trying to give context and nuance where it is unfortunately lacking and now apparently quite relevant to lots of people. This can't change reality of current compilers, but maybe it can serve as a lesson in history.

by veltas

5/20/2026 at 9:38:50 AM

Author here.

I touched on this in the "it's not about optimizations" section. It's not the compiler is out to get you. It's that you told it to do something it cannot express.

It's like if you slipped in a word in French, and not being programmed for French, it misheard the word as a false friend in English. The compiler had no way to represent the French word in it's parse tree.

So no, it's not overly legalistic. Like if the compiler knows that this hardware can do unaligned memory access, but not atomic unaligned access, should it check for alignment in std::atomic<int> ptr but not in int ptr? Probably not, right?

by thomashabets2

5/20/2026 at 10:46:30 AM

It's not that your article specifically discusses this aspect, but I think it's an important part of the conversation that's being overlooked by commentators, that we've twisted the original intent of UB and made unnecessary work for ourselves. There's been too much scaremongering about UB that's gone beyond the real concerns. If you only fear UB and don't understand it then you are worse off for trying to write safe C or C++.

by veltas

5/20/2026 at 10:24:53 AM

The behaviour is bounded by the capability of your machine. It is unlikely that your desktop computer launches a nuclear missile, unless you worked for it to be able to do that.

by 1718627440

5/20/2026 at 7:31:21 AM

> Is it just me or did compiler writers apply overly legalistic interpretation to the "no requirements" part in this paragraph?

I've (fruitlessly) had this discussion on HN before - super-aggressive optimisations for diminishing rewards are the norm in modern compilers.

In old C compilers, dereferencing NULL was reliable - the code that dereferenced NULL will always be emitted. Now, dereferencing NULL is not reliable, because the compiler may remove that and the program may fail in ways not anticipated (i.e, no access is attempted to memory location 0).

The compiler authors are on the standard, and they tend to push for more cases of UB being added rather than removing what UB there is right now (for exampel, by replacing with Implementation Defined Behaviour).

by lelanthran

5/20/2026 at 9:19:19 AM

Is comparing a signed integer with an unsigned integer UB? I resently wrote some code and compiled it with gcc to x86_64 (without optimization) that returned an incorrect answer.

by fjfaase

5/20/2026 at 9:35:53 AM

No UB, but the integer promotions rules apply.

When comparing signed and unsigned integers of same size the signed one will be converted to unsigned. In a reasonably configured project compiler will warn about it.

In case of integers smaller than int, promotion to int happens first.

In case of signed and unsigned integers of different size, the smaller one will be converted to bigger one.

by Karliss

5/20/2026 at 9:35:53 AM

It's not UB. Integer promotion applies, the signed int is implicitly coerced to unsigned (or the other way around - don't remember which.)

by benchloftbrunch

5/20/2026 at 6:49:11 AM

In C / C++ there are two kinds of undefined behaviour. One is where there is written in standard what UB is. Another one is everthing else that is not in standard.

by raluk

5/20/2026 at 7:05:04 AM

https://en.wikipedia.org/wiki/There_are_unknown_unknowns

by wiseowise

5/20/2026 at 6:58:42 AM

Technically, that's only one kind, because it's written in the standard that anything not mentioned in the standard is undefined behavior.

by thaumasiotes

5/20/2026 at 7:02:38 AM

One kind, but two different classes of undefined behaviour.

by cepepe

5/20/2026 at 1:58:03 PM

I want a language that is a group of bit (0,1) and the xor operator. Everything else is built on top of that.

by kajaktum

5/20/2026 at 10:02:39 AM

shameless plug, it's part of the Nerd Encyclopedia: it's also called "nasal demons".

https://nickyreinert.de/2023/2023-05-16-nerd-enzyklop%C3%A4d...

by y42

5/20/2026 at 9:58:03 AM

The art is actually making sure it all stays defined behavior

by justmarc

5/20/2026 at 12:51:52 PM

Very interesting article. I'm in love with C++, and I cannot say that I'm a good developer, but interesting to discover where UB can be. (Sorry I'm not a good english speaker)

by DostLeFan

5/20/2026 at 9:35:37 AM

Isn't the article mostly saying that SPARC sucks?

by alper

5/20/2026 at 7:43:20 PM

Life is undefined behaviour.

by 0x20cowboy

5/21/2026 at 12:38:05 AM

> We need some way of fixing UB at scale, without committing AI slop nor overwhelming human reviewers.

Write compiler which will define all this behavior. Usually people forget that UB exists only in standard. In practice it is always defined.

P.S. of course, while your hardware + firmware staying unchanged

P.S. not always defined in documentation - I mean defined in e.g. code

by feelamee

5/20/2026 at 7:16:02 AM

Hello, it's me. I'm not afraid of UB.

by my-next-account

5/20/2026 at 7:25:32 AM

To be honest, miscompilations because of UB is exceedingly rare, and we do a lot of weird shit in our code.

by my-next-account

5/20/2026 at 11:45:25 AM

You should be!

by saagarjha

5/20/2026 at 3:59:14 PM

> probably meaning on an address that’s a multiple of sizeof(int), but who knows

Sigh. s/sizeof(int)/_Alignof(int)/.

There are good reasons for an implementation to have sizeof(int) = _Alignof(int) and not a mere multiple of it, but if you are going to discuss subtle points and UB, just stick to the language guarantees.

> But let’s say you have a modern machine, where NULL is a pointer to address zero, and you actually have an object there.

You don't program in C on such a machine. Or maybe memory is virtualized, and it does not matter that your object lives at physical address zero, as long as you can map a non-zero virtual address to it.

> So how do you print an uid_t?

    if ((uid_t)-1 < (uid_t)0) {
        // uid_t is signed
        printf("%" PRIdMAX, (intmax_t)id);
    } else {
        // uid_t is unsigned
        printf("%" PRIuMAX, (uintmax_t)id);
    }

> It’s not rare for the denominator to come from untrusted input.

It's not rare for the array index to come from untrusted input.

It's not rare for the supposedly valid UTF-8 string to come from untrusted input.

...

Why single out division? This problem affects every partially defined operation. In the case of division at least, everyone learned in school that thou shalt not divide by zero. Adding two untrusted integers and forgetting that signed overflow is UB, not defined as a modulo? Your average programmer is much less likely to see that coming.

    > unsigned char a = 0xff;
    > unsigned char b = 1;
    > unsigned char zero = 0;
    > bool overflowed = (a + b) == zero;
    >
    > unsigned char a = 0x80;
    > uint64_t b = a << 24;

Please. Convert your operands to wide enough types before the operation. Convert your results back to narrow enough types to compensate for integer promotion to wider types than you would have liked. Do that consistently, and you're good.

Here:

    unsigned char a = 0xff;
    unsigned char b = 1;
    unsigned char zero = 0;
    bool overflowed = (unsigned char)(a + b) == zero;

    unsigned char a = 0x80;
    uint64_t b = (uint32_t)a << 24;

by el_pollo_diablo

5/20/2026 at 6:41:32 AM

I stoped reading about here:

    > bool parse_packet(const uint8_t* bytes) {
    >   const int* magic_intp = (const int*)bytes;   // UB!

Author, if you are reading this, please cite the spec section explaining that this is UB. Dereferencing the produced pointer may be UB, but casting itself is not, since uint8_t is ~ char and char* can be cast to and from any type.

you might try to argue that uint8_t is not necessarily char, and while it is true that implementations of C can exist where CHAR_BIT > 8, but those do not have uint8_t defined (as per spec), so if you have uint8_t, then it is "unsigned char", which makes this cast perfectly safe and defined as far as i can tell. Of course CHAR_BIT is required to be >= 8, so if it is not >8, it is exactly 8. (In any case, whether uint8_t is literally a typedef of unsigned char is implementation-defined and not actually relevant to whether the cast itself is valid -- it is)

by dmitrygr

5/20/2026 at 7:00:23 AM

The issue is not type punning (itself a very common source of UB), but the fact that the `bytes` pointer might not be int-aligned. The spec is clear that the creation (not just the dereferencing) of an unaligned pointer is UB, see 6.3.2.3 paragraph 7 of the C11 (draft) spec.

Of course, this exchange just demonstrates the larger point, that even a world-class expert in low level programming can easily make mistakes in spotting potential UB.

by raphlinus

5/20/2026 at 8:05:42 AM

> Of course, this exchange just demonstrates the larger point, that even a world-class expert in low level programming can easily make mistakes in spotting potential UB.

A "world-class expert in low level programming" knows that unaligned memory accesses are no problem anymore on most modern CPUs, and that this particular UB in the C standard is bogus and needs to fixed ;)

by flohofwoe

5/20/2026 at 8:12:23 AM

… it’s only UB if the pointer is actually misaligned. It’s not possible to tell from these two lines whether that’s the case.

by formerly_proven

5/20/2026 at 7:09:59 AM

C of course is ancient. It remembers the Cambrian explosion of CPU architectures, twelve-bit bytes and everything like that. I wonder if it is possible to codify some pragmatic subset of it that works nicely on currently available CPUs. Cause the author of the piece goes back in time to prove his point (SPARCs and Alphas).

by gritzko

5/20/2026 at 7:12:08 AM

Fun story: even the latest C spec doesn’t require CHAR_BIT == 8, but it does now codify 2s complement int representation. (IIRC)

by dmitrygr

5/20/2026 at 7:45:01 AM

For unsigned ints, or also for signed ints?

by eru

5/20/2026 at 9:53:52 AM

Two's complement is a representation specifically for signed integers.

by account42

5/20/2026 at 7:48:07 AM

For signed. Unsigned overflow was defined for a while now.

by dmitrygr

5/20/2026 at 10:12:53 AM

And unsigned negation is two's complement negation as well (-u = 0-u).

by gblargg

5/20/2026 at 7:03:52 AM

That cast is valid. Spec does not guarantee same bit sequence for resulting pointer and source pointer. But as the cast is explicitly allowed, it is not UB. Compiler is free to round the pointer down. Or up. Or even sideways. All ok. Dereferencing it — indeed not ok. But the cast is explicitly allowed and not UB.

Pointer casts changing pointer bit sequences is common on weird platforms (eg: some TI DSPs, PIC, and aarch64+PAC). And it is valid as per spec. Pointer assignment is not required to be the same as memcpy-ing the pointer unto a pointer to another type.

You misunderstood the spec. No promises are made that that cast copies the pointer bit for bit (and thus creates an invalid pointer). Therefore, your objection to invalid pointers is null and void. :)

by dmitrygr

5/20/2026 at 7:17:43 AM

I'm not assuming anything about bit representations. In this case, the spec language is quite clear and unambiguous.

6.3.2.3 paragraph 7: A pointer to an object type may be converted to a pointer to a different object type. If the resulting pointer is not correctly aligned[footnote 68]) for the referenced type, the behavior is undefined. Otherwise, when converted back again, the result shall compare equal to the original pointer. When a pointer to an object is converted to a pointer to a character type, the result points to the lowest addressed byte of the object. Successive increments of the result, up to the size of the object, yield pointers to the remaining bytes of the object.

This is a subsection of section 6.3 which describes conversions, which include both implicit and conversions from a cast operation. This language is not saying anything about bit representations or derefencing.

I happen to be wearing my undefined behavior shirt at the moment, which lends me an extra layer of authority. I'm at RustWeek in Utrecht, and it's one of my favorite shirts to wear at Rust conferences. But let's say for the sake of argument that you are right and I am indeed misunderstanding the spec. Then the logical conclusion is that it's very difficult for even experienced programmers to agree on basic interpretations of what is and what isn't UB in C.

by raphlinus

5/20/2026 at 7:44:39 AM

I do not see there a promise that the cast will produce an invalid pointer, nor anything prohibiting the compiler from rounding the pointer down, thus producing a valid one. “Converted” does not require bit copy. I don’t see how this interpretation is against any section of the spec.

by dmitrygr

5/20/2026 at 8:47:24 AM

I also do not see any requirement in the quoted text that the casted pointer be dereferenced before noting "the behavior is undefined".

In practice performing a cast doesn't really do much until you dereference, but without a carve out in the spec, it does really mean "the behavior is undefined".

by dwattttt

5/20/2026 at 8:20:38 AM

> Otherwise, when converted back again, the result shall compare equal to the original pointer.

Doesn't this part exclude the possibility of rounding down?

by cyclopeanutopia

5/20/2026 at 1:41:31 PM

No cause that requires initial alignment.

by dmitrygr

5/20/2026 at 9:11:54 AM

> rounding the pointer down, thus producing a valid one

A "valid" pointer to the wrong object?

by pjc50

5/20/2026 at 1:41:45 PM

Which is ok since it is UB to deref

by dmitrygr

5/20/2026 at 9:52:46 AM

Author here.

> A pointer to an object type may be converted to a pointer to a different object type. If the resulting pointer is not correctly aligned71) for the referenced type, the behavior is undefined.

C23 6.3.2.3p7.

by thomashabets2

5/20/2026 at 7:22:18 AM

Byte and int has different alignment requirements. It is UB the moment you make such a ptr.

Great way to demonstrate the point of the article.

by stevenhuang

5/20/2026 at 7:40:35 AM

That better be marked "historical". At least, Lemire says:

On recent Intel and 64-bit ARM processors, data alignment does not make processing a lot faster. It is a micro-optimization. Data alignment for speed is a myth. // https://lemire.me/blog/2012/05/31/data-alignment-for-speed-m...

(while in the olden days, a program may crash on unaligned access, esp on RISC)

by gritzko

5/20/2026 at 7:45:56 AM

Don't mix up what processors do with what the C standard allows you to get away with.

by eru

5/20/2026 at 8:09:39 AM

...and don't mix up the C standard with what actually existing compilers allow you to get away with ;) In the end the standard is merely a set of guidelines. What matters is how compiler toolchains behave in the real word, and breaking code which does unaligned memory accesses by 'UB exploitation' would be quite insane.

by flohofwoe

5/21/2026 at 3:55:55 AM

Sanity ain't a hard requirement for C compilers.

by eru

5/20/2026 at 7:50:07 AM

Without memcpy there is no guarantee that that line produces an invalid pointer

I don’t see what spec part would prohibit that cast from validly compiling to

   BIC r3, r0, #3

Spec only guaranteed round-trip through char* of properly aligned for type pointers. This doesn’t break that.

by dmitrygr

5/20/2026 at 11:32:07 PM

And that's a good thing. UB is another mechanism to speed up the development of compilers, many other languages fall trap to over defining while we lack the methods to solve such problems cleanly (believe me, the modern c++ people have tried). Usually this is the case because they believe strongly that their methods work despite evidence.

As for UB, the compiler has the final say. Nobody should write nontrivial c without understanding their compiler, the same as nobody should write c without understanding their text editor.

Code in other languages breaks between versions, in c there are projects with code from every version at once!

Looking at it another way, work put into a c compiler enables you to write nontrivial code.

by casey2

5/20/2026 at 5:51:25 PM

Probably not "everything" the vast vast vast majority of everything you are looking at on your screen right now is written in C.

by saltyoldman

5/20/2026 at 1:41:47 PM

U just need to read the title and 5 lines to know this must be a rust guy.

by up2isomorphism

5/20/2026 at 12:31:09 PM

How can it be valid implementation of isxdigit?

``` int isxdigit(int c) { if (c == EOF) { return false; } return some_array[c]; } ```

If you write code like this, then everything in programming is UB.

by stackedinserter

5/20/2026 at 7:14:47 AM

UB can also have impact in logical cohesion of codebase.

by fithisux

5/20/2026 at 11:10:58 AM

if c is more ub unsafe than it seems,what is the solution here

by synergy20

5/20/2026 at 7:10:21 AM

We know. This is not news.

by cracki

5/20/2026 at 7:26:06 AM

It seems to be to many many programmers who keep using C++

by boxed

5/20/2026 at 8:47:43 AM

I used to teach C programming and one time I got anonymous feedback: "when this instructor doesn't know the answer he says "it's compiler dependent.""

Shrug.

by SanjayMehta

5/20/2026 at 7:44:49 AM

Wait until he discovers PowerShell ;D

by VimEscapeArtist

5/20/2026 at 11:26:20 AM

feels like https://xkcd.com/1499/

the only people complaining about being able to do awful things are people that do awful things

by NooneAtAll3

5/20/2026 at 11:28:55 AM

- a metal bar always sinks

- unless you are trying to sink it in mercury. then it floats

- unless it is an uranium bar

- go sink uranium bars in mercury yourself

by gritzko

5/20/2026 at 6:51:47 AM

Yet another push to use LLMs after casting fear. Now it should be illegal not to use LLMs. A good start of the day.

(I hope casting fear is not UB)

by jraph

5/20/2026 at 6:53:16 AM

The irony is unmistakable.

by wg0

5/20/2026 at 7:27:48 AM

There is nothing ironic in letting an llm have a pass at identifying potential UB and other correctness issues in C code.

I say this as an experienced C developer.

by stevenhuang

5/20/2026 at 7:59:00 AM

It is ironic because the behaviour of an LLM itself is UB. Guaranteed.

by wg0

5/20/2026 at 6:58:53 AM

> (I hope casting fear is not UB)

I'm sure that's UB in C

In C++ just use <reinterpret_cast>

by raverbashing

5/20/2026 at 5:23:00 PM

"not correctly aligned (probably meaning on an address that’s a multiple of sizeof(int), but who knows)"

I stopped reading there. If you have decades of experience in C/C++ and don't know what that means (and that it's arch specific), I'll assume those decades were mostly the same year over and over.

C/C++ are horrible languages, but they deserve better opponents than that.

by groby_b

5/20/2026 at 3:01:08 PM

It's also worth highlighting that C is perhaps the most officially standardized programming language in history.

What a contradiction. Strong evidence that standard-driven programming language development is much worse than implementation-driven development. Standards should be used for data types and external interfaces/protocols, not programming languages.

by pphysch

5/20/2026 at 6:44:18 AM

Anyone who uses the construction "C/C++" doesn't write modern C++, and probably isn't very familiar with the recent revisions despite TFA's claims of writing it every day for decades.

Far from being just "C with classes", modern C++ is very different than C. The language is huge and complex, for sure, but nobody is forced to use all of it.

No HN comment can possibly cover all the use cases of C++ but in general, unless you have a very good reason not to:

- eschewing boomer loops in favor of ranges

- using RAII with smart pointers

- move semantics

- using STL containers instead of raw arrays

- borrowing using spans and string views

These things go a long way towards, shall we say, "safe-ish" code without UB. It is not memory-safe enforced at the language level, like Rust, but the upshot is you never need to deal with the Rust community :^)

by stackghost

5/20/2026 at 6:51:31 AM

Although some people, like Bjarne Stroustrup, object to the term C/C++, it's a bit like Richard Stallman objecting to the term "Linux". The fact is it can mean "C or C++", and I wouldn't assume the author thinks they're the same, but they're talking about both of them together in the same sentence. This seems reasonable given this is about undefined behavior, and it's trivial to accidentally write UB-inducing code in C++ even with modern style (although I'd say you should catch most trivial cases with e.g. ubsan, and a lot of bad cases would be avoided with e.g. ranges, so I think the article is exaggerating the issue).

by veltas

5/20/2026 at 6:56:43 AM

Well, the author explicitly refers to "C/C++" as one language:

>After all, C/C++ is not a memory safe language.

by stackghost

5/20/2026 at 9:43:51 AM

That is a typo, that I think I introduced when I went back to clarify that it applies to C++ too.

Will fix it.

by thomashabets2

5/20/2026 at 9:42:18 AM

Author here.

In the context of UB discussion, the arguments apply equally to C and C++.

How would you write that?

I entirely agree with all your points that C and C++ are completely different languages at this point. And yet I wanted to write this post about something that is true for both.

by thomashabets2

5/21/2026 at 4:38:03 AM

You can write C++ in a way that's similar to C if you want and run into some of the same UB. Normally I don't like the "C/C++" thing, but in this context it makes sense.

by jim33442

5/20/2026 at 6:48:21 AM

> the upshot is you never need to deal with the Rust community

In the end, everything comes down to culture war.

by rectang

5/20/2026 at 6:49:52 AM

Perhaps we should rewrite our culture in Rust.

by stackghost

5/20/2026 at 7:05:36 AM

I totally agree that modern c++ is pretty robust if you are both a well seasoned developer and only stick to a very blessed subset of it's features and avoid the historical baggage.

However, that's obviously not the point? Ignoring the idea that people can/should just "git gud" and write perfect code in a language with lots of old traps, you can't control how everyone else writes their code, even on your own team once it gets big enough. And there will always be junior devs stumbling into the bear traps of c/c++ (even if the rest of the codebase is all modern c++). So no matter how many great new features get added to C++, until (never) they start taking away the bad ones, the danger inherent to writing in that language doesn't go away.

Also, safe != non-UB. TFA isn't so much about memory safety anyway.

by SpaceNugget

5/20/2026 at 7:21:12 AM

C/C++ is a perfectly fine term for C or C-style C++. The languages can be very close, and personally I prefer C-style C++ miles over some of the half-baked modern nonsense. I mean, I do use C++23 since it has some great additions, but I'm ditching like 90% of the stuff that only adds complexity without much benefit.

by m-schuetz

5/20/2026 at 6:53:16 AM

"C/C++" is still a useful term for the common C/C++ subset :)

As far as stdlib usage is concerned: that's just your opinion. The stdlib has a lot of footguns and terrible design decisions too, e.g. std::vector pulling in 20k lines of code into each compilation unit is simply bizarre.

Also:

- eschewing boomer loops in favor of ranges

Those "boomer loops" compile infinitely faster than the new ranges stuff (and they are arguably more readable too): https://aras-p.info/blog/2018/12/28/Modern-C-Lamentations/

- borrowing using spans and string views

Those are just as unsafe as raw pointers. It's not really "borrowing" when the referenced data can disappear while the "borrow" is active.

by flohofwoe

5/20/2026 at 3:03:29 PM

a good case can be made that use of C++ is a SOX violation

So Linus was right? But for a second reason too:

C++ is a horrible language. It’s made more horrible by the fact that a lot of substandard programmers use it, to the point where it’s much, much easier to generate total and utter crap with it. Quite frankly, even if the choice of C were to do _nothing_ but keep the C++ programmers out, that in itself would be a huge reason to use C.

That is, accepting C++ code from programmers who use C++ could be a SOX violation ;-)

by EGreg

5/20/2026 at 4:33:06 PM

[flagged]

by creatorsstack

5/20/2026 at 8:35:07 AM

[dead]

by ivandotcodes

5/20/2026 at 6:56:53 AM

[dead]

by jdw64

5/20/2026 at 8:13:41 PM

[flagged]

by JayJSpringpeace

5/20/2026 at 7:28:05 PM

[dead]

by jim33442

5/20/2026 at 10:57:59 AM

[flagged]

by tenego

5/21/2026 at 11:43:01 AM

[dead]

by UltraViolence

5/20/2026 at 7:39:57 AM

[dead]

by rahadbhuiya

5/20/2026 at 7:39:59 AM

[dead]

by black_13

5/20/2026 at 7:13:46 AM

[dead]

by nurettin

5/20/2026 at 6:50:51 AM

[flagged]

by llggbbtt

5/20/2026 at 6:42:58 AM

Ok, and?

by nokeya

5/20/2026 at 6:52:45 AM

"Rewrite everything in Rust. OMG universe is written in Rust so memory safe with zero allocations"

by wg0

5/20/2026 at 9:52:01 AM

The issue for me with posts like this is that it misses the issue.

Unaligned pointer accesses are UB because different systems handle it differently. This 'should' be to allow the program to be portable by doing what the system normally does.

Instead it's been highjacked by compiler writers, with the logic that "X is UB, therefore can't happen, therefore can be optimised away."

Int c = abs(a) + abs(b); If (a > c) //overflow

Is UB because some system might do overflow differently. In practice every system wraps around.

That should be a valid check, instead it gets optimised away because it 'can't' happen.

C gives you enough rope to hang yourself. The compiler writers don't trust you to use the rope properly.

by benj111

5/20/2026 at 1:04:03 PM

maybe rewrite this in go?)

by Webhix

5/20/2026 at 7:10:15 AM

The concept of undefined behaviour is also a very useful lens for understanding LLM-based coding. Anything you don't explicitly specify is undefined behavior, so if you don't want the LLM to potentially pick a ridiculous implementation for some aspect of an application, make sure to explicitly specify how it should be implemented.

by logicchains

5/20/2026 at 8:42:57 AM

Rust.

by reinhash

5/20/2026 at 7:51:15 AM

most languages don't even HAVE a specification so in most languages literally EVERYTHING everything is undefined behavior

by mbrock

5/20/2026 at 8:03:55 AM

UB doesn't mean that it is not specified (actually it is often very well specified), it means that compilers can and do assume that such code patterns will not be present. Those cases may not be considered and can lead to unexpected behaviour.

Additionally, some (most?) UB is intentionally UB so that optimisers are free to do fancy tricks assuming that certain cases will never happen. Indeed, this is required for high performance. If they do happen, again, it can lead to unexpected behaviour.

PS: Most languages that don't have a specification declare their primary implementation to be specification-as-code. Rust is an example of that, and it does still have UB: the cases that the compiler assumes will not happen.

by oersted

5/20/2026 at 9:15:13 AM

undefined behavior is the behavior of code patterns "for which this International Standard imposes no requirements" and the behavior is in fact almost always predictable and agreed upon by compiler vendors and the users of the language, which is why you are able to use programs that rely on undefined behavior probably every single second you are using the computer

edit: for example I'm typing this into Safari which means probably every key press and event is going through JSC JIT compiled functions—which have, structurally and necessarily and intentionally, COMPLETELY undefined behavior according to the spec—and yet it miraculously works, perfectly, because the spec doesn't really matter

by mbrock

5/20/2026 at 11:44:45 AM

It matters when your JSC JIT is full of security holes

by saagarjha

5/20/2026 at 11:45:37 AM

ok what's the alternative?

by mbrock

5/20/2026 at 11:48:00 AM

Removing the undefined behavior

by saagarjha

5/20/2026 at 12:00:19 PM

you mean removing the JIT?

by mbrock

5/20/2026 at 12:16:22 PM

by saagarjha

5/20/2026 at 12:31:11 PM

okey dokey

by mbrock

5/20/2026 at 7:44:53 AM

Use Rust!

by grougnax

5/20/2026 at 7:10:43 AM

When use C ,keep using char* not mess with int*

by liamd1988

5/20/2026 at 7:03:38 AM

Debugging in C is soooo hard. When I was writing Malloc Lab in system course, there were uncountable undefined and out of range :(

by momo26

5/20/2026 at 7:20:37 AM

Yet, debugging memory corruption issues in C and C++ code with modern compiler toolchains and memory debugging tools is infinitely easier than 25 years ago.

(e.g. just compiling with address sanitizer and using static analyzers catch pretty much all of the 'trivial' memory corruption issues).

by flohofwoe

5/21/2026 at 7:43:32 AM

I think vice versa - C is so simply, that debugging it is just a pleasant walk.

Especially compared with modern languages with lambdas/exceptions/virtual functions and so on.

The one thing I see can make it harder is function pointers.

by feelamee

5/20/2026 at 11:03:19 AM

Everything in Java is defined behaviour, you need a VM with GC to remain sane.

Everything else is a waste of time!

by bullen

5/20/2026 at 12:09:18 PM

Excellent post. But it's addressed to the wrong people.

The problem lies with compilers, not with the language and its specification, or with the creators of the C programming language.

Anyone can write a compiler that transforms all undefined behaviors (UB) into defined behaviors (DB). And your compiler will be used by people, including me.

by nullpwr

5/20/2026 at 12:39:52 PM

I'd say the unaligned pointer one is the language's fault. The language should not let you create an an invalid pointer, or at least warn you when you are doing so.

OTOH one could argue that creating truly portable programs is not possible since a programming language is a leaky abstraction - different machines have different endianness, different alignment requirements, different amounts of memory, etc. One could argue therefore that the language should not make any assumptions about the alignment restrictions, or lack of them, on the machine you are compiling for. Just document that "manually created" pointers may be unaligned and have machine-dependent behavior. A nice compiler could still generate a warning or error if you create a pointer that doesn't meet the alignment requirements of the target you are compiling for.

C/C++'s provision of type casts reflects that the language has made the design decision to not restrict the user, and let them step outside the bounds of any guarantees the language provides if they want to. Unions are also a form of type cast.

by HarHarVeryFunny

5/20/2026 at 12:50:03 PM

> The language should not let you create an an invalid pointer, or at least warn you when you are doing so

completely agree!

by nullpwr

5/21/2026 at 12:08:18 AM

That's a nonsensical statement, a language cannot warn you, only a compiler can (-Wcast-align). The compiler can also decide what is and isn't an invalid pointer, this way the language avoids leaky abstractions.

by casey2

5/21/2026 at 1:52:45 PM

Different hardware targets obviously differ in many ways, not just alignment requirements, but things like endianness, integer representation, etc. C23 has finally said that signed integers must be in 2's complement representation, but this is only practically possible since all modern CPUs have also done that. Maybe endianness will be be mandated at some point since modern processors have also rallied around little endian ordering.

However, C has since day one chosen to define a leaky abstraction by putting efficiency above all else and having the language/implementation adapt to the hardware rather than vice versa (having the implementation hide hardware differences to implement some language-defined standard). The most obvious case of this is the size of the integer types short, int, and long, where the standard only says that "short <= int <= long" and specifies minimum value ranges that each type must be able to hold. The spec basically says that ints must be at least 16-bit, while on most targets they are in fact going to be 32-bit. Of course you can use int32_t etc for better portability, but nonetheless the language provides this "int" type whose size and endianness are unspecified.

C's unions could also be considered as providing a leaky abstraction, since it exposes differences between implementations - endianness, alignment/packing.

Then you get to things like size of address space, pointer sizes and representations, even amount of memory that a program may be able to allocate.

The only way a language could totally abstract away the underling hardware would be to base it on some lowest common denominator of hardware (or virtual machines) that it is willing to support, which would be totally impractical. The alternative is that you just accept the leaky abstraction and say that many things are implementation defined, not language defined.

by HarHarVeryFunny

5/21/2026 at 1:04:15 PM

It's obvious at the langauge level if you are creating, for example, an int pointer that is not guaranteed to be int-aligned. The only way to guarantee that an int pointer is int-aligned is if it comes from malloc* or new (C++), or if you take the address of an integer variable.

Any time you need to use a type cast to assign an int pointer, other than casting the output of malloc, then you potentially have a unaligned pointer, although only the compiler, knowing the alignment requirements of the target, would know if that is creating an invalid pointer or not.

* The C/C++ language (standard library) spec says that malloc must return memory that is aligned to meet the (target) requirements of all built in types, which is obviously a bit wasteful, as well not helpful for things like SIMD types that may be supported by libraries rather than built-in.

by HarHarVeryFunny

5/21/2026 at 10:17:56 AM

Thanks for the clarification. Yes, that's right – that's the job of the compiler/parser/etc.

I like the C programming language. To be honest, I regret not having the knowledge (low skills) to write my own compiler, or at least a standard library. I have some rough sketches on paper of what it should look like, but I can't implement it.

by nullpwr

5/20/2026 at 10:23:15 AM

I’ve been heavily invested in https://c3-lang.org/ the past couple months. How does it look from this perspective to someone with C experience?

by ricardobeat