I haven't got the time to read the paper yet but I believe I'd emerge with the more or less the same opinion that I've had before: nobody's forcing you to pass -O2 or -O3. It's stupid to ask the compiler to optimize and then whine that it optimizes. I usually am OK with the compiler optimizing, hence I ask it to do so. I'm glad that others who disagree can selectively enable or disable only the optimizations they're concerned about. Most of the optimizations that people whine about seem quite sane to me. Of course, sometimes you find real bugs in the optimizer (yesterday someone on libera/#c posted a snippet where gcc with -O2 (-fdelete-null-pointer-checks) removes completely legit checks)

I’m amused by the people who ask for optimization and then complain about it. Or the “it did what I said, not what I meant” crowd.

But, officially, undefined behavior is always undefined, not just at higher optimization levels.

Alternatively if my compiler has a "run faster" flag I would not expect it to change the "semantics" of my code in the process.

Additionally in C UB is often not intentional nor trivial to detect in your codebase since it may be the interaction of two pieces of code that are not anywhere obviously close to each other. There comes a point where faster but broken code isn't better it's just broken.

"Run faster" involves things like register allocation. How can you justify register allocation doesn't change semantics of code? It absolutely does! If you smash the stack, variables on stack will be smashed, and variables on register will not. This is change of semantics.

Compilers justify register allocation by assuming stack can't be smashed, contrary to hardware reality. Because smashing the stack is UB. That's what C UB is for.

C exposes a (fairly low-level) abstract machine with its own semantics, encapsulating the operational semantics of the CPU. Optimization flags are expected to affect the uarch-operational semantics, but not the semantics of the C abstract machine.

Think of it like trading one cycle-accurate emulator for a better, more tightly-coded cycle-accurate emulator. Or a plain loop-and-switch bytecode interpreter for a threaded-code bytecode interpreter. The way your code is running on the underlying machine changes, but the semantics of your code relative to the abstract machine the code itself interacts with should not change.

> Compilers justify register allocation by assuming stack can't be smashed, contrary to hardware reality.

...which should be entirely fine, as the existence of a stack isn't part of the exposed semantics of the C abstract machine. Values can be "on the stack" — and you can get pointers to them — but nothing in the C abstract machine says that you should be able to smash the stack. There's nothing in the C standard itself describing a physically-laid-out stack in memory. (There are ABI calling conventions defined in the standard, but these are annotations for target-uarch codegen, not facts about the C abstract machine.) There could just as well be a uarch with only the ability to allocate things "on the stack" by putting them in a heap — in fact, this is basically how you'd have to do things if you wrote a C compiler to target the JVM as a uarch — and a C compiler written for such a uarch would still be perfectly compliant with the C standard.

If C were an interpreted language, the fact that the stack can be smashed would be referred to as a "flaw in the implementation of the runtime" — the runtime allowing the semantics of the underlying uarch to leak through to the abstract-machine abstraction — rather than "a flaw in the interpreter" per se; and you'd then expect a better runtime implementation to fix the problem, by e.g. making all pointers to stack values actually be hardware capabilities, or making each stack frame into its own memory segment, or something crazy like that.

As a compiled language — and especially one that has encapsulation-breaking holes in its abstract machine, like the C inline-assembly syntax — you can't exactly expect a runtime shim to fix up the underlying uarch to conform to the C abstract machine. But you could at least expect the C compiler to not let you do anything that invokes those cases where it's exposing potentially-divergent uarch semantics rather than a normalized C-abstract-machine semantics. As if, where there should be a shim "fixing up" a badly-behaved uarch, you'd instead hit a NotImplementedException in the compiler, resulting in a compilation abort.

If you wanted to prevent undefined behavior at compile time by having the compiler throw an exception, you'd actually need to prevent potentially undefined behavior, for the simple reason that compile time doesn't know what your inputs at runtime are. You could come up with a subset of C, but it would be a restrictive and probably useless subset. C strings go out the window, for instance, because you don't know at compile time if a given char * is indeed a NUL-terminated string. You also wouldn't be able to store that pointer anywhere, because it could be freed as soon as you return from the function - and that's assuming we're disallowing threads and signals in our C subset, or else you just couldn't use the pointer at all. (This, by the way, argues that it's a flaw in the language definition, not in the implementation of anything.)

There are a huge number of languages where you pass strings along with their length. There are also plenty of languages where you know (either via compile-time analysis and an insistence on writing code in an analyzable way, as in Rust, or via mild runtime overhead, as in Swift or Haskell or Java) that the memory remains still valid when you reference it. But those languages are not a subset of C. They either need more information about a program than a C program contains, or they disallow operations that the C abstract machine can perform.

You're right that you can't define a safe subset of C without making it impractical. MISRA C defines a C subset intended to help avoid C's footguns, but it still isn't actually a safe language. There are alternative approaches though:

1. Compile a safe language to C (whether a new language or an existing one)

2. Formal analysis of C, or of some practical subset of C, to prove the absence of undefined behaviour

Work has been done on both approaches.

ZZ compiles to C. [0] Dafny can compile to C++, but it seems that's not its primary target. [1][2]

There are several projects on formal analysis of C. [3][4][5][6]

[0] https://github.com/zetzit/zz

[1] https://github.com/dafny-lang/dafny

[2] https://dafny-lang.github.io/dafny/

[3] https://frama-c.com/

[4] https://www.microsoft.com/en-us/research/project/vcc-a-verif...

[5] https://www.eschertech.com/products/ecv.php

[6] https://trust-in-soft.com/