There’s of course a couple of notes on combating this attack: most compilers of today don’t actually produce exactly the same code if you run them twice. In broad strokes they do but often they’ll have randomness creep in, such as different build hashes or iteration order of associative containers. To truly get a bit-for-bit identical output you may need to do some extra work, or perhaps run yet another step in a controlled environment to protect against this.

Second, and more importantly, many people carry a copy of a trusted compiler around, though it’s rarely mention in attacks like these: their head. In a pinch people can do spot checks to verify codegen to see whether it looks correct, unless the backdoor is incredibly subtle. But experience shows us that the more complex and hidden a backdoor is the more likely it is to break when subjected to unfamiliar examination.

most compilers of today don’t actually produce exactly the same code if you run them twice.

Timestamps are one of the biggest offenders, but this is also why reproducible builds are important. Nondeterministic codegen is just scary.

Second, and more importantly, many people carry a copy of a trusted compiler around, though it’s rarely mention in attacks like these: their head.

This is also why I'm against inefficient bloated software in general: the bigger a binary is, the easier it is to hide something in it.

Along the same lines, a third idea I have for defending against such attacks is better decompilers --- ideally, repeatedly decompiling and recompiling should converge to a fixed point, whereas a backdoor in a compiler should cause it to decompile into source that's noticeably divergent.

Mike Pall somewhat famously wrote a Lua interpreter in assembler. Assuming that you can write a compiler for your language in Lua, you don't really have to trust trust, but you do have to trust Mike Pall. I'm not aware of any other raw assembly implementations of modern programming languages, but I suspect there are other examples. The overall scheme (^,~) could probably be replicated by a sufficiently dedicated team if there was interest.

Not exactly easy, but probably easier than a decompiler that produces human-equivalent source code.

The C4 compiler [https://github.com/rswier/c4] is a self-hosting compiler for a subset of the C programming language that produces executable x86 code. You can understand and audit this code in a couple of hours (its 528 lines).

It could be an interesting exercise to bootstrap up from something like this to a working linux environment based solely on source code compilation : no binary inputs. Of course a full linux environment has way too much source code for one person or team to audit, but at least it rules out RoTT style binary compiler contamination.