Have there been attempts to measure modern compilers' ability to optimize large programs against "perfect" machine code? The few times I've poked through demo scene binaries, I've felt a sense of awe at the complexity described by just a few instructions, but demos aren't starting from a specification and trying to find a compressed representation, they're trying to create small binaries that expand into something complex and interesting. Clearly this problem is NP-hard as hell (and most software doesn't need to be optimized instruction-by-instruction), but it would be incredibly neat to have an estimate of _how much worse_ our compilers might be.

I'm reminded of the old story about someone who tried to evolve an FPGA configuration to perform signal processing, and the final generation exploited imperfections in the chip to use fewer gates.

"Optimal" compilation is actually harder than NP-complete. Because program equivalence is undecidable, finding a canonical "optimal" representation for a program can't be done in general.

What about superoptimization? For instance, Stoke (http://stoke.stanford.edu/) or Souper (https://github.com/google/souper). They will not find all optimizations of course, and program equivalence is undecidable indeed, but they are a good shot at optimal compilation, I would say.